10720171

Audio Processing

PublishedJuly 21, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
36 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method of audio processing, the method comprising: receiving a plurality of audio samples; concatenating the plurality of audio samples to form a composite audio signal; analysing the composite audio signal to identify audio artefacts associated with concatenation in the composite audio signal, wherein analysing comprises: monitoring an energy level of the composite audio signal; monitoring a rate of change of a tracking envelope of the composite audio signal; and identifying audio artefacts associated with concatenation based on both the monitored energy level of the composite audio signal and the monitored rate of change of the energy level of the composite audio signals; compensating for the identified audio artefacts to form a corrected composite audio signal; and providing the corrected composite audio signal to a voice biometrics module.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors both the audio signal's energy level and the rate of change of a tracking envelope of the signal. Audio artifacts are identified based on a combination of these monitored energy levels and their rates of change. After identification, these artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 2

Original Legal Text

2. A method according to claim 1 , wherein the step of analysing the composite audio signal to identify audio artefacts associated with concatenation in the composite audio signal comprises: identifying a pop or click in the composite audio signal.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. This analysis, which specifically includes identifying *pops or clicks* in the composite audio signal, monitors both the audio signal's energy level and the rate of change of a tracking envelope of the signal. Audio artifacts are identified based on a combination of these monitored energy levels and their rates of change. After identification, these artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 3

Original Legal Text

3. A method according to claim 1 , wherein the step of monitoring an energy level of the composite audio signal comprises: forming an energy tracking envelope of the composite audio signal.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors the audio signal's energy level *by forming an energy tracking envelope of the composite audio signal*, and also monitors the rate of change of a tracking envelope of the signal. Audio artifacts are identified based on a combination of the monitored energy level (derived from the energy tracking envelope) and the monitored rate of change of the energy level. After identification, these artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 4

Original Legal Text

4. A method according to claim 1 , wherein the step of monitoring a rate of change of the energy level of the composite audio signal comprises: forming a signal tracking envelope of the composite audio signal; and determining a rate of change of the signal tracking envelope of the composite audio signal.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors the audio signal's energy level and also monitors the rate of change of the energy level *by forming a signal tracking envelope of the composite audio signal and then determining the rate of change of this signal tracking envelope*. Audio artifacts are identified based on a combination of the monitored energy level and the monitored rate of change of the energy level (derived from the signal tracking envelope's rate of change). After identification, these artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 5

Original Legal Text

5. A method according to claim 4 , wherein the signal tracking envelope has a faster attack time constant than the energy tracking envelope.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors the audio signal's energy level by forming an *energy tracking envelope* of the composite audio signal. It also monitors the rate of change of the energy level by forming a *signal tracking envelope* of the composite audio signal and determining its rate of change. Crucially, this *signal tracking envelope* is configured to have a *faster attack time constant* than the *energy tracking envelope*, allowing it to respond more quickly to sudden changes. Audio artifacts are identified based on both the monitored energy level (from the energy tracking envelope) and the rate of change (from the signal tracking envelope). After identification, these artifacts are compensated for, resulting in a corrected composite audio signal that is then sent to a voice biometrics module.

Claim 6

Original Legal Text

6. A method according to claim 4 , wherein the step of monitoring an energy level of the composite audio signal comprises forming an energy tracking envelope of the composite audio signal, and wherein the step of identifying audio artefacts associated with concatenation based on both the monitored energy level of the composite audio signal and monitored rate of change of the energy level of the composite audio signal comprises: determining whether a parameter of the energy tracking envelope exceeds a first threshold level; determining whether the rate of change of the signal tracking envelope exceeds a second threshold level; and responsive to the parameter of the energy tracking envelope not exceeding the first threshold level, and the rate of change of the signal tracking envelope exceeding the second threshold level, identifying an audio artefact.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors the audio signal's energy level by forming an *energy tracking envelope*. It also monitors the rate of change of the energy level by forming a *signal tracking envelope* and determining its rate of change. Audio artifacts are specifically identified when two conditions are met: first, a parameter of the *energy tracking envelope does not exceed a first threshold level*; and second, the *rate of change of the signal tracking envelope exceeds a second threshold level*. If both conditions are true, an audio artifact is identified. After identification, these artifacts are compensated for, resulting in a corrected composite audio signal that is then sent to a voice biometrics module.

Claim 7

Original Legal Text

7. A method according to claim 6 , wherein the second threshold level is set based on a maximum expected slew rate of the composite audio signal.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors the audio signal's energy level by forming an *energy tracking envelope*. It also monitors the rate of change of the energy level by forming a *signal tracking envelope* and determining its rate of change. Audio artifacts are specifically identified when two conditions are met: first, a parameter of the *energy tracking envelope does not exceed a first threshold level*; and second, the *rate of change of the signal tracking envelope exceeds a second threshold level*. This *second threshold level is specifically set based on the maximum expected slew rate of the composite audio signal*. If both conditions are true, an audio artifact is identified. After identification, these artifacts are compensated for, resulting in a corrected composite audio signal that is then sent to a voice biometrics module.

Claim 8

Original Legal Text

8. A method according to claim 4 , wherein the step of monitoring an energy level of the composite audio signal comprises forming an energy tracking envelope of the composite audio signal, and wherein the step of identifying audio artefacts associated with concatenation based on both the monitored energy level of the composite audio signal and monitored rate of change of the energy level of the composite audio signal comprises determining whether the ratio of the rate of change of the signal tracking envelope and a parameter of the energy tracking envelope exceeds a third threshold level; and responsive to the ratio of the rate of change of the signal tracking envelope and the parameter of the energy tracking envelope exceeding the third threshold level, identifying an audio artefact.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors the audio signal's energy level by forming an *energy tracking envelope*. It also monitors the rate of change of the energy level by forming a *signal tracking envelope* and determining its rate of change. Audio artifacts are specifically identified when a calculated *ratio* exceeds a certain threshold: this ratio is derived from the *rate of change of the signal tracking envelope divided by a parameter of the energy tracking envelope*. If this ratio *exceeds a third threshold level*, an audio artifact is identified. After identification, these artifacts are compensated for, resulting in a corrected composite audio signal that is then sent to a voice biometrics module.

Claim 9

Original Legal Text

9. A method according to claim 1 , wherein the plurality of audio samples represent speech.

Plain English Translation

A method for audio processing involves receiving multiple audio samples, which *represent speech*, and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors both the audio signal's energy level and the rate of change of a tracking envelope of the signal. Audio artifacts are identified based on a combination of these monitored energy levels and their rates of change. After identification, these artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 10

Original Legal Text

10. A method according to claim 9 , wherein the plurality of samples representing speech are received from a speaker diarisation process.

Plain English Translation

A method for audio processing involves receiving multiple audio samples, which represent speech and are *specifically received from a speaker diarisation process*, and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors both the audio signal's energy level and the rate of change of a tracking envelope of the signal. Audio artifacts are identified based on a combination of these monitored energy levels and their rates of change. After identification, these artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 11

Original Legal Text

11. A method according to claim 9 , wherein the plurality of samples representing speech comprise a plurality of utterances received from multiple different sessions where an individual has provided speech to the system.

Plain English Translation

A method for audio processing involves receiving multiple audio samples, which represent speech and *comprise a plurality of utterances received from multiple different sessions where an individual has provided speech to the system*. These samples are then combined (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors both the audio signal's energy level and the rate of change of a tracking envelope of the signal. Audio artifacts are identified based on a combination of these monitored energy levels and their rates of change. After identification, these artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 12

Original Legal Text

12. A method according to claim 1 , wherein the step of compensating for the identified audio artefacts to form a corrected composite audio signal comprises: applying a time-variable gain to the composite audio signal.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors both the audio signal's energy level and the rate of change of a tracking envelope of the signal. Audio artifacts are identified based on a combination of these monitored energy levels and their rates of change. Once identified, these artifacts are compensated for *by applying a time-variable gain to the composite audio signal*, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 13

Original Legal Text

13. A method according to claim 12 , wherein the time-variable gain comprises a Gaussian profile.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors both the audio signal's energy level and the rate of change of a tracking envelope of the signal. Audio artifacts are identified based on a combination of these monitored energy levels and their rates of change. Once identified, these artifacts are compensated for *by applying a time-variable gain to the composite audio signal, wherein this time-variable gain comprises a Gaussian profile*, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 14

Original Legal Text

14. A method according to claim 1 , wherein the method further comprises: using the corrected composite audio signal in a speaker enrolment process.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors both the audio signal's energy level and the rate of change of a tracking envelope of the signal. Audio artifacts are identified based on a combination of these monitored energy levels and their rates of change. After identification, these artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module, and *is further used in a speaker enrolment process*.

Claim 15

Original Legal Text

15. A method according to claim 1 , wherein the method further comprises: using the corrected composite audio signal in a speaker verification process.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. The analysis monitors both the audio signal's energy level and the rate of change of a tracking envelope of the signal. Audio artifacts are identified based on a combination of these monitored energy levels and their rates of change. After identification, these artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module, and *is further used in a speaker verification process*.

Claim 16

Original Legal Text

16. A system for audio processing, the system comprising: an input for receiving a plurality of audio samples; a processor, wherein the processor is configured for: concatenating the plurality of audio samples to form a composite audio signal; analysing the composite audio signal to identify audio artefacts associated with concatenation in the composite audio signal, wherein analysing comprises: monitoring an energy level of the composite audio signal; monitoring a rate of change of a tracking envelope of the composite audio signal; and identifying audio artefacts associated with concatenation based on both the monitored energy level of the composite audio signal and the monitored rate of change of the energy level of the composite audio signal; compensating for the identified audio artefacts to form a corrected composite audio signal; and an output for providing the corrected composite audio signal to a voice biometrics module.

Plain English Translation

A system for audio processing includes an input for receiving multiple audio samples. A processor within the system is configured to combine (concatenate) these samples into a composite audio signal. The processor then analyzes this composite signal to identify audio artifacts caused by concatenation. This analysis involves monitoring the composite audio signal's energy level and the rate of change of a tracking envelope of the signal. Based on both the monitored energy level and its rate of change, the processor identifies any audio artifacts. It then compensates for these identified artifacts to create a corrected composite audio signal. Finally, an output provides this corrected audio signal to a voice biometrics module.

Claim 17

Original Legal Text

17. A system according to claim 16 , further comprising a voice biometrics module connected to said output.

Plain English Translation

A system for audio processing includes an input for receiving multiple audio samples. A processor within the system is configured to combine (concatenate) these samples into a composite audio signal. The processor then analyzes this composite signal to identify audio artifacts caused by concatenation. This analysis involves monitoring the composite audio signal's energy level and the rate of change of a tracking envelope of the signal. Based on both the monitored energy level and its rate of change, the processor identifies any audio artifacts. It then compensates for these identified artifacts to create a corrected composite audio signal. Finally, an output provides this corrected audio signal to a voice biometrics module, and the system *further includes this voice biometrics module connected to said output*.

Claim 18

Original Legal Text

18. A computer program product, comprising a non-transitory computer-readable medium, containing instructions for causing a suitably programmed processor to perform a method comprising: receiving a plurality of audio samples; concatenating the plurality of audio samples to form a composite audio signal; analysing the composite audio signal to identify audio artefacts associated with concatenation in the composite audio signal, wherein analysing comprises: monitoring an energy level of the composite audio signal; monitoring a rate of change of a tracking envelope of the composite audio signal; and identifying audio artefacts associated with concatenation based on both the monitored energy level of the composite audio signal and the monitored rate of change of the energy level of the composite audio signal; compensating for the identified audio artefacts to form a corrected composite audio signal; and providing the corrected composite audio signal to a voice biometrics module.

Plain English Translation

A computer program product, stored on a non-transitory computer-readable medium, contains instructions that, when executed by a processor, cause the processor to perform an audio processing method. This method includes receiving multiple audio samples and combining them (concatenating) into a composite audio signal. The instructions then guide the processor to analyze this composite signal to identify audio artifacts caused by concatenation. This analysis involves monitoring the composite audio signal's energy level and the rate of change of a tracking envelope of the signal. Audio artifacts are identified based on both the monitored energy level and its rate of change. The processor then compensates for these identified artifacts to create a corrected composite audio signal, and finally provides this corrected signal to a voice biometrics module.

Claim 19

Original Legal Text

19. A method of audio processing, the method comprising: receiving a plurality of audio samples; concatenating the plurality of audio samples to form a composite audio signal; analysing the composite audio signal to identify audio artefacts associated with concatenation in the composite audio signal; compensating for the identified audio artefacts to form a corrected composite audio signal; and providing the corrected composite audio signal to a voice biometrics module; reversing the composite audio signal; and analysing the reversed composite audio signal to identify audio artefacts associated with concatenation in the composite audio signal.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. After this initial analysis, the method further processes the composite audio signal by *reversing it* and then *analyzing the reversed composite audio signal again to identify additional audio artifacts associated with concatenation*. Once identified, all audio artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 20

Original Legal Text

20. A method according to claim 19 , wherein the step of analysing the composite audio signal to identify audio artefacts associated with concatenation in the composite audio signal comprises: monitoring an energy level of the composite audio signal; monitoring a rate of change of a tracking envelope of the composite audio signal; and identifying audio artefacts associated with concatenation based on both the monitored energy level of the composite audio signal and the monitored rate of change of the energy level of the composite audio signal.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. This analysis involves *monitoring the audio signal's energy level, monitoring the rate of change of a tracking envelope of the composite audio signal, and identifying audio artifacts based on both the monitored energy level and the monitored rate of change of the energy level*. After this initial analysis, the method further processes the composite audio signal by reversing it and then analyzing the reversed composite audio signal again to identify additional audio artifacts associated with concatenation. Once identified, all audio artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module.

Claim 21

Original Legal Text

21. A method according to claim 20 , wherein the step of monitoring an energy level of the composite audio signal comprises: forming an energy tracking envelope of the composite audio signal.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. This analysis monitors the audio signal's energy level *by forming an energy tracking envelope of the composite audio signal*, and also monitors the rate of change of a tracking envelope of the composite audio signal. Audio artifacts are identified based on both the monitored energy level (from the energy tracking envelope) and the monitored rate of change of the energy level. After this initial analysis, the method further processes the composite audio signal by reversing it and then analyzing the reversed composite audio signal again to identify additional audio artifacts associated with concatenation. Once identified, all audio artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module.

Claim 22

Original Legal Text

22. A method according to claim 20 , wherein the step of monitoring a rate of change of the energy level of the composite audio signal comprises: forming a signal tracking envelope of the composite audio signal; and determining a rate of change of the signal tracking envelope of the composite audio signal.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. This analysis monitors the audio signal's energy level and also monitors the rate of change of the energy level *by forming a signal tracking envelope of the composite audio signal and determining a rate of change of the signal tracking envelope*. Audio artifacts are identified based on both the monitored energy level and the monitored rate of change of the energy level (derived from the signal tracking envelope's rate of change). After this initial analysis, the method further processes the composite audio signal by reversing it and then analyzing the reversed composite audio signal again to identify additional audio artifacts associated with concatenation. Once identified, all audio artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module.

Claim 23

Original Legal Text

23. A method according to claim 22 , wherein the signal tracking envelope has a faster attack time constant than the energy tracking envelope.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. This analysis monitors the audio signal's energy level by forming an *energy tracking envelope* and monitors the rate of change of the energy level by forming a *signal tracking envelope* and determining its rate of change. Crucially, this *signal tracking envelope* has a *faster attack time constant* than the *energy tracking envelope*. Audio artifacts are identified based on both envelopes. After this initial analysis, the method further processes the composite audio signal by reversing it and then analyzing the reversed composite audio signal again to identify additional audio artifacts. Once identified, all artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module.

Claim 24

Original Legal Text

24. A method according to claim 22 , wherein the step of monitoring an energy level of the composite audio signal comprises forming an energy tracking envelope of the composite audio signal, and wherein the step of identifying audio artefacts associated with concatenation based on both the monitored energy level of the composite audio signal and monitored rate of change of the energy level of the composite audio signal comprises: determining whether a parameter of the energy tracking envelope exceeds a first threshold level; determining whether the rate of change of the signal tracking envelope exceeds a second threshold level; and responsive to the parameter of the energy tracking envelope not exceeding the first threshold level, and the rate of change of the signal tracking envelope exceeding the second threshold level, identifying an audio artefact.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. This analysis monitors the audio signal's energy level by forming an *energy tracking envelope* and monitors the rate of change of the energy level by forming a *signal tracking envelope* and determining its rate of change. Audio artifacts are specifically identified when two conditions are met: first, a parameter of the *energy tracking envelope does not exceed a first threshold level*; and second, the *rate of change of the signal tracking envelope exceeds a second threshold level*. If both are true, an artifact is identified. After this initial analysis, the method reverses the composite signal and analyzes the reversed signal for additional artifacts. All identified artifacts are compensated for, producing a corrected signal sent to a voice biometrics module.

Claim 25

Original Legal Text

25. A method according to claim 24 , wherein the second threshold level is set based on a maximum expected slew rate of the composite audio signal.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. This analysis monitors the audio signal's energy level by forming an *energy tracking envelope* and monitors the rate of change of the energy level by forming a *signal tracking envelope* and determining its rate of change. Audio artifacts are specifically identified when two conditions are met: first, a parameter of the *energy tracking envelope does not exceed a first threshold level*; and second, the *rate of change of the signal tracking envelope exceeds a second threshold level*. This *second threshold level is specifically set based on the maximum expected slew rate of the composite audio signal*. If both are true, an artifact is identified. After this initial analysis, the method reverses the composite signal and analyzes the reversed signal for additional artifacts. All identified artifacts are compensated for, producing a corrected signal sent to a voice biometrics module.

Claim 26

Original Legal Text

26. A method according to claim 22 , wherein the step of monitoring an energy level of the composite audio signal comprises forming an energy tracking envelope of the composite audio signal, and wherein the step of identifying audio artefacts associated with concatenation based on both the monitored energy level of the composite audio signal and monitored rate of change of the energy level of the composite audio signal comprises: determining whether the ratio of the rate of change of the signal tracking envelope and a parameter of the energy tracking envelope exceeds a third threshold level; and responsive to the ratio of the rate of change of the signal tracking envelope and the parameter of the energy tracking envelope exceeding the third threshold level, identifying an audio artefact.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. This analysis monitors the audio signal's energy level by forming an *energy tracking envelope* and monitors the rate of change of the energy level by forming a *signal tracking envelope* and determining its rate of change. Audio artifacts are specifically identified when a calculated *ratio* exceeds a certain threshold: this ratio is derived from the *rate of change of the signal tracking envelope divided by a parameter of the energy tracking envelope*. If this ratio *exceeds a third threshold level*, an audio artifact is identified. After this initial analysis, the method reverses the composite signal and analyzes the reversed signal for additional artifacts. All identified artifacts are compensated for, producing a corrected signal sent to a voice biometrics module.

Claim 27

Original Legal Text

27. A method according to claim 19 , wherein the plurality of audio samples represent speech.

Plain English Translation

A method for audio processing involves receiving multiple audio samples, which *represent speech*, and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. After this initial analysis, the method further processes the composite audio signal by reversing it and then analyzing the reversed composite audio signal again to identify additional audio artifacts associated with concatenation. Once identified, all audio artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 28

Original Legal Text

28. A method according to claim 27 , wherein the plurality of samples representing speech are received from a speaker diarisation process.

Plain English Translation

A method for audio processing involves receiving multiple audio samples, which represent speech and are *specifically received from a speaker diarisation process*, and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. After this initial analysis, the method further processes the composite audio signal by reversing it and then analyzing the reversed composite audio signal again to identify additional audio artifacts associated with concatenation. Once identified, all audio artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 29

Original Legal Text

29. A method according to claim 27 , wherein the plurality of samples representing speech comprise a plurality of utterances received from multiple different sessions where an individual has provided speech to the system.

Plain English Translation

A method for audio processing involves receiving multiple audio samples, which represent speech and *comprise a plurality of utterances received from multiple different sessions where an individual has provided speech to the system*. These samples are then combined (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. After this initial analysis, the method further processes the composite audio signal by reversing it and then analyzing the reversed composite audio signal again to identify additional audio artifacts associated with concatenation. Once identified, all audio artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module.

Claim 30

Original Legal Text

30. A method according to claim 19 , wherein the step of compensating for the identified audio artefacts to form a corrected composite audio signal comprises: applying a time-variable gain to the composite audio signal.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. After this initial analysis, the method further processes the composite audio signal by reversing it and then analyzing the reversed composite audio signal again to identify additional audio artifacts associated with concatenation. Once identified, all audio artifacts are compensated for *by applying a time-variable gain to the composite audio signal*, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 31

Original Legal Text

31. A method according to claim 30 , wherein the time-variable gain comprises a Gaussian profile.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. After this initial analysis, the method further processes the composite audio signal by reversing it and then analyzing the reversed composite audio signal again to identify additional audio artifacts associated with concatenation. Once identified, all audio artifacts are compensated for *by applying a time-variable gain to the composite audio signal, wherein this time-variable gain comprises a Gaussian profile*, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module for use in tasks like speaker identification or verification.

Claim 32

Original Legal Text

32. A method according to claim 19 , wherein the method further comprises: using the corrected composite audio signal in a speaker enrolment process.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. After this initial analysis, the method further processes the composite audio signal by reversing it and then analyzing the reversed composite audio signal again to identify additional audio artifacts associated with concatenation. Once identified, all audio artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module, and *is further used in a speaker enrolment process*.

Claim 33

Original Legal Text

33. A method according to claim 19 , wherein the method further comprises: using the corrected composite audio signal in a speaker verification process.

Plain English Translation

A method for audio processing involves receiving multiple audio samples and combining them (concatenating) into a single composite audio signal. This composite signal is then analyzed to detect specific audio artifacts caused by the concatenation. After this initial analysis, the method further processes the composite audio signal by reversing it and then analyzing the reversed composite audio signal again to identify additional audio artifacts associated with concatenation. Once identified, all audio artifacts are compensated for, resulting in a corrected composite audio signal. This cleaned signal is then sent to a voice biometrics module, and *is further used in a speaker verification process*.

Claim 34

Original Legal Text

34. A system for audio processing, the system comprising: an input for receiving a plurality of audio samples; a processor, wherein the processor is configured for: concatenating the plurality of audio samples to form a composite audio signal; analysing the composite audio signal to identify audio artefacts associated with concatenation in the composite audio signal; and compensating for the identified audio artefacts to form a corrected composite audio signal; reversing the composite audio signal; and analysing the reversed composite audio signal to identify audio artefacts associated with concatenation in the composite audio signal; and an output for providing the corrected composite audio signal to a voice biometrics module.

Plain English Translation

A system for audio processing includes an input for receiving multiple audio samples. A processor within the system is configured to combine (concatenate) these samples into a composite audio signal. The processor then analyzes this composite signal to identify audio artifacts caused by concatenation. It then compensates for these identified artifacts to create a corrected composite audio signal. Additionally, the processor is configured to *reverse the composite audio signal* and *analyze the reversed composite audio signal to identify further audio artifacts associated with concatenation*. Finally, an output provides this corrected audio signal to a voice biometrics module.

Claim 35

Original Legal Text

35. A system according to claim 34 , further comprising a voice biometrics module connected to said output.

Plain English Translation

A system for audio processing includes an input for receiving multiple audio samples. A processor within the system is configured to combine (concatenate) these samples into a composite audio signal. The processor then analyzes this composite signal to identify audio artifacts caused by concatenation. It then compensates for these identified artifacts to create a corrected composite audio signal. Additionally, the processor is configured to reverse the composite audio signal and analyze the reversed composite audio signal to identify further audio artifacts associated with concatenation. Finally, an output provides this corrected audio signal to a voice biometrics module, and the system *further includes this voice biometrics module connected to said output*.

Claim 36

Original Legal Text

36. A computer program product, comprising a non-transitory tangible computer-readable medium, containing instructions for causing a suitably programmed processor to perform a method comprising: receiving a plurality of audio samples; concatenating the plurality of audio samples to form a composite audio signal; analysing the composite audio signal to identify audio artefacts associated with concatenation in the composite audio signal; compensating for the identified audio artefacts to form a corrected composite audio signal; and providing the corrected composite audio signal to a voice biometrics module; reversing the composite audio signal; and analysing the reversed composite audio signal to identify audio artefacts associated with concatenation in the composite audio signal.

Plain English Translation

A computer program product, stored on a non-transitory tangible computer-readable medium, contains instructions that, when executed by a processor, cause the processor to perform an audio processing method. This method includes receiving multiple audio samples and combining them (concatenating) into a composite audio signal. The instructions then guide the processor to analyze this composite signal to identify audio artifacts caused by concatenation and compensate for them to form a corrected composite audio signal. Furthermore, the instructions cause the processor to *reverse the composite audio signal* and *analyze the reversed composite audio signal to identify additional audio artifacts associated with concatenation*. Finally, the corrected signal is provided to a voice biometrics module.

Patent Metadata

Filing Date

Unknown

Publication Date

July 21, 2020

Inventors

John Paul LESSO
Gordon Richard MCLEOD

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “AUDIO PROCESSING” (10720171). https://patentable.app/patents/10720171

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10720171. See llms.txt for full attribution policy.