An encoding technique encoding a sound signal at a low bit rate with reduced processing. The technique includes: an interval determination determining an interval T between samples corresponding to periodicity of an audio signal or an integer multiple of a fundamental frequency of the audio signal from a set S of candidates for the interval T; and a side information generating encoding the determined interval T to obtain side information. The interval determining determines the interval T from a set S of Y candidates (Y<Z) including Z2 candidates (Z2<Z) selected from among Z candidates for the interval T representable with the side information without depending on a candidate subjected to the interval determination in a previous frame a predetermined number of frames before the current frame and including a candidate subjected to the interval determination in the previous frame the predetermined number of frames before the current frame.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A computer-implemented encoding method for encoding a sample string in a frequency domain that is derived from an audio signal in frames, executing on a processor, the method comprising: a step of receiving the sample string of the audio signal in the time-domain; a step of transforming the audio signal in the time-domain to the frequency-domain; an interval determination step of determining an interval T between samples from a set S of candidates for the interval T, the interval T corresponding to a periodicity of the audio signal or to an integer multiple of a fundamental frequency of the audio signal; a side information generating step of encoding the interval T determined at the interval determination step to obtain side information; outputting the side information to a decoder; a sample string encoding step of encoding a rearranged sample to obtain a code string, the rearranged sample string (1) including all of the samples in the sample string, and (2) being a sample string in which at least some of the samples are rearranged so that all or some of one or a plurality of successive samples including a sample corresponding to the periodicity or the fundamental frequency of the audio signal in the sample string and one or a plurality of successive samples including a sample corresponding to an integer multiple of the periodicity or the fundamental frequency of the audio signal in the sample string are gathered together into a cluster on the basis of the interval T determined by the interval determination step; wherein the interval determination step determines the interval T from a set S of candidates for the interval T, the set S being made up of Y candidates among Z candidates for the interval T, the Y candidates including Z 2 candidates selected without depending on a previous candidate for the interval T corresponding to a periodicity of the audio signal or to an integer multiple of a fundamental frequency of the audio signal, the previous candidate subjected to the interval determination step in a previous frame a predetermined number of frames before the current frame and including the previous candidate subjected to the interval determination step in the previous frame the predetermined number of frames before the current frame, the Z candidates being representable with the side information, where Z 2 <Z and Y<Z; and outputting the code string to the decoder, wherein the code string has a compressed amount of data compared to the received sample string of the audio signal, and the decoder is configured to reproduce a sample string of an audio signal in the time-domain based on the code string and the side information.
An audio encoder takes an audio signal, transforms it into the frequency domain, and encodes it in frames. It determines a periodicity interval T of the audio signal (or a multiple of its fundamental frequency) from a set S of candidate intervals. Side information representing the interval T is generated. The audio samples are rearranged based on the interval T, grouping samples related to the periodicity or its multiples into clusters. The rearranged sample string is then encoded into a code string, which is outputted along with the side information. The set S of candidates for interval T includes Y candidates from a larger set of Z representable candidates. Y includes Z2 candidates chosen independently of the periodicity interval used in previous frames, plus the interval from a frame a few frames back. The resulting code string is smaller than the original audio data, enabling a decoder to reconstruct the audio.
2. The encoding method according to claim 1 , wherein the interval determination step further comprises an adding step of adding to the set S a value adjacent to the previous candidate subjected to the interval determination step in a previous frame the predetermined number of frames before the current frame and/or a value having a predetermined difference from the candidate.
In the audio encoding method from the previous description, when selecting the candidate intervals for the periodicity T, the set S is augmented by adding values adjacent to the interval T used a few frames ago, and/or by adding values that differ from it by a predetermined amount. This allows the encoder to adapt to slowly varying audio signals more effectively.
3. The encoding method according to claim 1 or 2 , wherein the interval determination step further comprises a preliminary selection step of selecting some of Z 1 candidates among the Z candidates for the interval T representable with the side information as the Z 2 candidates on the basis of an indicator obtainable from the audio signal and/or sample string in the current frame, where Z 2 <Z 1 .
In the audio encoding method described previously, the selection of the Z2 candidates from the Z possible periodicity intervals includes a preliminary selection step. This step pre-selects Z1 (where Z2 < Z1) candidate intervals based on an indicator derived from the current audio signal or its frequency domain representation. This reduces the computational complexity of the subsequent search for the best periodicity interval.
4. The encoding method according to claim 1 or 2 , wherein the interval determination step further comprises: a preliminary selection step of selecting some of Z 1 candidates among the Z candidates for the interval T representable with the side information on the basis of an indicator obtainable from the audio signal and/or sample string in the current frame; and a second adding step of selecting, as the Z 2 candidates, a set of a candidate selected at the preliminary selection step and a value adjacent to the candidate selected at the preliminary selection step and/or a value having a predetermined difference from the candidate selected at the preliminary selection step.
In the audio encoding method described previously, the encoder selects a set of Z2 candidate intervals for periodicity T by first pre-selecting Z1 candidate intervals (where Z2 < Z1) based on an indicator derived from the current audio signal. Then, it creates the Z2 set by including the selected Z1 candidates and adding values adjacent to those candidates, or values that have a predetermined difference from those candidates.
5. The encoding method according to claim 1 or 2 , wherein the interval determination step comprises: a second preliminary selection step of selecting some of candidates for the interval T that are included in the set S on the basis of an indicator obtainable from the audio signal and/or sample string in the current frame; and a final selection step of determining the interval T from a set made up of some of the candidates selected at the second preliminary selection step.
In the audio encoding method described previously, the interval determination further includes: first, a preliminary selection from the existing candidate set S using an indicator derived from the audio signal in the current frame, then a final selection step to determine interval T using only the pre-selected candidates.
6. The encoding method according to claim 1 , wherein the greater an indicator indicating the degree of stationarity of the audio signal in the current frame, the greater the proportion of candidates subjected to the interval determination step in the previous frame the predetermined number of frames before the current frame to the set S is.
In the audio encoding method described previously, the algorithm adapts its search for the periodicity interval based on the stationarity of the audio signal. If the audio signal is highly stationary (doesn't change much over time), the set S of candidate periodicity intervals is weighted to include a greater proportion of values used in a previous frame.
7. The encoding method according to claim 1 , wherein when an indicator indicating the degree of stationarity of the audio signal in the current frame is smaller than a predetermined threshold, only the Z 2 candidates are included in the set S.
In the audio encoding method described previously, if a stationarity indicator shows the current audio frame is not stationary (changes rapidly), only the Z2 candidates (those independent of previous frames) are included in the candidate set S. This avoids biasing the search towards potentially outdated periodicity values.
8. The encoding method according to claim 6 or 7 , wherein the indicator indicating the degree of stationarity of the audio signal in the current frame increases when at least one of the following conditions occurs: (a-1) that a prediction gain of the audio signal in the current frame increases, (a-2) that an estimated prediction gain of the audio signal in the current frame increases, (b-1) that the difference between a prediction gain of the audio signal in the frame immediately preceding the current frame and the prediction gain of the audio signal in the current frame decreases, (b-2) that the difference between an estimated prediction gain in the immediately preceding frame and the estimated prediction gain in the current frame decreases, (c-1) that the sum of amplitudes of samples of the audio signal included in the current frame increases, (c-2) that the sum of amplitudes of samples included in a sample string obtained by transforming a sample string of the audio signal included in the current frame into a frequency domain increases, (d-1) that the difference between the sum of amplitudes of samples of the audio signal included in the immediately preceding frame and the sum of amplitudes of samples of the audio signal included in the current frame decreases, (d-2) that the difference between the sum of amplitudes of samples included in a sample string obtained by transforming a sample string of the audio signal included in the immediately preceding frame into a frequency domain and the sum of amplitudes of samples included in a sample string obtained by transforming a sample string of the audio signal included in the current frame into a frequency domain decreases, (e-1) that power of the audio signal in the current frame increases, (e-2) that power of a sample string obtained by transforming a sample string of the audio signal in the current frame into a frequency domain increases, (f-1) that the difference between power of the audio signal in the immediately preceding frame and power of the audio signal in the current frame decreases, and (f-2) that the difference between power of a sample string obtained by transforming a sample string of the audio signal in the immediately preceding frame into a frequency domain and power of a sample string obtained by transforming a sample string of the audio signal in the current frame into a frequency domain decreases.
In the audio encoding method from the previous descriptions, the stationarity indicator increases if any of the following conditions are met: Prediction gain increases; The difference between prediction gains in consecutive frames decreases; The sum of sample amplitudes in the current frame increases; The difference between the sum of sample amplitudes in consecutive frames decreases; Power of the signal increases; The difference between signal power in consecutive frames decreases. These conditions are evaluated in both the time and frequency domains.
9. The encoding method according to claim 1 , wherein the sample string encoding step comprises the step of outputting the code string obtained by encoding the sample string before being rearranged or the code string obtained by encoding the rearranged sample string and the side information, whichever has a smaller code amount.
In the audio encoding method described previously, the encoding step outputs either the code string generated from the rearranged audio samples, or the code string generated from the original (un-rearranged) audio samples, choosing whichever results in the smaller code size along with any necessary side information.
10. The encoding method according to claim 1 , wherein the sample string encoding step outputs the code string obtained by encoding the rearranged sample string and the side information when the sum of the code amount of or an estimated value of the code amount of the code string obtained by encoding the rearranged sample string and the code amount of the side information is smaller than the code amount of or an estimated value of the code amount of the code string obtained by encoding the sample string before being rearranged, and outputs the code string obtained by encoding the sample string before being rearranged when the code amount of or an estimated value of the code amount of the code string obtained by encoding the sample string before being rearranged is smaller than the sum of the code amount of or an estimated value of the code amount of the code string obtained by encoding the rearranged sample string and the code amount of the side information.
In the audio encoding method described previously, the encoder chooses whether to encode the original or rearranged samples by comparing the estimated or actual code sizes. If the rearranged sample encoding plus side information is smaller than encoding the original, the rearranged version is used. Otherwise, the original is encoded to minimize the overall bit rate.
11. The encoding method according to claim 9 or 10 , wherein the proportion of candidates subjected to the interval determination step in the previous frame the predetermined number of frames before the current frame to the set S is greater when a code string output in the immediately preceding frame is a code string obtained by encoding a rearranged sample string than when a code string output in the immediately preceding frame is a code string obtained by encoding a sample string before being rearranged.
In the audio encoding method described previously, if the previous frame's output was encoded using the rearranged samples, the current frame's candidate set S for periodicity T will include a greater proportion of values used in previous frames. This prioritizes temporal consistency when the rearrangement method was successful in the previous frame.
12. The encoding method according to claim 9 or 10 , wherein when a code string output in the immediately preceding frame is a code string obtained by encoding a sample string before being rearranged, the set S includes only the Z 2 candidates.
In the audio encoding method described previously, if the previous frame was encoded using the original (un-rearranged) samples, the set S includes only the Z2 candidates (those independent of previous frames). This reduces the reliance on potentially incorrect or irrelevant periodicity values from prior frames.
13. The encoding method according to claim 9 or 10 , wherein when the current frame is a temporally first frame, or when the immediately preceding frame is coded by an encoding method different from the encoding method, or when a code string output in the immediately preceding frame is a code string obtained by encoding a sample string before being rearranged, the set S includes only the Z 2 candidates.
In the audio encoding method described previously, if the current frame is the first frame, or the immediately preceding frame was coded by a different encoding method, or if the immediately preceding frame's output was encoded using a sample string before being rearranged, the set S of candidates only includes the Z2 candidates.
14. A computer-implemented method for determining a periodic feature amount of an input audio signal in frames, executing on a processor, the method comprising: a step of receiving the audio signal in the time-domain; a step of transforming the audio signal in the time-domain to the frequency-domain; a periodic feature amount determination step of determining a periodic feature amount of the audio signal from a set of candidates for the periodic feature amount of the audio signal on a frame-by-frame basis; outputting the periodic feature amount of the audio signal; a side information generating step of encoding the periodic feature amount obtained at the periodic feature amount determination step to obtain side information; and outputting the side information, wherein the periodic feature amount determination step determines a periodic feature amount of the audio signal from a set S of candidates for the periodic feature amount of the audio signal, the set S being made up of Y candidates among Z candidates for the periodic feature amount of the audio signal, the Y candidates including Z 2 candidates selected without depending on a previous candidate for the periodic feature amount of the audio signal, the previous candidate subjected to the periodic feature amount determination step in a previous frame a predetermined number of frames before the current frame and including the previous candidate subjected to the periodic feature amount determination step in the previous frame the predetermined number of frames before the current frame, the Z candidates being representable with the side information, where Z 2 <Z and Y<Z; wherein the periodic feature amount of the audio signal is a fundamental frequency or pitch period of the audio signal, wherein the side information is configured to be outputted to a decoder along with a code string, the code string being generated by encoding a rearranged sample of the audio signal and having a compressed amount of data compared to the received sample string of the audio signal, and the decoder is configured to reproduce a sample string of an audio signal in the time-domain based on the code string and the side information.
A method determines the periodic feature (fundamental frequency or pitch period) of an audio signal in frames. The signal is transformed to the frequency domain. The periodic feature amount is determined from a set S of candidate values. Side information representing this value is generated. The set S is a subset of Z possible values, including Z2 values chosen independently of previous frames' periodic feature. The resulting side information is used with a code string generated by encoding a rearranged sample of the audio signal to reproduce the audio signal in a decoder.
15. The periodic feature amount determination method according to claim 14 , wherein the periodic feature amount determination step further comprises an adding step of adding to the set S a value adjacent to a candidate subjected to the periodic feature amount determination step in a previous frame the predetermined number of frames before the current frame and/or a value having a predetermined difference from the candidate.
In the periodic feature determination method described previously, the set S of candidate periodic feature amounts is augmented by adding values adjacent to a periodic feature amount used in a previous frame and/or values that have a predetermined difference from the candidate value.
16. The periodic feature amount determination method according to claim 14 , wherein the greater an indicator indicating the degree of stationarity of the audio signal in the current frame, the greater the proportion of candidates subjected to the periodic feature determination step in the previous frame the predetermined number of frames before the current frame to the set S is.
In the periodic feature determination method described previously, the method uses a stationarity indicator. The more stationary the audio signal in the current frame, the greater the proportion of values subjected to the periodic feature determination step in a previous frame that are included in the set S.
17. The periodic feature amount determination method according to claim 16 , wherein when the indicator indicating the degree of stationarity of the audio signal in the current frame is smaller than a predetermined threshold, only the Z 2 candidates are included in the set S.
In the periodic feature determination method described previously, if the indicator of stationarity for the audio signal in the current frame is below a certain threshold, the method includes only the Z2 candidates in set S.
18. The periodic feature amount determination method according to claim 16 or 17 , wherein the indicator indicating the degree of stationarity of the audio signal in the current frame increases when at least one of the following conditions occurs: (a-1) that a prediction gain of the audio signal in the current frame increases, (a-2) that an estimated prediction gain of the audio signal in the current frame increases, (b-1) that the difference between a prediction gain of the audio signal in the frame immediately preceding the current frame and the prediction gain of the audio signal in the current frame decreases, (b-2) that the difference between an estimated prediction gain in the immediately preceding frame and the estimated prediction gain in the current frame decreases, (c-1) that the sum of amplitudes of samples of the audio signal included in the current frame increases, (c-2) that the sum of amplitudes of samples included in a sample string obtained by transforming a sample string of the audio signal included in the current frame into a frequency domain increases, (d-1) that the difference between the sum of amplitudes of samples of the audio signal included in the immediately preceding frame and the sum of amplitudes of samples of the audio signal included in the current frame decreases, (d-2) that the difference between the sum of amplitudes of samples included in a sample string obtained by transforming a sample string of the audio signal included in the immediately preceding frame into a frequency domain and the sum of amplitudes of samples included in a sample string obtained by transforming a sample string of the audio signal included in the current frame into a frequency domain decreases, (e-1) that power of the audio signal in the current frame increases, (e-2) that power of a sample string obtained by transforming a sample string of the audio signal in the current frame into a frequency domain increases, (f-1) that the difference between power of the audio signal in the immediately preceding frame and power of the audio signal in the current frame decreases, and (f-2) that the difference between power of a sample string obtained by transforming a sample string of the audio signal in the immediately preceding frame into a frequency domain and power of a sample string obtained by transforming a sample string of the audio signal in the current frame into a frequency domain decreases.
In the periodic feature determination method from the previous descriptions, the stationarity indicator increases if any of the following conditions are met: Prediction gain increases; The difference between prediction gains in consecutive frames decreases; The sum of sample amplitudes in the current frame increases; The difference between the sum of sample amplitudes in consecutive frames decreases; Power of the signal increases; The difference between signal power in consecutive frames decreases. These conditions are evaluated in both the time and frequency domains.
19. A encoder encoding a sample string in a frequency domain that is derived from an audio signal in frames, the encoder comprising a processor configured to act as: a frequency-domain transform unit that receives the sample string of the audio signal in the time domain and transforms the audio signal in the time-domain to the frequency-domain; an interval determination unit that determines an interval T between samples from a set S of candidates for the interval T, the interval T corresponding to a periodicity of the audio signal or to an integer multiple of a fundamental frequency of the audio signal; a side information generating unit that encodes the interval T determined by the interval determination unit to obtain side information and outputs the side information to a decoder; a sample string encoding unit that encodes a rearranged sample string to obtain a code string and outputs the code string to the decoder, the rearranged sample string (1) including all of the samples in the sample string, and (2) being a sample string in which at least some of the samples are rearranged so that all or some of one or a plurality of successive samples including a sample corresponding to the periodicity or the fundamental frequency of the audio signal in the sample string and one or a plurality of successive samples including a sample corresponding to an integer multiple of the periodicity or the fundamental frequency of the audio signal in the sample string are gathered together into a cluster on the basis of the interval T determined by the interval determination unit; wherein the interval determination unit determines the interval T from a set S of candidates for the interval T, the set S being made up of Y candidates among Z candidates for the interval T, the Y candidates including Z 2 candidates selected without depending on a previous candidate for the interval T corresponding to a periodicity of the audio signal or to an integer multiple of a fundamental frequency of the audio signal, the previous candidate subjected to processing by the interval determination unit in a previous frame a predetermined number of frames before the current frame and including the previous candidate subjected to the processing by the interval determination unit in the previous frame the predetermined number of frames before the current frame, the Z candidates being representable with the side information, where Z 2 <Z and Y<Z, wherein the code string and the side information have a compressed amount of data compared to the received sample string of the audio signal, and the decoder is configured to reproduce a sample string of an audio signal in the time-domain based on the code string and the side information.
An audio encoder transforms an audio signal to the frequency domain in frames. It determines the periodicity interval T (or a multiple of the fundamental frequency) from a candidate set S. Side information for T is created and output. The audio samples are rearranged based on T, grouping related samples, and then encoded into a code string along with the side information. S includes Y candidates chosen from Z representable candidates, where Y includes Z2 candidates independent of past frames plus the interval from a few frames back. The decoder can reconstruct the audio based on this compressed data.
20. The encoder according to claim 19 , wherein the sample string encoding unit outputs the code string obtained by encoding the rearranged sample string and the side information when the sum of the code amount of or an estimated value of the code amount of the code string obtained by encoding the rearranged sample string and the code amount of the side information is smaller than the code amount of or an estimated value of the code amount of the code string obtained by encoding the sample string before being rearranged, and outputs the code string obtained by encoding the sample string before being rearranged when the code amount of or an estimated value of the code amount of the code string obtained by encoding the sample string before being rearranged is smaller than the sum of the code amount of or an estimated value of the code amount of the code string obtained by encoding the rearranged sample string and the code amount of the side information.
The audio encoder described previously chooses whether to encode the original or rearranged audio samples by comparing the estimated code sizes. If encoding the rearranged samples plus side information produces a smaller code size than encoding the original samples, the encoder outputs the rearranged version. Otherwise, it encodes the original samples.
21. A periodic feature amount determination apparatus determining a periodic feature amount of an input audio signal in frames, the apparatus comprising a processor configured to act as: a frequency-domain transform unit that receives the sample string of the audio signal in the time domain and transforms the audio signal in the time-domain to the frequency-domain; a periodic feature amount determination unit that determines a periodic feature amount of the audio signal from a set of candidates for the periodic feature amount on a frame-by-frame basis and outputs the periodic feature amount of the audio signal; and a side information generating unit that encodes the periodic feature amount obtained at the periodic feature amount determination unit to obtain side information and outputs the side information; wherein the periodic feature amount determination unit determines a periodic feature amount of the audio signal from a set S of candidates for the periodic feature amount of the audio signal, the set S being made up of Y candidates among Z candidates for the periodic feature amount of the audio signal, the Y candidates including Z 2 candidates selected without depending on a previous candidate for the periodic feature amount of the audio signal, the previous candidate subjected to the periodic feature amount determination unit in a previous frame a predetermined number of frames before the current frame and including the previous candidate subjected to the periodic feature amount determination unit in the previous frame the predetermined number of frames before the current frame, the Z candidates being representable with the side information, where Z 2 <Z and Y<Z; wherein the periodic feature amount of the audio signal is a fundamental frequency or pitch period of the audio signal, wherein the side information is configured to be outputted to a decoder along with a code string, the code string being generated by encoding a rearranged sample of the audio signal and having a compressed amount of data compared to the received sample string of the audio signal, and the decoder is configured to reproduce a sample string of an audio signal in the time-domain based on the code string and the side information.
An apparatus determines the periodic feature (fundamental frequency or pitch period) of an audio signal in frames. The apparatus transforms the signal to the frequency domain. The periodic feature is determined from a candidate set S. Side information for the periodic feature is generated and output. The set S is a subset of Z possible values, including Z2 values chosen independently of the periodic feature determined from previous frames. The side information is used with a code string to reproduce the audio signal in a decoder.
22. A non-transitory computer-readable recording medium having recorded thereon a computer program for causing a computer to execute the steps of the encoding method according to claim 1 or the periodic feature amount determination method according to claim 14 .
A non-transitory computer-readable medium stores instructions that, when executed by a computer, cause the computer to perform the audio encoding method, which encodes an audio signal, transforms it into the frequency domain, and encodes it in frames, or the periodic feature determination method, which determines the periodic feature (fundamental frequency or pitch period) of an audio signal in frames.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
January 18, 2012
July 18, 2017
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.