According to the present invention, multiple parametrically encoded audio signals can be efficiently combined using an audio signal generator, which generates an audio output signal by combining the down-mix channels and the associated parameters of the audio signals directly within the parameter domain, i.e. without reconstructing or decoding the individual input audio signals prior to the generation of the audio output signal. This is achieved by direct mixing of the associated down-mix channels of the individual input signals. It is one key feature of the present invention that the combination of the down-mix channels is achieved by simple, computationally inexpensive arithmetic operations.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
Claims not yet imported for this patent.
Claims are being imported from USPTO data. Check back soon!
See the raw claims text section below.
Original claims text from the patent document.
Claim 1: . Audio signal generator for generating an audio output signal, comprising:
Claim 2: . Audio signal generator in accordance with, in which the channel combiner is operative to derive the combined down-mix channel using a linear combination of the first and the second down-mix channel.
Claim 3: . Audio signal generator in accordance with, in which the channel combiner is operative to use a linear combination having coefficients depending on the energy E(sA2(n)) within the first down-mix channel and on the energy E(sB2(n)) within the second down-mix channel.
Claim 4:
Claim 5: . Audio signal generator in accordance with, in which the channel combiner is operative to use a linear combination having coefficients depending on the number U of the first original channels and the number V of the second original channels.
Claim 6:
Claim 7: . Audio signal generator in accordance with, in which the parameter calculator is operative to use a predetermined channel of the first original channels or the at least one second original channel as common reference channel.
Claim 8: . Audio signal generator in accordance with, in which the parameter calculator is operative to use the reference channel of the first audio signal as the common reference channel.
Claim 9: . Audio signal generator in accordance with, in which the parameter calculator is operative to use the combined down-mix channel as the common reference channel.
Claim 10: . Audio signal generator in accordance with, in which the parameter calculator is operative to use the original channel as the common reference channel which has the highest energy.
Claim 11:
Claim 12: . Audio signal generator in accordance with, in which the parameter calculator is operative to use the reference channel as the common reference channel and the original parameter aas first combined parameter yu and to derive the second combined parameter yu+1 for the at least one second original channel with respect to the reference channel.
Claim 13: . Audio signal generator in accordance with, in which the parameter calculator is operative to derive the combined parameters using the energy E{sA2(n)} of the first down-mix channel and the energy E{sB2(n)} of the second down-mix channel.
Claim 14: . Audio signal generator in accordance with, in which the parameter calculator is operative to further use coefficients gA associated to the first down-mix channel and gB associated to the second down-mix channel, the coefficients used for the linear combination of the first and second down-mix used by the channel combiner.
Claim 15:
Claim 16: . Audio signal generator in accordance with, in which the parameter calculator is operative to process frequency-portions of the first and the second down-mix channels associated with discrete frequency intervals such that combined parameters are derived for each discrete frequency interval.
Claim 17: . Audio signal generator in accordance with, in which the audio signal receiver is operative to receive audio signals comprising down-mix channels represented by sampling parameters sampled with a predetermined sample frequency.
Claim 18: . Method of generating an audio output signal, the method comprising:
Claim 19: . Conferencing System having an audio signal generator for generating an audio output signal, comprising:
Claim 20: . A non-transitory storage medium having stored thereon a computer program for, when running on a computer, implementing a method for generating an audio output signal, the method comprising:
Claim 21: 21. Method of generating an audio output signal, the method comprising:
Claim 22: 22. Method in accordance with, wherein the combined down-mix channel is derived using a linear combination which has coefficients depending on the energy within the first down-mix channel and on the energy within the second down-mix channel.
Claim 23:
Claim 24: 24. Method in accordance with, the linear combination having coefficients depending on the number U of the first original channels and the number V of the second original channels.
Claim 25:
Claim 26: 26. Method in accordance with, further comprising using a predetermined channel of the first original channels or the at least one second original channel as common reference channel.
Claim 27: 27. Method in accordance with, further comprising using the reference channel of the first audio signal as the common reference channel.
Claim 28: 28. Method in accordance with, further comprising using the combined down-mix channel as the common reference channel.
Claim 29: 29. Method in accordance with, further comprising using the original channel as the common reference channel which has the highest energy.
Claim 30:
Claim 31: 31. Method in accordance with, further comprising using the reference channel as the common reference channel and the original parameter aas first combined parameter yand deriving the second combined parameter y˜, for the at least one second original channel with respect to the reference channel.
Claim 32: 32. Method in accordance with, further comprising deriving the combined parameters using the energy E{S(n)} of the first down-mix channel and the energy E{S(n)} of the second down-mix channel.
Claim 33: 33. Method in accordance with, further comprising using coefficients gA associated to the first down-mix channel and gB associated to the second down-mix channel, the coefficients used for the linear combination of the first and second down-mix used by the channel combiner.
Claim 34:
Claim 35: 35. Method in accordance with, further comprising processing frequency-portions of the first and the second down-mix channels associated with discrete frequency intervals such that combined parameters are derived for each discrete frequency interval.
Claim 36: 36. Method in accordance with, further comprising receiving audio signals comprising down-mix channels represented by sampling parameters sampled with a predetermined sample frequency.
Claim 37: 37. A non-transitory storage medium having stored thereon a computer program for, when running on a computer, implementing a method for generating an audio output signal, the method comprising:
Complete technical specification and implementation details from the patent document.
Notice: More than one reissue application has been filed on Aug. 2, 2024. Applicant has filed the following continuation reissues of U.S. patent application Ser. No. 18/793,472 filed Aug. 2, 2024, which itself is an application for reissue of issued U.S. Pat. No. 8,139,775, issued Mar. 20, 2012 (U.S. application Ser. No. 11/739,544 filed Apr. 24, 2007). These reissue applications include continuation reissue applications application Ser. Nos. 18/793,552, 18/793,601, 18/793,626, 18/793,642, 18/793/654.
This is a continuation reissue application of U.S. patent application Ser. No. 18/793,472 filed Aug. 2, 2024, which is an application for reissue of issued U.S. Pat. No. 8,139,775, issued Mar. 20, 2012 (U.S. application Ser. No. 11/793,544 filed Apr. 24, 2007), which claims priority to U.S. Provisional Application No. 60/819,419, filed Jul. 7, 2006, all of which are incorporated herein by reference in their entirety.
The present invention relates to multi-channel audio coding and, in particular, to a concept of combining parametrically coded audio-streams in a flexible and efficient way.
The recent development in the area of audio coding has brought forward several parametric audio coding techniques for jointly coding a multi-channel audio signal (e.g. 5.1 channels) signal into one (or more) down-mix channel plus a side information stream. Generally, the side information stream has parameters relating to properties of the original channels of the multi-channel signal either with respect to other original channels of the multi-channel signal or with respect to the down-mix channel. The particular definition of parameters of the reference channel, to which these parameters relate, depends on the specific implementation. Some of the techniques known in the art are “binaural cue coding”, “spatial audio coding”, and “parametric stereo”.
For details of these particular implementations, reference is herewith made to related publications. Binaural cue coding is for example detailed in:
C. Faller and F. Baumgarte, “Efficient representation of spatial audio using perceptual parametrization,” IEEE WASPAA, Mohonk, N.Y., October 2001; F. Baumgarte and C. Faller, “Estimation of auditory spatial cues for binaural cue coding,” ICASSP, Orlando, Fla., May 2002; C. Faller and F. Baumgarte, “Binaural cue coding: a novel and efficient representation of spatial audio,” ICASSP, Orlando, Fla., May 2002; C. Faller and F. Baumgarte, “Binaural cue coding applied to audio compression with flexible rendering,” AES 113th Convention, Los Angeles, Preprint 5686, October 2002; C. Faller and F. Baumgarte, “Binaural Cue Coding—Part II: Schemes and applications,” IEEE Trans. on Speech and Audio Proc., vol. 11, no. 6, November 2003.
While binaural cue coding uses multiple original channels, parametric stereo is a related technique for the parametric coding of a two-channel stereo signal resulting in a transmitted mono signal and parameter side information, as for example reviewed in the following publications: J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, “High-Quality Parametric Spatial Audio Coding at Low Bitrates”, AES 116th Convention, Berlin, Preprint 6072, May 2004; E. Schuijers, J. Breebaart, H. Purnhagen, J. Engdegard, “Low Complexity Parametric Stereo Coding”, AES 116th Convention, Berlin, Preprint 6073, May 2004.
Other technologies are based on multiplexing of arbitrary numbers of audio sources or objects into a single transmission audio channel. Schemes based on multiplexing are, for example, introduced as “flexible rendering” in BCC (binaural cue coding) related publications or, more recently, by a scheme called “joint source coding” (JSC). Related publications are, for example: C. Faller, “Parametric Joint Coding of Audio Sources”, Convention Paper 6752, 120th AES Convention, Paris, May 2006. Similar to the parametric stepreo and binaural cue coding schemes, these techniques are intended to encode multiple original audio objects (channels) for transmission by fewer down-mix channels. By additionally deriving object-based parameters for each input channel, which can be encoded at a very low data rate and which are also transmitted to a receiver, these objects can be separated at the receiver side and rendered (mixed) to a certain number of output devices, as for example head phones, two-channel stereo loudspeakers, or multi-channel loudspeaker set-ups. This approach allows for level adjustment and redistribution (panning) of the different audio objects to different locations in the reproduction set-up, i.e. at the receiver side.
Basically, such techniques operate as M-k-N transmitter, with M being the number of audio objects at the input, k being the number of transmitted down-mix channels, typically k≤2. N is the number of audio channels at the renderer output, i.e. for example the number of loudspeakers. That is, N=2 for a stereo renderer or N=6 for a 5.1 multi-channel speaker set-up. In terms of compression efficiency, typical values are e.g. 64 kbps or less for a perceptually coded down-mix channel (consisting of k audio channels) and approximately 3 kbps for object parameters per transmitted audio object.
Application scenarios for the above techniques are for example encoding of spatial audio scenes related to cinemamovie-productions to allow for a spatial-reproduction of sound in a home-theatre system. Common examples are the widely known 5.1 and 7.1 surround-sound tracks on movie media, such as DVD and the like. Movie-productions are becoming more and more complex with respect to the audio-scenes, which are intended to provide a spatial listening experience and thus have to be mixed with great care. Different sound engineers may be commissioned with the mixing of different surround sources or sound-effects and therefore, transmission of parametrically encoded multi-channel scenarios between the individual sound engineers is desirable, to transport the audio-streams of the individual sound engineers efficiently.
Another application scenario for such a technology is teleconferencing with multiple talkers at either end of a point-to-point connection. To save bandwidth, most teleconferencing set-ups operate with monophonic transmission. Using, for example, joint source coding or one of the other multi-channel encoding techniques for transmission, redistribution and level-alignment of the different talkers at the receiving end (each end) can be achieved and thus the intelligibility and balance of the speakers is enhanced by spending a marginally increased bit rate as compared to a monophonic system. The advantage of increased intelligibility becomes particularly evident in the special case of assigning each individual participant of the conference to a single channel (and thus speaker) of a multi-channel speaker set-up at a receiving end. This, however, is a special case. In general, the number of participants will not match the number of speakers at the receiving end. However, using the existing speaker setup it is possible to render the signal associated with each participant such that it appears to be originating from any desired position. That is, the individual participant is not only recognized by his/her different voice but also by the location of the audio-source related to the talking participant.
While the state of the art techniques implement concepts as to how to efficiently encode multiple channels or audio objects, all of the presently known techniques lack the possibility to combine two or more of these transmitted audio-streams efficiently to derive an output stream (output signal), which is a representation of all of the input audio-streams (input audio signals).
The problem arises, for example, when a teleconferencing scenario with more than two locations is considered, each location having one or more speakers. Then, an intermediate instance is required to receive the audio input signals of the individual sources and to generate an audio output signal for each teleconferencing location having only the information of the remaining teleconferencing locations. That is, the intermediate instance has to generate an output signal, which is derived from a combination of two or more audio input signals and which allows for a reproduction of the individual audio channels or audio objects of the two or more input signals.
A similar scenario may occur when two audio-engineers in a cinema-movie production want to combine their spatial-audio signals to check for the listening impression generated by both signals. Then, it may be desirable to directly combine two encoded multi-channel signals to check for the combined listening impression. That is, a combined signal needs to be such that it resembles all of the audio objects (sources) of the two audio-engineers.
However, according to prior art techniques, such a combination is only feasible by decoding of the audio signals (streams). Then, the decoded audio signals may again be re-encoded by prior art multi-channel encoders to generate a combined signal in which all of the original audio channels or audio objects are represented appropriately.
This has the disadvantage of high computational complexity, thus wasting a lot of energy and making it some times even unfeasible to apply the concept, especially in real-time scenarios. Furthermore, a combination by subsequent audio decoding and re-encoding can cause a considerable delay due to the two processing steps which is unacceptable for certain applications, such as teleconferencing/telecommunications.
It is the object of the present invention to provide a concept to efficiently combine multiple parametrically coded audio signals.
In accordance with a first aspect of the present invention, this object is achieved by an audio signal generator for generating an audio output signal, the audio signal generator comprising: an audio signal receiver for receiving a first audio signal comprising a first down-mix channel having information on two or more first original channels, and comprising an original parameter associated with one of the first original channels describing a property of one of the first original channels with respect to a reference channel; and a second audio-signal comprising a second down-mix channel having information on at least one second original channel; a channel combiner for deriving a combined down-mix channel by combining the first down-mix channel and the second down-mix channel; a parameter calculator for deriving a first combined parameter describing the property of one of the first original channels with respect to a common reference channel, and a second combined parameter describing the property of another one of the first original channels or of the at least one second original channel with respect to the common reference channel; and an output interface for outputting the audio output signal comprising the combined down-mix channel, the first and second combined parameters.
In accordance with a second aspect of the present invention, this object is achieved by a method of generating an audio output signal, the method comprising: receiving a first audio signal comprising a first down-mix channel having information on two or more first original channels, and comprising an original parameter associated with one of the first original channels describing a property of one of the first original channels with respect to a reference channel and a second audio signal comprising a second down-mix channel having information on at least one second original channel; deriving a combined down-mix channel by combining the first down-mix channel and the second down-mix channel; deriving a first combined parameter describing the property of one of the first original channels with respect to a common reference channel and a second combined parameter describing the property of another one of the first original channels or of the at least one second original channel with respect to a common reference channel; and outputting the audio output signal comprising the combined down-mix channel and the first and second combined parameters.
In accordance with a third aspect of the present invention, this object is achieved by a representation of three or more audio channels, comprising: a combined down-mix channel being a combination of a first down-mix channel having information on at least two first original channels and a second down-mix channel having information on at least one second original channel; a first parameter describing a property of one of the at least two first original channels with respect to a reference channel; and a second parameter describing the property of another channel of the first original channels or the property of the at least one second original channel with respect to the reference channel.
In accordance with a fourth aspect of the present invention, this object is achieved by a computer program implementing a method for generating an audio output signal, the method comprising: receiving a first audio signal comprising a first down-mix channel having information on two or more first original channels, and comprising an original parameter associated with one of the first original channels describing a property of one of the first original channels with respect to a reference channel and a second audio signal comprising a second down-mix channel having information on at least one second original channel; deriving a combined down-mix channel by combining the first down-mix channel and the second down-mix channel; deriving a first combined parameter describing the property of one of the first original channels with respect to a common reference channel and a second combined parameter describing the property of another one of the first original channels or of the at least one second original channel with respect to a common reference channel; and outputting the audio output signal comprising the combined down-mix channel and the first and second combined parameters.
In accordance with a fifth aspect of the present invention, this object is achieved by a conferencing system having an audio signal generator for generating an audio output signal, comprising: an audio signal receiver for receiving a first audio signal comprising a first down-mix channel having information on two or more first original channels, and comprising an original parameter associated with one of the first original channels describing a property of one of the first original channels with respect to a reference channel; and a second audio signal comprising a second down-mix channel having information on at least one second original channel; a channel combiner for deriving a combined down-mix channel by combining the first down-mix channel and the second down-mix channel; a parameter calculator for deriving a first combined parameter describing the property of one of the first original channels with respect to a common reference channel, and a second combined parameter describing the property of another one of the first original channels or of the at least one second original channel with respect to the common reference channel; and an output interface for outputting the audio output signal comprising the combined down-mix channel, the first and second combined parameter.
The present invention is based on the finding that multiple parametrically encoded audio signals can be efficiently combined using an audio signal generator or audio signal combiner, which generates an audio output signal by combining the down-mix channels and the associated parameters of the audio input signals directly within the parameter domain, i.e. without reconstructing or decoding the individual audio input signals prior to the generation of the audio output signal. To be more specific, this is achieved by direct mixing of the associated down-mix channels of the individual input signals, for example by summation or formation of a linear combination of the same. It is a key feature of the present invention that the combination of the down-mix channels is achieved by simple, computationally inexpensive arithmetical operations, such as summation.
The same holds true for the combination of the parameters associating the down-mix channels. As generally at least a sub-set of the associated parameters will have to be altered during the combination of the input audio signals, it is most important that the calculations performed to alter the parameters are simple and hence do not need significant computational power nor that they incur additional delay, e.g. by using filterbanks or other operations involving memory.
According to one embodiment of the present invention, an audio signal generator for generating an audio output signal is implemented to combine a first and a second audio signal, both being parametrically encoded. For generating the audio output signal, the inventive audio signal generator extracts the down-mix channels of the input audio signals and generates a combined down-mix channel by forming a linear combination of the two down-mix channels. That is, the individual channels are added with additional weights applied.
In a preferred embodiment of the present invention, the applied weights are derived by extremely simple arithmetical operations, for example by using the number of channels represented by the first audio signal and the second audio signal as a basis for the calculation.
In a further preferred embodiment, the weight calculation is performed under the assumption that each original audio channel of the input signals contributes to the total signal energy with the same quantity. That is, the weights applied are simple ratios of the channel numbers of the input signals and the total number of channels.
In a further preferred embodiment of the present invention, the weights of the individual down-mix channels are calculated based on the energy contained within the down-mix channels such as to allow for a more authentic reproduction of the combined down-mix channel included in the output audio signal generated.
In a further preferred embodiment of the present invention, the computational effort is further decreased in that only the parameters associated to one of the two audio signals are altered. That is, the parameters of the other audio signal are transmitted unaltered, therefore not causing any computations and hence minimizing the load on the inventive audio signal generator.
In the following paragraphs, the inventive concept will be detailed mainly for a coding scheme using joint source coding (JSC). In that sense, the current invention extends this technology for connecting multiple monophonic or JSC-enabled transceivers to remote stations by mixing JSC down-mix signals and object information within the parameter domain. As the above considerations have shown, the inventive concept is by no means restricted to the use of JSC-coding but could also be implemented with BCC-coding, or other multi-channel coding schemes, such as MPEG spatial audio coding (MPEG Surround) and the like.
As the inventive concept will be detailed mainly by using JSC coding, JSC coding will be shortly reviewed within the following paragraphs in order to more clearly point out the flexibility of the inventive concept and the enhancements achievable over prior art when applying the inventive concept to existing multi-channel audio coding schemes.
For the explanation of JSC coding, reference will in the following be made to. Within the following figures, functionally identical components share the same reference marks, indicating that individual components providing the same functionality may be interchanged between the single embodiments of the present invention without loosing or restricting functionality and without limiting the scope of the present invention.
shows a block diagram of the joint source coding scheme, a corresponding encoderand a corresponding decoder.
The encoderreceives discrete audio inputs s(n)a,b, andc and creates a down-mix signal s(n), for example by a summation of the waveforms.
Additionally, a parameter extractorwithin encoderextracts parametric side information for each single object (signala,b, andc). Although not shown in, the down-mix signalmay be further compressed by a speech or audio coder and is transmitted with the adjacent parametric side information to the JSC decoder. A synthesis modulewithin decoderregenerates estimatesa,b, andc (ŝ(n)) of the input objects (channelsa,b, andc)
In order to reconstruct estimatesa,b, andc, being perceptually similar to the discrete input objects (input channels)a,b, andc, appropriate parametric side information for each channel has to be extracted. As the individual channels are summed up for generation of down-mix signal, power ratios between channels are such suitable quantities. Therefore, the parametric information for the different objects or channels consists of power ratios Δp of each object relative to the first object (reference object).
This information is derived in the frequency domain in non-equally spaced frequency bands (sub-bands) corresponding to the critical band resolution of human auditory perception. This is a concept described in more detail for example in: J. Blauert, “Spatial Hearing: The Psychophysics of Human Sound Localization”, The MIT Press, Cambridge, Mass., revised edition 1997.
That is, the broad band input audio channels are filtered into several frequency bands of finite bandwidth and for each of the individual frequency bands, the following calculations are performed. As already mentioned, the bandwise power of the first object (reference object or reference channel) acts as a reference value.
To avoid further introduction of artefacts, for example introduced by a division by zero, these power ratios (in the logarithmic representation) can further be limited to a maximum of, for example, 24 dB in each subband. The power ratio may furthermore be quantized prior to submission to additionally save transmission bandwidth.
It is not necessary to explicitly transmit the power of the first object. Instead, this value can be derived from the assumption that for statistically independent objects, the sum of the powers of the synthesized signals s(n) is equal to the power of the down-mix signal s(n). In terms of a mathematical expression, this means:
Based on this assumption and equation, the subband powers for the first object (the reference object or reference channel) can be reconstructed, as it will be described further below when detailing the inventive concept.
To summarize, an audio signal or audio-stream according to JSC comprises a down-mix channel and associated parameters, the parameters describing power ratios of original channels with respect to one original reference channel. It may be noted that this scenario may easily be altered in that other channels are selected to be the reference channel. For example, the down-mix channel itself may be the reference channel, requiring the transmission of one additional parameter, relating the power of the first, former reference channel, to the power of the down-mix channel. Also, the reference channel may be chosen to be varying in that the one channel having the most power is selected to be the reference channel. Hence, as the power within the individual channels may change with time, the reference channel may also vary with time. Also, due to the fact that all processing is typically carried out in a frequency selective fashion, the reference channel can be different for different frequency bands.
shows a further enhanced scheme of JSC coding, based on the scheme of. The features detailed with respect toare enclosed with the storage or transmission box, receiving the input channelsto be encoded and outputting estimatesof the input channels. The scheme ofis enhanced in that it furthermore comprises a mixerreceiving the estimates. That is, the synthesized objectsare not output as single audio signals directly, but rendered to N output channels in the mixer module. Such a mixer can be implemented in different ways., for example receiving additional mixing parametersas input, to steer the mixing of the synthesized objects. As an example only, one may consider a teleconferencing scenario, in which each of the output channelsis attributed to one participant of the conference. Therefore, a participant at the receiving end has the possibility to virtually separate the other participants by assigning their voices to individual positions. Thus, not only the voice may serve as criterion to distinguish between different participants of a telephone-conference, but also the direction from which a listener receives the voice of a participant. Furthermore, a listener may arrange the output channel such that all the participants from the same teleconferencing location are grouped in the same direction, enhancing the perceptual experience even more.
As shown in, s(n) . . . s(n) denote the discrete audio objects at the input of the JSC encoder. At the JSC decoder output ŝ(n) . . . ŝ(n) represent the ‘virtually’ separated audio objects that are fed into the mixer. Mixing parameterscan be interactively modified at the receiver side to place the different objects in a sound stage that is reproduced by the output channels {circumflex over (x)}(n) . . . {circumflex over (x)}(n).
shows the application of multi-channel audio coding schemes to a basic teleconferencing scenario, taking place between two locations. Here, a first locationcommunicates with a second location. The first location may have A participants, i.e. A audio objects, the second location has B participants or audio objects. For point-to-point teleconferencing, the described technology of JSC coding can be applied straightforward to transmit audio signals of multiple objects at each location to the corresponding remote station. That is, (A-1) parameters aand an associated down-mix are transferred to location. In the opposite direction, (B-1) parameters bare transmitted together with an associated down-mix to location.
For teleconferencing with more than two end points, the situation is completely different, as illustrated in.
shows, apart from locationsanda third location. As can be seen in, such a scenario needs a central distributor for the associated audio signals, generally called multi point control unit, MCU. Each of the locations (sites),andis connected to the MCU. For each site,and, there is a single upstream to the MCU containing the signal from the site. As each individual site needs to receive the signals from the remaining sites, the down-stream to each site,andis a mix of the signals of the other sites, excluding the site's own signal, which is also referred to as the (N-1) signal. Generally, to fulfill the requirements of the set-up and to keep the transmission bandwidth reasonably low, transmitting N-1 JSC coded streams from the MCU to each site is not feasible. This would, of course, be the straightforward option.
The state of the art approach to derive the individual down-streams is to resynthesize all incoming streams (objects) within the MCUusing a JSC decoder. Then, the resynthesized audio objects could be regrouped and re-encoded such as to provide every site with audio streams comprising the desired audio objects or audio channels. Even within this simple scenario, this would mean three decoding and three encoding tasks, which must be simultaneously performed within MCU. Despite the significant computational demands, audible artefacts can be additionally expected by this parametric “tandem coding” (repeated encoding/decoding) process. Increasing the number of sites would further increase the number of streams and hence the number of required encoding or decoding processes, making none of the straightforward approaches feasible for real-time scenarios.
Unknown
April 7, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.