US-9728193

Frame erasure concealment for a multi-rate speech and audio codec

PublishedAugust 8, 2017

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

An audio coding terminal and method is provided. The terminal includes a coding mode setting unit to set an operation mode, from plural operation modes, for input audio coding by a codec, configured to code the input audio based on the set operation mode such that when the set operation mode is a high frame erasure rate (FER) mode the codec codes a current frame of the input audio according to a select frame erasure concealment (FEC) mode of one or more FEC modes. Upon the setting of the operation mode to be the High FER mode, the one FEC mode is selected, from the one or more FEC modes predetermined for the High FER mode, to control the codec by incorporating of redundancy within a coding of the input audio or as separate redundancy information separate from the coded input audio according to the selected one FEC mode.

Patent Claims

20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A terminal comprising: at least one processor configured to: set an operation mode of a codec, wherein the operation mode is associated with a high frame erasure rate (FER) condition; and add partial redundant data of a current frame onto at least one neighboring frame, according to a coding mode, wherein the at least one processor is configured to add the partial redundant data without changing a total packet size, when the operation mode is set.

Plain English Translation

A device includes a processor that sets an audio/speech codec to a high frame erasure rate (FER) mode. In this mode, partial redundant data from the current audio frame is added to neighboring frames. This is done without increasing the overall packet size. This redundancy helps recover audio information if the current frame is lost during transmission. The codec operates in a special mode when frame loss is likely, improving audio quality in poor network conditions by selectively duplicating critical information.

Claim 2

Original Legal Text

2. The terminal of claim 1 , wherein a size of the partial redundant data is determined based on signal characteristics.

Plain English Translation

The device, operating with high frame loss resilience, calculates the size of the partial redundant data added to neighboring frames based on the signal characteristics of the audio. For example, if the audio contains important features, the redundancy will be higher than if the audio is less important. This dynamic adjustment optimizes the balance between redundancy and bandwidth usage, improving audio quality in varying network conditions.

Claim 3

Original Legal Text

3. The terminal of claim 1 , wherein the processor is further configured to code at least one frame at a reduced frame rate, when the operation mode is set.

Plain English Translation

When operating in the high frame erasure rate mode, the audio/speech codec can code one or more frames at a reduced frame rate. This frees up bits that can be used for redundancy to protect against frame erasures. This adaptive approach prioritizes error resilience over high fidelity, resulting in more robust audio transmission in challenging network environments.

Claim 4

Original Legal Text

4. The terminal of claim 1 , wherein the High FER condition is used for an Enhanced Voice Services (EVS) codec of a 3GPP standard and the codec is the EVS codec.

Plain English Translation

The high frame erasure rate (FER) mode is specifically designed for the Enhanced Voice Services (EVS) codec, a standard defined by 3GPP (3rd Generation Partnership Project). The codec uses the high FER mode to improve voice quality under high frame loss conditions in mobile networks. This is a particular implementation of the error resilience method within the EVS standard.

Claim 5

Original Legal Text

5. The terminal of claim 4 , wherein the EVS codec adds encoded audio from the at least one neighboring frame, including respectively encoded audio of one or more previous frames and/or one or more future frames, to results of the encoding of the current frame in a current packet for the current frame as combined EVS encoded source bits, with the combined EVS encoded source bits being represented in the current packet distinct from any RTP payload portion of the current packet, and wherein the EVS codec is configured to respectively encode audio from each of the at least one neighboring frame, as the encoded audio, and include the respectively encoded audio from each of the at least one neighboring frame in separate packets from the current packet.

Plain English Translation

In the EVS codec implementation with a high frame loss rate, encoded audio data from neighboring frames (previous and/or future frames) is included in the current packet. This "combined EVS encoded source bits" are separate from the normal RTP payload. The neighboring frames are also sent in their own separate packets. Therefore, the current packet contains redundancy from neighboring frames, and neighboring frames are transmitted separately as well.

Claim 6

Original Legal Text

6. The terminal of claim 1 , wherein the codec is further configured to add a High FER condition flag to a current packet for the current frame to identify the operation mode for the current frame as being associated with the High FER condition.

Plain English Translation

The codec adds a High FER condition flag to the current packet to indicate that the current frame is being coded with the high frame erasure rate (FER) mode. This flag informs the receiver that the packet contains redundancy information and that the audio was encoded with special error correction methods. This allows for appropriate decoding based on the known error resilience mode.

Claim 7

Original Legal Text

7. The terminal of claim 6 , wherein the High FER condition flag is represented in the current packet by a single bit in the RTP payload portion of the current packet.

Plain English Translation

The High FER condition flag in the current packet is represented by a single bit located within the RTP payload portion of the packet. This single bit indicates to the decoder that the frame has been encoded using high frame erasure resilience methods. By using a single bit, the overhead is minimized.

Claim 8

Original Legal Text

8. The terminal of claim 1 , wherein the codec is further configured to add a frame erasure concealment (FEC) mode flag to a current packet for the current frame identifying which one of one or more FEC modes is selected for the current frame.

Plain English Translation

The codec adds a frame erasure concealment (FEC) mode flag to the current packet, specifying which FEC mode is being used for the current frame. This flag indicates which of several possible methods for error concealment was applied during encoding. The receiver can then use this flag to appropriately decode the audio and handle potential frame losses.

Claim 9

Original Legal Text

9. The terminal of claim 8 , wherein the FEC mode flag is represented in the current packet by only two bits.

Plain English Translation

The frame erasure concealment (FEC) mode flag, indicating the specific error concealment mode used, is represented by only two bits within the current packet. These two bits allow for selection among four different FEC modes. This design minimizes the overhead associated with signaling the selected FEC mode.

Claim 10

Original Legal Text

10. The terminal of claim 9 , wherein the codec adds the FEC mode flag for the current frame with redundancy data in packets of other frames.

Plain English Translation

The codec includes the frame erasure concealment (FEC) mode flag, along with redundancy data, not only in the current frame but also in packets of other frames. This adds extra robustness and increases the chances of correctly decoding the audio, even if some packets are lost.

Claim 11

Original Legal Text

11. The terminal of claim 1 , wherein, the processor is configured to set the operation mode with different, increased, and/or varied partial redundant data compared to other modes of a plurality of operation modes based upon an analysis of feedback information including at least one of quality of transmission determined outside the terminal, a determination that the current frame is more sensitive to frame erasure upon transmission, and an importance of the current frame.

Plain English Translation

The processor sets the high frame erasure rate (FER) mode with different levels of redundant data, based on feedback information. This feedback includes the quality of transmission outside the device, the current frame's sensitivity to loss, and the frame's importance. The codec can dynamically adjust the amount of redundancy based on network conditions and audio content.

Claim 12

Original Legal Text

12. The terminal of claim 11 , wherein the feedback information comprises at least one of: fast feedback (FFB) information, a hybrid automatic repeat request (HARQ) feedback transmitted at a physical layer, slow feedback (SFB) information, feedback from network signaling transmitted at a layer higher than the physical layer; in-band feedback (ISB) information, in-band signaling from the codec at a far end; and high sensitivity frame (HSF) information, a selection by the codec of specific critical frames to be sent in a redundant fashion.

Plain English Translation

The feedback information used to set the high frame erasure rate (FER) mode includes: fast feedback (FFB), hybrid automatic repeat request (HARQ) feedback, slow feedback (SFB), in-band feedback (ISB), and high sensitivity frame (HSF) information. FFB and HARQ are physical layer feedback. SFB is network signaling. ISB is from the codec. HSF are critical frames sent redundantly. This wide range of feedback improves robustness.

Claim 13

Original Legal Text

13. The terminal of claim 12 , wherein the terminal receives the at least one of the FFB information, the HARQ feedback, the SFB information, and ISB information and performs the analysis of the received feedback information to determine the one or more qualities of transmission outside the terminal.

Plain English Translation

The device receives FFB, HARQ, SFB, and ISB information and analyzes this received information to determine transmission quality outside the terminal. This analysis allows the terminal to dynamically adjust the error resilience mode based on real-time network conditions. Based on the outside transmission quality, the redundancy is adjusted.

Claim 14

Original Legal Text

14. The terminal of claim 12 , wherein the terminal receives information indicating that the analysis of the at least one of the FFB information, the HARQ feedback, the SFB information, and ISB information has been previously performed based upon a received flag in a packet indicating that the current frame in the current packet is coded according the High FER mode or indicating that an encoding of the current packet should be performed by the codec in the High FER mode.

Plain English Translation

The device can receive a flag in a packet indicating that the analysis of feedback information has already been performed and that the current frame is coded in the High FER mode. Alternatively, the flag can request the codec to encode the current packet in the High FER mode. This allows for external control over the error resilience mode.

Claim 15

Original Legal Text

15. The terminal of claim 1 , wherein, the processor is further configured to set the operation mode to be associated with a frame error concealment (FEC) mode of one or more FEC modes based upon one of a determined coding type of at least one of the current frame and neighboring frames, from a plurality of available coding types, or a determined frame classification of at least one of the current frame and the neighboring frames, from a plurality of available frame classifications.

Plain English Translation

The processor sets the high frame erasure rate (FER) mode based on either the coding type of the current and neighboring frames (e.g., voiced or unvoiced) or the frame classification. Different coding types and frame classifications might benefit more from the additional redundancy of the high FER mode. The coding type/frame classification of neighboring frames also affects this decision.

Claim 16

Original Legal Text

16. The terminal of claim 15 , wherein the plurality of available coding types comprise an unvoiced wideband type for unvoiced speech frames, a voiced wideband type for voiced speech frames, a generic wideband type for non-stationary speech frames, and a transition wideband type used for enhanced frame erasure performance.

Plain English Translation

The available coding types for setting the high frame erasure rate (FER) mode include unvoiced wideband, voiced wideband, generic wideband, and transition wideband. The transition wideband is specifically used for enhanced frame erasure performance. Different coding types have differing sensitivity to frame loss. The selection of a particular wideband type affects how the FEC mode is selected.

Claim 17

Original Legal Text

17. The terminal of claim 1 , wherein the processor is further configured to identify the High FER condition in response to a frame error rate being greater than a threshold.

Plain English Translation

The High FER condition is identified when the frame error rate exceeds a threshold value. The processor continuously monitors the frame error rate, and when it surpasses this threshold, it automatically switches the codec into the High FER mode. This enables a proactive response to worsening network conditions.

Claim 18

Original Legal Text

18. The terminal of claim 1 , wherein the processor is further configured to identify the High FER condition based on a network condition.

Plain English Translation

The High FER condition is identified based on network conditions. Rather than only the frame error rate, other metrics like network congestion, bandwidth limitations, or packet loss probability trigger the activation of the High FER mode. This approach provides a more holistic assessment of network health and proactive error resilience.

Claim 19

Original Legal Text

19. The terminal of claim 1 , further comprising: a transmitter configured to transmit the current frame to a receiver, wherein information about the High FER condition is received from the receiver.

Plain English Translation

The device includes a transmitter to send the current frame to a receiver. Information about the High FER condition is received *from* the receiver. This enables the receiver to signal back to the transmitter that frame loss is occurring and the transmitter then switches into the High FER mode.

Claim 20

Original Legal Text

20. The terminal of claim 1 , wherein the processor is further configured to set the operation mode to one sub-mode of a plurality of sub-modes based on at least one of network bandwidth and an amount of frame error concealment, wherein the codec is configured to add the partial redundant data based on the one sub-mode of the plurality of sub-modes.

Plain English Translation

The processor can set the operation mode to a sub-mode within the high frame erasure rate (FER) mode based on available network bandwidth and the desired amount of frame error concealment. The codec then adds the partial redundant data based on this selected sub-mode. Fine-grained control is therefore possible with differing redundancy levels based on bandwidth.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L

Patent Metadata

Filing Date

February 6, 2017

Publication Date

August 8, 2017

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search