Artificially Generated Speech for a Communication Session

PublishedFebruary 23, 2021

Assigneenot available in USPTO data we have

InventorsRoss G. Cutler Sriram Srinivasan Ramin Mehran Karlton David Sequeira Jayant Ajit Gupchup+1 more

Technical Abstract

Patent Claims

19 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A system comprising first and second devices in communication with each other via a communication network, the system comprising: a processor; and a memory in communication with the processor, the memory comprising executable instructions that, when executed by the processor, cause the processor to control the system to perform functions of: during a communication session between the first and second devices, capturing, at the first device, a first speech spoken by a person, the first and second device storing a speech model specific to the person; generating audio data representing the captured first speech; converting, based on a speech model stored at the first device, the audio data to text data representing the captured first speech; during the communication session, transmitting, to the second device via the communication network, the audio data and the text data; receiving, at the first device, a user input that trains the speech model stored at the first device, the user input comprising correction of the text data converted from the audio data; updating, based on the received user input, a voice parameter value of the speech model stored at the first device; during the communication session, transmitting, to the second device via the communication network, the updated voice parameter value of the speech model; during the communication session, updating, at the second device, the speech model stored at the second device based on the updated voice parameter value transmitted to the second device; and converting, at the second device, the text data transmitted to the second device to a second speech based on the updated speech model stored at the second device.

Plain English Translation

The system involves a communication network connecting two devices, each storing a speech model specific to a person. During a communication session, the first device captures speech from a person, converts it to audio data, and generates corresponding text data using its stored speech model. The audio and text data are transmitted to the second device. The first device receives user input correcting the text data, updates its speech model with the corrected data, and transmits the updated voice parameter values to the second device. The second device updates its stored speech model with the received parameter values and converts the transmitted text data into speech using the updated model. This system enables real-time speech recognition and synthesis with continuous model refinement based on user corrections, improving accuracy during ongoing communication sessions. The speech models at both devices are synchronized to reflect the latest updates, ensuring consistent performance. The system is designed for applications where real-time speech processing and adaptive learning are critical, such as voice assistants, teleconferencing, or assistive communication tools.

Claim 2

Original Legal Text

2. The system of claim 1 , wherein: the voice parameter value is dynamically updated in response to the user input received during the communication session, and in response to the updating of the voice parameter value, the updated voice parameter value is dynamically transmitted to the second device during the communication session.

Plain English translation pending...

Claim 3

Original Legal Text

3. The system of claim 1 , wherein the text data and the audio data are continuously transmitted in parallel to the second device via the communication network.

Plain English translation pending...

Claim 4

Original Legal Text

4. The system of claim 1 , wherein the text data is selectively transmitted to the second device when a predetermined condition is met.

Plain English Translation

A system for managing text data transmission between devices includes a first device that processes text data and selectively transmits it to a second device based on a predetermined condition. The system ensures efficient data transfer by evaluating conditions such as network availability, device status, or user preferences before sending the text data. The first device may include a processor and memory to store the text data, while the second device receives and processes the transmitted data. The predetermined condition can be a specific time, a user-triggered event, or a system-determined threshold, ensuring that data is only sent when optimal conditions are met. This selective transmission reduces unnecessary data transfer, conserves bandwidth, and improves system efficiency. The system may also include additional features such as encryption, compression, or error-checking to enhance security and reliability during transmission. By dynamically assessing conditions before sending data, the system optimizes performance and resource usage in text data communication between devices.

Claim 5

Original Legal Text

5. The system of claim 4 , wherein the predetermined condition includes a condition that a quality of the audio data received by the second device is below a predetermined level.

Plain English translation pending...

Claim 6

Original Legal Text

6. The system of claim 4 , wherein, for selectively transmitting the audio data, the instructions further include instructions that, when executed by the processor, cause the processor to control the system to perform functions of: transmitting the audio data to the second device via the communication network; receiving, from the second device, a feedback signal indicating that the predetermined condition is met; and in response to receiving the feedback signal, stopping transmitting the audio data and starting to transmit the text data to the second device via the communication network.

Plain English translation pending...

Claim 7

Original Legal Text

7. The system of claim 1 , wherein the instructions further include instructions that, when executed by the processor, cause the processor to control the system to perform a function of synchronizing the text data with the audio data.

Plain English Translation

The invention relates to a system for processing and synchronizing text and audio data, addressing the challenge of aligning textual information with corresponding audio content in real-time or near-real-time applications. The system includes a processor and memory storing instructions that, when executed, enable the system to perform various functions. These functions include receiving text data and audio data, processing the text data to generate a structured representation, and analyzing the audio data to extract relevant features. The system then correlates the structured text data with the audio data based on temporal or contextual alignment, ensuring synchronization between the two data streams. Additionally, the system may adjust the synchronization dynamically to account for variations in speech rate, pauses, or other audio characteristics. The synchronization function ensures that the text data accurately reflects the audio content, which is critical for applications such as live captioning, transcription services, and multimedia content creation. The system may also include user interfaces or APIs to allow customization of synchronization parameters or integration with other software tools. This technology enhances accessibility and usability in environments where precise alignment between text and audio is essential.

Claim 8

Original Legal Text

8. The system of claim 7 , wherein, for synchronizing the text data with the audio data, the instructions further include instructions that, when executed, cause the processor to control the system to perform functions of: inserting a first time stamp into a portion of the audio data; and inserting a second time stamp into a portion of the text data corresponding to the portion of the audio data.

Plain English translation pending...

Claim 9

Original Legal Text

9. The system of claim 1 , wherein, for transmitting the audio data and the text data, the instructions, when executed by the processor, further cause the processor to control the system to perform a function of transmitting the audio data and the text data in separate packets to the second device via the communication network.

Plain English translation pending...

Claim 10

Original Legal Text

10. The system of claim 1 , wherein, for transmitting the audio data and the text data, the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: transmitting, to the second device, the audio data via a first communication modality; and transmitting, to the second device, the text data via a second communication modality having a higher robustness than the first communication modality.

Plain English translation pending...

Claim 11

Original Legal Text

11. The system of claim 10 , wherein, for transmitting the audio data and the text data, the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: transmitting, to the second device, the text data via a first transport layer protocol requiring retransmission of unreceived packets, and transmitting, to the second device, the audio data via a second transport layer protocol not involving retransmission of unreceived packets.

Plain English translation pending...

Claim 12

Original Legal Text

12. The system of claim 11 , wherein, for transmitting the audio data and the text data, the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: transmitting, to the second device, the text data at a first quality of service level, and transmitting, to the second device, the audio data at a second quality of service level that is lower than the first quality of service level.

Plain English translation pending...

Claim 13

Original Legal Text

13. A method of operating a system comprising first and second devices in communication with each other via a communication network, comprising: during a communication session between the first and second devices, capturing, at the first device, a first speech spoken by a person, the first and second devices storing a speech model specific to the person; generating audio data representing the captured first speech; converting, based on a speech model specific to the person, the audio data to text data representing the captured first speech; during the communication session, transmitting, to the second device via the communication network, the audio data and the text data; receiving, at the first device, a user input that trains the speech model stored at the first device, the user input comprising correction of the text data converted from the audio data; updating, based on the received user input, a voice parameter value of the speech model stored at the first device; during the communication session, transmitting, to the second device via the communication network, the updated voice parameter value of the speech model; during the communication session, updating, at the second device, the speech model stored at the second device based on the updated voice parameter value transmitted to the second device; and converting, at the second device, the text data transmitted to the second device to a second speech based on the updated speech model stored at the second device.

Plain English translation pending...

Claim 14

Original Legal Text

14. The method of claim 13 , wherein transmitting the audio data and the text data comprises continuously transmitting the text data and the audio data in parallel to the second device via the communication network.

Plain English translation pending...

Claim 15

Original Legal Text

15. The method of claim 13 , wherein transmitting the audio data and the text data comprises selectively transmitting the text data to the second device when a predetermined condition is met.

Plain English translation pending...

Claim 16

Original Legal Text

16. The method of claim 15 , wherein the predetermined condition includes a condition that a quality of the audio data received by the second device is below a predetermined level.

Plain English translation pending...

Claim 17

Original Legal Text

17. The method of claim 15 , wherein transmitting the audio data and the text data comprises: transmitting the audio data to the second device via the communication network; receiving, from the second device, a feedback signal indicating that the predetermined condition is met; and in response to receiving the feedback signal, stopping transmitting the audio data to the second device and starting to transmit the text data to the second device via the communication network.

Plain English Translation

This invention relates to a system for transmitting audio and text data between devices over a communication network, particularly in scenarios where real-time audio transmission may be interrupted or replaced by text data under specific conditions. The problem addressed is ensuring seamless and efficient data transmission when switching between audio and text formats, such as in communication systems where network conditions or user preferences dictate a transition from voice to text. The method involves a first device transmitting audio data to a second device via a communication network. The second device monitors for a predetermined condition, such as network degradation, user input, or a predefined time threshold. Once the condition is met, the second device sends a feedback signal to the first device. Upon receiving this signal, the first device stops transmitting audio data and begins transmitting text data instead. This ensures that the communication remains uninterrupted, adapting dynamically to changing conditions or user needs. The text data may be derived from speech-to-text conversion of the audio or from pre-existing text content. The system may also include error handling mechanisms to ensure reliable transmission of both audio and text data. This approach is useful in applications like real-time communication, teleconferencing, or assistive technologies where flexibility in data format is critical.

Claim 18

Original Legal Text

18. The device of claim 15 , wherein transmitting the audio data and the text data comprises: transmitting, to the second device, the audio data via a first communication modality; and transmitting, to the second device, the text data via a second communication modality having a higher robustness than the first communication modality.

Plain English translation pending...

Claim 19

Original Legal Text

19. A non-transitory computer readable medium containing instructions which, when executed by a processor, cause the processor to control a system, which comprises first and second devices in communication with each other via a communication network, to perform functions of: during a communication session between the first and second devices, capturing, at the first device, a first speech spoken by a person, the first and second devices storing a speech model specific to the person; generating audio data representing the captured first speech; converting, based on the speech model stored at the first device, the audio data to text data representing the captured first speech; during the communication session, transmitting, to the second device via the communication network, the audio data and the text data; receiving, at the first device, a user input that trains the speech model stored at the first device, the user input comprising correction of the text data converted from the audio data; updating, based on the received user input, a voice parameter value of the speech model stored at the first device; during the communication session, transmitting, to the second device via the communication network, the updated voice parameter value of the speech model; during the communication session, updating, at the second device, the speech model stored at the second device based on the updated voice parameter value transmitted to the second device; and converting, at the second device, the text data transmitted to the second device to a second speech based on the updated speech model stored at the second device.

Plain English translation pending...

Patent Metadata

Filing Date

Unknown

Publication Date

February 23, 2021

Inventors

Ross G. Cutler

Sriram Srinivasan

Ramin Mehran

Karlton David Sequeira

Jayant Ajit Gupchup

Senthil K. Velayutham

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search