US-11238232

Written-modality prosody subsystem in a natural language understanding (NLU) framework

PublishedFebruary 1, 2022

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Present embodiment include a prosody subsystem of a natural language understanding (NLU) framework that is designed to analyze collections of written messages for various prosodic cues to break down the collection into a suitable level of granularity (e.g., into episodes, sessions, segments, utterances, and/or intent segments) for consumption by other components of the NLU framework, enabling operation of the NLU framework. These prosodic cues may include, for example, source prosodic cues that are based on the author and the conversation channel associated with each message, temporal prosodic cues that are based on a respective time associated with each message, and/or written prosodic cues that are based on the content of each message. For example, to improve the domain specificity of the agent automation system, intent segments extracted by the prosody subsystem may be consumed by a training process for a ML-based structure subsystem of the NLU framework.

Patent Claims

20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. An agent automation system, comprising: a memory configured to store a written conversation log and natural language understanding (NLU) framework including a prosody subsystem; and a processor configured to execute instructions of the NLU framework to cause the agent automation system to perform actions comprising: processing, via the prosody subsystem, the written conversation log based on prosodic cues to divide the written conversation log into conversation channel groups, to divide the conversation channel groups into sessions, to divide the sessions into conversation segments, to divide the conversation segments into utterances, and to divide the utterances into intent segments, wherein the prosodic cues comprise temporal prosodic cues and written prosodic cues.

Plain English Translation

The agent automation system is designed to analyze and structure written conversation logs using natural language understanding (NLU) techniques, particularly focusing on prosodic cues to improve conversational analysis. The system addresses the challenge of extracting meaningful insights from unstructured text by leveraging both temporal and written prosodic cues to segment conversations into hierarchical components. Temporal prosodic cues may include pauses, speech rate, or timing patterns, while written prosodic cues may involve punctuation, capitalization, or formatting. The system processes the conversation log to divide it into conversation channel groups, which are further split into sessions, then into conversation segments, and finally into individual utterances. Each utterance is further divided into intent segments, allowing for granular analysis of user intent and context. The NLU framework, including a dedicated prosody subsystem, enables the system to interpret and categorize conversational elements more accurately, enhancing automation in customer service, chatbots, or other interactive applications. By structuring conversations in this hierarchical manner, the system improves the efficiency and accuracy of intent recognition, leading to better automated responses and decision-making.

Claim 2

Original Legal Text

2. The system of claim 1 , wherein, to process the conversation log, the processor is configured to execute the instructions of the NLU framework to cause the agent automation system to perform actions comprising: dividing the written conversation log into the conversation channel groups, dividing the conversation channel groups into the sessions, dividing the sessions into the conversation segments, and dividing the conversation segments into the utterances based on metadata prosodic cues.

Plain English Translation

This invention relates to natural language processing (NLP) and automated agent systems, specifically for analyzing and structuring conversation logs. The problem addressed is the difficulty in extracting meaningful insights from unstructured conversation data, such as chat logs or call transcripts, due to the lack of clear segmentation into logical components like sessions, segments, and utterances. The system processes conversation logs by first dividing them into conversation channel groups, which represent distinct communication channels (e.g., email, chat, or voice). Each channel group is then split into individual sessions, which are continuous interactions between participants. Sessions are further divided into conversation segments, which are coherent exchanges within a session. Finally, segments are broken down into individual utterances—speech or text contributions—using metadata and prosodic cues (e.g., pauses, tone, or punctuation) to ensure accurate segmentation. This hierarchical structure enables better analysis, automation, and response generation in agent systems. The system leverages an NLP framework to automate these steps, improving efficiency and accuracy in processing large volumes of conversational data.

Claim 3

Original Legal Text

3. The system of claim 1 , wherein the temporal prosodic cues comprise a temporal prosody cue that is based on a respective time associated with each message of the conversation log.

Plain English Translation

The system relates to analyzing conversational data to extract temporal prosodic cues for improving natural language processing tasks. The problem addressed is the lack of contextual understanding in automated systems when processing conversational logs, particularly in distinguishing the temporal aspects of speech patterns that convey meaning beyond the words themselves. These temporal prosodic cues include timing-related features such as speech rate, pauses, and the duration of utterances, which are derived from the timestamps associated with each message in a conversation log. By analyzing these cues, the system enhances the accuracy of tasks like sentiment analysis, speaker identification, and intent recognition. The system processes the conversation log to extract temporal prosodic features, which are then used to refine the interpretation of spoken or written language. This approach improves the ability of automated systems to understand the nuances of human communication, such as emphasis, hesitation, or emotional tone, which are often conveyed through timing rather than explicit content. The system integrates these temporal features with other linguistic and acoustic data to provide a more comprehensive analysis of conversational dynamics. This method is particularly useful in applications like virtual assistants, customer service automation, and real-time translation services, where understanding the context and intent behind spoken language is critical.

Claim 4

Original Legal Text

4. The system of claim 3 , wherein the temporal prosody cue comprises a time gap between the respective times associated with each message of the conversation log.

Plain English Translation

The system relates to analyzing conversational data to extract temporal prosody cues, which are indicators of conversational dynamics based on timing patterns. The problem addressed is the lack of tools to quantify and interpret the temporal structure of conversations, which can reveal insights into speaker engagement, turn-taking behavior, and conversational flow. The system processes a conversation log containing messages exchanged between participants, where each message is timestamped. The system identifies temporal prosody cues by measuring the time gaps between consecutive messages in the conversation. These time gaps are analyzed to infer conversational patterns, such as pauses, interruptions, or delays, which can indicate speaker dominance, listener engagement, or other interaction dynamics. The system may also correlate these temporal cues with other prosodic features, such as speech rate or pitch, to provide a more comprehensive analysis of conversational behavior. The extracted temporal prosody cues can be used in applications like sentiment analysis, speaker identification, or automated conversation coaching. The system enhances the understanding of conversational dynamics by leveraging temporal patterns, which are often overlooked in traditional text-based analysis.

Claim 5

Original Legal Text

5. The system of claim 1 , wherein, to divide the utterances into the intent segments, the processor is configured to execute the instructions of the NLU framework to cause the agent automation system to perform actions comprising: analyzing the utterances for the written prosodic cues; and dividing the utterances into the intent segments based on the written prosodic cues.

Plain English Translation

This invention relates to natural language understanding (NLU) systems for agent automation, specifically improving the segmentation of user utterances into distinct intent segments. The problem addressed is the difficulty in accurately parsing continuous speech or text into meaningful intent segments, which is critical for automated agents to respond appropriately. Traditional systems often struggle with unstructured input, leading to misinterpretation or incomplete responses. The system processes user utterances by analyzing written prosodic cues—such as punctuation, capitalization, or specific keywords—to identify natural breaks in intent. These cues are used to divide

Claim 6

Original Legal Text

6. The system of claim 1 , wherein the written prosodic cues comprise punctuation, emojis, emphasis, or linguistic structure.

Plain English Translation

This invention relates to a system for enhancing digital communication by incorporating written prosodic cues to convey emotional tone, emphasis, or intent in text-based interactions. The system addresses the challenge of conveying nuanced meaning in written communication, where traditional text lacks the vocal and facial cues present in spoken or in-person conversations. By analyzing and interpreting written prosodic cues such as punctuation, emojis, emphasis (e.g., bold or italic text), or linguistic structure (e.g., sentence length, word choice), the system improves the clarity and emotional expressiveness of digital messages. These cues help users convey sarcasm, urgency, excitement, or other subtle tones that might otherwise be misinterpreted. The system may be integrated into messaging platforms, email clients, or other text-based communication tools to automatically detect and interpret these cues, ensuring more accurate and engaging interactions. Additionally, the system may provide feedback or suggestions to users on how to refine their written prosody for better communication outcomes. This approach enhances user experience by reducing misunderstandings and fostering more natural, expressive digital conversations.

Claim 7

Original Legal Text

7. The system of claim 1 , wherein the written prosodic cues comprise an interrupt, a change in topic, a change in context, or a combination thereof.

Plain English Translation

This invention relates to a system for analyzing and processing written prosodic cues in text-based communication to improve natural language understanding and interaction. The system detects and interprets specific written prosodic cues, such as interruptions, topic changes, or context shifts, to enhance the accuracy of language processing tasks. These cues are extracted from written text and used to infer speaker intent, emotional tone, or conversational structure, which can be applied in applications like chatbots, virtual assistants, or automated transcription services. The system may also integrate these cues with other linguistic features to refine natural language processing (NLP) models, ensuring more contextually aware responses. By identifying and analyzing these cues, the system improves the ability of automated systems to understand and respond to human communication in a more nuanced and human-like manner. The invention addresses challenges in text-based interactions where traditional NLP models may struggle to capture the subtleties of spoken language, such as pauses, emphasis, or shifts in tone, which are often conveyed through written markers like punctuation, formatting, or specific phrasing. The system enhances the interpretability of text by leveraging these cues to provide more accurate and contextually appropriate outputs.

Claim 8

Original Legal Text

8. The system of claim 1 , wherein the processor is configured to execute the instructions of the NLU framework to cause the agent automation system to perform actions comprising: providing the intent segments as inputs to a training process for a machine-learning (ML)-based parser of the NLU framework, wherein, within the training process, the NLU framework is configured to apply a plurality of other parsers of the NLU framework to generate a plurality of utterance trees for each intent segment, and in response to determining that a majority of the plurality of utterance trees for a particular intent segment are the same utterance tree, update a model of the ML-based parser such that the ML-based parser generates the same utterance tree for the particular intent segment.

Plain English Translation

This invention relates to natural language understanding (NLU) systems for agent automation, specifically improving the accuracy of machine-learning (ML)-based parsers in processing intent segments. The problem addressed is the inconsistency in parsing natural language inputs, where different parsers may generate varying utterance trees for the same intent segment, leading to unreliable automation. The system includes a processor executing instructions from an NLU framework to enhance parser training. The framework processes intent segments by feeding them into a training process for an ML-based parser. During training, multiple other parsers within the NLU framework generate utterance trees for each intent segment. If a majority of these parsers produce the same utterance tree for a particular intent segment, the ML-based parser's model is updated to ensure it consistently generates that same utterance tree for future instances of the segment. This consensus-based approach improves parsing reliability by leveraging agreement among multiple parsers to refine the ML-based parser's output. The system ensures that the ML-based parser aligns with the most common parsing structure, reducing ambiguity and enhancing automation accuracy.

Claim 9

Original Legal Text

9. The system of claim 1 , wherein the processor is configured to execute the instructions of the NLU framework to cause the agent automation system to perform actions comprising: providing the utterances as inputs to a training process for a vocabulary subsystem of the NLU framework, wherein, within the training process, the utterances are used to generate a plurality of word vectors of a refined word vector distribution model that replaces a word vector distribution model of the vocabulary subsystem, wherein the NLU framework is configured to use the refined word vector distribution model to determine a suitable word vector for words of received natural language requests.

Plain English Translation

This invention relates to natural language understanding (NLU) systems used in agent automation, particularly for improving vocabulary processing. The problem addressed is the limitation of existing NLU frameworks in accurately interpreting natural language requests due to static or insufficiently refined word vector distribution models, which fail to adapt to new or evolving language patterns. The system includes a processor executing instructions from an NLU framework to enhance vocabulary processing. The processor provides collected utterances as inputs to a training process for a vocabulary subsystem within the NLU framework. During training, these utterances are used to generate a plurality of word vectors, forming a refined word vector distribution model. This refined model replaces the existing word vector distribution model in the vocabulary subsystem. The NLU framework then uses this updated model to determine suitable word vectors for words in received natural language requests, improving accuracy and adaptability. The system also includes a data store for storing the refined word vector distribution model and a network interface for receiving natural language requests. The refined model dynamically adjusts to new language patterns, ensuring the NLU framework can accurately interpret requests over time. This approach enhances the performance of automated agents by improving their ability to understand and respond to diverse and evolving natural language inputs.

Claim 10

Original Legal Text

10. The system of claim 1 , wherein the processor is configured to execute the instructions of the NLU framework to cause the agent automation system to perform actions comprising: providing the intent segments as inputs to a semantic mining pipeline of the NLU framework, wherein the semantic mining pipeline is configured to: generate intent vectors for the intent segments; generate meaning clusters of intent vectors based on distances between the intent vectors; detect stable ranges of cluster radius values for the meaning clusters; and generate an intent/entity model from the meaning clusters and the stable ranges of cluster radius values, wherein the intent/entity model stores relationships between a representative intent of each of the meaning clusters and corresponding intent segments as sample utterances, and wherein the NLU framework is configured to use the intent/entity model to classify intents in received natural language requests.

Plain English Translation

This invention relates to natural language understanding (NLU) systems for agent automation, specifically improving intent classification in natural language requests. The system addresses the challenge of accurately interpreting user inputs by enhancing the semantic mining process within an NLU framework. The processor executes instructions to process intent segments through a semantic mining pipeline, which generates intent vectors representing the meaning of each segment. These vectors are grouped into meaning clusters based on their distances, forming semantically similar groups. The system then detects stable ranges of cluster radius values, ensuring consistent clustering. From these clusters, an intent/entity model is generated, storing relationships between representative intents and sample utterances. This model enables the NLU framework to classify intents in new natural language requests by comparing them to the stored clusters. The approach improves intent recognition accuracy by leveraging stable clustering and vector-based semantic analysis, making it suitable for automated agent systems handling diverse user inputs.

Claim 11

Original Legal Text

11. The system of claim 1 , wherein the processor is configured to execute the instructions of the NLU framework to cause the agent automation system to perform actions comprising: providing the sessions, the conversational segments, or a combination thereof, as inputs to a persona of a reasoning agent/behavior engine (RA/BE) of the NLU framework, wherein RA/BE is configured to generate an episode frame tree set in a persona context database of the persona based on each of the sessions, the conversational segments, or the combination thereof, wherein the episode frame tree set comprises an episode start time and an episode end time that are heuristically determined from the sessions, the conversational segments, or the combination thereof.

Plain English Translation

This invention relates to natural language understanding (NLU) frameworks for agent automation systems, specifically addressing the challenge of dynamically generating and managing conversational context for automated agents. The system processes sessions or conversational segments to extract structured data, which is then used to create episode frame trees within a persona context database. These episode frame trees represent conversational episodes, each defined by a start time and end time that are heuristically determined from the input data. The reasoning agent/behavior engine (RA/BE) within the NLU framework processes these inputs to generate the episode frame trees, which are stored in a persona-specific context database. This allows the system to maintain contextual awareness during interactions, enabling more coherent and personalized automated responses. The invention improves upon traditional NLU systems by dynamically structuring conversational data into episodic frames, enhancing the agent's ability to track and respond to context over time. The system is particularly useful in applications requiring long-term conversational memory, such as customer service automation or virtual assistants.

Claim 12

Original Legal Text

12. The system of claim 1 , wherein the processor is configured to execute the instructions of the NLU framework to cause the agent automation system to perform actions comprising: receiving, from a persona of a RA/BE of the NLU framework, a new message that is part of a conversation between the persona and a user; providing a first indication to the persona of the RA/BE in response to determining that the conversation is a continuation of a previous episode of conversation between the persona and the user; and providing a second indication to the persona of the RA/BE in response to determining that the conversation is a new conversation episode.

Plain English Translation

The system relates to natural language understanding (NLU) frameworks used in agent automation systems, specifically for managing conversational interactions between a virtual agent (persona) and a user. The problem addressed is the need to distinguish between ongoing conversations and new conversation episodes to ensure context-aware and coherent interactions. The system includes a processor executing instructions from an NLU framework to automate agent interactions. When a persona of a reasoning agent (RA) or behavior engine (BE) receives a new message from a user, the system determines whether the message is part of a continuing conversation or a new episode. If the message is part of a previous conversation, the system provides a first indication to the persona, signaling that the interaction should maintain context from prior exchanges. If the message initiates a new conversation, the system provides a second indication, prompting the persona to start a fresh interaction without prior context. This differentiation ensures that the agent responds appropriately, either by continuing a previous dialogue or starting a new one, improving user experience and interaction accuracy. The system enhances NLU frameworks by dynamically adapting to conversation state, reducing errors in context retention or misinterpretation of user intent.

Claim 13

Original Legal Text

13. A method of operating a prosody subsystem of a natural language understanding (NLU) framework, comprising: dividing a conversation log comprising plurality of messages into a plurality of conversation channel groups based on a first set of prosodic cues; dividing each of the plurality of conversation channel groups into a plurality of sessions based on a second set of prosodic cues; dividing each of the plurality of sessions into a plurality of conversation segments based on a third set of prosodic cues; dividing each of the plurality of conversation segments into a plurality of utterances based on a fourth set of prosodic cues; dividing each of the plurality of utterances into a plurality of intent segments based on a fifth set of prosodic cues, wherein the second, third, fourth, and fifth sets of prosodic cues comprise temporal prosodic cues, written prosodic cues, or a combination thereof; and providing the plurality of intent segments, the plurality of utterances, the plurality of conversation segments, or the plurality of sessions, or a combination thereof, as inputs to processes of the NLU framework.

Plain English Translation

This invention relates to natural language understanding (NLU) systems and addresses the challenge of accurately parsing and structuring conversational data for improved language processing. The method involves analyzing a conversation log containing multiple messages and systematically breaking it down into hierarchical layers using prosodic cues. First, the log is divided into conversation channel groups based on a first set of prosodic cues, which may include temporal or written indicators of conversational boundaries. Each channel group is then split into sessions using a second set of prosodic cues, which further refine the segmentation. These sessions are subsequently divided into conversation segments based on a third set of prosodic cues, followed by splitting each segment into individual utterances using a fourth set. Finally, each utterance is broken down into intent segments using a fifth set of prosodic cues. The resulting structured data—intent segments, utterances, conversation segments, or sessions—can be provided as inputs to various NLU processes, enabling more precise language interpretation and intent recognition. The use of temporal and written prosodic cues ensures that the segmentation aligns with natural conversational patterns, enhancing the accuracy of downstream NLU tasks.

Claim 14

Original Legal Text

14. The method of claim 13 , wherein the first set of prosodic cues comprises metadata prosodic cues that are based on a conversation channel associated with each of the plurality of messages of the conversation log, and wherein the second set of prosodic cues and third set of prosodic cues comprise the temporal cues that are based on a time associated with each of the plurality of messages of the conversation log.

Plain English Translation

This invention relates to analyzing conversational data by extracting and processing prosodic cues from messages in a conversation log. The technology addresses the challenge of interpreting non-verbal communication elements in text-based conversations, such as tone, emphasis, or timing, which are typically lost in written exchanges. The method involves categorizing prosodic cues into distinct sets to enhance the understanding of conversational dynamics. The first set of prosodic cues includes metadata-based cues derived from the conversation channel, such as the platform or medium used for communication (e.g., email, chat, or social media). These cues help contextualize the tone or formality of the conversation. The second and third sets of prosodic cues focus on temporal cues, which are based on the timing of messages within the conversation log. These cues analyze factors like message frequency, response times, or pauses to infer emotional states or engagement levels. By separating prosodic cues into these distinct categories, the method enables more nuanced analysis of conversational patterns, improving applications such as sentiment analysis, automated response systems, or user behavior modeling. The approach ensures that both contextual and temporal aspects of communication are considered, leading to more accurate interpretations of conversational intent and dynamics.

Claim 15

Original Legal Text

15. The method of claim 13 , wherein the fourth set of prosodic cues comprise metadata prosodic cues, and wherein each of the plurality of utterances corresponds to one of the plurality of messages of the conversation log.

Plain English Translation

This invention relates to natural language processing and conversational systems, specifically improving the analysis and interpretation of spoken or written conversations by extracting and utilizing prosodic cues. The problem addressed is the lack of nuanced understanding in automated systems when processing conversational data, particularly in distinguishing between different types of messages based on their prosodic features. The method involves analyzing a conversation log containing multiple messages exchanged between participants. Each message in the log is associated with one or more utterances, which are segments of speech or text. The system extracts a fourth set of prosodic cues from these utterances, where the prosodic cues include metadata such as pitch, tone, and rhythm. These metadata prosodic cues provide additional context about the emotional tone, emphasis, or intent behind each utterance, enhancing the system's ability to interpret the conversation accurately. By mapping each utterance to its corresponding message in the conversation log, the system can correlate prosodic features with specific parts of the dialogue. This allows for more sophisticated analysis, such as identifying sarcasm, detecting speaker intent, or improving response generation in conversational agents. The method ensures that the prosodic metadata is accurately aligned with the conversational context, enabling better natural language understanding and interaction.

Claim 16

Original Legal Text

16. The method of claim 13 , wherein the fourth set of prosodic cues comprise metadata prosodic cues and the written prosodic cues, wherein at least a portion of the plurality of utterances corresponds to more than one of the plurality of messages of the conversation log.

Plain English Translation

This invention relates to systems for processing and analyzing conversational data, particularly focusing on extracting and utilizing prosodic cues from spoken utterances to enhance understanding of conversations. The problem addressed is the difficulty in accurately interpreting spoken language due to the lack of explicit prosodic information in written transcripts, which can lead to misinterpretations of tone, emphasis, and emotional context. The method involves analyzing a conversation log containing multiple messages exchanged between participants. Each message is associated with one or more spoken utterances, which are processed to extract prosodic cues—such as pitch, volume, and speech rate—that convey emotional or contextual meaning. These cues are categorized into different sets, including metadata prosodic cues (e.g., automatically detected prosodic features) and written prosodic cues (e.g., manually annotated or transcribed prosodic markers). The system ensures that at least some of the utterances correspond to multiple messages in the conversation log, allowing for cross-referencing and improved contextual understanding. By integrating these prosodic cues with the written text, the system enhances the accuracy of conversational analysis, enabling better sentiment detection, intent recognition, and contextual interpretation. This approach is particularly useful in applications like customer service, virtual assistants, and automated transcription services where understanding the full emotional and contextual nuances of speech is critical. The method improves upon traditional text-based analysis by incorporating the rich, non-verbal information present in spoken language.

Claim 17

Original Legal Text

17. The method of claim 13 , wherein the fifth set of prosodic cues comprise the written prosodic cues, and wherein the written prosodic cues include punctuation, emojis, emphases, or linguistic structure.

Plain English Translation

This invention relates to systems for generating spoken language from text, addressing the challenge of producing natural-sounding speech by incorporating written prosodic cues. The method involves analyzing text to identify prosodic cues, such as punctuation, emojis, emphases, or linguistic structure, which influence the rhythm, intonation, and emphasis of speech. These cues are then converted into corresponding spoken prosodic cues, which modify the speech synthesis process to enhance expressiveness and clarity. The system may also adjust speech parameters like pitch, duration, and volume based on the detected cues to ensure the generated speech aligns with the intended emotional tone and emphasis of the original text. By leveraging written prosodic cues, the method improves the naturalness and emotional richness of synthesized speech, making it more engaging and contextually appropriate. The approach is particularly useful in applications like virtual assistants, audiobooks, and accessibility tools where expressive speech is critical. The invention ensures that subtle nuances in written communication are accurately reflected in the spoken output, enhancing user experience and comprehension.

Claim 18

Original Legal Text

18. The method of claim 13 , wherein providing comprises: providing the plurality of intent segments as inputs to a first training process of a machine-learning (ML)-based parser of the NLU framework, wherein, within the first training process, the NLU framework is configured to apply a plurality of other parsers of the NLU framework to generate a plurality of utterance trees for each intent segment, and in response to determining that a majority of the plurality of utterance trees for a particular intent segment are the same utterance tree, update a model of the ML-based parser such that the ML-based parser generates the same utterance tree for the particular intent segment; providing the plurality of utterances as inputs to a second training process of a vocabulary subsystem of the NLU framework, wherein, within the second training process, the plurality of utterances is used to generate a plurality of word vectors of a refined word vector distribution model that replaces a word vector distribution model of the vocabulary subsystem, wherein the NLU framework is configured to use the refined word vector distribution model to determine a suitable word vector for words of received natural language requests; providing the plurality of intent segments as inputs to a semantic mining pipeline of the NLU framework, wherein the semantic mining pipeline is configured to: generate intent vectors for each of the plurality of intent segments; generate meaning clusters of intent vectors based on distances between the intent vectors; detect stable ranges of cluster radius values for the meaning clusters; and generate an intent/entity model from the meaning clusters and the stable ranges of cluster radius values, wherein the intent/entity model stores relationships between a representative intent of each of the meaning clusters and corresponding intent segments as sample utterances, and wherein the NLU framework is configured to use the intent/entity model to classify intents in the received natural language requests; and providing the plurality of sessions or the plurality of conversation segments as inputs to a persona of a reasoning agent/behavior engine (RABE) of the NLU framework, wherein RA/BE is configured to generate an episode frame tree set in a persona context database of the persona based on the plurality of sessions or the plurality of conversation segments, wherein the episode frame tree set comprises an episode start time and an episode end time that are heuristically determined from the plurality of sessions or the plurality conversational segments.

Plain English Translation

This invention relates to natural language understanding (NLU) frameworks for processing and interpreting natural language requests. The system addresses challenges in accurately parsing, classifying, and contextualizing user inputs by improving the training and modeling of NLU components. The method involves multiple training processes to enhance different subsystems within the NLU framework. First, a machine-learning-based parser is trained using intent segments to generate utterance trees. If a majority of trees for a given segment match, the parser model is updated to consistently produce the same tree. Second, a vocabulary subsystem is refined by generating word vectors from utterances, replacing the existing word vector distribution model to improve word representation accuracy. Third, a semantic mining pipeline processes intent segments to generate intent vectors, cluster them based on semantic similarity, and detect stable cluster radii. These clusters form an intent/entity model that maps representative intents to sample utterances, enabling intent classification in new requests. Finally, conversation sessions or segments are used to train a reasoning agent/behavior engine (RABE), which constructs episode frame trees in a persona context database, including heuristically determined start and end times for episodes. This structured approach enhances the NLU framework's ability to parse, classify, and contextualize natural language inputs effectively.

Claim 19

Original Legal Text

19. The method of claim 13 , comprising: receiving, from a persona of a RA/BE of the NLU framework, a user message; analyzing the user message based on a sixth set of prosodic cues to determine whether the user message corresponds to a prior episode of conversation between a user and the persona, wherein the sixth set of prosodic cues comprises a temporal prosody cue based on a first time associated with the user message and a second time associated with a previous user message; providing a first indication to the persona of the RA/BE in response to determining that the user message corresponds to the prior episode of conversation; and providing a second indication to the persona of the RA/BE in response to determining that the user message does not correspond to the prior episode of conversation.

Plain English Translation

This invention relates to natural language understanding (NLU) frameworks, specifically improving conversational coherence by detecting whether a user message belongs to an ongoing conversation episode or represents a new topic. The problem addressed is the lack of contextual awareness in NLU systems, which often fail to recognize when a user's message is part of an existing dialogue or a new interaction, leading to disjointed or irrelevant responses. The method involves analyzing user messages using prosodic cues, particularly temporal prosody, to determine conversational context. When a user sends a message, the system compares the timestamp of the message (first time) with the timestamp of the previous message (second time) to assess whether they are part of the same conversational episode. If the temporal gap suggests continuity, the system provides a first indication to the NLU framework's persona (e.g., a virtual assistant or chatbot) to maintain context. If the gap indicates a new topic, a second indication is provided to reset or adjust the conversational context accordingly. This approach enhances dialogue coherence by dynamically adapting to user behavior patterns, ensuring responses remain relevant and contextually appropriate. The system may also incorporate additional prosodic cues beyond temporal analysis to refine its determinations.

Claim 20

Original Legal Text

20. A non-transitory, computer-readable medium storing instructions of a natural language understanding (NLU) framework executable by one or more processors of a computing system, the instructions comprising instructions to: process, via a prosody subsystem of the NLU framework, a conversation log based on prosodic cues to divide the conversation log into conversation channel groups, to divide the conversation channel groups into sessions, to divide the sessions into conversation segments, to divide the conversation segments into utterances, and to divide the utterances into intent segments; provide the intent segments as inputs to a first training process for a machine-learning (ML)-based parser of the NLU framework, wherein, within the first training process, the NLU framework is configured to apply a plurality of other parsers of the NLU framework to generate a plurality of utterance trees for each intent segment, and in response to determining that a majority of the plurality of utterance trees for a particular intent segment are the same utterance tree, update a model of the ML-based parser such that the ML-based parser generates the same utterance tree for the particular intent segment; provide the utterances as inputs to a second training process for a vocabulary subsystem of the NLU framework, wherein, within the second training process, the utterances are used to generate a first plurality of word vectors for a refined word vector distribution model that replaces a word vector distribution model of the vocabulary subsystem, wherein the NLU framework is configured to use the refined word vector distribution model to determine a suitable word vector for words of received natural language requests; provide the intent segments as inputs to a semantic mining pipeline of the NLU framework, wherein the semantic mining pipeline is configured to: generate intent vectors for the intent segments; generate meaning clusters of intent vectors based on distances between the intent vectors; detect stable ranges of cluster radius values for the meaning clusters; and generate an intent/entity model from the meaning clusters and the stable ranges of cluster radius values, wherein the intent/entity model stores relationships between a representative intent of each of the meaning clusters and corresponding intent segments as sample utterances, and wherein the NLU framework is configured to use the intent/entity model to classify intents in the received natural language requests; and provide the sessions, the conversation segments, or a combination thereof, as inputs to a persona of a reasoning agent/behavior engine (RA/BE) of the NLU framework, wherein RA/BE is configured to generate an episode frame tree set in a persona context database of the persona based on each of the sessions, the conversation segments, or the combination thereof, wherein the episode frame tree set comprises an episode start time and an episode end time that are heuristically determined from the sessions, the conversational segments, or the combination thereof.

Plain English Translation

This invention relates to a natural language understanding (NLU) framework designed to process and analyze conversational data. The framework addresses challenges in accurately parsing and interpreting natural language requests by leveraging prosodic cues, machine learning, and semantic analysis. The system processes conversation logs to divide them into structured components, including conversation channel groups, sessions, conversation segments, utterances, and intent segments. A prosody subsystem analyzes these divisions to enhance the granularity of the data. The framework includes a machine-learning-based parser trained through a first process that generates utterance trees for intent segments. If a majority of these trees match for a given segment, the parser model is updated to consistently produce the same output. A second training process refines a word vector distribution model for the vocabulary subsystem, improving word vector accuracy for natural language requests. A semantic mining pipeline generates intent vectors for intent segments, clusters them based on similarity, and detects stable cluster radius values to create an intent/entity model. This model stores relationships between representative intents and sample utterances, enabling intent classification in new requests. Additionally, the framework integrates with a reasoning agent/behavior engine (RA/BE) that processes sessions or conversation segments to generate episode frame trees in a persona context database. These trees include heuristically determined start and end times for episodes, aiding in contextual understanding and response generation. The system enhances NLU accuracy and contextual awareness in conversational AI applications.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F G06N G10L

Patent Metadata

Filing Date

March 11, 2019

Publication Date

February 1, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search