Patentable/Patents/US-11978434

US-11978434

Developing an automatic speech recognition system using normalization

PublishedMay 7, 2024

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A computer-implemented technique identifies terms in an original reference transcription and original ASR output results that are considered valid variants of each other, even though these terms have different textual forms. Based on this finding, the technique produces a normalized reference transcription and normalized ASR output results in which valid variants are assigned the same textual form. In some implementations, the technique uses the normalized text to develop a model for an ASR system. For example, the technique may generate a word error rate (WER) measure by comparing the normalized reference transcription with the normalized ASR output results, and use the WER measure as guidance in developing the model. Some aspects of the technique involve identifying occasions in which a term can be properly split into component parts. Other aspects can identify other ways in which two terms may vary in spelling, but nonetheless remain valid variants.

Patent Claims

7 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 2

Original Legal Text

2. The computer-implemented method of claim 1, wherein the method further includes generating a word error rate (WER) measure by comparing the normalized reference transcription with the normalized ASR output results, and wherein the model is developed, in part, based on guidance provided by the WER measure.

Plain English Translation

This invention relates to improving automatic speech recognition (ASR) systems by refining their training models using word error rate (WER) metrics. The method addresses the challenge of accurately transcribing spoken language by leveraging WER to guide model development. The process involves generating a normalized reference transcription and a normalized ASR output, which are then compared to compute the WER. This metric quantifies the discrepancy between the ASR output and the reference transcription, providing feedback to adjust and enhance the ASR model. The method ensures that the model is trained with iterative improvements based on WER, leading to more accurate speech recognition. The normalization step standardizes the transcriptions and outputs, eliminating variations that could skew the WER calculation. By incorporating WER as a guiding metric, the system refines its performance, reducing errors in speech-to-text conversion. This approach is particularly useful in applications requiring high-accuracy transcription, such as voice assistants, transcription services, and real-time captioning. The method optimizes ASR models by systematically analyzing and correcting errors, ensuring continuous improvement in recognition accuracy.

Claim 5

Original Legal Text

5. The computer-implemented method of claim 1, wherein one validity test involves determining whether segmentation of the compound term into the first sub-term and the second sub-term satisfies a language-specific rule.

Plain English Translation

This invention relates to natural language processing (NLP) systems that analyze compound terms in text. The problem addressed is ensuring accurate segmentation of compound terms into meaningful sub-terms while adhering to language-specific grammatical or syntactic rules. Many NLP systems struggle with correctly splitting compound terms, leading to errors in text analysis, machine translation, or information retrieval. The method involves a computer-implemented process that evaluates the validity of segmenting a compound term into two sub-terms. One key validity test checks whether the segmentation follows language-specific rules, such as morphological, syntactic, or semantic constraints. For example, in German, compound nouns are often split at morpheme boundaries, while in English, segmentation may depend on part-of-speech rules. The system may also use statistical models or predefined dictionaries to verify the correctness of the split. Additionally, the method may include preprocessing steps like tokenization and part-of-speech tagging to prepare the text for analysis. The results of the validity tests are used to determine whether the segmentation is acceptable or if alternative splits should be considered. This approach improves the accuracy of NLP tasks by ensuring that compound terms are processed in a linguistically valid manner.

Claim 6

Original Legal Text

6. The computer-implemented method of claim 5, wherein one language-specific rule specifies that no sub-term can begin with a specified character.

Plain English Translation

This invention relates to natural language processing (NLP) and text analysis, specifically addressing challenges in parsing and validating text strings according to language-specific grammatical or syntactic rules. The method involves processing a text string by applying predefined language-specific rules to determine whether the string conforms to expected linguistic patterns. One such rule enforces that no sub-term within the text can begin with a specified character, ensuring compliance with language-specific constraints. The method may also include additional rules, such as validating the presence or absence of certain characters, enforcing positional constraints on sub-terms, or checking for valid character sequences. The system may analyze the text string by breaking it into sub-terms, applying the rules to each sub-term, and generating an output indicating whether the text string is valid or invalid based on the applied rules. This approach is useful in applications like spell-checking, syntax validation, or automated text generation where adherence to language-specific conventions is critical. The method ensures that processed text strings meet predefined linguistic criteria, improving accuracy and consistency in NLP tasks.

Claim 7

Original Legal Text

7. The computer-implemented method of claim 5, wherein said producing further comprises, based on the entry in the substitution data store, replacing each occurrence of the combination of the first sub-term and the second sub-term in the original reference transcription and the original ASR output results with the compound term.

Plain English Translation

This invention relates to natural language processing and speech recognition, specifically improving the accuracy of automated speech recognition (ASR) systems by handling multi-word terms that are often misrecognized as separate words. The problem addressed is that ASR systems frequently split compound terms (e.g., "New York") into individual words ("New York"), leading to errors in downstream applications like transcription, search, or translation. The solution involves a method that identifies and corrects these errors by replacing split sub-terms with their correct compound form. The method processes an original reference transcription and ASR output results, which may contain misrecognized compound terms. A substitution data store is used, containing entries that map combinations of sub-terms (e.g., "New" and "York") to their correct compound term (e.g., "New York"). The system scans the original reference and ASR output for occurrences of these sub-term combinations and replaces them with the corresponding compound term from the substitution data store. This ensures consistency between the reference and ASR output, improving accuracy in applications that rely on these results. The method is particularly useful in domains where compound terms are common, such as place names, technical jargon, or proper nouns.

Claim 13

Original Legal Text

13. The computer-implemented method of claim 12, wherein said selecting chooses a term of the group that has highest frequency of use.

Plain English Translation

This invention relates to natural language processing and information retrieval, specifically improving the accuracy of term selection in text analysis. The problem addressed is the challenge of identifying the most relevant term from a group of semantically similar terms in a document or corpus, which is critical for tasks like search, summarization, and machine translation. Existing methods often struggle with ambiguity and context, leading to suboptimal term selection. The method involves analyzing a group of terms that share a common semantic meaning but differ in usage frequency. For example, in a medical document, terms like "hypertension" and "high blood pressure" may be semantically equivalent but appear with different frequencies. The method selects the term with the highest frequency of use within a given context, such as a document, corpus, or domain-specific dataset. This frequency-based selection ensures that the most commonly used term is chosen, improving consistency and relevance in downstream applications. The method may also involve preprocessing steps like tokenization, part-of-speech tagging, and semantic analysis to identify term groups and their frequencies. By prioritizing high-frequency terms, the approach enhances the accuracy of text processing tasks where term selection impacts output quality.

Claim 17

Original Legal Text

17. The computer-readable storage medium of claim 16, wherein a first conversion process translates a particular term from one natural language to another natural language, a second conversion process identifies a pronunciation of the particular term, and a third conversion process identifies a transliteration of the particular term.

Plain English Translation

This invention relates to a system for processing multilingual text data, specifically addressing challenges in language translation, pronunciation identification, and transliteration. The system includes a computer-readable storage medium containing instructions for executing multiple conversion processes. A first conversion process translates a specific term from one natural language to another, enabling cross-lingual communication. A second conversion process identifies the pronunciation of the term, which is useful for applications like speech synthesis or language learning. A third conversion process generates a transliteration of the term, converting it into a phonetic representation using characters from another language, aiding in pronunciation guidance or text normalization. The system may also include a user interface for inputting the term and displaying the results of these processes. Additionally, the system may support batch processing of multiple terms, allowing for efficient handling of large datasets. The invention is particularly useful in applications requiring accurate language conversion, such as machine translation, linguistic research, or educational tools.

Claim 18

Original Legal Text

18. The computer-readable storage medium of claim 16, wherein said selecting chooses a term of the group that has highest frequency of use.

Plain English Translation

A system and method for natural language processing (NLP) and text analysis involves extracting and analyzing terms from a text corpus to identify relevant groups of terms. The system processes input text to detect and group related terms, such as synonyms or semantically similar words, based on linguistic or statistical analysis. When analyzing a group of terms, the system selects the most frequently used term from the group to represent the group in further processing, such as indexing, search, or summarization. This approach improves efficiency and accuracy in text analysis by reducing redundancy and focusing on the most prominent terms. The method may involve preprocessing text, applying linguistic rules, or using statistical models to identify term relationships. The selected term is then used in downstream applications, such as search engines, document classification, or information retrieval, to enhance performance and relevance. The system may also include additional steps like filtering, ranking, or contextual analysis to refine term selection. The invention addresses challenges in NLP, such as handling synonyms, reducing noise, and improving term representation in large-scale text processing.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L

Patent Metadata

Filing Date

September 29, 2021

Publication Date

May 7, 2024

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search