10915233

Automated Entity Correlation and Classification Across Heterogeneous Datasets

PublishedFebruary 9, 2021
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
26 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method comprising: receiving, by a computing system, a data set comprising a column of data from one or more data sources, wherein the computing system is a big data system configured to analyze large data sets; normalizing, by the computing system, the data set comprising the column of data and having a first format to create a normalized data set in a second column of data by modifying the data in the data set to have a common format for data in the normalized data set; identifying, by the computing system, a set of patterns for a set of entities in the normalized data set in the column of data using a hierarchy of regular expressions, wherein the set of patterns for the set of entities is identified based on a semantic similarity between the set of patterns for the set of entities in the normalized data set and one or more data sets in a knowledge source, wherein the knowledge source comprises information published by one of a web site, a web service, or a knowledge store; extracting, by the computing system, based on the identified set of patterns, entity information corresponding to the set of entities from the normalized data set comprising the data set having the common format; classifying, by the computing system, the set of entities using the entity information in order to obtain a classification for attributes of the set of entities, wherein the classification for the attributes of the set of entities identifies a type of the set of entities in the column of data; transforming, automatically by the computing system, the data set based on the classification and the entity information by generating a transformed data set comprising classification attribute metadata for the set of entities in the normalized data set in the second column of data; determining, by the computing system, a transform script comprising a plurality of transformations applied to the normalized data in the column of data to generate the transformed data set; rendering, by the computing system, a graphical interface that displays the transformed data set and the transform script comprising the plurality of transformations applied to the normalized data set to generate the transformed data set; receiving, via the graphical interface, an input by a user selecting the transform script comprising the plurality of transformations; applying the selected transform script to the data set; and generating the transformed data set based on the selected transform script.

Plain English Translation

This invention relates to a big data system for analyzing and transforming large datasets. The system receives a dataset containing a column of data from one or more sources. The data is normalized into a common format, ensuring consistency across the dataset. The system then identifies patterns within the normalized data using a hierarchy of regular expressions, comparing these patterns to a knowledge source such as a website, web service, or knowledge store to determine semantic similarities. Based on these patterns, entity information is extracted, and the entities are classified to determine their types. The classified data is transformed into a structured format, generating metadata that describes the attributes of the entities. The system also determines a transform script, which outlines the steps applied to the normalized data to produce the transformed dataset. A graphical interface displays the transformed data and the transform script, allowing users to select and apply the script to generate the final transformed dataset. This approach automates data normalization, pattern recognition, entity classification, and transformation, improving efficiency in big data analysis.

Claim 2

Original Legal Text

2. The method of claim 1 , further comprising: computing a set of pattern metrics, wherein each of the set of pattern metrics is computed for a different pattern in the set of patterns, and wherein the set of patterns includes a plurality of different patterns identified for the set of entities; determining a difference amongst the set of pattern metrics; and selecting, based on the difference amongst the set of pattern metrics, a pattern in the set of patterns, wherein the classification is determined based on the selected pattern.

Plain English translation pending...
Claim 3

Original Legal Text

3. The method of claim 1 , further comprising: identifying text data in the normalized data set, the text data corresponding to each entity of the set of entities; determining a set of classifications for the set of entities; and computing a set of classification metrics, each of the set of classification metrics computed for a different classification in the set of classifications; and wherein the classification is determined based on determining a difference amongst the set of classifications.

Plain English translation pending...
Claim 4

Original Legal Text

4. The method of claim 1 , wherein normalizing the data set to create the normalized data set includes modifying the data set having the first format to an adjusted format of the normalized data set, the adjusted format being different from the first format.

Plain English translation pending...
Claim 5

Original Legal Text

5. The method of claim 1 , wherein identifying the set of patterns includes comparing column data amongst a plurality of columns in the normalized data set, and determining that each of a set of columns in the plurality of columns has an attribute, and wherein the set of patterns for the set of entities is identified based on the normalized data for each column having the attribute.

Plain English translation pending...
Claim 6

Original Legal Text

6. The method of claim 1 , wherein the transformed data set may be generated from the normalized data set by modifying the normalized data set to include data about the classification for the set of entities.

Plain English Translation

This invention relates to data processing systems that classify entities within a dataset. The problem addressed is the need to efficiently transform and classify data while preserving its integrity and usability. The method involves generating a transformed dataset from a normalized dataset by incorporating classification information for a set of entities. The normalized dataset is first prepared by standardizing the data, ensuring consistency and comparability. This normalized dataset is then modified to include classification labels or metadata that categorize the entities within the dataset. The transformation process ensures that the classified data remains structured and accessible for further analysis or machine learning applications. The method may involve applying classification algorithms, rule-based systems, or other techniques to assign classifications to the entities. The resulting transformed dataset retains the original normalized data while augmenting it with classification information, enabling more sophisticated data analysis, decision-making, or predictive modeling. This approach is particularly useful in fields such as finance, healthcare, and logistics, where accurate classification of entities is critical for operational efficiency and decision support.

Claim 7

Original Legal Text

7. The method of claim 1 , further comprising: rendering, by the computing system, an additional graphical interface that displays the normalized data, that identifies the entity information, and that indicates the classification; and receiving an input indicating selection of the classification in the additional graphical interface, wherein the transformed data set is generated upon receiving the input.

Plain English translation pending...
Claim 8

Original Legal Text

8. The method according to claim 1 , wherein the data set comprises a plurality of columns of data.

Plain English translation pending...
Claim 9

Original Legal Text

9. The method according to claim 1 , wherein the graphical interface is an interactive graphical interface.

Plain English Translation

A system and method for enhancing user interaction with graphical interfaces, particularly in computing environments, addresses the problem of inefficient and non-intuitive user input mechanisms. The invention provides an interactive graphical interface that dynamically responds to user inputs, improving usability and reducing the time required to complete tasks. The graphical interface includes visual elements such as icons, menus, and data displays that adapt in real-time based on user actions, such as clicks, gestures, or voice commands. The interface may also incorporate predictive features, suggesting actions or content based on user behavior patterns. Additionally, the system may include feedback mechanisms, such as visual or auditory cues, to confirm user inputs and guide navigation. The interactive nature of the interface allows for seamless transitions between different modes of operation, such as editing, viewing, or data entry, without requiring complex manual adjustments. This approach enhances user experience by making interactions more intuitive and reducing cognitive load. The invention is applicable in various domains, including software applications, mobile devices, and virtual reality environments, where efficient and responsive user interfaces are critical.

Claim 10

Original Legal Text

10. The method according to claim 1 , wherein the graphical interface displays the transformed data set and information indicating a recommended transformation to the normalized data set.

Plain English Translation

A method for data transformation and visualization involves processing a data set to generate a normalized data set, where the normalization includes scaling, centering, or other adjustments to standardize the data. The method further includes transforming the normalized data set into a different format or representation, such as a different coordinate system or data structure. A graphical interface is used to display the transformed data set alongside information indicating a recommended transformation to the normalized data set. The recommended transformation may include specific parameters, techniques, or steps to further refine or adjust the data, ensuring optimal analysis or visualization. The graphical interface allows users to interact with the data, view the transformations applied, and apply the recommended transformations to improve data consistency, accuracy, or usability. This method is particularly useful in data analysis, machine learning, and statistical applications where normalized and transformed data sets are essential for accurate modeling and interpretation.

Claim 11

Original Legal Text

11. The method according to claim 10 , wherein the recommended transformation is performed in accordance with a user input on the graphical interface.

Plain English Translation

This invention relates to a system for transforming data in a graphical interface, addressing the challenge of efficiently modifying data representations based on user preferences. The method involves generating a graphical interface that displays data in a visual format, such as a chart or table, and providing transformation options to the user. These transformations may include altering data structures, formats, or visual representations. The system analyzes the data and the user's interaction with the interface to determine optimal transformation options. When a user selects a transformation, the system applies it to the data and updates the graphical display accordingly. The transformation is performed in response to explicit user input, ensuring that modifications align with the user's intent. The system may also track user behavior to refine future transformation recommendations. This approach enhances data usability by allowing dynamic adjustments without requiring manual data manipulation, improving efficiency and user experience in data analysis tasks.

Claim 12

Original Legal Text

12. The method according to claim 1 , wherein the classification comprises classifying the set of entities into one or more matching domains in accordance with a similarity score.

Plain English translation pending...
Claim 13

Original Legal Text

13. The method according to claim 1 , wherein the plurality of transformations in the transform script comprises a plurality of actions, and wherein the plurality of actions comprise one or more of an update action, a split action, a filter action, an edit column action, an extract action, an insert action, a rename action, a sample action, a join action, an export action, an obfuscate action, a data reformat action, a change case action, or a whitelist filter action.

Plain English translation pending...
Claim 14

Original Legal Text

14. A data enrichment system comprising: a plurality of data sources; and a cloud computing infrastructure system comprising: one or more processors communicatively coupled to the plurality of data sources over at least one communication network; and a memory coupled to the one or more processors, the memory storing instructions to provide a data enrichment service, wherein the data enrichment system is a big data system configured to analyze large data sets, wherein the instructions, when executed by the one or more processors, cause the one or more processors to: receive a data set comprising a column of data from one or more data sources of the plurality of data sources; normalize the data set to create a normalized data set comprising the column of data and having a first format by modifying the data in the data set in a second column of data to have a common format for data in the normalized data set; identify a set of patterns for a set of entities in the normalized data set in the column of data using a hierarchy of regular expressions, wherein the set of patterns for the set of entities is identified based on a semantic similarity between the set of patterns for the set of entities in the normalized data set and one or more data sets in a knowledge source, wherein the knowledge source comprises information published by one of a web site, a web service, or a knowledge store; extract, based on the identified set of patterns, entity information corresponding to the set of entities from the normalized data set comprising the data set having the common format; classify the set of entities using the entity information in order to obtain a classification for attributes of the set of entities, wherein the classification for the attributes of the set of entities identifies a type of the set of entities in the column of data; automatically transform the data set based on the classification and the entity information by generating a transformed data set comprising classification attribute metadata for the set of entities in the normalized data set in the second column of data; determine a transform script comprising a plurality of transformations applied to the normalized data in the column of data to generate the transformed data set; and render a graphical interface that displays the transformed data set and the transform script comprising the plurality of transformations applied to the normalized data set to generate the transformed data set; receive, via the graphical interface, an input by a user selecting the transform script comprising the plurality of transformations; apply the selected transform script to the data set; and generate the transformed data set based on the selected transform script.

Plain English translation pending...
Claim 15

Original Legal Text

15. The data enrichment system of claim 14 , wherein normalizing the data set to create the normalized data includes modifying the data set having the first format to an adjusted format of the normalized data set, the adjusted format being different from the first format.

Plain English translation pending...
Claim 16

Original Legal Text

16. The data enrichment system of claim 14 , wherein the instructions, when executed by the one or more processors, further cause the one or more processors to: compute a set of pattern metrics, wherein each of the set of pattern metrics is computed for a different pattern in the set of patterns, and wherein the set of patterns includes a plurality of different patterns identified for the set of entities; determine a difference amongst the set of pattern metrics; and select, based on the difference amongst the set of pattern metrics, a pattern in the set of patterns, wherein the classification is determined based on the selected pattern.

Plain English translation pending...
Claim 17

Original Legal Text

17. The data enrichment system of claim 14 , wherein the instructions, when executed by the one or more processors, further cause the one or more processors to: identify text data in the normalized data set, the text data corresponding to each entity of the set of entities; determine a set of classifications for the set of entities; and compute a set of classification metrics, each of the set of classification metrics computed for a different classification in the set of classifications; and wherein the classification is determined based on determining a difference amongst the set of classifications.

Plain English translation pending...
Claim 18

Original Legal Text

18. The data enrichment system of claim 14 , wherein the transformed data set may be generated from the normalized data set by modifying the normalized data set to include data about the classification for the set of entities.

Plain English translation pending...
Claim 19

Original Legal Text

19. The data enrichment system of claim 14 , wherein the instructions, when executed by the one or more processors, further cause the one or more processors to: render an additional graphical interface that displays the normalized data, that identifies the entity information, and that indicates the classification; and receive an input indicating selection of the classification in the additional graphical interface, wherein the transformed data set is generated upon receiving the input.

Plain English translation pending...
Claim 20

Original Legal Text

20. A non-transitory computer readable storage medium including instructions stored thereon which, when executed by one or more processors, cause the one or more processors to: receive, by a computing system, a data set comprising a column of data from one or more data sources, wherein the computing system is a big data system configured to analyze large data sets; normalize, by the computing system, the data set comprising the column of data and having a first format to create a normalized data set in a second column of data by modifying the data in the data set to have a common format for data in the normalized data set; identify, by the computing system, a set of patterns for a set of entities in the normalized data set in the column of data using a hierarchy of regular expressions, wherein the set of patterns for the set of entities is identified based on a semantic similarity between the set of patterns for the set of entities in the normalized data set and one or more data sets in a knowledge source, wherein the knowledge source comprises information published by one of a web site, a web service, or a knowledge store; extract, by the computing system, based on the identified set of patterns, entity information corresponding to the set of entities from the normalized data set; classify, by the computing system, the set of entities using the entity information in order to obtain a classification for attributes of the set of entities, wherein the classification for the attributes of the set of entities identifies a type of the set of entities in the column of data; transform, automatically by the computing system, the data set based on the classification and the entity information by generating a transformed data set comprising classification attribute metadata for the set of entities in the normalized data set comprising the data set having the common format in the second column of data; determine, by the computing system, a transform script comprising a plurality of transformations applied to the normalized data to generate the transformed data set; and render, by the computing system, a graphical interface that displays the transformed data set and the transform script comprising the plurality of transformations applied to the normalized data set in the column of data to generate the transformed data set; receive, via the graphical interface, an input by a user selecting the transform script comprising the plurality of transformations; apply the selected transform script to the data set; and generate the transformed data set based on the selected transform script.

Plain English translation pending...
Claim 21

Original Legal Text

21. The non-transitory computer readable storage medium of claim 20 , wherein the instructions, when executed by the one or more processors, further cause the one or more processors to: compute a set of pattern metrics, wherein each of the set of pattern metrics is computed for a different pattern in the set of patterns, and wherein the set of patterns includes a plurality of different patterns identified for the set of entities; determine a difference amongst the set of pattern metrics; and select, based on the difference amongst the set of pattern metrics, a pattern in the set of patterns, wherein the classification is determined based on the selected pattern.

Plain English translation pending...
Claim 22

Original Legal Text

22. The non-transitory computer readable storage medium of claim 20 , wherein the instructions, when executed by the one or more processors, further cause the one or more processors to: identify text data in the normalized data set, the text data corresponding to each entity of the set of entities; determine a set of classifications for the set of entities; and compute a set of classification metrics, each of the set of classification metrics computed for a different classification in the set of classifications; and wherein the classification is determined based on determining a difference amongst the set of classifications.

Plain English translation pending...
Claim 23

Original Legal Text

23. The non-transitory computer readable storage medium of claim 20 , wherein normalizing the data set to create the normalized data includes modifying the data set having the first format to an adjusted format of the normalized data set, the adjusted format being different from the first format.

Plain English translation pending...
Claim 24

Original Legal Text

24. The non-transitory computer readable storage medium of claim 20 , wherein identifying the set of patterns includes comparing column data amongst a plurality of columns in the normalized data set, and determining that each of a set of columns in the plurality of columns has an attribute, and wherein the set of patterns for the set of entities is identified based on the normalized data for each column having the attribute.

Plain English translation pending...
Claim 25

Original Legal Text

25. The non-transitory computer readable storage medium of claim 20 , wherein the transformed data set may be generated from the normalized data set by modifying the normalized data set to include data about the classification for the set of entities.

Plain English translation pending...
Claim 26

Original Legal Text

26. The non-transitory computer readable storage medium of claim 20 , wherein the instructions, when executed by the one or more processors, further cause the one or more processors to: render an additional graphical interface that displays the normalized data, that identifies the entity information, and that indicates the classification; and receive an input indicating selection of the classification in the additional graphical interface, wherein the transformed data set is generated upon receiving the input.

Plain English translation pending...
Patent Metadata

Filing Date

Unknown

Publication Date

February 9, 2021

Inventors

Alexander Sasha Stojanovic
Philip Ogren
Kevin L. Markey
Mark Kreider

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “AUTOMATED ENTITY CORRELATION AND CLASSIFICATION ACROSS HETEROGENEOUS DATASETS” (10915233). https://patentable.app/patents/10915233

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10915233. See llms.txt for full attribution policy.