Embodiments of the present invention are directed to facilitating data preprocessing for machine learning. In accordance with aspects of the present disclosure, a training set of data is accessed. A preprocessing query specifying a set of preprocessing parameter values that indicate a manner in which to preprocess the training set of data is received. Based on the preprocessing query, a preprocessing operation is performed to preprocess the training set of data in accordance with the set of preprocessing parameter values to obtain a set of preprocessed data. The set of preprocessed data can be provided for presentation as a preview. Based on an acceptance of the set of preprocessed data, the set of preprocessed data is used to train a machine learning model that can be subsequently used to predict data.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
2. The computer-implemented method of claim 1, wherein the machine learning model is used to subsequently predict data.
A computer-implemented method involves using a machine learning model to predict data. The model is trained on a dataset that includes input data and corresponding output data. During training, the model learns patterns and relationships within the dataset to improve its predictive accuracy. The trained model can then be applied to new, unseen input data to generate predictions. This process may involve preprocessing the input data, feeding it into the model, and post-processing the model's output to refine the predictions. The method may also include evaluating the model's performance using metrics such as accuracy, precision, recall, or mean squared error to assess its effectiveness. The predictions generated by the model can be used for various applications, including decision-making, anomaly detection, or forecasting, depending on the specific use case. The method ensures that the model is trained and validated properly to provide reliable and accurate predictions.
3. The computer-implemented method of claim 1 further comprising generating a first preprocessing query based on a user selection of the first set of preprocessing parameter values.
This invention relates to a computer-implemented method for data preprocessing in analytical systems. The method addresses the challenge of efficiently preparing raw data for analysis by allowing users to customize preprocessing steps through parameter selection. The system generates a preprocessing query based on user-defined parameters, enabling flexible and automated data transformation. The method involves receiving a set of preprocessing parameter values selected by a user, which define how raw data should be processed. These parameters may include filtering criteria, normalization settings, or aggregation rules. The system then generates a preprocessing query that encodes these parameters into executable instructions. This query is applied to the raw data to produce a processed dataset ready for further analysis. The preprocessing query is designed to be dynamically adjustable, allowing users to modify parameters and regenerate the query without manual intervention. This ensures consistency and reproducibility in data preparation. The method supports various preprocessing operations, such as data cleaning, feature extraction, and transformation, tailored to the user's specifications. By automating the preprocessing workflow, the invention reduces manual effort and minimizes errors in data preparation. It is particularly useful in environments where data analysis requires iterative refinement of preprocessing steps. The system enhances efficiency by enabling rapid adjustments to preprocessing logic based on evolving analytical needs.
4. The computer-implemented method of claim 1, wherein the first set of preprocessing parameter values includes at least one preprocessing method for use in preprocessing the at least the portion of the initial set of data.
This invention relates to data preprocessing in computer-implemented systems, specifically addressing the challenge of efficiently preparing raw data for analysis or machine learning tasks. The method involves selecting and applying preprocessing techniques to optimize data quality and usability. A key aspect is the use of a first set of preprocessing parameter values, which includes at least one preprocessing method designed to transform or clean a portion of the initial data set. These methods may include normalization, noise reduction, feature extraction, or other techniques to enhance data consistency and relevance. The preprocessing step ensures that the data is in a suitable form for subsequent processing stages, such as model training or statistical analysis. The invention emphasizes adaptability, allowing different preprocessing methods to be applied based on the specific characteristics of the data and the requirements of the downstream tasks. By systematically applying these preprocessing steps, the method improves data quality, reduces errors, and enhances the accuracy of subsequent analytical processes. The approach is particularly useful in fields like machine learning, data mining, and predictive analytics, where high-quality input data is critical for reliable results.
5. The computer-implemented method of claim 1, wherein the first set of preprocessing parameter values includes an indication of a field to preprocess.
This invention relates to a computer-implemented method for preprocessing data fields in a dataset. The method addresses the challenge of efficiently preparing data for analysis by allowing users to specify which fields require preprocessing and how they should be processed. The system identifies a first set of preprocessing parameter values, which includes an indication of a specific field to preprocess. These parameters define the preprocessing operations to be applied, such as normalization, encoding, or transformation, ensuring the data is in a suitable format for further analysis. The method then applies these preprocessing steps to the specified field, enhancing data quality and consistency. Additionally, the system may generate a second set of preprocessing parameter values based on the results of the first preprocessing step, allowing for iterative refinement. The method ensures that preprocessing is both targeted and adaptable, improving the efficiency and accuracy of subsequent data analysis tasks. This approach is particularly useful in machine learning and data analytics, where preprocessing is critical for model performance and reliability.
6. The computer-implemented method of claim 1, wherein the first preprocessing operation preprocesses the at least the portion of the initial set of data by formatting, cleaning, and/or sampling.
The invention relates to data preprocessing in computer-implemented systems, specifically addressing the need for efficient and effective preparation of raw data before analysis or processing. The method involves performing a first preprocessing operation on at least a portion of an initial set of data, where the preprocessing includes formatting, cleaning, and/or sampling the data. Formatting ensures the data is structured consistently, cleaning removes errors or inconsistencies, and sampling reduces the dataset size while maintaining representativeness. This preprocessing step enhances data quality and reduces computational overhead in subsequent operations. The method may also include additional preprocessing steps, such as normalization or transformation, to further refine the data for analysis. The invention aims to improve data processing efficiency and accuracy by systematically preparing data before further computational tasks.
7. The computer-implemented method of claim 1 further comprising causing display of a display area for receiving a set of training parameter values indicating a manner in which to train the machine learning model.
This invention relates to machine learning model training systems, specifically addressing the need for user-configurable training parameters. The method involves providing a graphical user interface (GUI) that includes a display area for receiving a set of training parameter values. These parameters define how the machine learning model should be trained, such as learning rate, batch size, number of epochs, or optimization algorithms. The GUI allows users to input these values, which are then used to configure the training process. The system ensures that the training parameters are properly validated and applied to the machine learning model during training. This approach enhances flexibility and control over the training process, enabling users to fine-tune model performance based on specific requirements. The invention may also include additional features, such as pre-defined parameter templates or real-time feedback on parameter selections, to assist users in optimizing the training process. The method is implemented on a computer system, leveraging computational resources to process and apply the specified training parameters efficiently. This solution is particularly useful in environments where customizable model training is required, such as research, enterprise applications, or specialized AI development.
8. The computer-implemented method of claim 1 further comprising receiving a set of training parameter values indicating a manner in which to train the machine learning model, the set of training parameter values including at least one of an algorithm to use for training the machine learning model, an indication of a field to predict, or an indication of a field to use for predicting.
This invention relates to machine learning model training, specifically addressing the need for flexible and configurable training processes. The method involves receiving a set of training parameter values that define how a machine learning model should be trained. These parameters include the algorithm to use for training, the field to predict (i.e., the target variable), and the fields to use for making predictions (i.e., the input features). The method ensures that the training process can be customized based on user-defined parameters, allowing for adaptability across different datasets and prediction tasks. The training parameters enable the selection of appropriate algorithms, such as regression, classification, or clustering, and specify the data fields involved in the prediction process. This flexibility allows the system to handle diverse machine learning tasks without requiring manual adjustments to the underlying training logic. The invention improves efficiency by automating the training configuration process, reducing the need for manual intervention while ensuring the model is trained according to specified requirements. The method is particularly useful in environments where multiple models must be trained with different configurations, such as in large-scale data analysis or automated machine learning pipelines.
9. The computer-implemented method of claim 1 further comprising receiving a set of training parameter values indicating a manner in which to train the machine learning model, wherein the set of training parameter values is based on a user selection of the set of training parameter values via a graphic user interface that enables concurrent user selection of the first set of preprocessing parameter values and the second set of preprocessing parameter values.
This invention relates to a computer-implemented method for training a machine learning model, focusing on user-configurable preprocessing and training parameters. The method addresses the challenge of optimizing machine learning model performance by allowing users to fine-tune preprocessing and training steps through an interactive graphical user interface (GUI). The GUI enables concurrent selection of preprocessing parameters for different data transformations, such as normalization, feature extraction, or dimensionality reduction, and training parameters that dictate how the model learns from the data. By providing a unified interface for adjusting these parameters, the method simplifies the model development process, reducing the need for manual scripting or separate tools. The training parameters may include learning rates, batch sizes, or optimization algorithms, which are selected alongside preprocessing steps to ensure compatibility and efficiency. This approach enhances flexibility and reproducibility in machine learning workflows, particularly for users with varying levels of technical expertise. The method is applicable across domains where data preprocessing significantly impacts model accuracy, such as image recognition, natural language processing, or predictive analytics.
10. The computer-implemented method of claim 1 further comprising storing the first set of preprocessed data and the first set of preprocessing parameter values for subsequent use.
This invention relates to data preprocessing in computer systems, specifically addressing the need to efficiently store and reuse preprocessing steps for data analysis or machine learning tasks. The method involves preprocessing a first set of data using a set of preprocessing parameter values, such as normalization, scaling, or feature extraction techniques. After preprocessing, the original data and the corresponding preprocessing parameters are stored for later retrieval and reuse. This allows subsequent data processing tasks to apply the same preprocessing steps consistently, ensuring reproducibility and reducing computational overhead. The stored preprocessing parameters can be applied to new datasets with similar characteristics, maintaining consistency in data transformations across different analyses. The method supports efficient data handling by avoiding redundant preprocessing steps, particularly in iterative or large-scale data processing workflows. By storing both the preprocessed data and the parameters used, the system enables quick retrieval and application of the same preprocessing logic, improving efficiency in data-driven applications. This approach is particularly useful in machine learning pipelines where consistent data preprocessing is critical for model performance and reliability.
11. The computer-implemented method of claim 1 further comprising storing the second set of preprocessed data and the second set of preprocessing parameter values for subsequent use.
This invention relates to data preprocessing in computer systems, specifically addressing the challenge of efficiently managing and reusing preprocessing steps for large datasets. The method involves preprocessing a first set of data using a first set of preprocessing parameter values to generate a first set of preprocessed data. The preprocessing may include operations such as normalization, feature extraction, or noise reduction. The method then applies the same preprocessing steps to a second set of data using the same preprocessing parameter values, producing a second set of preprocessed data. This ensures consistency in data handling across different datasets. The second set of preprocessed data and the corresponding preprocessing parameter values are stored for future use, allowing the same preprocessing steps to be reused without redundant computation. This approach improves efficiency in data processing workflows, particularly in machine learning and analytics applications where consistent preprocessing is critical. The stored preprocessing parameters and results enable quick reprocessing of similar datasets, reducing computational overhead and ensuring reproducibility.
12. The computer-implemented method of claim 1, wherein the machine learning model is trained using the second set of preprocessed data and a set of non-preprocessed data.
A computer-implemented method involves training a machine learning model using a combination of preprocessed and non-preprocessed data to improve accuracy and robustness. The method addresses the challenge of optimizing machine learning performance by leveraging both structured and unstructured data inputs. The machine learning model is trained on a second set of preprocessed data, which has undergone transformations such as normalization, feature extraction, or noise reduction, alongside a set of non-preprocessed data that retains its raw form. This hybrid training approach allows the model to learn from both refined and unrefined data, enhancing its ability to generalize across different data distributions. The method may also include preprocessing steps such as data cleaning, feature selection, or dimensionality reduction to prepare the second set of data for training. By incorporating both processed and unprocessed data, the model can better adapt to real-world scenarios where data quality and structure may vary. This technique is particularly useful in applications where data preprocessing is time-consuming or where raw data contains valuable information that preprocessing might otherwise remove. The method ensures the machine learning model remains versatile and effective across diverse datasets.
15. The computer-implemented method of claim 1, wherein the first set of preprocessing parameter values is different from the second set of preprocessing parameter values.
This invention relates to a computer-implemented method for optimizing preprocessing parameters in data analysis or machine learning workflows. The method addresses the challenge of selecting appropriate preprocessing steps, such as normalization, feature scaling, or data transformation, which can significantly impact the performance of subsequent analytical or predictive models. The invention provides a technique to dynamically adjust preprocessing parameters based on the characteristics of the input data or the requirements of the downstream task, ensuring improved accuracy and efficiency. The method involves applying a first set of preprocessing parameter values to an input dataset to generate a first processed dataset. A second set of preprocessing parameter values, distinct from the first, is then applied to the same input dataset to produce a second processed dataset. The two processed datasets are compared or evaluated, either automatically or manually, to determine which preprocessing approach yields better results for the intended application. This comparison may involve metrics such as model performance, computational efficiency, or data quality. The method may also include iterative refinement, where preprocessing parameters are further adjusted based on the comparison results. The technique is particularly useful in scenarios where the optimal preprocessing steps are unknown or vary depending on the data characteristics, ensuring adaptability and robustness in data processing pipelines.
17. The computer-implemented method of claim 1 further comprising receiving a user selection to use the at least the portion of the second set of preprocessed data to train the machine learning model.
A system and method for machine learning model training involves preprocessing data to generate a first set of preprocessed data and a second set of preprocessed data. The first set is used to train a machine learning model, while the second set is stored for potential future use. The method includes receiving a user selection to use at least a portion of the second set of preprocessed data to further train the machine learning model. This allows for flexible and iterative model training, enabling users to incorporate additional data as needed. The system may also include a user interface for selecting and managing the preprocessed data used in training. The method ensures efficient data handling by separating preprocessing from model training, allowing for modular and scalable machine learning workflows. This approach is particularly useful in scenarios where training data is updated or expanded over time, as it avoids redundant preprocessing steps and streamlines the training process. The system may also include validation and testing mechanisms to ensure the quality of the preprocessed data before it is used for training.
19. The one or more computer-readable storage media of claim 18, further comprising receiving a user selection to use the at least the portion of the second set of preprocessed data to train the machine learning model.
A system and method for machine learning model training involves preprocessing data to generate a first set of preprocessed data and a second set of preprocessed data. The first set is used to train a machine learning model, while the second set is stored for potential future use. The system allows a user to select and retrieve at least a portion of the second set of preprocessed data to further train or retrain the machine learning model. This approach enables efficient model training by leveraging preprocessed data, reducing redundant processing steps, and providing flexibility in model refinement. The system may also include preprocessing steps such as data normalization, feature extraction, or data augmentation to enhance the quality of the training data. By storing preprocessed data separately, the system optimizes computational resources and accelerates model development cycles. The user selection mechanism ensures that the most relevant or updated data can be incorporated into the training process as needed. This method is particularly useful in scenarios where continuous model improvement is required, such as in dynamic environments or applications with evolving data requirements.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
October 27, 2022
April 16, 2024
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.