Features are disclosed for using a neural network to tag sequential input without using an internal representation of the neural network generated when scoring previous positions in the sequence. A predicted or determined label (e.g., the highest scoring or otherwise most probable label) for input at a given position in the sequence can be used when scoring input corresponding to the next position the sequence. Additional features are disclosed for training a neural network for use in tagging sequential input without using an internal representation of the neural network generated when scoring previous positions the sequence.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A system comprising: a computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least: obtain input data regarding a sequence of tokens, the sequence of tokens comprising a first token and a second token, wherein the first token immediately precedes the second token in the sequence of tokens; generate, using a neural network and data regarding the first token, a first probability distribution reflecting, for each label of a plurality of labels, a probability that the label corresponds to the first token; generate, using the first probability distribution, preceding label data indicating that a particular label of the plurality of labels corresponds to the first token; generate, using the neural network and the preceding label data and information regarding the second token, a second probability distribution reflecting, for each label of the plurality of labels, a probability that the label corresponds to the second token, wherein the second probability distribution is generated by the neural network independently of any label corresponding to any token in the sequence of tokens preceding the first token; and determine a sequence of labels corresponding to the sequence of tokens based at least partly on the first and second probability distributions.
A system for tagging sequences of data (like words in a sentence) uses a neural network. The system takes a sequence of tokens (words), processes them one by one. For each token, it generates a probability distribution over possible labels (like part-of-speech tags). It uses the predicted label for the previous token to help predict the label for the current token. Importantly, the neural network doesn't remember hidden states or internal representations from processing previous tokens; it only uses the single best label prediction from the previous step. Based on these probability distributions, the system determines the most likely sequence of labels for the entire input sequence.
2. The system of claim 1 , wherein one or more processors are programmed to at least use the neural network as a Markov model.
The system described above, which uses a neural network to tag sequences of data, specifically uses the neural network in a way that mimics a Markov model. This means that the prediction for the current token depends only on the immediately preceding token's label, similar to how a Markov model operates, even though a neural network is used to generate the probabilities.
3. The system of claim 1 , wherein the preceding label data comprises a plurality of values, each value of the plurality of values corresponding to a different label of the plurality of labels, and wherein the value corresponding to the particular label is larger than all other values of the plurality of values.
In the system for sequence tagging using a neural network, the "preceding label data" (the information about the previous token's label) is represented as a set of values. Each value corresponds to a different possible label. The value associated with the most likely or predicted label for the previous token is set to a higher value than all other values. For example, one-hot encoding, where the selected label has a value of '1' and the rest '0'. This high value indicates to the neural network which label was predicted for the previous token.
4. A computer-implemented method comprising: under control of one or more computing devices configured with specific computer-executable instructions, obtaining input data regarding a current position in a sequence comprising a plurality of positions; obtaining preceding label data regarding a label for each of a finite number of preceding positions in the sequence; generating, using a neural network, a probability distribution reflecting, for each label of a plurality of labels, a probability that the label corresponds to the current position, wherein the probability distribution is based at least partly on the input data and the preceding label data; and determining a sequence of labels corresponding to the sequence of the plurality of positions based at least partly on the probability distribution.
A computer-implemented method tags sequences of data. For a current position in the sequence, the method obtains input data (e.g., the word itself). It also obtains "preceding label data," which represents labels predicted for a limited number of previous positions in the sequence. A neural network uses this input data and the preceding label data to generate a probability distribution over possible labels for the current position. The method then determines the most likely sequence of labels for the entire input sequence based on these probability distributions.
5. The computer-implemented method of claim 4 , wherein the finite number of preceding positions is limited to fewer than a total number of preceding positions in the sequence.
The computer-implemented method for sequence tagging described above, where a neural network predicts labels based on previous labels, only considers a *limited* number of preceding positions. This number is *less* than the total number of positions that came before the current one in the sequence. For example, only considering the immediately previous token, and ignoring all earlier tokens.
6. The computer-implemented method of claim 4 , wherein the finite number of preceding positions corresponds to only the position immediately preceding the current position.
The computer-implemented method for sequence tagging, which predicts labels using a neural network, considers only the label of the *immediately* preceding token in the sequence. The neural network uses this single previous label, along with the input data for the current token, to generate a probability distribution over possible labels for the current token.
7. The computer-implemented method of claim 4 , wherein the neural network is not a recurrent neural network.
The computer-implemented method for sequence tagging uses a neural network that is *not* a recurrent neural network (RNN). Instead of relying on internal memory of hidden states between steps, the network receives explicit "preceding label data" as input to inform its prediction for the current token.
8. The computer-implemented method of claim 4 , wherein the input data regarding the current position comprises a feature vector extracted from an input signal.
In the sequence tagging method using a neural network, the "input data" for the current token is not the raw data directly. Instead, it's a "feature vector" that's been extracted or computed from the raw input signal (e.g., the word). This feature vector represents relevant characteristics of the input.
9. The computer-implemented method of claim 4 , wherein the preceding label data comprises a plurality of values, each value of the plurality of values corresponding to a different label of the plurality of labels.
In the computer-implemented sequence tagging method, the preceding label data consists of a set of values, where each value corresponds to a different possible label.
10. The computer-implemented method of claim 4 , further comprising training the neural network using the input data regarding the current position and correct label data regarding the correct label for each of the finite number of preceding positions.
The computer-implemented method for sequence tagging also *trains* the neural network. The training process uses the input data for the current token, along with the correct labels for a finite number of preceding positions. This helps the network learn to predict labels accurately based on the input and the context provided by the previous labels.
11. The computer-implemented method of claim 10 , further comprising training the neural network using non-sequential training data.
The computer-implemented method for sequence tagging, which also trains the neural network, trains the network using "non-sequential training data." This means the training data isn't necessarily presented in the order that tokens appear in a sequence.
12. The computer-implemented method of claim 11 , wherein training the neural network using non-sequential training data comprises using input data for the current position prior to using input data for a preceding position.
When training the neural network for sequence tagging with non-sequential data, the method may use input data for the current position *before* using the input data for preceding positions. The training sequence can be randomized.
13. The computer-implemented method of claim 4 , wherein the sequence of labels corresponding to the sequence of the plurality of positions is determined using a Viterbi process.
The computer-implemented method determines the most likely sequence of labels for the entire input sequence using a "Viterbi process." The Viterbi algorithm efficiently finds the optimal sequence of labels given the probability distributions generated by the neural network for each token.
14. One or more non-transitory computer readable media comprising executable code that, when executed, cause one or more computing devices to perform a process comprising: obtaining input data regarding a current position in a sequence comprising a plurality of positions; generating, using a neural network, a probability distribution reflecting, for each label of a plurality of labels, a probability that the label corresponds to the current position, wherein the probability distribution is based at least partly on the input data and a label prediction for each of a finite number of preceding positions in the sequence; and determining a sequence of labels corresponding to the sequence of the plurality of positions based at least partly on the probability distribution.
A computer-readable medium stores instructions that cause a computer to perform a sequence tagging process. The process involves obtaining input data for the current position in a sequence, and using a neural network to generate a probability distribution over possible labels. The probability distribution is based partly on the input data, and partly on label *predictions* for a finite number of preceding positions in the sequence. The process determines the sequence of labels using the probability distributions.
15. The one or more non-transitory computer readable media of claim 14 , wherein the probability distribution is generated independently of a label prediction for any position of the sequence occurring before the finite number of preceding positions.
In the sequence tagging process performed by a computer, the probability distribution generated by the neural network is independent of any label predictions made for positions *before* the "finite number of preceding positions". Only a limited history is used to make predictions.
16. The one or more non-transitory computer readable media of claim 14 , wherein the finite number of preceding positions is limited to fewer than a total number of preceding positions in the sequence.
The sequence tagging process performed by a computer only considers a *limited* number of preceding positions when predicting labels. This number is *less* than the total number of positions that came before the current one in the sequence.
17. The one or more non-transitory computer readable media of claim 14 , wherein the neural network is not a recurrent neural network.
The sequence tagging process uses a neural network that is *not* a recurrent neural network (RNN). It relies on explicit label predictions from previous tokens.
18. The one or more non-transitory computer readable media of claim 14 , wherein the input data regarding the current position comprises a feature vector extracted from an input signal.
In the sequence tagging process, the "input data" for the current token is a "feature vector" that represents relevant characteristics of the input.
19. The one or more non-transitory computer readable media of claim 14 , wherein the preceding label data comprises a plurality of values, each value of the plurality of values corresponding to a different label of the plurality of labels.
The preceding label data used in the sequence tagging process contains a plurality of values, with each value corresponding to a different possible label.
20. The one or more non-transitory computer readable media of claim 14 , the process further comprising training the neural network using the input data regarding the current position and correct label data regarding the correct label for the preceding position.
The sequence tagging process also *trains* the neural network, using the current input data and the correct label for each of the preceding positions.
21. The one or more non-transitory computer readable media of claim 14 , the process further comprising training the neural network using input data for the current position prior to using input data for the preceding position.
When training the neural network for sequence tagging, the process can use input data for the current position *before* using input data for the preceding position, enabling non-sequential training.
22. The one or more non-transitory computer readable media of claim 14 , wherein the neural network generates, for a single input of the input data, a plurality of probability distributions, wherein individual probability distributions of the plurality of probability distributions correspond to different label predictions for the preceding position in the sequence.
The neural network used in the sequence tagging process can generate multiple probability distributions for a *single* input. Each probability distribution corresponds to a *different* label prediction for the preceding position in the sequence. This potentially explores multiple options for previous token labels.
23. The one or more non-transitory computer readable media of claim 14 , the process further comprising generating, using a second neural network, a second probability distribution reflecting, for each label of the plurality of labels, a probability that the label corresponds to the current position, wherein the second probability distribution is based at least partly on the input data and a second label prediction for the preceding position in the sequence, and wherein the second label prediction is different than the label prediction.
The sequence tagging process uses a *second* neural network to generate a *second* probability distribution. This second distribution also reflects the probabilities of different labels for the current position. It's based on the input data and a *different* label prediction for the preceding position than was used to generate the *first* probability distribution. This allows the system to consider multiple hypotheses for the previous token's label, using multiple neural networks to generate distributions.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 17, 2014
March 21, 2017
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.