US-9600764

Markov-based sequence tagging using neural networks

PublishedMarch 21, 2017

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Features are disclosed for using a neural network to tag sequential input without using an internal representation of the neural network generated when scoring previous positions in the sequence. A predicted or determined label (e.g., the highest scoring or otherwise most probable label) for input at a given position in the sequence can be used when scoring input corresponding to the next position the sequence. Additional features are disclosed for training a neural network for use in tagging sequential input without using an internal representation of the neural network generated when scoring previous positions the sequence.

Patent Claims

23 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A system comprising: a computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least: obtain input data regarding a sequence of tokens, the sequence of tokens comprising a first token and a second token, wherein the first token immediately precedes the second token in the sequence of tokens; generate, using a neural network and data regarding the first token, a first probability distribution reflecting, for each label of a plurality of labels, a probability that the label corresponds to the first token; generate, using the first probability distribution, preceding label data indicating that a particular label of the plurality of labels corresponds to the first token; generate, using the neural network and the preceding label data and information regarding the second token, a second probability distribution reflecting, for each label of the plurality of labels, a probability that the label corresponds to the second token, wherein the second probability distribution is generated by the neural network independently of any label corresponding to any token in the sequence of tokens preceding the first token; and determine a sequence of labels corresponding to the sequence of tokens based at least partly on the first and second probability distributions.

Plain English Translation

A system for tagging sequences of data (like words in a sentence) uses a neural network. The system takes a sequence of tokens (words), processes them one by one. For each token, it generates a probability distribution over possible labels (like part-of-speech tags). It uses the predicted label for the previous token to help predict the label for the current token. Importantly, the neural network doesn't remember hidden states or internal representations from processing previous tokens; it only uses the single best label prediction from the previous step. Based on these probability distributions, the system determines the most likely sequence of labels for the entire input sequence.

Claim 2

Original Legal Text

2. The system of claim 1 , wherein one or more processors are programmed to at least use the neural network as a Markov model.

Plain English Translation

The system described above, which uses a neural network to tag sequences of data, specifically uses the neural network in a way that mimics a Markov model. This means that the prediction for the current token depends only on the immediately preceding token's label, similar to how a Markov model operates, even though a neural network is used to generate the probabilities.

Claim 3

Original Legal Text

3. The system of claim 1 , wherein the preceding label data comprises a plurality of values, each value of the plurality of values corresponding to a different label of the plurality of labels, and wherein the value corresponding to the particular label is larger than all other values of the plurality of values.

Plain English Translation

In the system for sequence tagging using a neural network, the "preceding label data" (the information about the previous token's label) is represented as a set of values. Each value corresponds to a different possible label. The value associated with the most likely or predicted label for the previous token is set to a higher value than all other values. For example, one-hot encoding, where the selected label has a value of '1' and the rest '0'. This high value indicates to the neural network which label was predicted for the previous token.

Claim 4

Original Legal Text

4. A computer-implemented method comprising: under control of one or more computing devices configured with specific computer-executable instructions, obtaining input data regarding a current position in a sequence comprising a plurality of positions; obtaining preceding label data regarding a label for each of a finite number of preceding positions in the sequence; generating, using a neural network, a probability distribution reflecting, for each label of a plurality of labels, a probability that the label corresponds to the current position, wherein the probability distribution is based at least partly on the input data and the preceding label data; and determining a sequence of labels corresponding to the sequence of the plurality of positions based at least partly on the probability distribution.

Plain English Translation

A computer-implemented method tags sequences of data. For a current position in the sequence, the method obtains input data (e.g., the word itself). It also obtains "preceding label data," which represents labels predicted for a limited number of previous positions in the sequence. A neural network uses this input data and the preceding label data to generate a probability distribution over possible labels for the current position. The method then determines the most likely sequence of labels for the entire input sequence based on these probability distributions.

Claim 5

Original Legal Text

5. The computer-implemented method of claim 4 , wherein the finite number of preceding positions is limited to fewer than a total number of preceding positions in the sequence.

Plain English Translation

The computer-implemented method for sequence tagging described above, where a neural network predicts labels based on previous labels, only considers a *limited* number of preceding positions. This number is *less* than the total number of positions that came before the current one in the sequence. For example, only considering the immediately previous token, and ignoring all earlier tokens.

Claim 6

Original Legal Text

6. The computer-implemented method of claim 4 , wherein the finite number of preceding positions corresponds to only the position immediately preceding the current position.

Plain English Translation

The computer-implemented method for sequence tagging, which predicts labels using a neural network, considers only the label of the *immediately* preceding token in the sequence. The neural network uses this single previous label, along with the input data for the current token, to generate a probability distribution over possible labels for the current token.

Claim 7

Original Legal Text

7. The computer-implemented method of claim 4 , wherein the neural network is not a recurrent neural network.

Plain English Translation

The computer-implemented method for sequence tagging uses a neural network that is *not* a recurrent neural network (RNN). Instead of relying on internal memory of hidden states between steps, the network receives explicit "preceding label data" as input to inform its prediction for the current token.

Claim 8

Original Legal Text

8. The computer-implemented method of claim 4 , wherein the input data regarding the current position comprises a feature vector extracted from an input signal.

Plain English Translation

In the sequence tagging method using a neural network, the "input data" for the current token is not the raw data directly. Instead, it's a "feature vector" that's been extracted or computed from the raw input signal (e.g., the word). This feature vector represents relevant characteristics of the input.

Claim 9

Original Legal Text

9. The computer-implemented method of claim 4 , wherein the preceding label data comprises a plurality of values, each value of the plurality of values corresponding to a different label of the plurality of labels.

Plain English Translation

In the computer-implemented sequence tagging method, the preceding label data consists of a set of values, where each value corresponds to a different possible label.

Claim 10

Original Legal Text

10. The computer-implemented method of claim 4 , further comprising training the neural network using the input data regarding the current position and correct label data regarding the correct label for each of the finite number of preceding positions.

Plain English Translation

The computer-implemented method for sequence tagging also *trains* the neural network. The training process uses the input data for the current token, along with the correct labels for a finite number of preceding positions. This helps the network learn to predict labels accurately based on the input and the context provided by the previous labels.

Claim 11

Original Legal Text

11. The computer-implemented method of claim 10 , further comprising training the neural network using non-sequential training data.

Plain English Translation

The computer-implemented method for sequence tagging, which also trains the neural network, trains the network using "non-sequential training data." This means the training data isn't necessarily presented in the order that tokens appear in a sequence.

Claim 12

Original Legal Text

12. The computer-implemented method of claim 11 , wherein training the neural network using non-sequential training data comprises using input data for the current position prior to using input data for a preceding position.

Plain English Translation

When training the neural network for sequence tagging with non-sequential data, the method may use input data for the current position *before* using the input data for preceding positions. The training sequence can be randomized.

Claim 13

Original Legal Text

13. The computer-implemented method of claim 4 , wherein the sequence of labels corresponding to the sequence of the plurality of positions is determined using a Viterbi process.

Plain English Translation

The computer-implemented method determines the most likely sequence of labels for the entire input sequence using a "Viterbi process." The Viterbi algorithm efficiently finds the optimal sequence of labels given the probability distributions generated by the neural network for each token.

Claim 14

Original Legal Text

14. One or more non-transitory computer readable media comprising executable code that, when executed, cause one or more computing devices to perform a process comprising: obtaining input data regarding a current position in a sequence comprising a plurality of positions; generating, using a neural network, a probability distribution reflecting, for each label of a plurality of labels, a probability that the label corresponds to the current position, wherein the probability distribution is based at least partly on the input data and a label prediction for each of a finite number of preceding positions in the sequence; and determining a sequence of labels corresponding to the sequence of the plurality of positions based at least partly on the probability distribution.

Plain English Translation

A computer-readable medium stores instructions that cause a computer to perform a sequence tagging process. The process involves obtaining input data for the current position in a sequence, and using a neural network to generate a probability distribution over possible labels. The probability distribution is based partly on the input data, and partly on label *predictions* for a finite number of preceding positions in the sequence. The process determines the sequence of labels using the probability distributions.

Claim 15

Original Legal Text

15. The one or more non-transitory computer readable media of claim 14 , wherein the probability distribution is generated independently of a label prediction for any position of the sequence occurring before the finite number of preceding positions.

Plain English Translation

In the sequence tagging process performed by a computer, the probability distribution generated by the neural network is independent of any label predictions made for positions *before* the "finite number of preceding positions". Only a limited history is used to make predictions.

Claim 16

Original Legal Text

16. The one or more non-transitory computer readable media of claim 14 , wherein the finite number of preceding positions is limited to fewer than a total number of preceding positions in the sequence.

Plain English Translation

The sequence tagging process performed by a computer only considers a *limited* number of preceding positions when predicting labels. This number is *less* than the total number of positions that came before the current one in the sequence.

Claim 17

Original Legal Text

17. The one or more non-transitory computer readable media of claim 14 , wherein the neural network is not a recurrent neural network.

Plain English Translation

The sequence tagging process uses a neural network that is *not* a recurrent neural network (RNN). It relies on explicit label predictions from previous tokens.

Claim 18

Original Legal Text

18. The one or more non-transitory computer readable media of claim 14 , wherein the input data regarding the current position comprises a feature vector extracted from an input signal.

Plain English Translation

In the sequence tagging process, the "input data" for the current token is a "feature vector" that represents relevant characteristics of the input.

Claim 19

Original Legal Text

19. The one or more non-transitory computer readable media of claim 14 , wherein the preceding label data comprises a plurality of values, each value of the plurality of values corresponding to a different label of the plurality of labels.

Plain English Translation

The preceding label data used in the sequence tagging process contains a plurality of values, with each value corresponding to a different possible label.

Claim 20

Original Legal Text

20. The one or more non-transitory computer readable media of claim 14 , the process further comprising training the neural network using the input data regarding the current position and correct label data regarding the correct label for the preceding position.

Plain English Translation

The sequence tagging process also *trains* the neural network, using the current input data and the correct label for each of the preceding positions.

Claim 21

Original Legal Text

21. The one or more non-transitory computer readable media of claim 14 , the process further comprising training the neural network using input data for the current position prior to using input data for the preceding position.

Plain English Translation

When training the neural network for sequence tagging, the process can use input data for the current position *before* using input data for the preceding position, enabling non-sequential training.

Claim 22

Original Legal Text

22. The one or more non-transitory computer readable media of claim 14 , wherein the neural network generates, for a single input of the input data, a plurality of probability distributions, wherein individual probability distributions of the plurality of probability distributions correspond to different label predictions for the preceding position in the sequence.

Plain English Translation

The neural network used in the sequence tagging process can generate multiple probability distributions for a *single* input. Each probability distribution corresponds to a *different* label prediction for the preceding position in the sequence. This potentially explores multiple options for previous token labels.

Claim 23

Original Legal Text

23. The one or more non-transitory computer readable media of claim 14 , the process further comprising generating, using a second neural network, a second probability distribution reflecting, for each label of the plurality of labels, a probability that the label corresponds to the current position, wherein the second probability distribution is based at least partly on the input data and a second label prediction for the preceding position in the sequence, and wherein the second label prediction is different than the label prediction.

Plain English Translation

The sequence tagging process uses a *second* neural network to generate a *second* probability distribution. This second distribution also reflects the probabilities of different labels for the current position. It's based on the input data and a *different* label prediction for the preceding position than was used to generate the *first* probability distribution. This allows the system to consider multiple hypotheses for the previous token's label, using multiple neural networks to generate distributions.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06N

Patent Metadata

Filing Date

June 17, 2014

Publication Date

March 21, 2017

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search