(S)-carnitine 3-dehydrogenase: Difference between revisions

Latest revision as of 02:49, 27 May 2013

Katz back-off is a generative n-gram language model that estimates the conditional probability of a word given its history in the n-gram. It accomplishes this estimation by "backing-off" to models with smaller histories under certain conditions. By doing so, the model with the most reliable information about a given history is used to provide the better results.

The method

The equation for Katz's back-off model is: ^[1]

P_{b o} (w_{i} | w_{i - n + 1} \dots w_{i - 1}) = {\begin{cases} d_{w_{i - n + 1} \dots w_{i}} \frac{C (w_{i - n + 1} . . . w_{i - 1} w_{i})}{C (w_{i - n + 1} \dots w_{i - 1})} if C (w_{i - n + 1} \dots w_{i}) > k \\ α_{w_{i - n + 1} \dots w_{i - 1}} P_{b o} (w_{i} | w_{i - n + 2} \dots w_{i - 1}) otherwise \end{cases}

where,

C (x)

= number of times x appears in training

w_{i}

= ith word in the given context

Essentially, this means that if the n-gram has been seen more than k times in training, the conditional probability of a word given its history is proportional to the maximum likelihood estimate of that n-gram. Otherwise, the conditional probability is equal to the back-off conditional probability of the "(n-1)-gram".

The more difficult part is determining the values for k, d and α.

$k$ is the least important of the parameters. It is usually chosen to be 0. However, empirical testing may find better values for k.

$d$ is typically the amount of discounting found by Good-Turing estimation. In other words, if Good-Turing estimates $C$ as $C^{*}$ , then $d = \frac{C^{*}}{C}$

To compute $α$ , it is useful to first define a quantity β, which is the left-over probability mass for the (n-1)-gram:

β_{w_{i - n + 1} \dots w_{i - 1}} = 1 - \sum_{{w_{i} : C (w_{i - n + 1} \dots w_{i}) > k}} d_{w_{i - n + 1} \dots w_{i}} \frac{C (w_{i - n + 1} . . . w_{i - 1} w_{i})}{C (w_{i - n + 1} \dots w_{i - 1})}

Then the back-off weight, α, is computed as follows:

α_{w_{i - n + 1} \dots w_{i - 1}} = \frac{β_{w_{i - n + 1} \dots w_{i - 1}}}{\sum_{{w_{i} : C (w_{i - n + 1} \dots w_{i}) \leq k}} P_{b o} (w_{i} | w_{i - n + 2} \dots w_{i - 1})}

Discussion

This model generally works well in practice, but fails in some circumstances. For example, suppose that the bigram "a b" and the unigram "c" are very common, but the trigram "a b c" is never seen. Since "a b" and "c" are very common, it may be significant (that is, not due to chance) that "a b c" is never seen. Perhaps it's not allowed by the rules of the grammar. Instead of assigning a more appropriate value of 0, the method will back off to the bigram and estimate P(c | b), which may be too high.^[2]

References

↑ Katz, S. M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3), 400–401.
↑ Manning and Schütze, Foundations of Statistical Natural Language Processing, MIT Press (1999), ISBN 978-0-262-13360-9.

[1] Katz, S. M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3), 400–401.

[2] Manning and Schütze, Foundations of Statistical Natural Language Processing, MIT Press (1999), ISBN 978-0-262-13360-9.

[1]

[2]

@@ Line 1: / Line 1: @@
-Andrew Berryhill is what his spouse enjoys to call him and he totally digs that title. To play lacross is the thing I adore most of all. Distributing manufacturing is how he tends to make a residing. Mississippi is exactly where her house is but her husband desires them to move.<br><br>Feel free to visit my homepage ... online reader [[http://www.skullrocker.com/blogs/post/10991 linked web-site]]
+'''Katz back-off''' is a generative [[n-gram]] [[language model]] that estimates the [[conditional probability]] of a word given its history in the n-gram. It accomplishes this estimation by "backing-off" to models with smaller histories under certain conditions. By doing so, the model with the most reliable information about a given history is used to provide the better results.
+==The method==
+The equation for Katz's back-off model is: <ref>Katz, S. M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3), 400–401. </ref>
+:<math>P_{bo} (w_i | w_{i-n+1} \cdots w_{i-1}) = \begin{cases}
+    d_{w_{i-n+1} \cdots w_{i}} \frac{C(w_{i-n+1}...w_{i-1}w_{i})}{C(w_{i-n+1} \cdots w_{i-1})} \mbox{ if } C(w_{i-n+1} \cdots w_i) > k \\
+    \alpha_{w_{i-n+1} \cdots w_{i-1}} P_{bo}(w_i | w_{i-n+2} \cdots w_{i-1}) \mbox{ otherwise}
+\end{cases}
+</math>
+where,
+:<math>C(x)</math> = number of times x appears in training
+:<math>w_i</math> = ith word in the given context
+Essentially, this means that if the n-gram has been seen more than ''k'' times in training, the conditional probability of a word given its history is proportional to the [[maximum likelihood]] estimate of that n-gram. Otherwise, the conditional probability is equal to the back-off conditional probability of the "(n-1)-gram".
+The more difficult part is determining the values for k, d and α.
+<math>k</math> is the least important of the parameters. It is usually chosen to be 0. However, empirical testing may find better values for k.
+<math>d</math> is typically the amount of discounting found by [[Good-Turing]] estimation. In other words, if Good-Turing estimates <math>C</math> as <math>C^*</math>, then <math>d = \frac{C^*}{C}</math>
+To compute <math>\alpha</math>, it is useful to first define a quantity β, which is the left-over probability mass for the (n-1)-gram:
+:<math>\beta_{w_{i-n+1} \cdots w_{i -1}} = 1 - \sum_{ \{w_i : C(w_{i-n+1} \cdots w_{i}) > k \} } d_{w_{i-n+1} \cdots w_{i}} \frac{C(w_{i-n+1}...w_{i-1} w_{i})}{C(w_{i-n+1} \cdots w_{i-1})} </math>
+Then the back-off weight, α, is computed as follows:
+:<math>\alpha_{w_{i-n+1} \cdots w_{i -1}} = \frac{\beta_{w_{i-n+1} \cdots w_{i -1}}}        {\sum_{ \{ w_i : C(w_{i-n+1} \cdots w_{i}) \leq k \} } P_{bo}(w_i | w_{i-n+2} \cdots w_{i-1})}</math>
+==Discussion==
+This model generally works well in practice, but fails in some circumstances. For example, suppose that the bigram "a b" and the unigram "c" are very common, but the trigram "a b c" is never seen. Since "a b" and "c" are very common, it may be significant (that is, not due to chance) that "a b c" is never seen. Perhaps it's not allowed by the rules of the grammar. Instead of assigning a more appropriate value of 0, the method will back off to the bigram and estimate P(c | b), which may be too high.<ref>Manning and Schütze, Foundations of Statistical Natural Language Processing, MIT Press (1999), ISBN 978-0-262-13360-9.</ref>
+==References==
+<references/>
+[[Category:Statistical natural language processing]]

(S)-carnitine 3-dehydrogenase: Difference between revisions

Latest revision as of 02:49, 27 May 2013

The method

Discussion

References

Navigation menu

Search