look carefully to the denominator of that formula (page 370). You will easily spot that it refers to all the possible N-grams in your corpus, not just that one constrained according a particular w_i. The meaning of that denominator is just a counter of all the possible N-grams present in your corpus. If you correctly interpret that counter you will easily understand that your denominator cannot be equal to 0 (except in the case you do not have any N-gram in it).
Let me know if you have understood your mistake.
Bye, michele.
On Thu, Jun 21, 2012 at 3:20 PM, Coen Jonker <coen.j.jonker at gmail.com>wrote:
> Dear readers of the corpora list,
>
>
> As a part of the AI-master course handwriting recognition I am working on
> the implementation of a Statistical Language Model for 19th century Dutch.
> I am running into a problem and hope you may be able to help. I have
> already spoken with prof. Ernst Wit and he suggested I contacted you. I
> would be very grateful if you could help me along.
>
> The purpose of the statistical language model is to provide a
> knowledge-based estimation for the conditional probability of a word w
> given the history h (previous words), let this probability be P(w|h).
>
> Since the available corpus for this project is quite sparse I want to use
> statistical smoothing on the conditional probabilities. I have learned that
> using a simple maximum likelihood estimation for P(w|h) will yield zero
> probabilities for word sequences that are not in the corpus, even though
> many grammatically correct sequences are not in the corpus. Furthermore,
> the actual probabilities for P(w|h) will be overestimated by maximum
> likelihood.
>
> There are many smoothing techniques available, but empirically a modified
> form of Kneser-Ney smoothing has been proven very effective (I have
> attached a paper by Stanley Chen and Joshua Goodman explaining this). A
> quick intro on the topic is on: http://www.youtube.com/watch?v=ody1ysUTD7o
>
> The Kneser-Ney smoothing interpolates discounted probabilities for
> trigrams with lower order bigram probabilities. The equations on page 12
> (370 in the journal numbering) of the attached PDF are the ones I use. The
> problem I run into is that the denominator of the fraction, which is the
> count of the history h in the corpus may be zero, yielding errors, but also
> making the gamma-term zero, yielding zero-probabilities. Avoiding zero
> probabilities was one of the reasons to implement smoothing in the first
> place.
>
> This problem has frustrated me for a few weeks now, after reading most of
> the available literature on the topic I am afraid that my knowledge of
> language modeling or statistics may be insufficient or that I misunderstood
> a fundamental part of the technique.
>
> Did I misunderstand anything? I sincerely hope you are able to point me in
> the direction of a solution.
>
> Sincerely,
>
> Coen Jonker
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-- Michele Filannino
CDT PhD student in Computer Science Room IT301 - IT Building The University of Manchester filannim at cs.manchester.ac.uk -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4330 bytes Desc: not available URL: <http://www.uib.no/mailman/public/corpora/attachments/20120626/44b09b97/attachment.txt>