Yamamoto and Church, Using Suffix Arrays to Compute Term Frequency
and Document Frequency for All Substrings in a Corpus,
Comp. Linguistics, 2000.
Some years back I used this technique to help identify bilingual phrasal equivalents
McNamee and Mayfield, Translation of Multiword Expressions Using Parallel
Suffix Arrays, AMTA 2006.
An actual use of LC substring is found in proper name variant matching (i.e., is "Mikhail Sergeyevich Gorbachev" coreferent with "Michail Gorbatchev")
LCS is also widely used as a means to identify spans of text that are duplicates or near duplicates; similar methods can also be applied to the problems of plagarism detection and authorship attribution.
On Tue, 19 Feb 2013, Albretch Mueller wrote:
> LCS algorithms are heavily used in bioinformatics to analyze DNA sequences
> How are they used in corpora research?
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no