[Corpora-List] Re : Longest common subsequences algorithms in corpora

Gael Lejeune gael.lejeune at unicaen.fr
Thu Feb 28 14:42:29 CET 2013


Another example of application of LCS algorithms in corpora:

We used detection of repeated strings in press articles to identify texts relevant for epidemic surveillance and detect what disease spread where.

It proved particularly useful for articles written in morphologically rich languages (Greek, Polish, Russian...) or languages with different writing systems (arabic, chinese).

Some examples are shown here: https://daniel.greyc.fr/

More details can be found in this paper: http://www.cs.helsinki.fi/u/doucet/papers/JapTAL2012.pdf

Gaël

<javascript:void(0)>

-- ---------------------------------------- PhD Student, HUman Language TECHnologies (HULTECH) Caen Campus 2, Bureau S3-365, Boulevard du Maréchal Juin 14000 Caen Tél: 02 31 56 73 98 http://lejeuneg.users.greyc.fr/ ----------------------------------------

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1999 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130228/ca674adf/attachment.txt>



More information about the Corpora mailing list