[Corpora-List] automatic search for orthographic recurring patterns

Shlomo Argamon argamon at iit.edu
Wed Dec 8 18:00:00 CET 2004

See our paper in COLING-04:

Shlomo Argamon, Navot Akiva, Amihood Amir, and Oren Kapah.
Efficient Unsupervised Recursive Word Segmentation Using Minimum
Description Length.
Proceedings of The 20th International Conference on Computational
Linguistics (COLING), August 2004.

Available at http://lingcog.iit.edu/pub.xml


MARC FRYD wrote:

> Hi,

> Perhaps someone on the List will be able to help me with the following

> datamining problem:


> Given a corpus of isolated lexical units or collocations, I would like

> to determine recurring orthographic patterns whether initial, i.e.

> "CARPO" (carpogenic, carpogenous, carpolite), final i.e. "IONALISM"

> (sensationalism, functionalism, etc.) , or internal, i.e. "CHRON"

> (synchony, synchronize, etc.).

> The output should be arranged so as to show respective productivity for

> each pattern.

> Important constraint: the various patterns will *not* be fed in

> initially but should be extracted as a result of the algorithm.

> I'll post a summary if I get several replies.

> Regards to all list members.

> Marc Fryd


More information about the Corpora-archive mailing list