[Corpora-List] Re: Chinese-English-Russian parallel corpora:

Jiangping Chen jpchen at unt.edu
Mon Jan 30 18:25:00 CET 2006

Thanks a lot for sharing these resources. Jiangping

Jiangping Chen, Ph.D.
Assistant Professor
School of Library and Information Sciences
University of North Texas
P.O. Box 311068
Denton, TX 76203
Phone: (940) 369-8393
Fax: (940) 565-3101

>>> Philip Resnik <resnik at umiacs.umd.edu> 01/30/06 9:44 AM >>>

"Olga Mitrofanova" <alkonost at OM12520.spb.edu> wrote:

> Here is a summary of useful links concerning Chinese-English-Russian


> parallel corpora prepared by Inna Lazareva (St-Petersburg


Here are three more resources that might be of interest for those
interested in Chinese-English parallel text:

- The Linguist's Search Engine (http://lse.umiacs.umd.edu) provides
access to a collection of over 118,000 Chinese pages. These were
mined automatically from the Web using a technique that
automatically finds Chinese-English page pairs, which means that the
English translation is also available when you look at a Chinese
result. To search Chinese collection, go to "Query Options", and
under "Collection to Search", select "Public Collection:
chinese_web"; then, under "Example Sentence", change "Language" from
English to Chinese. To see the corresponding English for a hit,
click "Annotation".

The LSE Web page has links to detailed documentation. Note that the
Chinese pages have also been automatically classified as to level of
document difficulty, and this "Level" can be used to narrow the

- The Linguist's Search Engine also provides English search of the
Bible (in modern English translation). When you click "Annotation"
for a result, it shows the corresponding verse in dozens of other
languages, including Chinese.

- For a collection of over 500,000 Chinese-English Web page pairs,
mined automatically, see http://umiacs.umd.edu/~resnik/strand/ under
the "English-Chinese (July 2003)" link. A heavily filtered version
of this collection was used to create the LSE's chinese_web
collection, above.

Hope this is helpful!


Philip Resnik, Associate Professor
Department of Linguistics and Institute for Advanced Computer

1401 Marie Mount Hall UMIACS phone: (301) 405-6760
University of Maryland Linguistics phone: (301) 405-8903
College Park, MD 20742 USA Fax: (301) 314-2644 / (301)
http://umiacs.umd.edu/~resnik E-mail: resnik at umiacs.umd.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.uib.no/public/corpora-archive/attachments/20060130/8fe6bd01/attachment.html

More information about the Corpora-archive mailing list