[Corpora-List] A New Turkish Corpus From TS Corpus: TS Corpus Wikipedia -Beta-

Taner Sezer tanersezerr at gmail.com
Thu Aug 29 23:34:33 CEST 2013


Dear Members, TS Wikipedia Corpus -Beta- is now available. It is freely online available. TS Wikipedia Corpus -Beta- is a PosTagged, morphological tagged Turkish corpus. The corpus consists of 45,245,304 PosTagged tokens. TS Wikipedia Corpus -Beta- is the first Turkish corpus based on Turkish Wikipedia pages.

TS Wikipedia Corpus -Beta- features:

TS Wikipedia Corpus -Beta- is POStagged

TS Wikipedia Corpus -Beta- has Morphologically tagged

TS Wikipedia Corpus -Beta- involves lemma form of the tokens

Key word in context view (KWIC)

Word & Lemma search

Frequency search

Regular expression search

Search with CQP Query

Case sensitive search

Building frequency list

Saving the results in different formats

This version is called beta as the corpus is still under development. The main version is planned to have the capability of making restricted queries that are based on WikiPedia categories.

Further information can be found on corpus web page at http://tscorpus.com and documentation on http://tscorpus.com/

Best Regards -- TanerSezer http://tscorpus.com http://tanersezer.com



More information about the Corpora mailing list