[Corpora-List] Turkish Corpus - TS Corpus -

Taner Sezer tanersezerr at gmail.com
Thu Aug 30 16:53:59 CEST 2012


Dear Members, TS Corpus is a Turkish Corpus project that is freely online available. TS Corpus is a general-purpose Turkish Corpus containing 491 million POSTagged tokens. TS Corpus is build and is being kept running by Taner Sezer. The corpus is based on CWB. Today the second version of TS Corpus has released. Corpus can be reached at: http://tscorpus.com

NTS Corpus serves the following features:

* TS Corpus is POStagged

* TS Corpus has Morphologically annotation

* TS Corpus involves the lemma form of the tokens

* Key word in context view (KWIC)

* Word & Lemma search

* Frequency search

* Regular expression search

* Search with CQP Query

* Case sensitive search

* Building frequency list

* Saving the results in different formats

New Features of the Second Version

* Queries based on Morphological Annotation

* Restricted query

* Simplified POSTag set and disambiguation

* Displaying POSTags on KWIC screen and morphological annotation on

context view

* Distribution of hit sets based on metadata restrictions

* Hits sets are now can be categorised

* Users can create subcorpora

Further information can be found on corpus web page at http://tscorpus.com and documentation on http://tscorpus.com/wiki

Best Regards

-- TanerSezer http://tscorpus.com http://tanersezer.com

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2240 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120830/59011d04/attachment.txt>



More information about the Corpora mailing list