I would like to announce release v1.0 of Colibri Core, software for working with basis linguistic constructions such as n-grams and skipgrams, in a quick memory-efficient yet lossless way suitable for big data:

See https://proycon.github.io/colibri-core

Colibri Core enables you to:

* extract patterns and their frequency from corpora

* preserve the exact indices where patterns occur in the corpus, allowing reverse-lookup as well

* model various relationships between patterns (subsumption, succesion, abstraction, co-occurrence)

* compare patterns between different corpora (using coverage metrics and/or log-likelihood)

The software is open-source (GPL) and consists of command-line tools and a programming library for both C++ and bindings for Python. The software aims to lay a foundation for more specialised or end-user-oriented software to be built upon.



