> Le Sunday, April 27, 2008 6:44 PM [GMT+1=CET],
> Mark Davies <Mark_Davies at byu.edu> a écrit :
>> Most really large corpora that I'm aware of do use a relational
>> database architecture, including systems like IMS Corpus Workbench.
> The IMS Corpus Workbench software's architecture is based on
> specific indexing technics related to textual data processing and querying.
> Those techniques where described in the book :
> "Managing GigabytesCompressing and Indexing Documents and Images"
> De Ian H. Witten, Alistair Moffat, Timothy C. Bell, 1999, Morgan Kaufmann.
> No RDBMS system or architecture the-like was used and this can
> be seen from the source : http://cwb.sourceforge.net/
This is also true of Xaira, of eXist, and many other XML-based systems. They used specialised indexing and storage techniques optimised for handling large quantities of text, rather than the specialized indexing and storage techniques used by relational systems which are optimised for handling large numbers of, er, relations. It's true that you can translate (with some loss of information) text into relations, but that doesn't mean you *have* to do so to get your text efficiently processed.