[Corpora-List] chaker jebari
adam at lexmasterclass.com
Mon Dec 5 11:16:02 CET 2005
In the general case, this is a very big question. Once you limit it to
particular types of documents, eg, scientific papers, or journalism, or CVs,
it becomes somewhat tractable, and this is what citeseer and DBLP are doing
on an industrial scale for academic papers.
As a general rule, you depend on the conventions that people use in
structuring each particular document type - the stronger the conventions,
the more tractable it is, and the more different conventions (and markup
languages, etc) there are, the more work there is to cover them all.
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Chaker Jabbari
Sent: 04 December 2005 08:40
To: CORPORA at UIB.NO
Subject: [Corpora-List] chaker jebari
I need a tool to identify the logical structure of a textual document.
for example :
a logical structure of a scientific paper is : title, abstract, key words,
introduction, text, conclusion, references
a logical structure of a call for papers is : title, topics, important
dates, submission, ...
I ask you if any one have an idea about a tool or an algorithm to identify
the logical structure.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Corpora-archive