this looks like an interesting resource, and it augments similar data sets obtained from other corpora. I would only ask you to define a license for using it (Creative-Commons Attribution ?), and a way to refer to it (a proper url ? some paper ? your thesis ?). Otherwise, people would not be sure whether they can use your data (in a legally safe way), and how to refer to it in their publications (your data certainly contains some noise, and I wouldn't take responsibility for it).
At least for me, these points (especially the first) represent severe obstacles to work with your data.
On Mon, 07 Nov 2011 09:56:16 +0100, Tural Gurbanov <madcat1991 at gmail.com> wrote:
> Hello to everyone!
> During my master degree work i had extracted combinations like
> verb+preposition+noun and adjective+noun from reuter news dump.
> Like result I get nearly 1.2M unique combinations and the number of times
> that each of combinations occurs.
> The result has pushed here:
> If you looking for something like this you can take it(in every
> i can guaranty syntactic coherence of words).
> In return I would like you to look a small fragment of the resulting
> combinations (500-1000 combination) for correctness, because I do not
> enough knowledge of English to a good estimate.
> And, if not a secret, tell us what problems you are going to deal with
> base. Not necessarily tell the solution - just why you need it. This is
> needed to my thesis review.