> Hi Sir
> Tried your script but ........ it has some problems. Probably the large
> size of txt files was the reason. Corpus A was about 1.9 million and
> corpus B was almost as A.
I'll leave Alex to comment on the use of his script but I wonder what you are reporting here with these numbers. Do you 1.9 million documents, words, characters.
The texts I used for my pipe-line script are all about 1.9Mb (1.9 million characters) in size. The individual filters I used do not have a problem processing that amount of data; I've processed larger stuff with the same piple-line.
It might be that Alex's quick script can't cope with the volumes of information you are throwing at it. And either you'll have to use something else or to improve the script to cope with large volumes.
Regards, Trevor
<>< Re: deemed!