What I suspect you will find is that your partial reimplementation of perl's :punct: class is causing problems. I would either do a complete reimplementation of that (see: http://en.wikipedia.org/wiki/Regular_expression ) or look into C#'s regular expressions, which I am sure will contain the same definition of the :punct: class.
Finally, if you are working with languages other than English, you most certainly should look into regular expression libraries. They take into account Unicode's rules as well, something you really don't want to have to duplicate in your own code.
-- Personal blog: http://blog.outerthoughts.com/ Research group: http://www.clt.mq.edu.au/Research/
On Sun, Jun 1, 2008 at 7:07 AM, True Friend <true.friend2004 at gmail.com> wrote:
> I am a corpus linguistics student and learning C# for this purpose as well.
> I've created a simple application to find the frequency of a given word in
> two files. Actually this simple application is a practice version in C# of a
> Perl script a respected subscriber of this list (Alexander Schutz) written
> for me on my request on this list. I needed it then, now I am trying to
> programm myself so I tried to implement that idea in C#.