[Corpora-List] AntConc 3.2.2 released for Windows and Mac OS X

Laurence Anthony anthony0122 at gmail.com
Wed Apr 13 20:10:48 CEST 2011


Hi Mike,

On Thu, Apr 14, 2011 at 2:50 AM, maxwell <maxwell at umiacs.umd.edu> wrote:
> Laurence Anthony <anthony0122 at gmail.com> wrote:
>> Basically, all (pre Win 7?) windows systems had their
>> own legacy encodings, which varied from country to country.
>> So, even if you have a file saved as UTF8, the file *name*
>> is saved in the legacy encoding.
>
> Are you sure?  I thought NTFS filenames were Unicode:
> http://en.wikipedia.org/wiki/Ntfs (see "Allowed characters in filenames")
> http://msdn.microsoft.com/en-us/library/dd317748%28v=vs.85%29.aspx
> --and NTFS superseded the older FAT filesystem as of Windows NT.
>
>   Mike Maxwell
>

It's a good question. I think the underlying OS stores everything as Unicode but then each system has a locale setting that's set to things like the legacy ShiftJIS here in Japan. It's also related to the Windows code page problem. See below: http://en.wikipedia.org/wiki/Windows_code_page.

So, you never know what the encoding will be when you want to open files. If anybody has any advice on this, I would be very grateful! Laurence.



More information about the Corpora mailing list