[Corpora-List] Tool/program to estimate percent of English in a given text file?

Muhammad Shakir Aziz true.friend2004 at gmail.com
Fri Nov 24 23:08:03 CET 2017

Hi I am dealing with computer mediated discourse and it has code switching as well. After my current project I plan to study it. To mark sentences with code switched texts, I thought Google Translate toolkit might be useful as it can take input and provide detected language name and a number (0-1) telling the confidence of detection result. But Google does not provide this service free as far as I have explored. Probably it isn't what you are looking for, just in case sharing maybe someone could provide a better idea to detect language or percentage of a certain language used in a given string. Regards

On Nov 24, 2017 10:56 PM, "Tristan Purvis" <tristan.purvis at aun.edu.ng> wrote:

> ​​
> Hello,
> Quick version: Are there any publicly available tools or program modules I
> could use to estimate the percent of English that is found in a given
> sample of bilingual/multilingual text?
> In a study that includes looking at instances of code-switching (to
> English words) for certain lexical items whose distribution and usage I'll
> be tracking, I want to keep track of a given speaker's overall tendency for
> mixing in English. It's not a high priority as a formal variable, so if
> it's too time consuming to pursue, I'll be inclined to drop it, but it
> seems like there might be some ready-made tool in the language detection
> field that might incidentally serve my purposes ... Can anyone point me to
> a tool or quick solution that can calculate an estimate of the percent of
> English found in a given text sample?
> (Note: I only have 50-60 speakers to apply this too, so I can feasibly run
> each one by one into a tool that can measure this. That is, I don't
> necessarily need a tool that can run this in batches, though obviously that
> would be an nice added convenience.)
> Thanks in advance,
> Tristan
> ==========================
> Mohamed Tristan Purvis, PhD
> Assistant Professor, School of Arts & Sciences
> American University of Nigeria
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4328 bytes Desc: not available URL: <https://www.uib.no/mailman/public/corpora/attachments/20171124/02012e47/attachment.txt>

More information about the Corpora mailing list