[Corpora-List] "Language Immersion for Chrome", and a Better Idea

Ziyuan Yao yaoziyuan at gmail.com
Sat May 26 14:09:32 CEST 2012

Hi Mike,

This service is awesome!

One problem with movie subtitles is they're not quite legal. Fan-contributed subtitles (fansub: http://en.wikipedia.org/wiki/Fansub) are derivative works from copyrighted works (the original movies).

But it seems redistributing small samples of subtitles and audio clips (as seen in your service) is "fair use" :-)

Regards, Ziyuan Yao

On Sat, May 26, 2012 at 4:37 PM, Mike Fabrikant <mikefabrikant at gmail.com> wrote:
> Dear Ziyuan,
> A computational linguist in Bangkok I worked for named named Doug
> Cooper forwarded me your exchange with Ramesh.
> About a year ago I started developing a subtitle with audio search
> engine for language acquisition. It's not really a translation thing,
> but you might still find it intriguing.
> You can read a about it here: http://english.audioverb.com/eng/about
> At the moment the English learning site at www.audioverb.com has audio
> and subtitles from several thousand videos from dotsub.com, mostly TED
> talks.
> The spanish site at http://spanish.audioverb.com/eng has content from
> around 16 main stream Spanish language movies.
> Please also see chinese.audioverb.com. This is the most presentable of
> the three sites in my opinion.
> Although not loaded at the moment, audioverb can also host content
> from tv shows and music videos from gleaned from lyricstraining.com
> and youtube. I'm doing testing at the moment and simply haven't added
> them yet, but the database is designed to scale many times its current
> size.
> Your impression of the site would be greatly appreciated.
> Thank you,
> Mike Fabrikant
>>> -------- Original Message --------
>>> Subject:        [Corpora-List] "Language Immersion for Chrome", and a Better
>>> Idea
>>> Date:   Tue, 15 May 2012 12:03:20 +0000
>>> From:   Krishnamurthy, Ramesh <r.krishnamurthy at aston.ac.uk>
>>> To:     yaoziyuan at gmail.com <yaoziyuan at gmail.com>
>>> CC:     corpora at uib.no <corpora at uib.no>
>>> Dear Ziyuan Yao
>>> I would like to applaud your idea! A Japanese teacher of English, Teruhiko
>>> Kadoyama, who was a distance-learning
>>> MA student of mine at Birmingham University, used English subtitles from
>>> Hollywood movies very effectively to
>>> help his students to learn a) onomatopoeic words (eg bark, quack, chirp,
>>> twitter, etc) b) a range of verbs of motion
>>> for similar actions (eg walk, stroll, amble, dash, rush, scramble, etc), and
>>> reported on his experiments for his
>>> MA Dissertation in 1999.
>>> However, I don’t know whether he has conducted any further experiments, or
>>> published his
>>> Dissertation or other papers on this topic, or developed any software for
>>> the
>>> purpose.
>>> But it may be worth you doing a Google Search on his name, to check? I had a
>>> quick look,
>>> and discovered he is now a Professor at Hiroshima International University,
>>> and is President
>>> of the Society for Teaching English Through Media…
>>> http://www.stemedia.co.kr/menu_1_1.htm?searchkey=&searchvalue=&page=1&board_seq=79&mode=read
>>> <http://www.stemedia.co.kr/menu_1_1.htm?searchkey=&searchvalue=&page=1&board_seq=79&mode=read>
>>> best wishes
>>> Ramesh
>>> Ramesh Krishnamurthy
>>> Visiting Academic Fellow, School of Languages and Social Sciences, Aston
>>> University, Birmingham B4 7ET
>>> ----------------------------------------------
>>> Date: Tue, 15 May 2012 04:41:24 +0800
>>> From: Ziyuan Yao <yaoziyuan at gmail.com <mailto:yaoziyuan at gmail.com>>
>>> Subject: [Corpora-List] "Language Immersion for Chrome", and a Better
>>> Idea
>>> To: corpora at uib.no <mailto:corpora at uib.no>
>>> Google's "Language Immersion for Chrome"
>>> Recently a Chrome browser extension called "Language Immersion for Chrome"
>>> has
>>> been much publicized. Developed by "Use All Five Inc." on behalf of Google,
>>> the extension translates certain words and phrases on the Web page you're
>>> browsing to a foreign language via Google Translate, for the purpose of
>>> helping you learn that foreign language while browsing the Web.
>>> I have been researching this kind of thing for years, and one of my main
>>> standpoints is machine translation shouldn't be used in serious language
>>> learning as it is error-prone: it takes a learner a great effort to memorize
>>> a
>>> piece of erroneous knowledge, another great effort to "unlearn" this wrong
>>> knowledge and yet another great effort to "relearn" the right knowledge.
>>> But I do understand online machine translation services like Google
>>> Translate
>>> and Bing Translator are so readily available that directly using them to do
>>> the translation can minimize development costs. Upon seeing the this news, I
>>> asked myself: "Can we use a kind of freely available, manually prepared
>>> data,
>>> instead of machine translation, to do this better?" And the answer is YES!
>>> A Bbetter Idea
>>> Imagine if we have a database of manually-translated bilingual sentence
>>> pairs
>>> (such as those multilingual movie subtitle files on those subtitle
>>> websites), e.g.
>>> (German) Er ist ein guter Schüler.
>>> (English) He is a good student.
>>> Now if a German wants to learn English, and he happens to be browsing a
>>> German
>>> Web page that contains the German word "Schüler" (student), and the computer
>>> finds out that this German word also occurs in a bilingual sentence pair
>>> like
>>> the above. Now, the computer can teach English for this German word, by
>>> inserting the above bilingual sentence pair into that Web page, like an
>>> embedded advertisement. This way, the German will learn the English word
>>> "student", and better yet, learn it in a bilingual sentence pair! This means
>>> he will not only learn the word "student" alone, but also its syntax,
>>> semantics and pragmatics, all implied by this example sentence. As to
>>> phonetics, the computer can use text-to-speech to read aloud the English
>>> sentence, or display some kind of pronunciation guide above or alongside the
>>> English sentence (see my recent project "Phonetically Intuitive English" for
>>> such a pronunciation aid:
>>> https://sites.google.com/site/phoneticallyintuitiveenglish/).
>>> That's the basic idea. But of course we can further refine this idea.
>>> For example, if there are multiple bilingual sentence pairs containing
>>> "Schüler", the computer can prefer a pair that contains words that appear
>>> near
>>> "Schüler" on the Web page (i.e. context words). This would be very useful if
>>> the word in question (Schüler) is ambiguous.
>>> Besides bilingual sentence pairs, we may also explore multilingual data from
>>> Wiktionary and Wikipedia, although their usage may not be as straightforward
>>> as the model discussed above. I leave this as homework for the reader.
>>> I also intend to develop a Chrome extension based on the idea discussed
>>> above :-)
>>> Best Regards,
>>> Ziyuan Yao
>>> https://sites.google.com/site/yaoziyuan/

More information about the Corpora mailing list