[Corpora-List] English idiom dataset

Alexander Osherenko osherenko at gmx.de
Sat Nov 18 14:42:18 CET 2017


Maria,

before you collect comprehensive data you might choose the languages you are working on and some field of interest -- you can't study all idioms. Be aware: you can't trust yourself since you will find many astonishing things concerning correct understanding of idioms you thought you know. ;-)

I my research, I was working on emotional verbs and phrases. As references, I've chosen Cambridge idioms (ISBN-13: 978-0521677691) or Cambridge Phrasal Verbs (ISBN-13: 978-0521677707) or The free dictionary https://idioms.thefreedictionary.com/ or Thesaurus http://www.thesaurus.com/browse. There is a comprehensive annotation in these dictionaries and you can work with them. These dictionaries have also examples of use that can be used in your corpus composition.

Translator software can help. It is not important that some translations are not correct. You can get an idea where to start. You can use Wikipedia, find an idiom in one language and then switch to the homepage in another language. You can also use Google Translator -- it can also assist you to find a translation. Otherwise use dictionaries in your target languages (they might have proper translations of your idioms).

In any case it is quite a hard research you are doing.

HIH, Alexander

-- Alexander Osherenko, Dr. rer. nat. Senior HCI architect

Founder and R&D Socioware Development <http://www.socioware.de/osherenko_page.html>

Humboldt Innovation Humboldt-Universität zu Berlin

Profile: ResearchGate <https://www.researchgate.net/profile/Alexander_Osherenko> Channel: LinkedIn <https://www.linkedin.com/pub/alexander-osherenko/1/30a/a74> Channel: Google+ <https://plus.google.com/105305790720313348886>, Google Scholar <https://scholar.google.com/citations?user=q_0QJBoAAAAJ&hl=en> Channel: Youtube <https://www.youtube.com/user/MrOsherenko> Channel: Twitter <https://twitter.com/mrosherenko>

Social interaction, globalization and computer-aided analysis <https://www.researchgate.net/publication/281644865_Social_Interaction_Globalization_and_Computer-Aided_Analysis_A_Practical_Guide_to_Developing_Social_Simulation> at Springer

2017-11-17 23:31 GMT+01:00 Jelena Mitrovic <jecovit at gmail.com>:


> Hello, Maria,
>
> You might find outcomes of the PARSEME COST Action useful
>
> https://typo.uni-konstanz.de/parseme/
>
> Kind regards
> Jelena
>
> On 17 November 2017 at 10:07, Maria Pia di Buono <mariapia.dibuono at fer.hr>
> wrote:
>
>> Hi all,
>>
>> I'm working on a cross-lingual classification system for idioms and I was
>> wondering if there are some available resources for English (I'm sure there
>> are but I wan not able to find them).
>> At first glance, I was looking for VNC-Tokens Dataset by Fazly and
>> Stevenson (2006). I know that this dataset includes just constructions with
>> a verb and a noun in its direct object position, so, probably I'd need
>> other comprehensive resources.
>>
>> Do you have any suggestions?
>>
>> Thank you.
>>
>> Best,
>> Maria Pia
>>
>>
>>
>> Maria Pia di Buono
>> --
>> Text Analysis and Knowledge Engineering Lab <http://takelab.fer.hr>
>> Faculty of Electrical Engineering and Computing
>> University of Zagreb, Croatia
>> mail: mariapia.dibuono at fer.hr
>> web: http://takelab.fer.hr/maria-pia-di-buono/
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> https://mailman.uib.no/listinfo/corpora
>>
>>
>
>
> --
> Jelena Mitrović, PhDc
> Wissenschaftliche Mitarbeiterin
>
> Lehrstuhl für Informatik
> Digital Libraries and Web Information Systems
> Universität Passau / ITZ / Raum 108
> Innstr. 43
> 94032 Passau
> +49 851 509 3395 <+49%20851%205093395>
>
> jelena.mitrovic at uni-passau.de
> www.uni-passau.de
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 10007 bytes Desc: not available URL: <https://www.uib.no/mailman/public/corpora/attachments/20171118/b1e2a2b6/attachment.txt>



More information about the Corpora mailing list