[Corpora-List] Corpora Digest, Vol 66, Issue 20

Xiaodong HAN (6511974) zx11974 at nottingham.edu.cn
Wed Dec 19 13:16:19 CET 2012


hi, how to tell the distinction between the native and non-native speaker with the perspective of corpus-based studies

best

Mack

________________________________________ From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of corpora-request at uib.no [corpora-request at uib.no] Sent: Wednesday, December 19, 2012 7:00 PM To: corpora at uib.no Subject: Corpora Digest, Vol 66, Issue 20

Today's Topics:

1. Re: machine translation (Mahdi Mohseni)

2. job: PhD position, Joint Doctorate Degree Radboud University

Nijmegen and University of Leuven (Antal van den Bosch)

3. job: PhD position, Joint Doctorate Degree Radboud University

Nijmegen and University of Leuven (Antal van den Bosch)

4. Video - Social interaction: overview, sciences, benefits,

example (Alexander Osherenko)

5. Re: Video - Social interaction: overview, sciences,

benefits, example (M.E.Sciubba)

----------------------------------------------------------------------

Message: 1 Date: Wed, 19 Dec 2012 10:29:00 +0330 From: Mahdi Mohseni <mohseni48 at gmail.com> Subject: Re: [Corpora-List] machine translation To: amin farajian <ma.farajian at gmail.com>, Mohammad Sadegh Rasooli

<rasooli.ms at gmail.com>, corpora at uib.no

?Dear Sadegh and Amin,

The project is now finished and it'll be published soon. But I don't know if it's published completely or partially and when exactly it is published.

Regards, Mahdi Mohseni

On Tue, Dec 18, 2012 at 8:44 PM, amin farajian <ma.farajian at gmail.com>wrote:


> Dear Mohammad Sadegh,
>
> Good news about SCICT corpus. It took along time, but I hope the resulting
> corpus was fine.
> now I am doing my PhD in FBK-IRST, Italy, so I am not in Iran and I don't
> have access to the people in SCICT. Is there any other way for obtaining
> this corpus? As I know Ms Khamesi is in Bojnourd, Iran. So, if possible,
> please provide her the information that she needs for contacting with SCICT
> people and getting this corpus.
>
> Best regards,
> Amin
>
>
>
> On Tue, Dec 18, 2012 at 5:32 PM, Mohammad Sadegh Rasooli <
> rasooli.ms at gmail.com> wrote:
>
>> Thanks Amin,
>> As I know about SCICT corpus, it is a big corpus of collections of
>> classic novels that the project has been finished in summer. I don't think
>> the corpus is completely available but if you live in Iran I think it's
>> easy to obtain that dataset. I think you should contact with people in
>> charge in SCICT.
>> Best
>>
>> On Tue, Dec 18, 2012 at 11:12 AM, amin farajian <ma.farajian at gmail.com>wrote:
>>
>>> Dear Karine,
>>> the corpus that you talked about (in Payame Noor University of Yazd) is
>>> actually the one which is available in ELRA. There is also another
>>> parallel corpus entitled PEN, developed by myself. It is not still publicly
>>> available, but I'm going to publish it. In the following paper you can find
>>> some information about it:
>>> Mohammad Amin Farajian (2011). PEN: Parallel English-Persian News Corpus<http://world-comp.org/p2011/ICA4953.pdf>.
>>> Proceedings of 2011 International Conference on Artificial Intelligence
>>> (ICAI'11), Nevada, USA.
>>>
>>> There are some other researchers (Dr. khadivi in Amirkabir University,
>>> Dr. Faili in University of Tehran, Dr. Analoui in Iran University of
>>> Science and Technology) and research centers (ITRC and SCICT) in Iran
>>> which are working on SMT and are building some parallel corpora, but as I
>>> know their corpora are not available yet.
>>>
>>> Best regards,
>>> Amin
>>>
>>> On 12/18/2012 03:33 PM, Megerdoomian, Karine wrote:
>>>
>>> I haven?t seen any other parallel English-Persian corpora besides the
>>> ones already mentioned below. However, I have heard about a corpus being
>>> developed by the English department at Payame Noor University in Yazd,
>>> Iran. You may want to contact them. Here?s the info online:
>>> http://www.eurac.edu/it/newsevents/focus/Newsdetails.html?entryid=22181*
>>> ***
>>>
>>> ** **
>>>
>>> ?Our developmental English-Persian parallel corpus consists of about *three
>>> million words* (more than 50,000 corresponding sentences in two
>>> languages). This is a kind of ongoing corpus, that is, an open corpus in
>>> which more material can be added as the need arises.?****
>>>
>>> ** **
>>>
>>> Karine****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> *From:* corpora-bounces at uib.no [mailto:corpora-bounces at uib.no<corpora-bounces at uib.no>]
>>> *On Behalf Of *Hieu Hoang
>>> *Sent:* Tuesday, December 18, 2012 7:31 AM
>>> *To:* Khamesi Fahime
>>> *Cc:* corpora at uib.no
>>> *Subject:* Re: [Corpora-List] machine translation****
>>>
>>> ** **
>>>
>>> Hi Khamesi
>>>
>>> According to this website
>>> http://opus.lingfil.uu.se/
>>> There are 3 freely available parallel corpora for persian-english:
>>> TEP
>>> KDE
>>> OpenSubtitles
>>>
>>> I've noticed other people, especially in Tehran, are also working on MT
>>> and collect data, eg.
>>> http://ece.ut.ac.ir/iis/resources.html
>>>
>>> Kind Regards
>>> Hieu
>>>
>>> ****
>>>
>>> On 12 December 2012 21:15, Khamesi Fahime <khamesi_fahime at yahoo.com>
>>> wrote:****
>>>
>>> Hi,
>>> I am student of Linguistics in Iran and i am working on English to
>>> Persian statistical machine translation .****
>>>
>>> unfortunately I haven't found any EN-PER corpus except TEP and ELRA .**
>>> **
>>>
>>> There are many restrictions in Iran(boycott) for ordering ELRA .
>>> I appreciate if u can help me in this respect.****
>>>
>>> I am looking forward to your reply.****
>>>
>>> Best regards,****
>>>
>>> Khamesi****
>>>
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/listinfo/corpora****
>>>
>>> ** **
>>>
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing listCorpora at uib.nohttp://mailman.uib.no/listinfo/corpora
>>>
>>>
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/listinfo/corpora
>>>
>>>
>>
>>
>> --
>> Mohammad Sadegh Rasooli
>> PhD Student, Computer Science Department, Columbia University
>> Research Assistant, Center for Computational Learning Systems, Columbia
>> University
>>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 11878 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121219/58b0c8a2/attachment.txt>

------------------------------

Message: 2 Date: Wed, 19 Dec 2012 09:46:47 +0100 From: Antal van den Bosch <a.vandenbosch at let.ru.nl> Subject: [Corpora-List] job: PhD position, Joint Doctorate Degree

Radboud University Nijmegen and University of Leuven To: corpORA at UIB.NO

job: PhD position, Joint Doctorate Degree Radboud University Nijmegen and University of Leuven

http://www.ru.nl/vacatures/details/details_vacature_0?recid=525672

Closing date: 13 January 2013

The Centre for Language Studies at Radboud University Nijmegen, the Netherlands, in cooperation with ESAT-PSI Centre for Speech and Image Processing, Leuven University, Belgium, is seeking candidates for a PhD project that upon successful completion will result in a joint doctorate degree awarded by the two universities.

You will work on the intersecting fields of latent variable models (a focus of the ESAT-PSI group in Leuven) and rich language models (investigated at CLS in Nijmegen). The first research subgoal is to investigate whether phrases and skipgrams could be better units in latent variable models than unigrams, the current standard. The second subgoal is to investigate whether richer latent variable models will be better adaptable to domains, in which phrases (multi-word units) and skipgrams may play an important role.

We offer a rich environment of two vibrant well-connected research labs in neighboring countries. Your main affiliation will be with Radboud University Nijmegen, but you will be a member of both labs and will work at Leuven University for at least 6 months during your project. We offer an excellent computer infrastructure, data, expertise, and colleagues working in overlapping areas in both labs.

More details: http://www.ru.nl/vacatures/details/details_vacature_0?recid=525672

prof. Antal van den Bosch Telephone: +31 24 3611647 E-mail: a.vandenbosch at let.ru.nl

prof. Hugo Van hamme Telephone: +32 16 321842, +32 16 321713 E-mail: hugo.vanhamme at esat.kuleuven.be

------------------------------

Message: 3 Date: Wed, 19 Dec 2012 09:47:21 +0100 From: Antal van den Bosch <a.vandenbosch at let.ru.nl> Subject: [Corpora-List] job: PhD position, Joint Doctorate Degree

Radboud University Nijmegen and University of Leuven To: corpORA at UIB.NO

job: PhD position, Joint Doctorate Degree Radboud University Nijmegen and University of Leuven

http://www.ru.nl/vacatures/details/details_vacature_0?recid=525672

Closing date: 13 January 2013

The Centre for Language Studies at Radboud University Nijmegen, the Netherlands, in cooperation with ESAT-PSI Centre for Speech and Image Processing, Leuven University, Belgium, is seeking candidates for a PhD project that upon successful completion will result in a joint doctorate degree awarded by the two universities.

You will work on the intersecting fields of latent variable models (a focus of the ESAT-PSI group in Leuven) and rich language models (investigated at CLS in Nijmegen). The first research subgoal is to investigate whether phrases and skipgrams could be better units in latent variable models than unigrams, the current standard. The second subgoal is to investigate whether richer latent variable models will be better adaptable to domains, in which phrases (multi-word units) and skipgrams may play an important role.

We offer a rich environment of two vibrant well-connected research labs in neighboring countries. Your main affiliation will be with Radboud University Nijmegen, but you will be a member of both labs and will work at Leuven University for at least 6 months during your project. We offer an excellent computer infrastructure, data, expertise, and colleagues working in overlapping areas in both labs.

More details: http://www.ru.nl/vacatures/details/details_vacature_0?recid=525672

prof. Antal van den Bosch Telephone: +31 24 3611647 E-mail: a.vandenbosch at let.ru.nl

prof. Hugo Van hamme Telephone: +32 16 321842, +32 16 321713 E-mail: hugo.vanhamme at esat.kuleuven.be

------------------------------

Message: 4 Date: Wed, 19 Dec 2012 10:40:10 +0100 From: Alexander Osherenko <osherenko at gmx.de> Subject: [Corpora-List] Video - Social interaction: overview,

sciences, benefits, example To: "Corpora at uib.no" <corpora at uib.no>

Hi all,

I've just uploaded a new video with an overview of social interaction, sciences studying it, its benefits, and an example of how InfoFramework composes a population of ECAs simulating social interaction ( http://www.youtube.com/watch?v=Pe4u94ar89I).

Best Alexander

-- Alexander Osherenko Dr. rer. nat, CEO and R&D <http://www.socioware.de/> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2108 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121219/0e2c9dcf/attachment.txt>

------------------------------

Message: 5 Date: Wed, 19 Dec 2012 11:37:17 +0100 From: "M.E.Sciubba" <mesciubba at gmail.com> Subject: Re: [Corpora-List] Video - Social interaction: overview,

sciences, benefits, example To: Alexander Osherenko <osherenko at gmx.de> Cc: "Corpora at uib.no" <corpora at uib.no>

Dear Alexander,

are you aware of the works of interactional linguists and of conversation analists? Interaction is rarely dyadic, as it appears from your video. Even contexts which are supposed to be strictly dyadic. like courts of justice, where the interaction is dictated by procedural law (at least in Italy), many overlapping and 'other stuff' typical of ordinary interactions occur. I understand that it is very simple to create a model from dyadic interactions, but if the aim is the creation of a multi-agent software, you might want to take into account the works of interactional linguists and conversation analysts.

Kind regards, Eleonora Sciubba

2012/12/19 Alexander Osherenko <osherenko at gmx.de>


> Alexander Osherenko

-- *Be green. Keep it on the screen* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1205 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121219/7902e149/attachment.txt>

---------------------------------------------------------------------- Send Corpora mailing list submissions to

corpora at uib.no

To subscribe or unsubscribe via the World Wide Web, visit

http://mailman.uib.no/listinfo/corpora or, via email, send a message with subject or body 'help' to

corpora-request at uib.no

You can reach the person managing the list at

corpora-owner at uib.no

When replying, please edit your Subject line so it is more specific than "Re: Contents of Corpora digest..."

_______________________________________________ Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora

End of Corpora Digest, Vol 66, Issue 20 *************************************** This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.

Please do not use, copy or disclose the information contained in this message or in any attachment.

Any views or opinions expressed by the author of this email do not necessarily reflect the views of The University of Nottingham Ningbo, China.

This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system: you are advised to perform your own checks.

Email communications with The University of Nottingham Ningbo, China may be monitored as permitted by UK and Chinese legislation.



More information about the Corpora mailing list