[Corpora-List] PhD position in Multi-factor Data Augmentation and Transfer Learning for Embedded Automatic Speech Recognition

Firas Hmida firas.hmida at gmail.com
Thu Jul 8 10:23:23 CEST 2021


Dear Colleagues,

Please find below the description of a PhD position in “Multi-factor Data Augmentation and Transfer Learning for Embedded Automatic Speech Recognition”.

Starting date: October 01, 2021

Deadline for Applications: July 16, 2021

All details are available at: https://recrutement.inria.fr/public/classic/en/offres/2021-03756

Keywords: automatic speech recognition, speech synthesis, voice conversion, transfer learning, deep learning

Context

Founded in 2015 and awarded two CES Innovation Awards, Vivoka <https://vivoka.com/en/> has created and sells the Voice Development Kit (VDK), the very first solution allowing a company to design a voice interface in a simple, autonomous and quick way. Moreover, this interface is embedded: it can be deployed on devices without an Internet connection and fully preserves privacy. Accelerated by the COVID-19 health crisis and the need for "no-touch" interfaces, Vivoka is now optimizing this technology by developing its own speech and language processing solutions able to compete with the most efficient current technologies. This research project, which involves the entire Vivoka R&D team, is carried out within the framework of a long lasting partnership with Inria's Multispeech <https://team.inria.fr/multispeech/> team.

The hired PhD student will share his/her time between Vivoka's R&D team and Inria's Multispeech team. He/she will benefit from the startup spirit of Vivoka, where he/she will interact with other PhD students, interns and researchers hired as part of the partnership and the engineers responsible for integrating their results into the VDK. He/she will also benefit from the skills of the Multispeech team, the largest research team in the field of speech processing in France, and the overall Inria environment.

Assignment

Conversational automatic speech recognition (ASR) has seen tremendous progress over the past decade, with a word error rate now similar to that of humans [1]. This is explained by the maturity of deep neural networks but above all by the increase in the size of the training corpora available as open or proprietary data. These corpora must be annotated, that is to say transcribed manually in textual form. The cost of this operation means that the amount of proprietary data collected and annotated by large industry players, in the order of 10,000 hours or more for languages such as French, is inaccessible to SMEs. It is also reflected in the fact that current business solutions are only available for about 100 languages out of the 7,000 languages spoken in the world.

The objective of this PhD is to design an embedded ASR system capable of competing with current solutions while being trained on non-proprietary data only, for example open data from the Mozilla Common Voice initiative [2], i.e., less than 1,000 h annotated data in French or less than 100 h for less-resourced languages.

[1] W. Xiong, J. Droppo, X. Huang, F. Seide, M. L. Seltzer, A. Stolcke, D. Yu and G. Zweig, "Toward human parity in conversational speech recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(12): 2410-2423, 2017.

[2] https://commonvoice.mozilla.org/ <https://commonvoice.mozilla.org/fr>

Skills

Master 2 in computer science or data science.

Programming experience in Python and in a deep learning framework.

Previous experience in the field of speech processing or computational footprint reduction is a plus.

Instructions for applying

Application deadline: July 16, 2021

Submit your complete application data online at https://recrutement.inria.fr/public/classic/en/offres/2021-03756 and send a copy to recrutement at vivoka.com

Applications will be considered on the fly. It is therefore advisable to apply as soon as possible. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 14671 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20210708/0e5eb20d/attachment.txt>



More information about the Corpora mailing list