Thank you for providing the pretrained word vectors. I am specifically interested in the Arabic version. I have a question in regards to Hamza manipulation, I noticed when searching for أحمد [Ahmad or >Hmd in Buckwalter] the results were empty as opposed to using احمد without hamza. Did you normalize all the hamza to regular alef?
On Fri, Feb 2, 2018 at 9:07 AM, Miloš Jakubíček < milos.jakubicek at sketchengine.co.uk> wrote:
> Dear all,
> this is to announce public availability of word embedding model calculated
> for large corpora that we have in Sketch Engine. At this moment, we have
> processed corpora for following languages:
> English, Arabic, Chinese, Czech, Danish, French, German, Italian, Korean,
> Portuguese, Russian, Spanish
> See https://embeddings.sketchengine.co.uk/ where you can find an online
> interface for executing word similarity queries (such as the infamous
> king-man+woman) and download the datasets. They can be used freely for
> non-commercial purposes, for the commercial ones do not hesitate to get
> back to me to work out a mutually suitable model of collaboration.
> We continue building further models as our spare computing capacity
> allows, and will continue publishing them. If you are interested in a
> particular language that is missing at this moment, let me know and I can
> try to prioritise (no guarantees though).
> The embeddings were calculated using FastText with various parameters and
> on various corpus attributes (word, lemma, lemma+PoS combination, lowercase
> We have had increasing amount of requests to obtain corpora from Sketch
> Engine for these purposes, so this is our response to that to support
> research in this area.
> Milos Jakubicek
> CEO, Lexical Computing
> Brno, CZ | Brighton, UK
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3260 bytes Desc: not available URL: <https://www.uib.no/mailman/public/corpora/attachments/20180202/41427844/attachment.txt>