[Corpora-List] Featuring SentiLecto v2.4 - NLU engine for Spanish

Fernando Balbachan fernando_balbachan at yahoo.com.ar
Sat Aug 15 03:07:51 CEST 2015


SentiLecto demo http://dev.natural.do/sentilecto

Sentilecto is a NLU engine that yields a highly fine-grained representation of complex texts. The pipeline starts by splitting text into sentences and clauses, then maps clauses into SVO slots just the way native spearker would understand natural language. SentiLecto leans on outstanding linguistic features such as: passive/active voice transformation, negation scope, anaphora resolution and co-reference chains, modality treatment, semantic features (animity and others) and accurate verbal frames for all Spanish verbs, even with 'se-impersonal' usages ('se mostraron retratos' = 'alguien mostró retratos' = 'somebody showed portraits'), 'se-clitic' usages (for example, plain action 'mostrar' 'to show something' vs. 'mostrarSE' 'to show yourself, namely to feel some way before a situation').

Also, SentiLecto can flawlessly identify whether or not an utterance is a real fact (fact mining) over which an opinion could span, and it can recognize & classify named-entities (NERC) with identity matching.

Finally, SentiLecto better suits into entity-based Sentiment Analysis paradigm. Unlike other approaches, this solution can deal with polarity shifting in the same sentence ('I like chocolate but I hate strawberry ice-cream'), within embedded clauses ('Norwegians, who are an aggresive People, export the exquisite herring'), or even onto the very same word ('Somebody who wasted a chance to do something' means that person did something bad about something good). SentiLecto better represents the premise whereby the entities involved in the opinion are syntactically mapped onto SVO (subject-verb-object) slots for their sentiment assignments: 'Mary hates John' (2 entities but only the object has a negative presentation) vs. 'Mary defames John' (the same 2 entities but only the subject has negative presentation).

SentiLecto is being used to automatically generate this blog http://entretenimientobit.com

with more than 300 high-quality posts on a daily basis, rewriting and enriching content and, more interestingly, merging news covering the same facts. This is just a show case of SentiLecto's NLU capabilities.

SentiLecto currently works only for Spanish, but soon it will be available for Brazilian Portuguese (1 month) and English (3 months)

Looking forward to hearing about NLP specilists' feedback.

Dr. Fernando Balbachan, Ph.D. fernandobalbachan at gmail.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 13664 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150814/e886e673/attachment.txt>



More information about the Corpora mailing list