This new Kaggle contest is directly related to Corpus Linguistics: "Given pairs of sentences (a premise and a hypothesis), you will predict whether they are related. For an added challenge, the train and test set include text in fifteen different languages!"

Please take a look at this challenge and consider taking this on, as an interesting exercise. There will be a Kaggle community of researchers working on this, and a discussion forum to share ideas. At minimum, you may learn some new techniques which you can then apply to your own corpus linguistics research. I am also thinking about using this data-set and task as an exercise in online teaching next year.


How quickly can you distinguish whether two sentences are related? Our brains do this surprisingly well. It’s a more difficult problem for a computer, which is the focus of a new Getting Started Kaggle Competition. We’re even providing a starter notebook<https://www.google.com/appserve/mkt/p/AD-FnEz5-8cCuHG_Mud2ULyrrnnNLg9KV5zyj3AKloFGSRunpb9wP65t1mEkOanyN-lu-M65d1fIVOfSqQTnPqr_V92mVN8NCozosEdPN68UGdSYr-AmRfXfpcVp> to try your hand at this problem using the power of Tensor Processing Units (TPUs).

You will use Natural Language Inferencing (NLI), a popular Natural Language Processing (NLP) approach. Given pairs of sentences (a premise and a hypothesis), you will predict whether they are related. For an added challenge, the train and test set include text in fifteen different languages!

This Getting Started competition will be ongoing and live for anyone to participate at any time. But TPU Star Awards<https://www.google.com/appserve/mkt/p/AD-FnEyyMtsaKZ4QHrl0B3SDdT3gNTjU3sGO1kdzEEUkQWB-23GVKK_GzntXEk1byIoTgHuNLzraL4Mw0CGLFiKKriBAQ0QSdKKGYYVP93Y6RptHkLOPWOLSxx0CkBQZTyZzco8qqmPoVOzP> for extra TPU quota are only open for self-nomination until September 30, 2020! This is a great opportunity to flex your NLP muscles and solve an exciting problem!

