[Corpora-List] [NTCIR-14 FinNum] New financial dataset is released

CHUNG-CHI CHEN chunji0112 at gmail.com
Sat Dec 15 12:09:27 CET 2018


Call for Participation: http://nlpfin.com

Register: http://research.nii.ac.jp/ntcir/ntcir-14/howto.html

Dataset: https://sites.google.com/nlg.csie.ntu.edu.tw/finnum/data

NTCIR-14 Pilot Shared Task – FinNum releases a new dataset for understanding the numerals in financial tweets. There are over 7,600 annotated instances in training set and development set, and total 8,868 annotated instances will be released.

------ Motivation ------

When analyzing a financial instrument, investors always focus on two sides, fundamental and technical. Investors using fundamental analysis attempt to evaluate the intrinsic value of the financial instrument. For the security of company, they may focus on the numerals in financial statements. For the treasury bond, they may evaluate the price depending on US Fed Funds Target Rate. Those who use technical analysis may employ the technical indicator like moving average (MA), relative strength index (RSI), and so on. No matter which analysis method investors use, numeral plays an important role, and provides much pivotal information in financial data.

Numeral contains much important information in financial domain. For example, investors may use price-earnings ratio (P/E ratio) to evaluate the value of security of certain company, where both P/E ratio and the value of security are numeral information. For the purpose of understanding the fine-grained numeral information in social media data, we provide the taxonomy for numerals, and classify numerals into 7 categories and further extend several categories into some subcategories. Especially, the most important category, Monetary, is extended into 8 subcategories. (T1) is an instance that contains several numerals in a tweet, and the category of each numeral are dissimilar. 8 is the numeral about quantity, 17.99 is about stop loss price, 200 is the indicator of technical indicator, and 1 is the price of stock. In such a short sentence, there are 4 kinds of numerals. That shows the importance of numerals in financial narrative.

*(T1)* *8** breakouts: $CHMT (stop: $**17.99**), $FLO (**200**-day MA), $OMX (gap), $SIRO (gap). One sub-$**1** stock. Modest selection on attempted swing low.*

------ Task Design ------

*Subtask 1*: Classify a numeral into 7 categories, i.e., Monetary, Percentage, Option, Indicator, Temporal, Quantity and Product/Version Number.

*Subtask 2*: Extend the classification task to the subcategory level, and classify numerals into 17 classes, including Indicator, Quantity, Product/Version Number, and all subcategories shown in Table 1.

------ Important dates ------

*2018 *

*Sep 10* Training Data Release

*Dec 5* Test Data Release

*Dec 31* Due of Registration

*2019 *

*Jan 5* Experimental Results Submission Due

*Feb 1* Evaluation Result Release & Task Overview Paper Release

*Mar 15 *Submission Due of Participants Papers

*April 1* Acceptance Notification

*May 1 *Camera-ready Participant Paper Due

*Jun 10- 13 *NTCIR-14 Conference

Read more:

FinNum: http://nlpfin.com

NTCIR-14: http://research.nii.ac.jp/ntcir/ntcir-14/index.html


The FinNum Organizers

NTCIR-14 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 14656 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20181215/c56bc60c/attachment.txt>

More information about the Corpora mailing list