Cracking Tabular Presentation Diversity for Automatic Cross-Checking over Numerical Facts
Hongwei Li: Institute of Computing Technology CAS University of Chinese Acadamy of Sciences ; Qingping Yang: Institute of Computing Technology CAS University of Chinese Acadamy of Sciences ; Yixuan Cao: Institute of Computing Technology CAS University of Chinese Acadamy of Sciences ; Jiaquan Yao: School of Management Jinan University ; Ping Luo: Institute of Computing Technology CAS University of Chinese Acadamy of Sciences
Tabular forms of numerical facts widely exist in the disclosure documents of vertical domains, especially the financial fields. It is also quite common that the same fact might be mentioned multiple times in different tables with diverse tabular presentation. Firm’s disclosure documents are the main source of accounting information for individual investors. Its authenticity is crucial for both firms’ development and investors’ investment decisions. However, due to large volumes of tables, frequent updates during editing, and limited time for manual cross-checking, these facts might be inconsistent with each other even after official publishing. Such errors may bring about huge reputational risk, and even economic losses even if the mistakes are made unintentionally instead of deliberately. Hence, it creates an opportunity for Automatic Numerical Cross-Checking over Tables. This paper introduces the key module of such a system, which aims to identify whether a pair of table cells are semantically equivalent, namely referring to the same fact. We observed that due to tabular presentation diversity the facts in tabular forms are difficult to be parsed into relational tuples. Thus, we present an end-to-end solution of binary classification over each pair of table cells, which does not involve with explicit semantic parsing over tables. Also, we discuss the design of this neural model to compromise between prediction accuracy and inference time for a large number of table cell pairs, and propose some practical techniques to address the issue of extreme classification imbalance among pairs. Experiments show that our model achieves macro F1 = 0.8297 in linking semantically equivalent table cells from the IPO prospectus. Finally, an auditing tool is built to support guided cross-checking over financial documents, reducing work hours by 52% ~ 68%. This system has received wide recognition in the Chinese financial community. Nine of the top ten Chinese security brokers have adopted this system to support their business of investment banking.
How can we assist you?
We'll be updating the website as information becomes available. If you have a question that requires immediate attention, please feel free to contact us. Thank you!
Please enter the word you see in the image below: