Facebook AI & The George Washington University
Data Paucity and Low Resource Scenarios: Challenges and Opportunities
In an era unstructured data abundance, you would think that we have solved our data requirements for building robust systems for language processing. However, this is not the case if we think on a global scale with over 7000 languages where only a handful have digital resources. Moreover, systems at scale with good performance typically require annotated resources.The existence of a handful of resources in a some languages is a reflection of the digital disparity in various societies leading to inadvertent biases in systems. In this talk I will show some solutions for low resource scenarios, both cross domain and genres as well as cross lingually.
Mona Diab conducts research in Statistical Natural Language Processing (NLP) is a rapidly growing, exciting field of research in artificial intelligence and computer science. Interdisciplinarity is inherent to NLP, drawing on the fields of computer algorithms, software engineering, statistics, machine learning, linguistics, pragmatics, information technology, etc. In NLP, researchers model language and its use, and build both analytical models and predictive ones. In Professor Diab's NLP lab, they address problems in social media processing, building robust enabling technologies such as syntactic and semantic processing tools for written texts in different languages, information extraction tools for large data, multilingual processing, machine translation, and computational sociolinguistic processing. Professor Diab has a special interest in Arabic NLP, where the emphasis has been on investigating Arabic dialect processing where there are very few available automated resources.