Samsung Research America
Multimodal Machine Learning for Video and Image Analysis
Videos typically have data in multiple modalities, e.g., audio, video, and text (captions). Understanding and modeling the interaction between different modalities is key for video analysis tasks like categorization, object detection, activity recognition, etc. However, data modalities are not always correlated -- so, learning when modalities are correlated and using that to guide the influence of one modality on the other is crucial. Another salient feature of videos is the coherence between successive frames due to continuity of video and audio, a property that we refer to as temporal coherence. We show how using non-linear guided cross-modal signals and temporal coherence can improve the performance of multimodal ML models for video analysis tasks like categorization. We also created a hierarchical taxonomy of categories internally. Our experiments on the large-scale YouTube-8M dataset show how our approach significantly outperforms state-of-the-art multimodal ML model for video categorization using our taxonomy, as well as generalizes well to an internal dataset of video segments from actual TV programs. We will conclude by discussing other problems in multimodal learning, e.g., visual dialog, model explainability.
Shalini Ghosh is currently Principal Scientist (Global) and the Leader of the Machine Learning Research team at the Smart TV division of Visual Display Intelligence Lab in Samsung Research America.
Before this, from May 2018 - July 2019, She also served as the Director of AI Research in the Artificial Intelligence center in Samsung Research America in Mountain View, reporting to Dr. Larry Heck.
Before May 2018, She was a Principal Computer Scientist in the Computer Science Laboratory at SRI in Menlo Park, reporting to Dr. Patrick Lincoln.
She completed her PhD in 2005 at the Computer Engineering Research Center in ECE at the University of Texas at Austin. She worked with Prof. Nur Touba in the Computer Aided Testing (CAT) Laboratory. Previously, She did her MS from the Computer Engineering Department of University of California at Santa Cruz (UCSC). At UCSC, She worked with the Semiconductor Test Group. her MS Thesis advisor was Prof. F. Joel Ferguson.
She was invited to be a Visiting Scientist at Google Research in Mountain View, as part of the Google Visiting Faculty Program, for more than 1 year (July 2014 to August 2015). She worked on applying deep learning (Google Brain) models to problems in natural language understanding.