Applications
Curated by:
Related KDD2016 Papers
| Title & Authors |
|---|
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Notice
Message: Undefined offset: 231
Filename: relationship_parser/Tree_builder.php
Line Number: 544
Severity: Warning
Message: Cannot modify header information - headers already sent by (output started at /home/kddhostingacm/public_html/yottabyte/codeigniter/system/core/Exceptions.php:170)
Filename: core/Common.php
Line Number: 479
Curated by: Aidong Zhang
Classification – Assigning labels to objects is one of the cornerstone application/task in data mining. Many day-to-day activities, some so involuntary that we don’t even realize doing it, are classification tasks – “Identifying your car in the parking lot” or “Recognizing your family member in a crowd”. These seemingly simple tasks for humans, however, is extremely difficult for computers and forms the core of AI problems.
The domain applications of classification have expanded from early days of hand-written digit recognition and face recognition tasks in 90s to identifying and classifying data in high-throughput environment like bioinformatics and social media.
With “big” data and growth of deep learning algorithms, a new paradigm of techniques and use-cases have arisen which significantly reduces human intervention in training the system/algorithm. Some very interesting demos/applications are listed in [1] in which a specific example of online handwritten character recognition available here: http://deep.host22.com/.
A general introduction and survey of classification algorithms can be found in [2].
[1] http://deeplearning.net/demos/
[2] http://wen.ijs.si/ojs-2.4.3/index.php/informatica/article/download/148/140
| Title & Authors |
|---|
| Text Mining in Clinical Domain: Dealing with Noise. Author(s): Hoang Nguyen*, National ICT Australia; Jon Patrick, University of Sydney |
| Days on Market: Measuring Liquidity in Real Estate Markets Author(s): Hengshu Zhu*, Baidu Inc.; Hui Xiong, Rutgers; Fangshuang Tang, University of Science and Technology of China; Yong Ge, ; Qi Liu, University of Science and Technology of China; Enhong Chen, ; Yanjie Fu, Rutgers University |
| Designing Policy Recommendations to Reduce Home Abandonment in Mexico Author(s): Klaus Ackermann, Monash University; Eduardo Blancas Reyes*, The University of Chicago; Sue He, University of Virginia; Thomas Anderson Keller, UC San Diego; Paul van der Boor, Data Science for Social Good; Romana Khan, Data Science for Social Good; Rayid Ghani, University of Chicago |
| Predictors without Borders: Behavioral Modeling of Product Adoption in Three Developing Countries Author(s): Muhammad Khan, University of Washington; Joshua Blumenstock*, University of Washington |
| Bid-aware Gradient Descent for Unbiased Learning with Censored Data in Display Advertising Author(s): Weinan Zhang*, University College London; Tianxiong Zhou, TukMob; Jun Wang, University College London; Jian Xu, TouchPal Inc |
| How to Get Them a Dream Job? Author(s): Jia Li, University of Illinois at Chicago; Dhruv Arya, LinkedIn; Viet Ha-Thuc*, LinkedIn; Shakti Sinha, LinkedIn |
| The Profile of an Online Purchaser: A Case Study of Pinterest Author(s): Caroline Lo*, Stanford University; Dan Frankowski, ; Jure Leskovec, Stanford University |
| EMBERS AutoGSR: Automated Coding of Civil Unrest Events Author(s): PARANG SARAF*, VIRGINIA TECH; Naren Ramakrishnan, Virginia Tech |
| DopeLearning: A Computational Approach to Rap Lyrics Generation Author(s): Eric Malmi*, Aalto University; Pyry Takala, Aalto University; Hannu Toivonen, University of Helsinki; Tapani Raiko, Aalto University; Aristides Gionis, Aalto University |
| Developing a Data-Driven Player Ranking in Soccer using Predictive Model Weights Author(s): Joel Brooks*, Massachusetts Institute of Tec; Matthew Kerr, Massachusetts Institute of Technology; John Guttag, MIT |
| Identifying Earmarks in Congressional Bills Author(s): Vrushank Vora*, Data Science for Social Good; Joe Walsh, Data Science for Social Good; Madian Khabasa, Microsoft ; Ellery Wulczyn, Wikimedia Foundation; Matthew Heston, Northwestern University; Rayid Ghani, University of Chicago; Chris Berry, University of Chicago |
| Detecting Devastating Diseases in Search Logs Author(s): John Paparrizos, Columbia University; Ryen White*, Microsoft; Eric Horvitz, Microsoft Research |
| Audience Expansion for Online Social Network Advertising Author(s): Haishan Liu*, LinkedIn Corporation; David Pardoe, LinkedIn Corporation; Kun Liu, LinkedIn Corporation |
| Question Independent Grading using Machine Learning: The Case of Computer Program Grading Author(s): Gursimran Singh*, Aspiring Minds; Shashank Srikant, ; Varun Aggarwal, |
| The Legislative Influence Detector: Finding Text Reuse in State Legislation Author(s): Matthew Burgess, University of Michigan; Eugenia Giraudy, YouGov; Julian Katz-Samuels, University of Michigan; Joe Walsh*, University of Chicago; Derek Willis, ProPublica; Lauren Haynes, University of Chicago; Rayid Ghani, University of Chicago |
| A Non-parametric Approach to Detect Epileptogenic Lesions using Restricted Boltzmann Machines Author(s): Yijun Zhao*, Tufts University; Bilal Ahmed, Tufts; Carla Brodley, Northeastern University; Jennifer Dy, NEU |
| Repeat Buyer Prediction for E-Commerce Author(s): Guimei Liu*, ; Tam T. Nguyen, Institute for Infocomm Research; Gang Zhao, Development Bank of Singapore; Wei Zha, Institute for Infocomm Research; Jianbo Yang, General Electric; Jianneng Cao, Institute for Infocomm Research; Min Wu, Institute for Infocomm Research; Peilin Zhao, Institute for Infocomm Research, A*STAR; Wei Chen, Development Bank of Singapore |
| Firebird: Predicting Fire Risk and Prioritizing Fire Inspections in Atlanta Author(s): Michael Madaio, Carnegie Mellon University; Shang-Tse Chen*, Georgia Institute of Technology; Oliver Haimson, University of California, Irvine; Wenwen Zhang, Georgia Institute of Technology; Xiang Cheng, Emory University; Matthew Hinds-Aldrich, Atlanta Fire Rescue Department; Duen Horng Chau, Georgia Tech; Bistra Dilkina, Georgia Tech |
| Identifying Police Officers at Risk of Adverse Events Author(s): Samuel Carton, University of Michigan; Jennifer Helsby*, University of Chicago; Kenneth Joseph, Carnegie Mellon University; Ayesha Mahmud, Princeton University; Youngsoo Park, University of Arizona; Joe Walsh, University of Chicago; Crystal Cody, Charlotte-Mecklenburg Police Department; Estella Patterson, Charlotte-Mecklenburg Police Department; Lauren Haynes, University of Chicago; Rayid Ghani, University of Chicago |
| Predicting Disk Replacement towards Reliable Data Centers Author(s): Mirela Botezatu*, IBM Research; Ioana Giurgiu, IBM Research; Jasmina Bogojeska, IBM Research; Dorothea Wiesmann, IBM Research |
| Domain adaptation in the absence of source domain data Author(s): Boris Chidlovskii*, XRCE; Stephane Clinchant, Xerox Research Centre Europe; Gabriela Csurka, Xerox Research Centre Europe |
| Crystal:Employer Name Normalization in the Online Recruitment Industry Author(s): Qiaoling Liu, CareerBuilder; Faizan Javed*, CareerBuilder; Matt McNair, CareerBuilder |
| Gemello: Creating a Detailed Energy Breakdown from just the Monthly Electricity Bill Author(s): Nipun Batra*, IIIT Delhi; Amarjeet Singh, ; Kamin Whitehouse, |
| Ranking Relevance in Yahoo Search Author(s): Dawei Yin, Yahoo Labs; Yuening Hu, ; Jiliang Tang, Yahoo Labs; Tim Daly, yahoo; Mianwei Zhou, Yahoo Inc; Hua Ouyang, ; Jianhui Chen, Yahoo!; Changsung Kang, Yahoo Labs; Hongbo Deng, Yahoo!; Chikashi Nobata, ; Jean-Marc Langlois, ; Yi Chang*, Yahoo! Labs |
| Dynamic and Robust Wildfire Risk Prediction System: An Unsupervised Approach Author(s): Mahsa Salehi*, IBM Australia; Laura Rusu, IBM Research; Timothy Lynar, IBM Research; Anna Phan, IBM Research |
| An Engagement-Based Customer Lifetime Value System for E-commerce Author(s): Ali Vanderveld*, Groupon; Angela Han, Groupon; Addhyan Pandey, Groupon; Rajesh Parekh, |
Curated by: Eric P. Xing and Qirong Ho
The rise of Big Data requires complex Machine Learning models with millions to billions of parameters, that promise adequate capacity to digest massive datasets and offer powerful predictive analytics (such as high-dimensional latent features, intermediate representations, and decision functions) thereupon. In turn, this has led to new demands for Machine Learning (ML) systems to learn complex models with millions to billions of parameters. In order to support the computational needs of ML algorithms at such scales, an ML system often needs to operate on distributed clusters with 10s to 1000s of machines; however, implementing algorithms and writing systems softwares for such distributed clusters demands significant design and engineering effort. A recent and increasingly popular trend toward industrial-scale machine learning is to explore new principles and strategies for either highly specialized monolithic designs for large-scale vertical applications such as various distributed topic models or regression models, or flexible and easily programmable general purpose distributed ML platforms—- such as GraphLab based on vertex programming, and Petuum using parameter server. It has been recognized that, in addition to familiarity of distributed system architectures and programing, large scale ML systems can benefit greatly from ML-rooted statistical and algorithmic insights, which can lead to principles and strategies unique to distributed machine learning programs. These principles and strategies shed lights to the following key questions—- How to distribute an ML program over a cluster? How to bridge ML computation with inter-machine communication? How to perform such communication? What should be communicated between machines?—- and they span a broad continuum from application, to engineering, and to theoretical research and development of Big ML systems and architectures. The ultimate goal of large scale ML systems research is to understand how these principles and strategies can be made efficient, generally-applicable, and easy to program and deploy, while not forgetting that they should be supported with scientifically-validated correctness and scaling guarantees.
| Title & Authors |
|---|
| Safe Pattern Pruning: An Efficient Approach for Predictive Pattern Mining Author(s): Kazuya Nakagawa, Nagoya Institute of Technology; Shinya Suzumura, Nagoya Institute of Technology; Masayuki Karasuyama, ; Koji Tsuda, University of Tokyo; Ichiro Takeuchi*, Nagoya Institute of Technology Japan |
| Parallel Dual Coordinate Descent Method for Large-scale Linear Classification in Multi-core Environm Author(s): Wei-Lin Chiang, National Taiwan University; Mu-Chu Lee, National Taiwan University; Chih-Jen Lin*, National Taiwan University |
| Compressing Graphs and Indexes with Recursive Graph Bisection Author(s): Laxman Dhulipala, Carnegie Mellon University; Igor Kabiljo, Facebook; Brian Karrer, Facebook; Giuseppe Ottaviano, Facebook; Sergey Pupyrev*, Facebook; Alon Shalita, Facebook |
| Accelerated Stochastic Block Coordinate Descent with Optimal Sampling Author(s): Aston Zhang*, UIUC; Quanquan Gu, University of Virginia |
| Stochastic Optimization Techniques for Quantification Performance Measures Author(s): Harikrishna Narasimhan, IACS, Harvard University; Shuai Li, University of Insubria; Purushottam Kar*, IIT Kanpur; Sanjay Chawla, QCRI-HBKU, Qatar; Fabrizio Sebastiani, QCRI-HBKU, Qatar |
| Parallel Lasso Screening for Big Data Optimization Author(s): Qingyang Li*, Arizona State University; Shuang Qiu, Umich; Shuiwang Ji, Washington State University; Jieping Ye, University of Michigan at Ann Arbor; Jie Wang, University of Michigan |
| Fast Component Pursuit for Large-Scale Inverse Covariance Estimation Author(s): Lei Han*, Rutgers University; Yu Zhang, Hong Kong University of Science and Technology; Tong Zhang, Rutgers University |
| Convex Optimization for Linear Query Processing under Approximate Differential Privacy Author(s): Ganzhao Yuan*, SCUT; Yin Yang, ; Zhenjie Zhang, ; Zhifeng Hao, |
Curated by: Jiawei Han and Jing Gao
The data reliability issue poses great difficulty to many decision making tasks when the data contains inconsistent, inaccurate, or even false information that could mislead the decisions and eventually result in invaluable losses. Unfortunately, we cannot expect real-world data to be clean and accurate, instead, data inconsistency, ambiguity and uncertainty widely exist. Such ubiquitous veracity problems motivate numerous efforts towards improving the information quality, trustworthiness and reliability. The efforts are taken from different perspectives to identify reliable information sources and trustworthy claims. Some popular subtopics are listed below: (1) Truth discovery is an emerging topic that attracts much attention. The goal is to discover truths from multiple conflicting information sources without supervision. The basic idea is to estimate both source reliability and claim trustworthiness simultaneously by examining the relationship between sources and claims. (2) Many efforts have been devoted to detect spams, rumors, or other types of untruthful information in the online world. Supervised approaches are typically adopted in which features are extracted to capture distinctions between rumors (spams) and facts. Trust between sources (users) is also an important factor in evaluating information trustworthienss, and many graph-based approaches are developed to assess source trustworthiness. (3) Anomaly detection and denoising also directly contribute to solving the data reliability issue. Anomaly detection is to detect anomalous data points that deviate significantly from the rest of the data. Denoising is usually conducted to reduce the level of noise in the data and repair the low-quality data by exploiting some underlying distributions or patterns in the data.
Surveys:
Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, Jiawei Han. A
Survey on Truth Discovery. SIGKDD Explorations Newsletter, 17(2): 1-16, 2015.
Xian Li, Xin Luna Dong, K. B. Lyons, Weiyi Meng, Divesh Srivastava. Truth finding on
the deep web: Is the problem solved? PVLDB, 6(2):97–108, 2012.
Manish Gupta, Jiawei Han. Heterogeneous Network-Based Trust Analysis: A Survey.
SIGKDD Explorations Newsletter, 13(1):60-77, 2011.
Meng Jiang, Peng Cui, Christos Faloutsos. Suspicious Behavior Detection: Current
Trends and Future Directions. Special Issue on Online Behavioral Analysis and
Modeling, IEEE Intelligent Systems Magazine, 2016.
Jiliang Tang, Huan Liu. Trust in Social Media, Morgan Claypool & Publishers, 2015.
Data and codes:
http://www.cse.buffalo.edu/~jing/software.htm
http://lunadong.com/fusionDataSets.htm
http://cogcomp.cs.illinois.edu/page/resource_view/16
http://www.jiliang.xyz/trust.html
| Title & Authors |
|---|
| Towards Confidence in the Truth: A Bootstrapping based Truth Discovery Approach Author(s): Houping Xiao*, SUNY Buffalo; Jing Gao, ; Qi Li, SUNY Buffalo; Fenglong Ma, SUNY Buffalo; Lu Su, SUNY Buffalo; Yunlong Feng, KU Leuven; Aidong Zhang, |
| From Truth Discovery to Trustworthy Opinion Discovery: An Uncertainty-Aware Quantitative Modeling Ap Author(s): Mengting Wan*, UC San Diego; Xiangyu Chen, University of Illinois, Urbana-Champaign; Lance Kaplan, U.S. Army Research Laboratory; Jiawei Han, University of Illinois at Urbana-Champaign; Jing Gao, ; Bo Zhao, LinkedIn |
Curated by: Jerry Xiaojin Zhu
Semi-supervised learning uses both labeled and unlabeled data to improve supervisedlearning. The goal is to learn a predictor that predicts future test data better than the predictor learned from the labeled training data alone. Semi-supervised learning is motivated by its practical value in learning faster, better, and cheaper. In many real world applications, it is relatively easy to acquire a large amount of unlabeled data x. For example, documents can be crawled from the Web, images can be obtained from surveillance cameras, and speech can be collected from broadcast. However, their corresponding labels y for the prediction task, such as sentiment orientation, intrusion detection, and phonetic transcript, often requires slow human annotation and expensive laboratory experiments. This labeling bottleneck results in a scarce of labeled data and a surplus of unlabeled data. Therefore, being able to utilize the surplus unlabeled data is desirable. Common semi-supervised learning methods include generative models, semi-supervised support vector machines, graph Laplacian based methods, co-training, and multiview learning. These methods make different assumptions on the link between the unlabeled data distribution and the classification function. Such assumptions are equivalent to prior domain knowledge, and the success of semi-supervised learning depends to a large degree on the validity of the assumptions.
References:
O. Chapelle and B. Sch{\"o}lkopf and A. Zien. Semi-Supervised Learning. MIT Press, 2006.
A. Subramanya and P. Talukdar. Graph-Based Semi-Supervised Learning. Morgan & Claypool Publishers, 2014.
X. Zhu and A. B. Goldberg. Introduction to Semi-Supervised Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2009.
| Title & Authors |
|---|
| A Multi-Task Learning Formulation for Survival Analysis Author(s): Yan Li*, Wayne State University; Jie Wang, University of Michigan; Jieping Ye, University of Michigan at Ann Arbor; Chandan Reddy, Wayne State University |
| Smart Reply: Automated Response Suggestion for Email Author(s): Anjuli Kannan, ; Karol Kurach*, Google; Sujith Ravi, Google; Tobias Kaufmann, Google, Inc.; Andrew Tomkins, ; Balint Miklos, Google, Inc.; Greg Corrado, ; László Lukács, ; Marina Ganea, ; Peter Young, ; Vivek Ramavajjala |
| Partial Label Learning via Feature-Aware Disambiguation Author(s): Min-Ling Zhang*, Southeast University; Binbin Zhou, Southeast University; Xu-Ying Liu, Southeast University |
| Semi-Markov Switching Vector Autoregressive Model-based Anomaly Detection in Aviation Systems Author(s): Igor Melnyk*, University of Minnesota; Arindam Banerjee, University of Minnesota; Bryan Matthews, Nasa Ames Research Center; Nikunj Oza, Nasa Ames Research Center |
| Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding Author(s): Xiang Ren*, UIUC; Wenqi He, UIUC; Meng Qu, UIUC; Heng Ji, PRI; Clare Voss, ARL; Jiawei Han, University of Illinois at Urbana-Champaign |
| Fast Unsupervised Online Drift Detection Using Incremental Kolmogorov-Smirnov Test Author(s): Denis Dos Reis*, Universidade de São Paulo; Gustavo Batista, Universidade de Sao Paulo at Sao Carlos; Peter Flach, University of Bristol; Stan Matwin, Dalhousie University |
| Overcoming key weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilari Author(s): Ting Kai Ming*, Federation University; YE ZHU, Monash University; Mark Carman, Monash University; Yue Zhu, Nanjing University |
| FRAUDAR: Bounding Graph Fraud in the Face of Camouflage Author(s): Bryan Hooi*, Carnegie Mellon University; Hyun Ah Song, Carnegie Mellon University; Alex Beutel, Carnegie Mellon University; Neil Shah, Carnegie Mellon University; Kijung Shin, Carnegie Mellon University; Christos Faloutsos, Carnegie Mellon University |
| Goal-Directed Inductive Matrix Completion Author(s): Si Si*, Ut austin; Kai-Yang Chiang, UT Austin; Cho-Jui Hsieh, UT Austin; Nikhil Rao, Technicolor Research; Inderjit Dhillon, UTexas |
| Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning Author(s): Yue Ning*, Virginia Tech; Sathappan Muthiah, Virginia Tech; Huzefa Rangwala, George Mason University; Naren Ramakrishnan, Virginia Tech |
Curated by: Tao Li
The field of data mining increasingly adapts methods and algorithms from advanced matrix computations, graph theory and optimization. In these methods, the data is described using matrix representations (graphs are represented by their adjacency matrices) and the data mining problem is formulated as an optimization problem with matrix variables. With these, the data mining task becomes a process of minimizing or maximizing a desired objective function of matrix variables.
Prominent examples include spectral clustering, matrix factorization, tensor analysis, and regularizations. These matrix-formulated optimization-centric methodologies are rapidly evolving into a popular research area for solving challenging data mining problems. These methods are amenable to vigorous analysis and benefit from the well-established knowledge in linear algebra, graph theory, and optimization accumulated through centuries. They are also simple to implement and easy to understand, in comparison with probabilistic, information-theoretic, and other methods. In addition, they are well-suited to parallel and distributed processing for solving large scale problems. Last but not the least, these methodologies are quite flexible and they can be used to formulate a large number of data mining tasks.
Resources
Workshop on algorithms for modern massive datasets (MMDS)
| Title & Authors |
|---|
| Portfolio Selections in P2P Lending: A Multi-Objective Perspective Author(s): Hongke Zhao*, USTC; Guifeng Wang, USTC; Yong Ge, UNC Charlotte; Qi Liu, University of Science and Technology of China; Enhong Chen, |
| Optimally Discriminative Choice Sets in Discrete Choice Models: Application to Data-Driven Test Desi Author(s): Igor Labutov*, Cornell University |
| Lossless Separation of Web Pages into Layout Code and Data Author(s): Adi Omari*, Technion; Benny Kimelfeld, Technion; Sharon Shoham, Academic College of Tel Aviv Yaffo; Eran Yahav, Technion |
| Joint Optimization of Multiple Performance Metrics in Online Video Advertising Author(s): Sahin Geyik*, Turn Inc.; Sergey Faleev, Turn Inc.; Jianqiang Shen, Turn Inc.; Sean O'Donnell, Turn Inc.; Santanu Kolay, Turn Inc. |
| Matrix Computations and Optimization in Apache Spark Author(s): Reza Zadeh*, Stanford University; Xiangrui Meng, ; Alexander Ulanov, ; Burak Yavuz, ; Li Pu, ; Shivaram Venkataraman, ; Evan Sparks, ; Aaron Staple, ; Matei Zaharia, |
| Online dual decomposition for performance and delivery-based distributed ad allocation Author(s): Jim Huang*, Amazon; Rodolphe Jenatton, Amazon; Cedric Archambeau, Amazon |
| MAP: Frequency-Based Maximization of Airline Profits based on an Ensemble Forecasting Approach Author(s): Bo An, ; Haipeng Chen, Nanyang Technological Universi; Noseong Park*, University of Maryland; V.S. Subrahmanian, Univ of Maryland |
| Email Volume Optimization at LinkedIn Author(s): Rupesh Gupta*, LinkedIn; Xiaoyu Chen, ; Guanfeng Liang, ; Romer Rosales, LinkedIn; Hsiao-Ping Tseng, ; Ravi Kiran Holur Vijay, |
| Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices Author(s): Yasuo Tabei*, JST; Hiroto Saigo, Kyushu Institute of Technology; Yoshihiro Yamanishi, Kyushu University; Simon Puglisi, Helsinki University |
Curated by: Chris Clifton
The ever-increasing collection of personal data, and the growing capabilities to analyze that data, pose increased risks to personal privacy. This has long been a concern for the SIGKDD community; in 2003 there was actually a Data Mining Moratorium Actproposed in the U.S. Senate (for more details, see the SIGKDD response. While there have been examples of data mining that people feel are privacy violations, such as Target’s pregancy prediction, most privacy problems have come from security failures leading to data breaches, rather than the data analysis itself.
The SIGKDD community has long been active in research to protect privacy while enabling data analysis. This year, there are two papers on privacy-protection techniques. Both use the model of ε-differential privacy. The idea behind differential privacy is that sufficient noise is added to analysis results to hide the impact of any single individual, thus ensuring that the outcome of the analysis does not reveal information specific to any individual in the data. For a more complete introduction to differential privacy, I recommend this article by Cynthia Dwork or you can watch a tutorial by Christine Task.
While knowledge discovery is often on an aggregate level, access to instance-level data (and the attendant privacy risks) is often a part of the learning process. In Privacy-Preserving Class Ratio Estimation, Arun Iyer, Saketh Nath, and Sunita Sarawagi provide a way around this. The goal, estimating the ratio of instances belonging to different classes, is aggregate knowledge rather than instance level. But how do we estimate this without labeling individual instances (putting the privacy of those individuals at risk)? This paper shows that by labeling at the set level, rather than the instance level, good estimates for class ratios can be obtained, while satisfying the requirements of differential privacy.
Naïve approaches to differential privacy require a privacy budget be divided among all accesses to the data; the accuracy of an answer is limited by it’s share of the budget. If we simultaneously perform a set of analyses, we can share this budget, giving more accurate results while still protecting privacy. Unfortunately, this becomes a (computationally infeasible) non-convex optimization problem. In the poster presentation Optimal Linear Aggregate Query Processing under Approximate Differential Privacy, authors Ganzhao Yuan, Yin Yang, Zhenjie Zhang and Zhifeng Hao show how to efficiently perform a set of analyses under the slightly relaxed (ε, δ)-differential privacy.
| Title & Authors |
|---|
| Convex Optimization for Linear Query Processing under Approximate Differential Privacy Author(s): Ganzhao Yuan*, SCUT; Yin Yang, ; Zhenjie Zhang, ; Zhifeng Hao, |
| Privacy-preserving Class Ratio Estimation Author(s): Arun Iyer*, ; Saketh Nath, IIT Bombay; Sunita Sarawagi, IIT Bombay |
| Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations Author(s): Wei Cheng*, NEC Labs America; Kai Zhang, NEC labs America; Haifeng Chen, NEC Research Lab; Guofei Jiang, NEC labs America; Wei Wang, UC Los Angeles |
| FRAUDAR: Bounding Graph Fraud in the Face of Camouflage Author(s): Bryan Hooi*, Carnegie Mellon University; Hyun Ah Song, Carnegie Mellon University; Alex Beutel, Carnegie Mellon University; Neil Shah, Carnegie Mellon University; Kijung Shin, Carnegie Mellon University; Christos Faloutsos, Carnegie Mellon University |
| Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning Author(s): Yue Ning*, Virginia Tech; Sathappan Muthiah, Virginia Tech; Huzefa Rangwala, George Mason University; Naren Ramakrishnan, Virginia Tech |
Curated by: Huan Liu
The very first issue of data mining and knowledge discovery is to properly handle data. It is essential to take into account different data types. Rich data types can be categorized into: non-dependency and dependency data. The non-dependency data is the most commonly encountered type, which refers to data without specified dependencies between data instances. In other words, data instances are or are assumed independent and identically distributed. Examples of non-dependency data include multidimensional data, text data, and image data. In practice, data can be more complex, and there exists dependency between data instances. Dependency data can be correlated with temporal, spatial, sequential, and social relationships such as time-series, sequence, graph, multi-media, and social-media data.Publications
Non-Dependency Data
1. Text
· Jiawei Han, Heng Ji, and Yizhou Sun. “Successful Data Mining Methods for NLP.” ACL-IJCNLP 2015 (2015). [Tutorial]
2. Image
· Foundations and Trends® in Computer Graphics and Vision, Now Publishers Inc. 2015. http://www.nowpublishers.com/CGV/ [Book chapters]
Dependency Data
3. Time Series Data
o Keogh, Eamonn. “Machine Learning in Time Series Databases (and Everything Is a Time Series.” AAAI’10. http://www.cs.ucr.edu/~eamonn/tutorials.html [Tutorial]
4. Sequence Data
o Mabroukeh, Nizar R., and Christie I. Ezeife. “A taxonomy of sequential pattern mining algorithms.” ACM Computing Surveys (CSUR) 43.1 (2010): 3. [Survey]
5. Dynamic/Streaming Data
o Hans-Peter Kriegel, Irene Ntoutsi, Myra Spiliopoulou, Grigorios Tsoumakas, and Arthur Zimek. “Mining Complex Dynamic Data.” ECML-PKDD 2011. [Tutorial]
6. Graph/Network Data
o Getoor, Lise, and Christopher P. Diehl. “Link Mining: a Survey.” ACM SIGKDD Explorations Newsletter 7.2 (2005): 3-12. [Survey]
o Shamanth Kumar, Fred Morstatter, and Huan Liu. “Analyzing Twitter Data.”Twitter Data Analytics. Springer New York, 2014. 35-48.
7. Social Data
o Mohammad Ali Abbasi, Huan Liu, and Reza Zafarani. Social Media Mining: Fundamental Issues and Challenges. ICDM’13 [Tutorial] http://ecs.syr.edu/faculty/reza/tutorials/ICDM13/TutorialICDM13SMM.pdf
o Jiebo Luo and Tao Mei. Social Multimedia as Sensors. ICDM’14 [Tutorial] http://icdm2014.sfu.ca/program_tutorials.html
8. Spatial and Spatial-Temporal Data
o Aggarwal, Charu C. Chapter 16: Mining Spatial Data, Data mining: The textbook. Springer, 2015. [Book chapter]
9. Multimedia
o Deng, Li, and D. Yu. “Foundations and Trends in Signal Processing.” Signal Processing 7 (2014): 3-4. [Survey]
10. Multi-modularity
o Sun, Shiliang. “A survey of multi-view machine learning.” Neural Computing and Applications 23.7-8 (2013): 2031-2038. [Survey]
Publicly Available Resources
Text Data
o New York Times Annotated Corpus https://catalog.ldc.upenn.edu/LDC2008T19
o 20 Newsgroups Dataset http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.html
o UCI Reuters-21578 Text Categorization Collection
Image Data
o ImageNET http://image-net.org/
Time Series Data
o TREC 2013/2014 Temporal Summarization http://trec.nist.gov/data/tempsumm.html
o UCI Machine Learning Repository (UCI) Synthetic Control Chart Time Series
Sequence Data
o UCI Molecular Biology
Dynamic/Streaming Data
o UCI Synthetic Control Chart Time Series & Pseudo Periodic Synthetic Time Series
Graph/Network Data
o AMiner Citation Network Dataset https://aminer.org/citation
o Stanford Large Network Dataset Collection https://snap.stanford.edu/data/
Social Data
o Social Computing Data Repository at ASU http://socialcomputing.asu.edu/pages/datasets
o MIRFlickr Retrieval Evaluation Dataset http://press.liacs.nl/mirflickr/
Spatial Data
o GDELT Project http://gdeltproject.org/
o UCI Connect-4
Spatio-Temporal Data
o Microsoft Urban Computing Dataset http://research.microsoft.com/en-us/people/yuzheng/#Datasets
o UCI El Nino
Video Data
o TRECVID ’01-’15 http://trecvid.nist.gov/
Audio Data
o Aurora: Timit with noise and additional information http://aurora.hsnr.de/index-2.html
o TIMIT Speech Corpus http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1
Multi-Modularity Data
o UCSD SVCL Cross Modal Dataset http://www.svcl.ucsd.edu/projects/crossmodal/
| Title & Authors |
|---|
| A Subsequence Interleaving Model for Sequential Pattern Mining Author(s): Jaroslav Fowkes*, University of Edinburgh; Charles Sutton, University of Edinburgh |
| A Real Linear and Parallel Multiple Longest Common Subsequences (MLCS) Algorithm Author(s): Yanni Li, Xidian University; Hui Li*, Xidian University; Tihua Duan, Shanghai Finance University; Sheng Wang, Coventry University; Zhi Wang, Xidian University; Yang Cheng, Xidian University |
| Improving Survey Aggregation with Sparsely Represented Signals Author(s): Tianlin Shi, Stanford University; Forest Agostinelli*, Univ of California - Irvine; Matthew Staib, MIT; David Wipf, Microsoft Research; Thomas Moscibroda, Microsoft Research |
| CatchTartan: Representing and Summarizing Dynamic Multicontextual Behaviors Author(s): Meng Jiang*, UIUC; Christos Faloutsos, Carnegie Mellon University; Jiawei Han, University of Illinois at Urbana-Champaign |
| Structural Deep Network Embedding Author(s): DAIXIN WANG*, TSINGHUA UNIVERSITY; Peng Cui, Tsinghua University; Wenwu Zhu, Tsinghua University |
| Predicting Socio-Economic Indicators using News Events Author(s): Sunandan Chakraborty*, NYU; Ashwin Venkataraman, New York University; Srikanth Jagabathula, New York University; Lakshminarayanan Subramanian, New York University |
| Probabilistic Robust Route Recovery with Spatio-Temporal Dynamics Author(s): Hao Wu, Fudan University; Jiangyun Mao, Fudan University; Weiwei Sun*, Fudan University; Baihua Zheng, Singapore Management University; Hanyuan Zhang, Fudan University; Ziyang Chen, Fudan University; Wei Wang, Fudan University |
| Unified Point-of-Interest Recommendation with Temporal Interval Assessment Author(s): Yanchi Liu*, Rutgers University; Chuanren Liu, Drexel University; Bin Liu, Rutgers University; Meng Qu, Rutgers University; Hui Xiong, Rutgers |
| Graph Wavelets via Sparse Cuts Author(s): Arlei Lopes da Silva*, UC, Santa Barbara; Xuan-Hong Dang, UCSB; Prithwish Basu, Raytheon BBN; Ambuj Singh, UCSB; Ananthram Swami, Army Lab |
| Beyond Sigmoids: the NetTide Model for Social Network Growth, and its Applications Author(s): Chengxi Zang*, Tsinghua University; Peng Cui, Tsinghua University; Christos Faloutsos, Carnegie Mellon University |
| Diversified Temporal Subgraph Pattern Mining Author(s): Yi Yang, Fudan University; Da Yan, CUHK; Huanhuan Wu, CUHK; James Cheng*, CUHK; Shuigeng Zhou, Fudan University; John C.S. Lui, The Chinese University of Hong Kong |
| GMove: Group-Level Mobility Modeling using Geo-Tagged Social Media Author(s): Chao Zhang*, UIUC; Keyang Zhang, ; Quan Yuan, University of Illinois Urbana-; Luming Zhang, ; Tim Hanratty, ; Jiawei Han, University of Illinois at Urbana-Champaign |
| Topic Modeling of Short Texts: A Pseudo-Document View Author(s): Yuan Zuo*, Beihang University; Junjie Wu, ; Has Lin, ; Hui Xiong, Rutgers |
| Predicting Matchups and Preferences in Context Author(s): Shuo Chen*, Cornell; Thorsten Joachims, Cornell University |
| FRAUDAR: Bounding Graph Fraud in the Face of Camouflage Author(s): Bryan Hooi*, Carnegie Mellon University; Hyun Ah Song, Carnegie Mellon University; Alex Beutel, Carnegie Mellon University; Neil Shah, Carnegie Mellon University; Kijung Shin, Carnegie Mellon University; Christos Faloutsos, Carnegie Mellon University |
| Asymmetric Transitivity Preserving Graph Embedding Author(s): Mingdong Ou*, Tsinghua University; Peng Cui, Tsinghua University; Jian Pei, Simon Fraser University; Wenwu Zhu, Tsinghua University |
| Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding Author(s): Xiang Ren*, UIUC; Wenqi He, UIUC; Meng Qu, UIUC; Heng Ji, PRI; Clare Voss, ARL; Jiawei Han, University of Illinois at Urbana-Champaign |
| Absolute Fused Lasso and Its Application to Genome-Wide Association Studies Author(s): Tao Yang*, Arizona State University; Jun Liu, SAS Institute Inc.; Pinghua Gong, University of Michigan; Ruiwen Zhang, SAS Institute Inc.; Xiaotong Shen, University of Minnesota; Jieping Ye, University of Michigan at Ann Arbor |
| Unbounded Human Learning: Optimal Scheduling for Spaced Repetition Author(s): Siddharth Reddy*, Cornell University; Igor Labutov, Cornell University; Siddhartha Banerjee, Cornell University; Thorsten Joachims, Cornell University |
| Recurrent Marked Temporal Point Processes: Embedding Event History to Vector Author(s): NAN DU*, GEORGIA TECH; Hanjun Dai, ; Rakshit Trivedi, ; Utkarsh Upadhyay, Max Plank Institute; Manuel Gomez-Rodriguez, MPI-SWS; Le Song, |
| Latent Space Model for Road Networks to Predict Time-Varying Traffic Author(s): Dingxiong Deng*, USC; Cyrus Shahabi, USC; Ugur Demiryurek, ; Linhong Zhu, ; Rose Yu, University of Southern Cal; Yan Liu, |
| Squish: Near-Optimal Compression for Archival of Relational Datasets Author(s): Yihan Gao*, University of Illinois; Aditya Parameswaran, |
| Mining Subgroups with Exceptional Transition Behavior Author(s): Florian Lemmerich*, Gesis; Martin Becker, University of Würzburg; Philipp Singer, Gesis; Denis Helic, TU Graz; Andreas Hotho, University of Wuerzburg; Markus Strohmaier, |
| Finding Gangs in War from Signed Networks Author(s): Lingyang Chu*, Simon Fraser University; Zhefeng Wang, University of Science and Technology of China; Jian Pei, Simon Fraser University; Jiannan Wang, Simon Fraser University; Zijin Zhao, Simon Fraser University; Enhong Chen, |
| Compact and Scalable Graph Neighborhood Sketching Author(s): Takuya Akiba*, NII; Yosuke Yano, National Institute of Informatics |
| FINAL: Fast Attributed Network Alignment Author(s): Si Zhang*, Arizona State University; Hanghang Tong, Arizona State University |
| Efficient Shift-Invariant Dictionary Learning Author(s): Guoqing Zheng*, Carnegie Mellon University; Yiming Yang, ; Jaime Carbonell, |
| Lexis: An Optimization Framework for Discovering the Hierarchical Structure of Sequential Data Author(s): Payam Siyari*, Georgia Institute of Technology; Bistra Dilkina, Georgia Tech; Constantine Dovrolis, Georgia Institute of Technology |
| Transfer Knowledge between Cities Author(s): Ying Wei*, Hong Kong Univ. of Sci. & Tech; Yu Zheng, Microsoft Research; Qiang Yang, HKUST |
| Burstiness Scale: a highly parsimonious model forcharacterizing random series of events Author(s): Rodrigo Alves*, CEFET-MG; Renato Assunção, DCC-UFMG; Pedro O.S. Vaz de Melo, DCC-UFMG |
| Structural Neighborhood based Classification of Nodes in a Network Author(s): Sharad Nandanwar*, Indian Institute of Science; Musti Narasimha Murty, Indian Institute of Science |
| DeepIntent: Learning Attentions for Online Advertising with Recurrent Neural Networks Author(s): Shuangfei Zhai*, Binghamton University; Keng-hao Chang, Microsoft; Ruofei Zhang, Microsoft; Zhongfei Zhang, |
| Point-of-Interest Recommendations: Learning Potential Check-ins from Friends Author(s): Yong Ge, UNC Charlotte; Huayu Li*, University of North Carolina a; Hengshu Zhu, Baidu Inc. |
| QUINT: On Query-Specific Optimal Networks Author(s): Liangyue Li*, Arizona State University; Yuan Yao, Nanjing University; Jie Tang, Tsinghua University; Wei Fan, Baidu; Hanghang Tong, Arizona State University |
| Temporal Order-based First-Take-All Hashing for Fast Attention-Deficit-Hyperactive-Disorder Detectio Author(s): Hao Hu, University of Central Florida; Joey Velez-Ginorio, University of Central Florida; Guojun Qi*, University of Central Florida |
| Regime Shifts in Streams: Real-time Forecasting of Co-evolving Time Sequences Author(s): Yasuko Matsubara*, Kumamoto University; Yasushi Sakurai, Kumamoto University |
| Dynamics of Large Multi-View Social Networks: Synergy, Cannibalization and Cross-View Interplay Author(s): Yu Shi*, UIUC; Myunghwan Kim, LinkedIn Corporation; Shaunak Chatterjee, LinkedIn Corporation; Mitul Tiwari, LinkedIn Corporation; Souvik Ghosh, LinkedIn; Romer Rosales, LinkedIn |
| Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns Author(s): Roel Bertens*, Universiteit Utrecht; Jilles Vreeken, Max-Planck Institute for Informatics and Saarland University; Arno Siebes, |
| Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations Author(s): Wei Cheng*, NEC Labs America; Kai Zhang, NEC labs America; Haifeng Chen, NEC Research Lab; Guofei Jiang, NEC labs America; Wei Wang, UC Los Angeles |
| Multi-layer Representation Learning for Medical Concepts Author(s): Edward Choi*, Georgia Institute of Technolog; Mohammad Bahadori, Georgia Institute of Technology; Jimeng Sun, Georgia Institute of Technology |
| Effcient Processing of Network Proximity Queries via Chebyshev Acceleration Author(s): Mustafa Coskun*, Case Western University; Ananth Grama, ; Mehmet Koyuturk, |
| PTE: Enumerating Trillion Triangles On Distributed Systems Author(s): Ha-Myung Park*, KAIST; Sung-Hyon Myaeng, KAIST; U Kang, Seoul National University |
| Rebalancing Bike Sharing Systems: A Multi-source Data Smart Optimization Author(s): Junming Liu, Rutgers University; Leilei Sun, ; Hui Xiong*, Rutgers; Weiwei Chen, |
| MANTRA: A Scalable Approach to Mining Temporally Anomalous Sub-trajectories Author(s): Prithu Banerjee*, UBC; Pranali Yawalkar, IIT Madras; Sayan Ranu, IIT Madras |
| ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages Author(s): Matteo Riondato*, Two Sigma Investments; Eli Upfal, Brown University |
| Distributing the Stochastic Gradient Sampler for Large-Scale LDA Author(s): Yuan Yang*, Beihang University; Jianfei Chen, Tsinghua University; Jun Zhu, |
| Data-driven Automatic Treatment Regimen Development and Recommendation Author(s): Leilei Sun*, Dalian University of Technolog; Chuanren Liu, Drexel University; Chonghui Guo, ; Hui Xiong, Rutgers; Yanming Xie, |
| Taxi Driving Behavior Analysis in Latent Vehicle-to-Vehicle Networks: A Social Influence Perspective Author(s): Tong Xu*, USTC; Hengshu Zhu, Baidu Inc.; Xiangyu Zhao, USTC; Hao Zhong, Rutgers University; Qi Liu, University of Science and Technology of China; Enhong Chen, ; Hui Xiong, Rutgers |
| TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size Author(s): Lorenzo De Stefani*, Brown University; Alessandro Epasto, Brown; Matteo Riondato, Two Sigma Investments; Eli Upfal, Brown University |
| Semi-Markov Switching Vector Autoregressive Model-based Anomaly Detection in Aviation Systems Author(s): Igor Melnyk*, University of Minnesota; Arindam Banerjee, University of Minnesota; Bryan Matthews, Nasa Ames Research Center; Nikunj Oza, Nasa Ames Research Center |
| Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs Author(s): Emaad Manzoor, Stony Brook University; Leman Akoglu*, SUNY Stony Brook |
| Inferring Network Effects from Observational Data Author(s): David Arbour*, University of Massachusetts Am; Dan Garant, University of Massachusetts Amherst; David Jensen, UMass Amherst |
| City-Scale Map Creation and Updating using GPS Collections Author(s): Chen Chen*, Stanford University; Cewu Lu, Stanford University; Qixing Huang, Stanford University; Dimitrios Gunopulos, ; Leonidas Guibas, Stanford University; Qiang Yang, HKUST |
| Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning Author(s): Yue Ning*, Virginia Tech; Sathappan Muthiah, Virginia Tech; Huzefa Rangwala, George Mason University; Naren Ramakrishnan, Virginia Tech |
| Smart Reply: Automated Response Suggestion for Email Author(s): Anjuli Kannan, ; Karol Kurach*, Google; Sujith Ravi, Google; Tobias Kaufmann, Google, Inc.; Andrew Tomkins, ; Balint Miklos, Google, Inc.; Greg Corrado, ; László Lukács, ; Marina Ganea, ; Peter Young, ; Vivek Ramavajjala |
Curated by: Eamonn Keogh
It is in the nature of humans to measure things, and (with rare exceptions) things change over time. A familiar example is a heartbeat, which represents the change in heart's electrical activity. A collection of such temporal measurements are called a “time series”. Other familiar examples include a politician’s popularity waxing and waning, or the temperature rising and falling over both the short term (each day) the medium term (each year) and the long term (climate change drift).
Because such data is ubiquitous, touching almost every aspect of human life, data mining researchers have long paid significant attention to time series. One paradox of time series is that we do not typically care about the individual values in a time series, but only in the shapes, trends and patterns. Therefore, one of the most basic operations one can perform with time series is to ask “are there any other patterns in this dataset that look like this pattern”. This task is called similarity search (or query-by- content). There are two challenges in doing this: How can we do it fast, given the database may be massive, and how can we do it right, given that the patterns may match according to the human eye, but not be exactly the same. Perhaps the first paper to consider this problem was [a], written about 25 years ago. Since then, there have been thousands of papers on the topic, including dozens that have appeared in SIGKDD.
By any standard, the KDD community has made great progress on this problem; early papers searched datasets with only a few thousand objects, more recent papers have conducted searches on datasets with up to a trillion objects . Moreover these ideas have been used to support research in biology, neuroscience, social media, robotics, music and medicine. Similarity search requires that we know what patterns are interesting in advance. A significant advance in time series data mining is introduction of time series motifs [c]. Time series motifs are previously unknown patterns that reoccur in the data. If such patterns repeat, we can assume they are conserved for some reason, and use that observation as a starting point for further research. While these time series data mining technologies may seem obscure, with the advent of wearable devices (smartwatches, fitbit, smartphones, etc) you probably have had your gestures/behaviors classified by one of this algorithms.
Further Resources:
To allow researchers to test and compare time series data mining algorithms, there is a large collection of them at the UCR Time Series Classification Archive www.cs.ucr.edu/~eamonn/time_series_data/
While there is currently no “time series data mining for beginners” book, the more general “Data Mining: The Textbook”, by Charu Aggarwal has an excellent and accessible section on time series.
[a] Rakesh Agrawal, Christos Faloutsos, Arun N. Swami: Efficient Similarity Search In Sequence Databases. FODO 1993: 69-84.
Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, Eamonn Keogh
(2012). Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping SIGKDD 2012.
[c] Abdullah Mueen, Eamonn Keogh, Qiang Zhu, Sydney Cash, Brandon Westover (2009). Exact Discovery of Time Series Motifs. SDM 2009
| Title & Authors |
|---|
| Towards Optimal Cardinality Estimation of Unions and Intersections with Sketches Author(s): Daniel Ting*, Facebook |
| Aircraft Trajectory Prediction made easy with Predictive Analytics Author(s): Samet Ayhan*, University of Maryland; Hanan Samet, University of Maryland |
| Anomaly Detection Using Program Control Flow Graph Mining from Execution Logs Author(s): Animesh Nandi*, IBM Research; Atri Mandal, IBM Research; Shubham Atreja, IIT Kanpur; Gargi Dasgupta, IBM Research; Subhrajit Bhattacharya, IBM Research |
| Temporal Order-based First-Take-All Hashing for Fast Attention-Deficit-Hyperactive-Disorder Detectio Author(s): Hao Hu, University of Central Florida; Joey Velez-Ginorio, University of Central Florida; Guojun Qi*, University of Central Florida |
| Computational Drug Repositioning Using Continuous Self-controlled Case Series Author(s): Zhaobin Kuang, UW-Madison; James Thomson, Morgridge Institute; Michael Caldwell, Marshfield Clinic; Peggy Peissig, ; Ron Stewart, Morgridge Institute; Page David*, University of Wisconsin |
| Lightweight Monitoring of Distributed Streams Author(s): Daniel Keren*, University of Haifa; Assaf Schuster, Technion; Arnon Lazerson, Israeli Institute of technology |
| Smart Reply: Automated Response Suggestion for Email Author(s): Anjuli Kannan, ; Karol Kurach*, Google; Sujith Ravi, Google; Tobias Kaufmann, Google, Inc.; Andrew Tomkins, ; Balint Miklos, Google, Inc.; Greg Corrado, ; László Lukács, ; Marina Ganea, ; Peter Young, ; Vivek Ramavajjala |
| Taxi Driving Behavior Analysis in Latent Vehicle-to-Vehicle Networks: A Social Influence Perspective Author(s): Tong Xu*, USTC; Hengshu Zhu, Baidu Inc.; Xiangyu Zhao, USTC; Hao Zhong, Rutgers University; Qi Liu, University of Science and Technology of China; Enhong Chen, ; Hui Xiong, Rutgers |
| TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size Author(s): Lorenzo De Stefani*, Brown University; Alessandro Epasto, Brown; Matteo Riondato, Two Sigma Investments; Eli Upfal, Brown University |
| Recurrent Marked Temporal Point Processes: Embedding Event History to Vector Author(s): NAN DU*, GEORGIA TECH; Hanjun Dai, ; Rakshit Trivedi, ; Utkarsh Upadhyay, Max Plank Institute; Manuel Gomez-Rodriguez, MPI-SWS; Le Song, |
| Online Asymmetric Active Learning with Imbalanced Data Author(s): Xiaoxuan Zhang*, University of Iowa; Tianbao Yang, Univ of Iowa; Padmini Srinivasan, University of Iowa |
| MANTRA: A Scalable Approach to Mining Temporally Anomalous Sub-trajectories Author(s): Prithu Banerjee*, UBC; Pranali Yawalkar, IIT Madras; Sayan Ranu, IIT Madras |
| Improving Survey Aggregation with Sparsely Represented Signals Author(s): Tianlin Shi, Stanford University; Forest Agostinelli*, Univ of California - Irvine; Matthew Staib, MIT; David Wipf, Microsoft Research; Thomas Moscibroda, Microsoft Research |
| Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning Author(s): Yue Ning*, Virginia Tech; Sathappan Muthiah, Virginia Tech; Huzefa Rangwala, George Mason University; Naren Ramakrishnan, Virginia Tech |
| Burstiness Scale: a highly parsimonious model forcharacterizing random series of events Author(s): Rodrigo Alves*, CEFET-MG; Renato Assunção, DCC-UFMG; Pedro O.S. Vaz de Melo, DCC-UFMG |
| Efficient Frequent Directions Algorithm for Sparse Matrices Author(s): Mina Ghashami*, University of utah; Edo Liberty, Yahoo ; Jeff Phillips, School of Computing, University of Utah |
| Assessing Human Error Against a Benchmark of Perfection Author(s): Ashton Anderson*, Stanford University; Jon Kleinberg, Cornell University; Sendhil Mullainathan, Harvard |
| Data-driven Automatic Treatment Regimen Development and Recommendation Author(s): Leilei Sun*, Dalian University of Technolog; Chuanren Liu, Drexel University; Chonghui Guo, ; Hui Xiong, Rutgers; Yanming Xie, |
| Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs Author(s): Emaad Manzoor, Stony Brook University; Leman Akoglu*, SUNY Stony Brook |
Curated by: H V Jagadish
The development of massively distributed computing infrastructures has changed the economics of data management, and made it possible to apply sophisticated data distillation and learning methods to datasets of unprecedented scale, diversity, and freshness; a technical and social phenomenon that has been dubbed Big Data. The sheer size of the data, of course, is a major challenge, and is the one that is most easily recognized. However, there are others. Industry analysis companies like to point out that there are challenges not just in Volume, but also in Variety and Velocity [See Gartner Group press release available at http://www.gartner.com/it/page.jsp?id=1731916], and that companies should not focus on just the first of these. Variety refers to heterogeneity of data types, representation, and semantic interpretation. Velocity denotes both the rate at which data arrive and the time frame in which they must be acted upon. While these three are important, this short list fails to include additional important requirements. Several additions have been proposed by various parties, such as Veracity. Other concerns, such as privacy and usability, still remain.
The analysis of Big Data is an iterative process that involves many distinct phases, each with its own challenges. An excellent overview is available in a community whitepaper hosted at http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf. A few dozen papers, chosen on account of their coverage and importance, have been collected at http://db.cs.pitt.edu/bigdata/resources .
The papers at this KDD conference on this topic do not disappoint in the breadth of questions asked, from efficiency of algorithm to trust in result. Enjoy!!
| Title & Authors |
|---|
| Positive-Unlabeled Learning in Streaming Networks Author(s): Shiyu Chang*, UIUC; Yang Zhang, UIUC; Jiliang Tang, Yahoo Labs; Dawei Yin, ; Yi Chang, Yahoo! Labs; Mark Hasegawa-Johnson, UIUC; Thomas Huang, UIUC |
| Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs Author(s): Emaad Manzoor, Stony Brook University; Leman Akoglu*, SUNY Stony Brook |
| Multi-layer Representation Learning for Medical Concepts Author(s): Edward Choi*, Georgia Institute of Technolog; Mohammad Bahadori, Georgia Institute of Technology; Jimeng Sun, Georgia Institute of Technology |
| Fast Component Pursuit for Large-Scale Inverse Covariance Estimation Author(s): Lei Han*, Rutgers University; Yu Zhang, Hong Kong University of Science and Technology; Tong Zhang, Rutgers University |
| Crime Rate Inference with Big Data Author(s): Hongjian Wang*, Penn State University; Zhenhui Li, Penn State Univ; Daniel Kifer, PSU; Corina Graif, Penn state university |
| Communication Efficient Distributed Kernel Principal Component Analysis Author(s): Yingyu Liang*, Princeton University; Bo Xie, ; David Woodruff, IBM Research; Le Song, ; Maria-Florina Balcan, |
| Lightweight Monitoring of Distributed Streams Author(s): Daniel Keren*, University of Haifa; Assaf Schuster, Technion; Arnon Lazerson, Israeli Institute of technology |
| FLASH: Fast Bayesian Optimization for Data Analytic Pipelines Author(s): Yuyu Zhang*, Georgia Institute of Technolog; Mohammad Bahadori, Georgia Institute of Technology; Hang Su, Georgia Institute of Technology; Jimeng Sun, Georgia Institute of Technology |
| Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices Author(s): Yasuo Tabei*, JST; Hiroto Saigo, Kyushu Institute of Technology; Yoshihiro Yamanishi, Kyushu University; Simon Puglisi, Helsinki University |
| Evaluating Mobile App Release Author(s): Ya Xu*, LinkedIn Corporation; Nanyu Chen, LinkedIn Corporation |
| Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning Author(s): Yue Ning*, Virginia Tech; Sathappan Muthiah, Virginia Tech; Huzefa Rangwala, George Mason University; Naren Ramakrishnan, Virginia Tech |
| Robust Influence Maximization Author(s): Wei Chen, Microsoft Research; Tian Lin*, Tsinghua University; Zihan Tan, IIIS, Tsinghua University; Mingfei Zhao, IIIS, Tsinghua University; Xuren Zhou, The Hong Kong University of Science and Technology |
| PTE: Enumerating Trillion Triangles On Distributed Systems Author(s): Ha-Myung Park*, KAIST; Sung-Hyon Myaeng, KAIST; U Kang, Seoul National University |
| Scalable Pattern Matching over Compressed Graphs via Dedensification Author(s): Antonio Maccioni*, Roma Tre University; Daniel Abadi, Yale University |
| Scalable Betweenness Centrality Maximization via Sampling Author(s): Ahmad Mahmoody*, Brown University; Eli Upfal, Brown University; Charalampos Tsourakakis, Harvard |
| Scalable Fast Rank-1 Dictionary Learning for fMRI Big Data Analysis Author(s): Xiang Li*, The University of Georgia; Milad Makkie, ; Binbin Lin, ; Mojtaba Sedigh Fazli, ; Ian Davidson, University of California-Davis; Jieping Ye, ; Tianming Liu, ; Shannon Quinn, |
| Boosted Decision Tree Regression Adjustment for Variance Reduction of Online Controlled Experiments Author(s): Alexey Poyarkov, Yandex; Alexey Drutsa*, Yandex; Andrey Khalyavin, Yandex; Gleb Gusev, Yandex; Pavel Serdyukov, Yandex |
| Revisiting Random Binning Feature: Fast Convergence and Strong Parallelizability Author(s): Lingfei Wu*, College of William and Mary; En-Hsu Yen, University of Texas at Austin; Jie Chen, IBM Research; Rui Yan, Baidu Inc. |
| Robust Large-Scale Machine Learning in the Cloud Author(s): Steffen Rendle*, Google; Dennis Fetterly, Google, Inc.; Eugene Shekita, Google, Inc.; Bor-Yiing Su, Google, Inc. |
| Skinny-dip: Clustering in a Sea of Noise Author(s): Samuel Maurus*, Helmholtz Zentrum München; Claudia Plant |
| ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages Author(s): Matteo Riondato*, Two Sigma Investments; Eli Upfal, Brown University |
| Deep Visual-Semantic Hashing for Cross-Modal Retrieval Author(s): Yue Cao, Tsinghua university; Mingsheng Long*, Tsinghua University; Jianmin Wang, Tsinghua University; Qiang Yang, HKUST; Philip Yu, UIC |
| Transfer Knowledge between Cities Author(s): Ying Wei*, Hong Kong Univ. of Sci. & Tech; Yu Zheng, Microsoft Research; Qiang Yang, HKUST |
| Singapore in Motion: insights on public transport service level through farecard and mobile data ana Author(s): Hasan Poonawala*, IBM; Vinay Kolar, IBM; Sebastien Blandin, IBM; Laura Wynter, IBM; Sambit Sahu, IBM |
| Deploying Analytics with the Portable Format for Analytics (PFA) Author(s): Jim Pivarski, Open Data Group Inc.; Collin Bennett, Open Data Group Inc.; Robert Grossman*, University of Chicago |
| Recruitment Market Trend Analysis with Sequential Latent Variable Models Author(s): Chen Zhu*, Baidu hr; Hengshu Zhu, Baidu Inc.; Hui Xiong, Rutgers; ding pengliang, ; xie fang, |
| XGBoost: A Scalable Tree Boosting System Author(s): Tianqi Chen*, University of washington; Carlos Guestrin, Dato/Univ of Washington |
| Distributing the Stochastic Gradient Sampler for Large-Scale LDA Author(s): Yuan Yang*, Beihang University; Jianfei Chen, Tsinghua University; Jun Zhu, |
| Learning Cumulatively to Become More Knowledgeable Author(s): Geli Fei*, Univ of Illinois at Chicago; Shuai Wang, Univ of Illinois at Chicago; Bing Liu, Univ of Illinois at Chicago |
| “Why Should I Trust you?” Explaining the Predictions of Any Classifier Author(s): Marco Tulio Ribeiro*, University of Washington; Sameer Singh, """University of Washington, Seattle"""; Carlos Guestrin, Dato/Univ of Washington |
| Parallel Dual Coordinate Descent Method for Large-scale Linear Classification in Multi-core Environm Author(s): Wei-Lin Chiang, National Taiwan University; Mu-Chu Lee, National Taiwan University; Chih-Jen Lin*, National Taiwan University |
| Safe Pattern Pruning: An Efficient Approach for Predictive Pattern Mining Author(s): Kazuya Nakagawa, Nagoya Institute of Technology; Shinya Suzumura, Nagoya Institute of Technology; Masayuki Karasuyama, ; Koji Tsuda, University of Tokyo; Ichiro Takeuchi*, Nagoya Institute of Technology Japan |
| Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix Author(s): Huizhi Xie*, Netflix; Juliette Aurisset, Netflix |
| Taxi Driving Behavior Analysis in Latent Vehicle-to-Vehicle Networks: A Social Influence Perspective Author(s): Tong Xu*, USTC; Hengshu Zhu, Baidu Inc.; Xiangyu Zhao, USTC; Hao Zhong, Rutgers University; Qi Liu, University of Science and Technology of China; Enhong Chen, ; Hui Xiong, Rutgers |
| Accelerated Stochastic Block Coordinate Descent with Optimal Sampling Author(s): Aston Zhang*, UIUC; Quanquan Gu, University of Virginia |
| Stochastic Optimization Techniques for Quantification Performance Measures Author(s): Harikrishna Narasimhan, IACS, Harvard University; Shuai Li, University of Insubria; Purushottam Kar*, IIT Kanpur; Sanjay Chawla, QCRI-HBKU, Qatar; Fabrizio Sebastiani, QCRI-HBKU, Qatar |
| Fast Unsupervised Online Drift Detection Using Incremental Kolmogorov-Smirnov Test Author(s): Denis Dos Reis*, Universidade de São Paulo; Gustavo Batista, Universidade de Sao Paulo at Sao Carlos; Peter Flach, University of Bristol; Stan Matwin, Dalhousie University |
| Parallel Lasso Screening for Big Data Optimization Author(s): Qingyang Li*, Arizona State University; Shuang Qiu, Umich; Shuiwang Ji, Washington State University; Jieping Ye, University of Michigan at Ann Arbor; Jie Wang, University of Michigan |
| Compressing Graphs and Indexes with Recursive Graph Bisection Author(s): Laxman Dhulipala, Carnegie Mellon University; Igor Kabiljo, Facebook; Brian Karrer, Facebook; Giuseppe Ottaviano, Facebook; Sergey Pupyrev*, Facebook; Alon Shalita, Facebook |
| GLMix: Generalized Linear Mixed Models For Large-Scale Response Prediction Author(s): XianXing Zhang*, LinkedIn; Bee-Chung Chen, LinkedIn; Liang Zhang, LinkedIn; Yitong Zhou, LinkedIn Corporation; Yiming Ma, LinkedIn; Deepak Agarwal, LinkedIn |
| Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned Author(s): Xiaolin Shi*, Yahoo Labs; Alex Deng, Microsoft |
| Regime Shifts in Streams: Real-time Forecasting of Co-evolving Time Sequences Author(s): Yasuko Matsubara*, Kumamoto University; Yasushi Sakurai, Kumamoto University |
| TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size Author(s): Lorenzo De Stefani*, Brown University; Alessandro Epasto, Brown; Matteo Riondato, Two Sigma Investments; Eli Upfal, Brown University |
| Online Asymmetric Active Learning with Imbalanced Data Author(s): Xiaoxuan Zhang*, University of Iowa; Tianbao Yang, Univ of Iowa; Padmini Srinivasan, University of Iowa |
| EMBERS at 4 years:Experiences operating an Open Source Indicators Forecasting System Author(s): Sathappan Muthiah*, Virginia Tech; Naren Ramakrishnan, Virginia Tech; Patrick Butler, Virginia Tech; Rupinder Khandpur, Virginia Tech; PARANG SARAF, VIRGINIA TECH; Anil Vullikanti, Virginia Tech; Achla Marathe, Virginia Tech; Graham Katz, CACI; Andrew Doyle, CACI; Jaime Arredondo, UCSD; Dipak Gupta, SDSU; David Mares, UCSD; Jose Cadena, Virginia Tech; Liang Zhao, VT; Nathan Self, ; Alla Rozovskaya, Virginia Tech; Kristen Summers, IBM |
| Approximate Personalized PageRank on Dynamic Graphs Author(s): Hongyang Zhang*, Stanford University; Peter Lofgren, Stanford University |
| Annealed Sparsity via Adaptive and Dynamic Shrinking Author(s): Kai Zhang*, NEC labs America; Shandian Shan, Purdue University; Zhengzhang Chen, NEC Lab America; Chaoran Cheng, New Jersey Institute of Technology; Zhi Wei, New Jersey Institute of Technology; Guofei Jiang, NEC labs America; Jieping Ye, |
| Dynamic Clustering of Streaming Short Documents Author(s): Shangsong Liang*, University College London; Emine Yilmaz, University College London; Evangelos Kanoulas, University of Amsterdam |
| Towards Optimal Cardinality Estimation of Unions and Intersections with Sketches Author(s): Daniel Ting*, Facebook |
| Rebalancing Bike Sharing Systems: A Multi-source Data Smart Optimization Author(s): Junming Liu, Rutgers University; Leilei Sun, ; Hui Xiong*, Rutgers; Weiwei Chen, |
| Accelerating Online CP Decompositions for Higher Order Tensors Author(s): Shuo Zhou*, University of melbourne; Nguyen Vinh, University of Melbourne; James Bailey, ; Yunzhe Jia, University of Melbourne; Ian Davidson, University of California-Davis |
| Convex Optimization for Linear Query Processing under Approximate Differential Privacy Author(s): Ganzhao Yuan*, SCUT; Yin Yang, ; Zhenjie Zhang, ; Zhifeng Hao, |
| Efficient Frequent Directions Algorithm for Sparse Matrices Author(s): Mina Ghashami*, University of utah; Edo Liberty, Yahoo ; Jeff Phillips, School of Computing, University of Utah |
| Assessing Human Error Against a Benchmark of Perfection Author(s): Ashton Anderson*, Stanford University; Jon Kleinberg, Cornell University; Sendhil Mullainathan, Harvard |
| Temporal Order-based First-Take-All Hashing for Fast Attention-Deficit-Hyperactive-Disorder Detectio Author(s): Hao Hu, University of Central Florida; Joey Velez-Ginorio, University of Central Florida; Guojun Qi*, University of Central Florida |
| Smart Reply: Automated Response Suggestion for Email Author(s): Anjuli Kannan, ; Karol Kurach*, Google; Sujith Ravi, Google; Tobias Kaufmann, Google, Inc.; Andrew Tomkins, ; Balint Miklos, Google, Inc.; Greg Corrado, ; László Lukács, ; Marina Ganea, ; Peter Young, ; Vivek Ramavajjala |
Curated by: Wei Fan
Deep learning attempts to model high-level abstractions in data by using multiple processing layers, with complex structures or otherwise, composed of multiple non-linear transformations. It is part of a broader family of machine learning methods based on learning representations of data. An observation, for example an image, can be represented in many ways such as a vector of intensity values per pixel, or in a more abstract way as a set of edges, regions of particular shape, etc. Some representations are better than others at simplifying the learning task. One of the promises of deep learning is replacing “handcrafting features” with “crafting architectures” by using efficient algorithms for unsupervised or semi-supervised feature learning and hierarchical feature extraction.
A good start of deep learning tutorial can be found at http://deeplearning.stanford.edu/
A resourceful tutorial was given by Hinton, Lecun and Bengio:
http://www.iro.umontreal.ca/~bengioy/talks/DL-Tutorial-NIPS2015.pdf
| Title & Authors |
|---|
| Towards Robust and Versatile Causal Discovery for Business Applications Author(s): Giorgos Borboudakis*, University of Crete; Ioannis Tsamardinos, |
| Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to R Author(s): Corey Lynch, ; Kamelia Aryafar*, Etsy Inc.; Josh Attenberg, Etsy |
| Large-Scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks Author(s): Hyuna Pyo, NAVER LABS; Jung-Woo Ha*, NAVER LABS; Jeonghee Kim, NAVER LABS |
| Causal Clustering for 1-Factor Measurement Models Author(s): Erich Kummerfeld*, University of Pittsburgh; Joseph Ramsey, Carnegie Mellon University |
| Optimal Reserve Prices in Upstream Auctions: Empirical Application on Online Video Advertising Author(s): Miguel Angel Alcobendas Lisbona*, Yahoo Inc; Kuang-chih Lee, Yahoo; Sheide Chammas, Yahoo |
| Inferring Network Effects from Observational Data Author(s): David Arbour*, University of Massachusetts Am; Dan Garant, University of Massachusetts Amherst; David Jensen, UMass Amherst |
| Compressing Convolutional Neural Networks in the Frequency Domain Author(s): Wenlin Chen*, Washington University; James Wilson, University of Edinburgh; Stephen Tyree, NVIDIA; Kilian Weinberger, Cornell; Yixin Chen, |
| Convolutional Neural Networks for Steady Flow Approximation Author(s): Xiaoxiao Guo*, University of Michigan; Wei Li, Autodesk Research; Francesco Iorio, |
| Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features Author(s): Ying Shan*, Microsoft Corporation; Thomas Hoens, Microsoft; Jian Jiao, Microsoft Corporation; Haijing Wang, Microsoft Corporation; Dong Yu, Microsoft Research; JC Mao, Microsoft Corporation |
| Predict Risk of Relapse for Patients with Multiple Stages of Treatment of Depression Author(s): Zhi Nie*, Arizona State University; Pinghua Gong, ; Jieping Ye, University of Michigan at Ann Arbor |
| Interpretable Decision Sets: A Joint Framework for Description and Prediction Author(s): Himabindu Lakkaraju*, Stanford University; Stephen Bach, Stanford University; Jure Leskovec, Stanford University |
| Subjectively Interesting Component Analysis: Data Projections that Contrast with Prior Expectations Author(s): Bo Kang*, Ghent University; Jefrey Lijffijt, Ghent University; Raul Santos-Rodriguez, University of Bristol; Tijl De Bie, Ghen University |
| Robust and Effective Metric Learning Using Capped Trace Norm Author(s): Zhouyuan Huo, University of Texas, Arlington; Feiping Nie, University of Texas at Arlington; Heng Huang*, Univ. of Texas at Arlington |
| A Closed-Loop Approach in Data-Driven Resource Allocation to Improve Network User Experience Author(s): Yanan Bao*, University of California, Davi; Huasen Wu, UC Davis; Xin Liu, UC Davis |
| Multi-Task Feature Interaction Learning Author(s): KAIXIANG LIN*, Michigan State University; Jianpeng Xu, Michigan State University; Shuiwang Ji, Washington State University; Jiayu Zhou, Michigan State University |
| Online Feature Selection: A Limited-Memory Substitution Algorithm and its Asynchronous Parallel Vari Author(s): Haichuan Yang*, University of Rochester; Ryohei Fujimaki, NEC Laboratories America; Yukitaka Kusumura, NEC lab; Ji Liu, University of Rochester |
| Just One More: Modeling Binge Watching Behavior Author(s): William Trouleau, EPFL; Azin Ashkan*, Technicolor; Weicong Ding, Technicolor Research; Brian Eriksson, Technicolor |
Curated by: Christos Faloutsos
Have you ever wondered how Google finds the best page for your question? How would you spot the most important people on faceBook? How would you spot fake followers on Twitter? In a who-contacts- whom network, which is the best nodes to immunize, to stop a flu epidemic?
All these problems, and myriad more, use “graph mining” methods. But graph mining is not restricted to social networks: in computer-to- computer communication networks we want to find whether a computer is under cyber-attack (and protect it, before-hand); in a user-product review system, we want to find fake reviews; in a prey-predator ecological system, we want to find the most important species, to protect the system from unraveling.
Graph mining uses sophisticated mathematical methods (“linear algebra”, “eigenvalue analysis”, “matrix factorizations”, “tensors”), which pay off spectacularly - Google’s PageRank algorithm being the most obvious example.
Link: http://www.cs.cmu.edu/~christos/TALKS/16-graph-mining-intro-kdd/
For K-12
For CS professionals
| Title & Authors |
|---|
| Meta Structure: Computing Relevance in Large Heterogeneous Information Networks Author(s): Zhipeng Huang*, University of Hong Kong; Yudian Zheng, The University of Hong Kong; Reynold Cheng, ; Yizhou Sun, Northeastern Univ; Nikos Mamoulis, ; Xiang Li, The University of Hong Kong |
| Effcient Processing of Network Proximity Queries via Chebyshev Acceleration Author(s): Mustafa Coskun*, Case Western University; Ananth Grama, ; Mehmet Koyuturk, |
| Smart broadcasting: Do you want to be seen? Author(s): Erfan Tavakoli, Sharif University; Mohammad Reza Karimi, Sharif University; Mehrdad Farajtabar, Georgia Tech; Le Song, ; Manuel Gomez-Rodriguez*, MPI-SWS |
| FASCINATE: Fast Cross-Layer Dependency Inference on Multi-layered Networks Author(s): Chen Chen*, Arizona State Unversity; Hanghang Tong, Arizona State University; Lei Xie, City University of New York; Lei YIng, Arizona State University; Qing He, Arizona State University |
| A Truth Discovery Approach with Theoretical Guarantee Author(s): Houping Xiao*, SUNY Buffalo; Jing Gao, ; Zhaoran Wang, Princeton University; Shiyu Wang, UIUC; Lu Su, SUNY Buffalo; Han Liu, Princeton University |
| Finding Gangs in War from Signed Networks Author(s): Lingyang Chu*, Simon Fraser University; Zhefeng Wang, University of Science and Technology of China; Jian Pei, Simon Fraser University; Jiannan Wang, Simon Fraser University; Zijin Zhao, Simon Fraser University; Enhong Chen, |
| How to Compete Online for News Audience: Modeling Words that Attract Clicks Author(s): Joon Hee Kim*, KAIST; Amin Mantrach, Yahoo! Research; Alex Jaimes, Yahoo!; Alice Oh, Korea Advanced Institute of Science and Technology |
| Come-and-Go Patterns of Group Evolution: A Dynamic Model Author(s): Tianyang Zhang*, Tsinghua University; Peng Cui, Tsinghua University; Christos Faloutsos, Carnegie Mellon University; Wenwu Zhu, Tsinghua University; Shiqiang Yang, |
| Compact and Scalable Graph Neighborhood Sketching Author(s): Takuya Akiba*, NII; Yosuke Yano, National Institute of Informatics |
| FINAL: Fast Attributed Network Alignment Author(s): Si Zhang*, Arizona State University; Hanghang Tong, Arizona State University |
| Reconstructing an Epidemic over Time Author(s): Polina Rozenshtein, Aalto University; Aristides Gionis*, Aalto University; B. Aditya Prakash, Virginia Tech; Jilles Vreeken, Max-Planck Institute for Informatics and Saarland University |
| Joint Community and Structural Hole Spanner Detection via Harmonic Modularity Author(s): Lifang He*, ; CHUN-TA LU, UIC; Jiaqi Ma, Tsinghua University; Jianping Cao, NUDT; Linlin Shen, ; Philip S. Yu, UI Chicago |
| When Social Influence Meets Item Inference Author(s): Hui-Ju Hung, Pennsylvania State University; Hong-Han Shuai, Academia Sinica; De-Nian Yang*, Academic Sinica; Liang-Hao Huang, Academia Sinica; Wang-Chien Lee, The Pennsylvania State University; Jian Pei, Simon Fraser University; Ming-Syan Chen, National Taiwan University |
| Burstiness Scale: a highly parsimonious model forcharacterizing random series of events Author(s): Rodrigo Alves*, CEFET-MG; Renato Assunção, DCC-UFMG; Pedro O.S. Vaz de Melo, DCC-UFMG |
| Engagement Capacity and Engaging Team Formation for Reach Maximization of Online Social Media Platfo Author(s): Alexander Nikolaev*, University at Buffalo; Shounak Gore, University at Buffalo; Venu Govindaraju, University at Buffalo |
| Sampling of Attributed Networks From Hierarchical Generative Models Author(s): Pablo Robles Granda*, Purdue University; Sebastian Moreno, ; Jennifer Neville, Purdue |
| node2vec: Scalable Feature Learning for Networks Author(s): Aditya Grover*, Stanford University; Jure Leskovec, Stanford University |
| Structural Neighborhood based Classification of Nodes in a Network Author(s): Sharad Nandanwar*, Indian Institute of Science; Musti Narasimha Murty, Indian Institute of Science |
| Talent Circle Detection in Job Transition Networks Author(s): Huang Xu*, Northwestern Polytechnical Uni; Jingyuan Yang, Rutgers University; zhi wen Yu, ; Hui Xiong, Rutgers; Hengshu Zhu, Baidu Inc. |
| User Identity Linkage by Latent User Space Modelling Author(s): Xin Mu*, Nanjing University; Feida Zhu, Singapore Management Univ.; Zhi-Hua Zhou, ; Ee-Peng Lim, Singapore Management University; Jing Xiao, ; Jianzong Wang, |
| Identifying Decision Makers from Professional Social Networks Author(s): Shipeng Yu*, LinkedIn; Evangelia Christakopoulou, University of Minnesota; Abhishek Gupta, LinkedIn |
| Robust Influence Maximization Author(s): Xinran He*, University of Southern California; David Kempe, University of Southern California |
| QUINT: On Query-Specific Optimal Networks Author(s): Liangyue Li*, Arizona State University; Yuan Yao, Nanjing University; Jie Tang, Tsinghua University; Wei Fan, Baidu; Hanghang Tong, Arizona State University |
| Approximate Personalized PageRank on Dynamic Graphs Author(s): Hongyang Zhang*, Stanford University; Peter Lofgren, Stanford University |
| A multiple test correction for streams and cascades of statistical hypothesis tests Author(s): Geoff Webb*, Monash University; Francois Petitjean, Monash |
| Dynamics of Large Multi-View Social Networks: Synergy, Cannibalization and Cross-View Interplay Author(s): Yu Shi*, UIUC; Myunghwan Kim, LinkedIn Corporation; Shaunak Chatterjee, LinkedIn Corporation; Mitul Tiwari, LinkedIn Corporation; Souvik Ghosh, LinkedIn; Romer Rosales, LinkedIn |
| Kam1n0: MapReduce-based Assembly Clone Search for Reverse Engineering Author(s): Steven H. H. Ding, McGill University; Benjamin C. M. Fung*, McGill University; Philippe Charland, Defence Research and Development Canada |
| Ranking Universities Based on Career Outcomes of Graduates Author(s): Navneet Kapur (GoFundMe), Nikita Lytkin (LinkedIn Corporation), Bee-Chung Chen (LinkedIn Corporation), Deepak Agarwal (LinkedIn Corporation), Igor Perisic (LinkedIn Corporation) |
| The Limits of Popularity-Based Recommendations, and the Role of Social Ties Author(s): Marco Bressan*, Sapienza University of Rome; Stefano Leucci, Sapienza University of Rome; Alessandro Panconesi, Sapienza University of Rome; Prabhakar Raghavan, Google; Erisa Terolli, Sapienza University of Rome |
| Robust Influence Maximization Author(s): Wei Chen, Microsoft Research; Tian Lin*, Tsinghua University; Zihan Tan, IIIS, Tsinghua University; Mingfei Zhao, IIIS, Tsinghua University; Xuren Zhou, The Hong Kong University of Science and Technology |
| PTE: Enumerating Trillion Triangles On Distributed Systems Author(s): Ha-Myung Park*, KAIST; Sung-Hyon Myaeng, KAIST; U Kang, Seoul National University |
| Scalable Betweenness Centrality Maximization via Sampling Author(s): Ahmad Mahmoody*, Brown University; Eli Upfal, Brown University; Charalampos Tsourakakis, Harvard |
| ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages Author(s): Matteo Riondato*, Two Sigma Investments; Eli Upfal, Brown University |
| CatchTartan: Representing and Summarizing Dynamic Multicontextual Behaviors Author(s): Meng Jiang*, UIUC; Christos Faloutsos, Carnegie Mellon University; Jiawei Han, University of Illinois at Urbana-Champaign |
| Structural Deep Network Embedding Author(s): DAIXIN WANG*, TSINGHUA UNIVERSITY; Peng Cui, Tsinghua University; Wenwu Zhu, Tsinghua University |
| Graph Wavelets via Sparse Cuts Author(s): Arlei Lopes da Silva*, UC, Santa Barbara; Xuan-Hong Dang, UCSB; Prithwish Basu, Raytheon BBN; Ambuj Singh, UCSB; Ananthram Swami, Army Lab |
| Diversified Temporal Subgraph Pattern Mining Author(s): Yi Yang, Fudan University; Da Yan, CUHK; Huanhuan Wu, CUHK; James Cheng*, CUHK; Shuigeng Zhou, Fudan University; John C.S. Lui, The Chinese University of Hong Kong |
| Asymmetric Transitivity Preserving Graph Embedding Author(s): Mingdong Ou*, Tsinghua University; Peng Cui, Tsinghua University; Jian Pei, Simon Fraser University; Wenwu Zhu, Tsinghua University |
| Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs Author(s): Emaad Manzoor, Stony Brook University; Leman Akoglu*, SUNY Stony Brook |
| Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations Author(s): Wei Cheng*, NEC Labs America; Kai Zhang, NEC labs America; Haifeng Chen, NEC Research Lab; Guofei Jiang, NEC labs America; Wei Wang, UC Los Angeles |
| Compressing Graphs and Indexes with Recursive Graph Bisection Author(s): Laxman Dhulipala, Carnegie Mellon University; Igor Kabiljo, Facebook; Brian Karrer, Facebook; Giuseppe Ottaviano, Facebook; Sergey Pupyrev*, Facebook; Alon Shalita, Facebook |
| Positive-Unlabeled Learning in Streaming Networks Author(s): Shiyu Chang*, UIUC; Yang Zhang, UIUC; Jiliang Tang, Yahoo Labs; Dawei Yin, ; Yi Chang, Yahoo! Labs; Mark Hasegawa-Johnson, UIUC; Thomas Huang, UIUC |
| Inferring Network Effects from Observational Data Author(s): David Arbour*, University of Massachusetts Am; Dan Garant, University of Massachusetts Amherst; David Jensen, UMass Amherst |
| TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size Author(s): Lorenzo De Stefani*, Brown University; Alessandro Epasto, Brown; Matteo Riondato, Two Sigma Investments; Eli Upfal, Brown University |
| FRAUDAR: Bounding Graph Fraud in the Face of Camouflage Author(s): Bryan Hooi*, Carnegie Mellon University; Hyun Ah Song, Carnegie Mellon University; Alex Beutel, Carnegie Mellon University; Neil Shah, Carnegie Mellon University; Kijung Shin, Carnegie Mellon University; Christos Faloutsos, Carnegie Mellon University |
Curated by: Yehuda Koren
Recommender systems assist users in selecting products or services most suitable to their tastes and needs. With the rapid growth of web content supply and of online item catalogs, the personalized advice offered by recommenders is vital. This, together with the widening availability of user data, has contributed to a vast interest in recommendation technologies.
The nature and quality of a recommender is greatly affected by the kind of signals it takes as an input. Consequently, recommendation technologies are broadly divided into two types: (1) collaborative filtering based on analyzing past user activities like explicit rating of items, or implicit indication of preference such as clicks, purchases etc. (2) Content-based filtering which determines preferences by generalizing predefined item and user attributes like text, tags, genres, demographics, etc. Generally speaking, content-based methods are preferred for combating cold start scenarios when little activity is recorded on an item or a user. Yet, as more activity data is becoming available, collaborative filtering is gaining an edge by being more accurate. Real life situations usually target both new and familiar users and items, which calls for hybrid recommenders that combine collaborative and content filtering.
A recommendation is only as good as the information it holds on the user. Therefore, recent trends in recommendation technology strive for a more complete understanding of the user needs. This involves context aware recommenders that adapt to the particular given context, accounting for current time, location and user need. Some other systems use transfer learning methodologies for extending the user profile by also considering activities in different domains. Another approach for enhancing the user modeling is by following active learning practices and eliciting ratings and preferences from the user.
The design of a recommender involves the optimization of multiple, sometimes conflicting, objectives. Systems can target a point-wise error between each predicted value and the believed ground truth. However, more recent system opt to ranking metrics which emphasize the quality of the few items present at the top of the suggested list. Other tradeoffs that influence the nature of the recommended items are narrow accuracy versus broader diversity of item types, as well as staying with well understood and safe popular items versus admitting riskier long-tail items offering the potential of enhancing perceived serendipity.
In summary, recommendation system is a booming field, merging disciplines like data mining and machine learning, human-computer interaction, system scaling and more. It offers both scientific opportunities and practical engineering challenges. Practitioners who are interested in deeper knowledge are invited to visit the public resources listed below.
| Title & Authors |
|---|
| Scalable Time-Decaying Adaptive Prediction Algorithm Author(s): Yinyan Tan*, Huawei; Zhe Fan, ; Guilin Li, ; Fangshan Wang, ; Zhengbing Li, ; Shikai Liu, ; Qiuling Pan, ; Eric Xing, CMU; Qirong Ho, |
| Online Context-Aware Recommendation with Time Varying Multi-Arm Bandit Author(s): Chunqiu Zeng*, Florida International University; Qing Wang, Florida International Univ.; Tao Li, Florida International Univ; Shekoofeh Mokhtari, Florida International University |
| Towards Conversational Recommender Systems Author(s): Konstantina Christakopoulou*, University of Minnesota; Katja Hofmann, Microsoft; Filip Radlinski, Microsoft |
| Point-of-Interest Recommendations: Learning Potential Check-ins from Friends Author(s): Yong Ge, UNC Charlotte; Huayu Li*, University of North Carolina a; Hengshu Zhu, Baidu Inc. |
| Collaborative Knowledge Base Embedding for Recommender Systems Author(s): Fuzheng Zhang*, Microsoft; Nicholas Jing Yuan, Microsoft Research; Defu Lian, ; Xing Xie, Microsoft Research; Wei-Ying Ma, |
| An Empirical Study on Recommendation with Multiple Types of Feedback Author(s): Liang Tang*, LinkedIn Corp.; Bo Long, LinkedIn; Bee-Chung Chen, LinkedIn; Deepak Agarwal, LinkedIn |
| Compute Job Memory Recommender System Using Machine Learning Author(s): Taraneh Taghavi*, Qualcomm Inc.; Maria Lupetini, Qualcomm Inc.; Yaron Kretchmer, Qualcomm Inc. |
| From Online Behaviors to Offline Retailing Author(s): Ping Luo*, Chinese Academy of Sciences |
| Goal-Directed Inductive Matrix Completion Author(s): Si Si*, Ut austin; Kai-Yang Chiang, UT Austin; Cho-Jui Hsieh, UT Austin; Nikhil Rao, Technicolor Research; Inderjit Dhillon, UTexas |
| CaSMoS: A Framework for Learning Candidate Selection Models over Structured Queries and Documents Author(s): Fedor Borisyuk*, LinkedIn; Krishnaram Kenthapadi, LinkedIn Corporation; David Stein, LinkedIn Corporation; Bo Zhao, LinkedIn Corporation |
| When Recommendation Goes Wrong - Anomalous Link Discovery in Recommendation Networks Author(s): Bryan Perozzi*, Stony Brook University |
| Contextual Intent Tracking for Personal Assistants Author(s): Yu Sun*, University of Melbourne; Nicholas Jing Yuan, Microsoft Research; Yingzi Wang, Microsoft Research; Xing Xie, Microsoft Research; Kieran McDonald, Microsoft Corporation; Rui Zhang, University of Melbourne |
| Minimizing Legal Exposure for High-Tech Companies through Collaborative Filtering Methods Author(s): Bo Jin*, Dalian University of Technology; Chao Che, Dalian University; Kuifei Yu, Zhigu Inc.; Yue Qu, Dalian University of Technology; Li Guo, Dalian University of Technology; Cuili Yao, Dalian University of Technology |
| The Limits of Popularity-Based Recommendations, and the Role of Social Ties Author(s): Marco Bressan*, Sapienza University of Rome; Stefano Leucci, Sapienza University of Rome; Alessandro Panconesi, Sapienza University of Rome; Prabhakar Raghavan, Google; Erisa Terolli, Sapienza University of Rome |
| The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain Recommendation Author(s): BEIDOU WANG*, Simon Fraser University; Martin Ester, Simon Fraser University; Yikang Liao, Zhejiang University; Jiajun Bu, Zhejiang University; Yu Zhu, Zhejiang University; Deng Cai, ; Ziyu Guan, |
| Continuous Experience-aware Language Model Author(s): Subhabrata Mukherjee*, Max Planck Informatics; Stephan Günnemann, Technical University of Munich; Gerhard Weikum, Max Planck Institute for Informatics |
| Unified Point-of-Interest Recommendation with Temporal Interval Assessment Author(s): Yanchi Liu*, Rutgers University; Chuanren Liu, Drexel University; Bin Liu, Rutgers University; Meng Qu, Rutgers University; Hui Xiong, Rutgers |
| Assessing Human Error Against a Benchmark of Perfection Author(s): Ashton Anderson*, Stanford University; Jon Kleinberg, Cornell University; Sendhil Mullainathan, Harvard |
| Positive-Unlabeled Learning in Streaming Networks Author(s): Shiyu Chang*, UIUC; Yang Zhang, UIUC; Jiliang Tang, Yahoo Labs; Dawei Yin, ; Yi Chang, Yahoo! Labs; Mark Hasegawa-Johnson, UIUC; Thomas Huang, UIUC |
| Smart Reply: Automated Response Suggestion for Email Author(s): Anjuli Kannan, ; Karol Kurach*, Google; Sujith Ravi, Google; Tobias Kaufmann, Google, Inc.; Andrew Tomkins, ; Balint Miklos, Google, Inc.; Greg Corrado, ; László Lukács, ; Marina Ganea, ; Peter Young, ; Vivek Ramavajjala |
Curated by: Aarti Singh
In modern datasets, the dimensionality of the input data is typically too large to measure, store, compute, transmit, visualize or interpret. This necessitates dimensionality reduction methods that can identify few of the most relevant dimensions. Dimensionality reduction methods can be categorized into feature selection methods that aim to select a subset from given features (aka coordinates, attributes or dimensions) that are most relevant and feature extraction methods that aim to identify few transformations of the given features that are most relevant. Feature extraction methods can yield more parsimonious representations than feature selection methods, however the latter lead to interpretable solutions e.g. which genes are most representative (or predictive of a disease), instead of transformations of gene expressions that are most representative (or predictive).
Feature selection based dimensionality reduction methods can be unsupervised such as uniform subsampling, column subset selection (set of columns that provide best rank-k approximation to the data matrix, e.g. using column norm or leverage score sampling), or supervised methods that score individual features based on their dependence on the output variable such as correlation, mutual information, c^2 dependence, etc. Choosing the best subset of size k is typically a combinatorial problem and hence computationally infeasible. Standard approaches either select top-k scoring features or iteratively score features extracting one at each of the k rounds. In the last decade, l1 penalized methods have become popular that balance a cost function with a convex “sparsity” penalty for selecting more features. These methods and include both supervised approaches such as l1 penalized regression and classification, and unsupervised methods such as l1 penalized clustering, as well as methods such as learning graphical model structure, which can be both supervised or unsupervised. The l1 penalized methods arecomputationally efficient and achieve nearly the same performance as combinatorial solutions under some assumptions. However, these have been mostly successful in linear settings only and optimal nonlinear feature selection remains open.
Feature extraction based dimensionality methods form few linear or nonlinear transformation of the original features, thus embedding the points in a low-dimensional space. Linear methods include random projections and Multi Dimensional Scaling (MDS) that seek to preserve pairwise distances, Principal Component Analysis (PCA) that maximizes covariance, Canonical Correlation Analysis (CCA) that maximizes cross-covariance, and Independent Component Analysis (ICA). Nonlinear methods include kernelized PCA or CCA, as well as methods such as Isomap, Locally Linear Embedding (LLE), Laplacian and Diffusion eigenmaps, etc. that preserve local structure (local distances or subspaces). The linear methods can be effectively combined with l1 penalties to yield feature selection along with feature extraction, i.e. find embeddings that only involve a transformation of few original features. Examples include sparse PCA, sparse CCA, etc. Dimensionality reduction can also be achieved by extracting features relevant for prediction in a supervised manner, examples of this include linear discriminant analysis, topic modeling, and hidden layers in neural networks.
Reference
Dimension Reduction: A Guided Tour. Christopher J. C. Burges, Foundations and Trends in Machine Learning, Vol. 2, No. 4 (2009) 275–365.
A survey of dimension reduction techniques. I. Fodor, Center for Applied Scientific Computing, Lawrence Livermore National, Technical Report UCRL-ID- 148494, (2002).
Linear Dimensionality Reduction: Survey, Insights, and Generalizations. John P. Cunningham and Zoubin Ghahramani. Journal of Machine Learning Research, Vol. 16 (2015) 2859-2900.
Feature selection for classification: A review . Jiliang Tang, Salem Alelyani, and Huan Liu. Data Classification: Algorithms and Applications, (2014) page 37.
| Title & Authors |
|---|
| Dynamic Clustering of Streaming Short Documents Author(s): Shangsong Liang*, University College London; Emine Yilmaz, University College London; Evangelos Kanoulas, University of Amsterdam |
| Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns Author(s): Roel Bertens*, Universiteit Utrecht; Jilles Vreeken, Max-Planck Institute for Informatics and Saarland University; Arno Siebes, |
| Structured Doubly Stochastic Matrix for Graph Based Clustering Author(s): Xiaoqian Wang, Univ. of Texas at Arlington; Feiping Nie, University of Texas at Arlington; Heng Huang*, Univ. of Texas at Arlington |
| Mining Subgroups with Exceptional Transition Behavior Author(s): Florian Lemmerich*, Gesis; Martin Becker, University of Würzburg; Philipp Singer, Gesis; Denis Helic, TU Graz; Andreas Hotho, University of Wuerzburg; Markus Strohmaier, |
| Rebalancing Bike Sharing Systems: A Multi-source Data Smart Optimization Author(s): Junming Liu, Rutgers University; Leilei Sun, ; Hui Xiong*, Rutgers; Weiwei Chen, |
| Continuous Experience-aware Language Model Author(s): Subhabrata Mukherjee*, Max Planck Informatics; Stephan Günnemann, Technical University of Munich; Gerhard Weikum, Max Planck Institute for Informatics |
| MANTRA: A Scalable Approach to Mining Temporally Anomalous Sub-trajectories Author(s): Prithu Banerjee*, UBC; Pranali Yawalkar, IIT Madras; Sayan Ranu, IIT Madras |
| Streaming-LDA: A Copula-based Approach to Modeling Topic Dependencies in Document Streams Author(s): Hesam Amoualian*, University Grenoble Alps; Marianne Clausel, University of Grenoble Alps; Eric Gaussier, University of Grenoble Alps; Massih-Reza Amini, University of Grenoble Alps |
| Bayesian Inference of Arrival Rate and Substitution Behavior from Sales Transaction Data with Stocko Author(s): Ben Letham*, Facebook; Lydia Letham, MIT; Cynthia Rudin, MIT |
| Efficient Shift-Invariant Dictionary Learning Author(s): Guoqing Zheng*, Carnegie Mellon University; Yiming Yang, ; Jaime Carbonell, |
| Hierarchical Incomplete Multi-source Feature Learning for Spatiotemporal Event Forecasting Author(s): Liang Zhao*, VT; Jieping Ye, University of Michigan at Ann Arbor; Feng Chen, SUNY Albany; Chang-Tien Lu, Virginia Tech; Naren Ramakrishnan, Virginia Tech |
| Generalized Hierarchical Sparse Model for Arbitrary-Order Interactive Antigenic Sites Identification Author(s): Lei Han*, Rutgers University; Yu Zhang, Hong Kong University of Science and Technology; Xiu-Feng Wan, Mississippi State University; Tong Zhang, Rutgers University |
| Accelerating Online CP Decompositions for Higher Order Tensors Author(s): Shuo Zhou*, University of melbourne; Nguyen Vinh, University of Melbourne; James Bailey, ; Yunzhe Jia, University of Melbourne; Ian Davidson, University of California-Davis |
| Targeted Topic Modeling for Focused Analysis Author(s): Shuai Wang*, University of Illinois at Chicago; Zhiyuan Chen, UIC; Geli Fei, Univ of Illinois at Chicago; Bing Liu, Univ of Illinois at Chicago; Sherry Emery, University of Illinois at Chicago |
| Infinite Ensemble for Image Clustering Author(s): Hongfu Liu*, Northeastern University; Ming Shao, Northeastern University; Sheng Li, Northeastern University; Yun Fu, Northeastern University |
| Data-driven Automatic Treatment Regimen Development and Recommendation Author(s): Leilei Sun*, Dalian University of Technolog; Chuanren Liu, Drexel University; Chonghui Guo, ; Hui Xiong, Rutgers; Yanming Xie, |
| FUSE: Full Spectral Clustering Author(s): Wei Ye*, University of Munich; Sebastian Goebl, University of Munich; Claudia Plant, ; Christian Boehm, University of Munich |
| Overcoming key weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilari Author(s): Ting Kai Ming*, Federation University; YE ZHU, Monash University; Mark Carman, Monash University; Yue Zhu, Nanjing University |
| City-Scale Map Creation and Updating using GPS Collections Author(s): Chen Chen*, Stanford University; Cewu Lu, Stanford University; Qixing Huang, Stanford University; Dimitrios Gunopulos, ; Leonidas Guibas, Stanford University; Qiang Yang, HKUST |
| A Subsequence Interleaving Model for Sequential Pattern Mining Author(s): Jaroslav Fowkes*, University of Edinburgh; Charles Sutton, University of Edinburgh |
| Skinny-dip: Clustering in a Sea of Noise Author(s): Samuel Maurus*, Helmholtz Zentrum München; Claudia Plant |
| How to Compete Online for News Audience: Modeling Words that Attract Clicks Author(s): Joon Hee Kim*, KAIST; Amin Mantrach, Yahoo! Research; Alex Jaimes, Yahoo!; Alice Oh, Korea Advanced Institute of Science and Technology |
| Distributing the Stochastic Gradient Sampler for Large-Scale LDA Author(s): Yuan Yang*, Beihang University; Jianfei Chen, Tsinghua University; Jun Zhu, |
| AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets Author(s): Son Mai*, Aarhus University; Ira Assent, ; Martin Storgaard, Aarhus University |
| node2vec: Scalable Feature Learning for Networks Author(s): Aditya Grover*, Stanford University; Jure Leskovec, Stanford University |
| A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization Author(s): Jianhua Yin*, Tsinghua University; Jianyong Wang, |
| Goal-Directed Inductive Matrix Completion Author(s): Si Si*, Ut austin; Kai-Yang Chiang, UT Austin; Cho-Jui Hsieh, UT Austin; Nikhil Rao, Technicolor Research; Inderjit Dhillon, UTexas |
| Positive-Unlabeled Learning in Streaming Networks Author(s): Shiyu Chang*, UIUC; Yang Zhang, UIUC; Jiliang Tang, Yahoo Labs; Dawei Yin, ; Yi Chang, Yahoo! Labs; Mark Hasegawa-Johnson, UIUC; Thomas Huang, UIUC |
| Online Feature Selection: A Limited-Memory Substitution Algorithm and its Asynchronous Parallel Vari Author(s): Haichuan Yang*, University of Rochester; Ryohei Fujimaki, NEC Laboratories America; Yukitaka Kusumura, NEC lab; Ji Liu, University of Rochester |
| Subjectively Interesting Component Analysis: Data Projections that Contrast with Prior Expectations Author(s): Bo Kang*, Ghent University; Jefrey Lijffijt, Ghent University; Raul Santos-Rodriguez, University of Bristol; Tijl De Bie, Ghen University |
| Safe Pattern Pruning: An Efficient Approach for Predictive Pattern Mining Author(s): Kazuya Nakagawa, Nagoya Institute of Technology; Shinya Suzumura, Nagoya Institute of Technology; Masayuki Karasuyama, ; Koji Tsuda, University of Tokyo; Ichiro Takeuchi*, Nagoya Institute of Technology Japan |
Curated by: Varun Chandola and Vipin Kumar
Anomalies are the unusual, unexpected, surprising patterns in the observed world. Identifying, understanding, and predicting anomalies from data form one of the key pillars of modern data mining. Ective detection of anomalies allows extracting critical information from data which can then be used for a variety of applications, such as to stop malicious intruders, detect and repair faults in complex systems, and better understand the behavior of natural, social, and engineered systems.
Anomaly detection refers to the problem of ending anomalies in data. While anomaly is a generally accepted term, other synonyms, such as outliers, discordant observations, exceptions, aberrations, surprises, peculiarities or contaminants, are often used in different application domains. In particular, anomalies and outliers are often used interchangeably. Anomaly detection finds extensive use in a wide variety of applications such as fraud detection for credit cards, insurance or health care, intrusion detection for cyber-security, fault detection in safety critical systems, and military surveillance for enemy activities. The importance of anomaly detection stems from the fact that for a variety of application domains anomalies in data often translate to significant (and often critical) actionable insights. For example, an anomalous traffic pattern in a computer network could mean that a hacked computer is sending out sensitive data to an unauthorized destination. An anomalous remotely sensed weather variable such as temperature could imply a heat wave or cold snap, or even faulty remote sensing equipment. An anomalous MRI image may indicate early signs of Alzheimer’s or presence of malignant tumors. Anomalies in credit card transaction data could indicate credit card or identity theft or anomalous readings from a space craft sensor could signify a fault in some component of the space craft.
Important links:
1. Anomaly Detection: A Survey, Varun Chandola, Arindam Banerjee and Vipin Kumar, ACM Computing Surveys (http://dl.acm.org/citation.cfm?id=1541882)
2. Outlier Analysis, Charu Aggarwal, Springer (http://www.amazon.com/Outlier-Analysis-Charu- C-Aggarwal/dp/1461463955)
3. Anomaly Detection: A Tutorial, Sanjay Chawla and Varun Chandola, ICDM 2011
(http://webdocs.cs.ualberta.ca/~icdm2011/downloads/ICDM2011_anomaly_detection_tutorial.pdf)
4. Data Mining for Anomaly Detection, Tutorial at ECML PKDD 2008
(http://videolectures.net/ecmlpkdd08_lazarevic_dmfa/)
| Title & Authors |
|---|
| Catch Me If You Can: Detecting Pickpocket Suspects from Large-Scale Transit Records Author(s): Bowen Du*, Beihang University; Chuanren Liu, Drexel University; Wenjun Zhou, U of Tennessee; Hui Xiong, Rutgers |
| Assessing Human Error Against a Benchmark of Perfection Author(s): Ashton Anderson*, Stanford University; Jon Kleinberg, Cornell University; Sendhil Mullainathan, Harvard |
| Semi-Markov Switching Vector Autoregressive Model-based Anomaly Detection in Aviation Systems Author(s): Igor Melnyk*, University of Minnesota; Arindam Banerjee, University of Minnesota; Bryan Matthews, Nasa Ames Research Center; Nikunj Oza, Nasa Ames Research Center |
| Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning Author(s): Yue Ning*, Virginia Tech; Sathappan Muthiah, Virginia Tech; Huzefa Rangwala, George Mason University; Naren Ramakrishnan, Virginia Tech |
Curated by: Xifeng Yan
Frequent patterns are itemsets, subsequences, or substructures that appear in a data set with frequency no less than a user-specified threshold. For example, a set of items, such as milk and bread, that appear frequently together in a transaction data set, is a frequent itemset. A subsequence, such as buying first a PC, then a digital camera, and then a memory card, if it occurs frequently in a shopping history database, is a (frequent) sequential pattern. A substructure can refer to different structural forms, such as subgraphs, subtrees, or sublattices, which may be combined with itemsets or subsequences. If a substructure occurs frequently in a graph database, it is called a (frequent) structural pattern. Finding frequent patterns plays an essential role in mining associations, correlations, and many other interesting relationships among data. Moreover, it helps in data indexing, classification, clustering, and other data mining tasks as well. Frequent pattern mining is an important data mining task and a focused theme in data mining research. Abundant literature has been dedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemset mining in transaction databases to numerous research frontiers, such as sequential pattern mining, structured pattern mining, correlation mining, associative classification, and frequent pattern-based clustering, as well as their broad applications [1]. A few text books are available on this topic, e.g., [2].
[1] Frequent Pattern Mining: Current Status and Future Directions, by J. Han, H. Cheng, D. Xin and X. Yan, 2007 Data Mining and Knowledge Discovery archive, Vol. 15 Issue 1, pp. 55 – 86, 2007
[2] Frequent Pattern Mining, Ed. Charu Aggarwal and Jiawei Han, Springer, 2014.
| Title & Authors |
|---|
| Online Feature Selection: A Limited-Memory Substitution Algorithm and its Asynchronous Parallel Vari Author(s): Haichuan Yang*, University of Rochester; Ryohei Fujimaki, NEC Laboratories America; Yukitaka Kusumura, NEC lab; Ji Liu, University of Rochester |
| Multi-Task Feature Interaction Learning Author(s): KAIXIANG LIN*, Michigan State University; Jianpeng Xu, Michigan State University; Shuiwang Ji, Washington State University; Jiayu Zhou, Michigan State University |
| Lexis: An Optimization Framework for Discovering the Hierarchical Structure of Sequential Data Author(s): Payam Siyari*, Georgia Institute of Technology; Bistra Dilkina, Georgia Tech; Constantine Dovrolis, Georgia Institute of Technology |
| Just One More: Modeling Binge Watching Behavior Author(s): William Trouleau, EPFL; Azin Ashkan*, Technicolor; Weicong Ding, Technicolor Research; Brian Eriksson, Technicolor |
| DeepIntent: Learning Attentions for Online Advertising with Recurrent Neural Networks Author(s): Shuangfei Zhai*, Binghamton University; Keng-hao Chang, Microsoft; Ruofei Zhang, Microsoft; Zhongfei Zhang, |
| Analyzing Volleyball Match Data from the 2014 World Championships Using Machine Learning Techniques Author(s): Jan Van Haaren*, KU Leuven; Horesh Ben Shitrit, PlayfulVision; Jesse Davis, KU Leuven; Pascal Fua, EPFL |
| Annealed Sparsity via Adaptive and Dynamic Shrinking Author(s): Kai Zhang*, NEC labs America; Shandian Shan, Purdue University; Zhengzhang Chen, NEC Lab America; Chaoran Cheng, New Jersey Institute of Technology; Zhi Wei, New Jersey Institute of Technology; Guofei Jiang, NEC labs America; Jieping Ye, |
| Causal Clustering for 1-Factor Measurement Models Author(s): Erich Kummerfeld*, University of Pittsburgh; Joseph Ramsey, Carnegie Mellon University |
| Efficient Frequent Directions Algorithm for Sparse Matrices Author(s): Mina Ghashami*, University of utah; Edo Liberty, Yahoo ; Jeff Phillips, School of Computing, University of Utah |
| Inferring Network Effects from Observational Data Author(s): David Arbour*, University of Massachusetts Am; Dan Garant, University of Massachusetts Amherst; David Jensen, UMass Amherst |
| Generalized Hierarchical Sparse Model for Arbitrary-Order Interactive Antigenic Sites Identification Author(s): Lei Han*, Rutgers University; Yu Zhang, Hong Kong University of Science and Technology; Xiu-Feng Wan, Mississippi State University; Tong Zhang, Rutgers University |
| Subjectively Interesting Component Analysis: Data Projections that Contrast with Prior Expectations Author(s): Bo Kang*, Ghent University; Jefrey Lijffijt, Ghent University; Raul Santos-Rodriguez, University of Bristol; Tijl De Bie, Ghen University |
| Predict Risk of Relapse for Patients with Multiple Stages of Treatment of Depression Author(s): Zhi Nie*, Arizona State University; Pinghua Gong, ; Jieping Ye, University of Michigan at Ann Arbor |
| Interpretable Decision Sets: A Joint Framework for Description and Prediction Author(s): Himabindu Lakkaraju*, Stanford University; Stephen Bach, Stanford University; Jure Leskovec, Stanford University |
| Robust and Effective Metric Learning Using Capped Trace Norm Author(s): Zhouyuan Huo, University of Texas, Arlington; Feiping Nie, University of Texas at Arlington; Heng Huang*, Univ. of Texas at Arlington |
| A Closed-Loop Approach in Data-Driven Resource Allocation to Improve Network User Experience Author(s): Yanan Bao*, University of California, Davi; Huasen Wu, UC Davis; Xin Liu, UC Davis |
| Towards Robust and Versatile Causal Discovery for Business Applications Author(s): Giorgos Borboudakis*, University of Crete; Ioannis Tsamardinos, |
Curated by: Ian Davidson
Cluster analysis or clustering aims to take a collection of objects and divide them into a number of different groups such that instances in the same group (cluster) are similar to each other and dis-similar to those in other groups/clusters. It is extensively used in many domains including image analysis, information retrieval and bioinformatics. Clustering is traditionally inherently exploratory in that it takes no human guidance and aims to uncover the underlying structure in the data. Recent innovations include adding supervision (semi-supervised clustering), constraints (constrained clustering) and extensions to handle complex data such as graphs, evolving data and multi-view data.
The survey of classic methods is given in [1] with a perspective on challenge and directions given in [2]. A talk based on [2] is freely available: http://videolectures.net/ecmlpkdd08_jain_dcyb/?q=anil%20jain
Lesson 2 of the this MOOC covers many traditional clustering methods https://www.class-central.com/mooc/1848/udacity-machine-learning-unsupervised-learning.
[1] Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. “Data clustering: a review.” ACM computing surveys (CSUR) 31.3 (1999): 264-323.
[2] Jain, Anil K. “Data clustering: 50 years beyond K-means.” Pattern recognition letters 31.8 (2010): 651-666.
| Title & Authors |
|---|
| City-Scale Map Creation and Updating using GPS Collections Author(s): Chen Chen*, Stanford University; Cewu Lu, Stanford University; Qixing Huang, Stanford University; Dimitrios Gunopulos, ; Leonidas Guibas, Stanford University; Qiang Yang, HKUST |
| Efficient Frequent Directions Algorithm for Sparse Matrices Author(s): Mina Ghashami*, University of utah; Edo Liberty, Yahoo ; Jeff Phillips, School of Computing, University of Utah |
| Batch model for batched timestamps data analysis with application to the SSA disability program Author(s): Qingqi Yue*, NIH; Ao Yuan, NIH; Xuan Che, NIH; Elizabeth Rasch, NIH; Minh Huynh, Impaq; Chunxiao Zhou, NIH |
| AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets Author(s): Son Mai*, Aarhus University; Ira Assent, ; Martin Storgaard, Aarhus University |
| A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization Author(s): Jianhua Yin*, Tsinghua University; Jianyong Wang, |
| Structured Doubly Stochastic Matrix for Graph Based Clustering Author(s): Xiaoqian Wang, Univ. of Texas at Arlington; Feiping Nie, University of Texas at Arlington; Heng Huang*, Univ. of Texas at Arlington |
| Infinite Ensemble for Image Clustering Author(s): Hongfu Liu*, Northeastern University; Ming Shao, Northeastern University; Sheng Li, Northeastern University; Yun Fu, Northeastern University |
| Data-driven Automatic Treatment Regimen Development and Recommendation Author(s): Leilei Sun*, Dalian University of Technolog; Chuanren Liu, Drexel University; Chonghui Guo, ; Hui Xiong, Rutgers; Yanming Xie, |