Time Series and Stream Mining
Curated by: Eamonn Keogh
It is in the nature of humans to measure things, and (with rare exceptions) things change over time. A familiar example is a heartbeat, which represents the change in heart's electrical activity. A collection of such temporal measurements are called a “time series”. Other familiar examples include a politician’s popularity waxing and waning, or the temperature rising and falling over both the short term (each day) the medium term (each year) and the long term (climate change drift).
Because such data is ubiquitous, touching almost every aspect of human life, data mining researchers have long paid significant attention to time series. One paradox of time series is that we do not typically care about the individual values in a time series, but only in the shapes, trends and patterns. Therefore, one of the most basic operations one can perform with time series is to ask “are there any other patterns in this dataset that look like this pattern”. This task is called similarity search (or query-by- content). There are two challenges in doing this: How can we do it fast, given the database may be massive, and how can we do it right, given that the patterns may match according to the human eye, but not be exactly the same. Perhaps the first paper to consider this problem was [a], written about 25 years ago. Since then, there have been thousands of papers on the topic, including dozens that have appeared in SIGKDD.
By any standard, the KDD community has made great progress on this problem; early papers searched datasets with only a few thousand objects, more recent papers have conducted searches on datasets with up to a trillion objects . Moreover these ideas have been used to support research in biology, neuroscience, social media, robotics, music and medicine. Similarity search requires that we know what patterns are interesting in advance. A significant advance in time series data mining is introduction of time series motifs [c]. Time series motifs are previously unknown patterns that reoccur in the data. If such patterns repeat, we can assume they are conserved for some reason, and use that observation as a starting point for further research. While these time series data mining technologies may seem obscure, with the advent of wearable devices (smartwatches, fitbit, smartphones, etc) you probably have had your gestures/behaviors classified by one of this algorithms.
Further Resources:
To allow researchers to test and compare time series data mining algorithms, there is a large collection of them at the UCR Time Series Classification Archive www.cs.ucr.edu/~eamonn/time_series_data/
While there is currently no “time series data mining for beginners” book, the more general “Data Mining: The Textbook”, by Charu Aggarwal has an excellent and accessible section on time series.
[a] Rakesh Agrawal, Christos Faloutsos, Arun N. Swami: Efficient Similarity Search In Sequence Databases. FODO 1993: 69-84.
Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, Eamonn Keogh
(2012). Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping SIGKDD 2012.
[c] Abdullah Mueen, Eamonn Keogh, Qiang Zhu, Sydney Cash, Brandon Westover (2009). Exact Discovery of Time Series Motifs. SDM 2009
Related KDD2016 Papers
Title & Authors |
---|
Computational Drug Repositioning Using Continuous Self-controlled Case Series Author(s): Zhaobin Kuang, UW-Madison; James Thomson, Morgridge Institute; Michael Caldwell, Marshfield Clinic; Peggy Peissig, ; Ron Stewart, Morgridge Institute; Page David*, University of Wisconsin |
Towards Optimal Cardinality Estimation of Unions and Intersections with Sketches Author(s): Daniel Ting*, Facebook |
Temporal Order-based First-Take-All Hashing for Fast Attention-Deficit-Hyperactive-Disorder Detectio Author(s): Hao Hu, University of Central Florida; Joey Velez-Ginorio, University of Central Florida; Guojun Qi*, University of Central Florida |
Aircraft Trajectory Prediction made easy with Predictive Analytics Author(s): Samet Ayhan*, University of Maryland; Hanan Samet, University of Maryland |
Anomaly Detection Using Program Control Flow Graph Mining from Execution Logs Author(s): Animesh Nandi*, IBM Research; Atri Mandal, IBM Research; Shubham Atreja, IIT Kanpur; Gargi Dasgupta, IBM Research; Subhrajit Bhattacharya, IBM Research |
Online Asymmetric Active Learning with Imbalanced Data Author(s): Xiaoxuan Zhang*, University of Iowa; Tianbao Yang, Univ of Iowa; Padmini Srinivasan, University of Iowa |
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size Author(s): Lorenzo De Stefani*, Brown University; Alessandro Epasto, Brown; Matteo Riondato, Two Sigma Investments; Eli Upfal, Brown University |
Recurrent Marked Temporal Point Processes: Embedding Event History to Vector Author(s): NAN DU*, GEORGIA TECH; Hanjun Dai, ; Rakshit Trivedi, ; Utkarsh Upadhyay, Max Plank Institute; Manuel Gomez-Rodriguez, MPI-SWS; Le Song, |
Improving Survey Aggregation with Sparsely Represented Signals Author(s): Tianlin Shi, Stanford University; Forest Agostinelli*, Univ of California - Irvine; Matthew Staib, MIT; David Wipf, Microsoft Research; Thomas Moscibroda, Microsoft Research |
Smart Reply: Automated Response Suggestion for Email Author(s): Anjuli Kannan, ; Karol Kurach*, Google; Sujith Ravi, Google; Tobias Kaufmann, Google, Inc.; Andrew Tomkins, ; Balint Miklos, Google, Inc.; Greg Corrado, ; László Lukács, ; Marina Ganea, ; Peter Young, ; Vivek Ramavajjala |
Lightweight Monitoring of Distributed Streams Author(s): Daniel Keren*, University of Haifa; Assaf Schuster, Technion; Arnon Lazerson, Israeli Institute of technology |
MANTRA: A Scalable Approach to Mining Temporally Anomalous Sub-trajectories Author(s): Prithu Banerjee*, UBC; Pranali Yawalkar, IIT Madras; Sayan Ranu, IIT Madras |
Taxi Driving Behavior Analysis in Latent Vehicle-to-Vehicle Networks: A Social Influence Perspective Author(s): Tong Xu*, USTC; Hengshu Zhu, Baidu Inc.; Xiangyu Zhao, USTC; Hao Zhong, Rutgers University; Qi Liu, University of Science and Technology of China; Enhong Chen, ; Hui Xiong, Rutgers |
Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning Author(s): Yue Ning*, Virginia Tech; Sathappan Muthiah, Virginia Tech; Huzefa Rangwala, George Mason University; Naren Ramakrishnan, Virginia Tech |
Burstiness Scale: a highly parsimonious model forcharacterizing random series of events Author(s): Rodrigo Alves*, CEFET-MG; Renato Assunção, DCC-UFMG; Pedro O.S. Vaz de Melo, DCC-UFMG |
Data-driven Automatic Treatment Regimen Development and Recommendation Author(s): Leilei Sun*, Dalian University of Technolog; Chuanren Liu, Drexel University; Chonghui Guo, ; Hui Xiong, Rutgers; Yanming Xie, |
Assessing Human Error Against a Benchmark of Perfection Author(s): Ashton Anderson*, Stanford University; Jon Kleinberg, Cornell University; Sendhil Mullainathan, Harvard |
Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs Author(s): Emaad Manzoor, Stony Brook University; Leman Akoglu*, SUNY Stony Brook |
Efficient Frequent Directions Algorithm for Sparse Matrices Author(s): Mina Ghashami*, University of utah; Edo Liberty, Yahoo ; Jeff Phillips, School of Computing, University of Utah |