Home / Topics

Time Series and Stream Mining

Curated by: Eamonn Keogh

It is in the nature of humans to measure things, and (with rare exceptions) things change over time. A familiar example is a heartbeat, which represents the change in heart's electrical activity. A collection of such temporal measurements are called a “time series”. Other familiar examples include a politician’s popularity waxing and waning, or the temperature rising and falling over both the short term (each day) the medium term (each year) and the long term (climate change drift).

Because such data is ubiquitous, touching almost every aspect of human life, data mining researchers have long paid significant attention to time series. One paradox of time series is that we do not typically care about the individual values in a time series, but only in the shapes, trends and patterns. Therefore, one of the most basic operations one can perform with time series is to ask “are there any other patterns in this dataset that look like this pattern”. This task is called similarity search (or query-by- content). There are two challenges in doing this: How can we do it fast, given the database may be massive, and how can we do it right, given that the patterns may match according to the human eye, but not be exactly the same. Perhaps the first paper to consider this problem was [a], written about 25 years ago. Since then, there have been thousands of papers on the topic, including dozens that have appeared in SIGKDD.

By any standard, the KDD community has made great progress on this problem; early papers searched datasets with only a few thousand objects, more recent papers have conducted searches on datasets with up to a trillion objects . Moreover these ideas have been used to support research in biology, neuroscience, social media, robotics, music and medicine. Similarity search requires that we know what patterns are interesting in advance. A significant advance in time series data mining is introduction of time series motifs [c]. Time series motifs are previously unknown patterns that reoccur in the data. If such patterns repeat, we can assume they are conserved for some reason, and use that observation as a starting point for further research. While these time series data mining technologies may seem obscure, with the advent of wearable devices (smartwatches, fitbit, smartphones, etc) you probably have had your gestures/behaviors classified by one of this algorithms.

Further Resources:

To allow researchers to test and compare time series data mining algorithms, there is a large collection of them at the UCR Time Series Classification Archive www.cs.ucr.edu/~eamonn/time_series_data/

While there is currently no “time series data mining for beginners” book, the more general “Data Mining: The Textbook”, by Charu Aggarwal has an excellent and accessible section on time series.

[a] Rakesh Agrawal, Christos Faloutsos, Arun N. Swami: Efficient Similarity Search In Sequence Databases. FODO 1993: 69-84.

Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, Eamonn Keogh

(2012). Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping SIGKDD 2012.

[c] Abdullah Mueen, Eamonn Keogh, Qiang Zhu, Sydney Cash, Brandon Westover (2009). Exact Discovery of Time Series Motifs. SDM 2009

Related KDD2016 Papers

Title & Authors
Temporal Order-based First-Take-All Hashing for Fast Attention-Deficit-Hyperactive-Disorder Detectio
Author(s): Hao Hu, University of Central Florida; Joey Velez-Ginorio, University of Central Florida; Guojun Qi*, University of Central Florida
Aircraft Trajectory Prediction made easy with Predictive Analytics
Author(s): Samet Ayhan*, University of Maryland; Hanan Samet, University of Maryland
Anomaly Detection Using Program Control Flow Graph Mining from Execution Logs
Author(s): Animesh Nandi*, IBM Research; Atri Mandal, IBM Research; Shubham Atreja, IIT Kanpur; Gargi Dasgupta, IBM Research; Subhrajit Bhattacharya, IBM Research
Computational Drug Repositioning Using Continuous Self-controlled Case Series
Author(s): Zhaobin Kuang, UW-Madison; James Thomson, Morgridge Institute; Michael Caldwell, Marshfield Clinic; Peggy Peissig, ; Ron Stewart, Morgridge Institute; Page David*, University of Wisconsin
Towards Optimal Cardinality Estimation of Unions and Intersections with Sketches
Author(s): Daniel Ting*, Facebook
Improving Survey Aggregation with Sparsely Represented Signals
Author(s): Tianlin Shi, Stanford University; Forest Agostinelli*, Univ of California - Irvine; Matthew Staib, MIT; David Wipf, Microsoft Research; Thomas Moscibroda, Microsoft Research
Smart Reply: Automated Response Suggestion for Email
Author(s): Anjuli Kannan, ; Karol Kurach*, Google; Sujith Ravi, Google; Tobias Kaufmann, Google, Inc.; Andrew Tomkins, ; Balint Miklos, Google, Inc.; Greg Corrado, ; László Lukács, ; Marina Ganea, ; Peter Young, ; Vivek Ramavajjala
Lightweight Monitoring of Distributed Streams
Author(s): Daniel Keren*, University of Haifa; Assaf Schuster, Technion; Arnon Lazerson, Israeli Institute of technology
MANTRA: A Scalable Approach to Mining Temporally Anomalous Sub-trajectories
Author(s): Prithu Banerjee*, UBC; Pranali Yawalkar, IIT Madras; Sayan Ranu, IIT Madras
Taxi Driving Behavior Analysis in Latent Vehicle-to-Vehicle Networks: A Social Influence Perspective
Author(s): Tong Xu*, USTC; Hengshu Zhu, Baidu Inc.; Xiangyu Zhao, USTC; Hao Zhong, Rutgers University; Qi Liu, University of Science and Technology of China; Enhong Chen, ; Hui Xiong, Rutgers
Online Asymmetric Active Learning with Imbalanced Data
Author(s): Xiaoxuan Zhang*, University of Iowa; Tianbao Yang, Univ of Iowa; Padmini Srinivasan, University of Iowa
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size
Author(s): Lorenzo De Stefani*, Brown University; Alessandro Epasto, Brown; Matteo Riondato, Two Sigma Investments; Eli Upfal, Brown University
Recurrent Marked Temporal Point Processes: Embedding Event History to Vector
Author(s): NAN DU*, GEORGIA TECH; Hanjun Dai, ; Rakshit Trivedi, ; Utkarsh Upadhyay, Max Plank Institute; Manuel Gomez-Rodriguez, MPI-SWS; Le Song,
Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning
Author(s): Yue Ning*, Virginia Tech; Sathappan Muthiah, Virginia Tech; Huzefa Rangwala, George Mason University; Naren Ramakrishnan, Virginia Tech
Burstiness Scale: a highly parsimonious model forcharacterizing random series of events
Author(s): Rodrigo Alves*, CEFET-MG; Renato Assunção, DCC-UFMG; Pedro O.S. Vaz de Melo, DCC-UFMG
Assessing Human Error Against a Benchmark of Perfection
Author(s): Ashton Anderson*, Stanford University; Jon Kleinberg, Cornell University; Sendhil Mullainathan, Harvard
Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs
Author(s): Emaad Manzoor, Stony Brook University; Leman Akoglu*, SUNY Stony Brook
Efficient Frequent Directions Algorithm for Sparse Matrices
Author(s): Mina Ghashami*, University of utah; Edo Liberty, Yahoo ; Jeff Phillips, School of Computing, University of Utah
Data-driven Automatic Treatment Regimen Development and Recommendation
Author(s): Leilei Sun*, Dalian University of Technolog; Chuanren Liu, Drexel University; Chonghui Guo, ; Hui Xiong, Rutgers; Yanming Xie,