ACM Special Interest Group on Knowledge Discovery & Data Mining

KDD-2000

Sixth ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining
August 20-23, 2000
Boston, MA, USA

Time Series Similarity Measures

Gautam Das, Dimitrios Gunopulos

Abstract:

Time series data arise in a variety of domains, such as stock market analysis, environmental data, telecommunications data, medical and financial data.

Typically each time series describes the evolution of an object as a function of time at a given data collection station. Examples are, the daily price fluctuations of a stock, or web data that count the number of clicks at different sites. Higher dimensional time series can be used to describe the evolution of more complex objects, for example digital image sequences.

Currently time series data account for a large fraction of the data stored in commercial databases. Recently there is increasing recognition of this fact, and support for time series as a new data type in commercial databases management systems is increasing. IBM DB2 for example implements support for time series using data-blades.

A fundamental problem of interest is to determine whether two given time series display similar behavior.

The problem is interesting (and difficult) because the similarity measures should allow for imprecise matches.

There are several applications of such measures. For example, they can be used to cluster the different time series into similar groups, or to classify a time series based on a set of known examples.

Another problem of interest is the indexing problem: given a set of time series Q, prepare an index offline such that given a query series q, the time series in Q that are most similar to q can be reported quickly. As an application, an investor may wish to know the stocks that behave similarly to a certain query stock.

In the database and data mining communities, various similarity measures and indexing techniques for time series have been proposed. In this tutorial we describe the state-of-art of this area by comparing and summarizing several of these techniques in detail.

Biographies of Organizers:

Gautam Das received a Ph.D. in Computer Science from the University of Wisconsin-Madison in 1990, and a B.Tech from the Indian Institute of Technology, Kanpur.� Dr. Das is currently a Researcher in the Data Mining and Exploration at Microsoft Research. He has also held positions at Compaq Computer Corp. and the University of Memphis.� His research interests include data mining, data bases, algorithms, and computational geometry. His current research focuses on techniques for defining context-based similarity measures between complex data objects, on sequence analysis, and on database indexing techniques.

Dimitrios Gunopulos received a Ph.D. in Computer Science from Princeton University in 1995. Prior to that he received a M.A. in Computer Science from Princeton and a Diploma in Computer Engineering from the University of Patras.� Dr. Gunopulos is currently an Assistant Professor in the Department of Computer Science and Engineering at the University of California, Riverside. He has also held positions at IBM Almaden and the Max-Plank-Institut for Informatik.� His research interests include data mining, databases, algorithms, and computational geometry.� His current research focuses on techniques for approximating range queries, on applying data mining techniques to geospatial data, and on database indexing techniques.

KDD-2000 Home

�