Prefix-querying: an approach for effective subsequence matching under time warping in sequence databases
Full text pdf formatPdf (1.66 MB)
Source Conference on Information and Knowledge Management archive
Proceedings of the tenth international conference on Information and knowledge management table of contents
Atlanta, Georgia, USA
Session: Sequence Mining table of contents
Pages: 255 - 262  
Year of Publication: 2001
ISBN:1-58113-436-3
Authors
Sanghyun Park  IBM T.J. Watson Research Center
Sang-Wook Kim  Kangwon National University
June-Suh Cho  IBM T.J. Watson Research Center
Sriram Padmanabhan  IBM T.J. Watson Research Center
Sponsors
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM Press   New York, NY, USA
Additional Information:

abstract   references   citings   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Discussions    Find similar Articles   Review this Article  
Save this Article to a Binder    Display in BibTex Format   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502585.502629
What is a DOI?

ABSTRACT

This paper discusses an index-based subsequence matching that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. In our earlier work, we suggested an efficient method for whole matching under time warping. This method constructs a multi-dimensional index on a set of feature vectors, which are invariant to time warping, from data sequences. For filtering at feature space, it also applies a lower-bound function, which consistently underestimates the time warping distance as well as satisfies the triangular inequality.In this paper, we incorporate the prefix-querying approach based on sliding windows into the earlier approach. For indexing, we extract a feature vector from every subsequence inside a sliding window and construct a multi-dimensional index using a feature vector as indexing attributes. For query processing, we perform a series of index searches using the feature vectors of qualifying query prefixes. Our approach provides effective and scalable subsequence matching even with a large volume of a database. We also prove that our approach does not incur false dismissal. To verify the superiority of our method, we perform extensive experiments. The results reveal that our method achieves significant speedup with real-world S&P 500 stock data and with very large synthetic data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1   R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. In Proc. FODO, pages 69-84, 1993.

2   R. Agrawal, K. Lin, H. S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In Proc. VLDB, pages 490-501, 1995.

3   Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States

4   S. Berchtold, D. A. Keim, and H. Kriegel. The X-tree: An index structure for high-dimensional data. In Proc. VLDB, pages 28-39, 1996.

5   Donald J. Berndt , James Clifford, Finding patterns in time series: a dynamic programming approach, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996

6   M. S. Chen, J. Han, and P. S. Yu. Data mining: An overview from database perspective. IEEE TKDE, 8(6):866-883, 1996.

7   Kelvin Kam Wing Chu , Man Hon Wong, Fast time-series searching with scaling and shifting, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.237-248, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States

8   G. Das, D. Gunopulos, and H. Mannila. Finding similar time series. In Proc. PKDD, pages 88-100, 1997.

9   Christos Faloutsos , M. Ranganathan , Yannis Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.419-429, May 24-27, 1994, Minneapolis, Minnesota, United States

10   D. Q. Goldin and P. C. Kanellakis. On similarity queries for time-series data: Constraint specification and implementation. In Proc. Constraint Programming, pages 137-153, 1995.

11   A. Guttman. R-tree: A dynamic index structure for spatial searching. In Proc. ACM SIGMOD, pages 47-57, 1984.

12   Ibrahim Kamel , Christos Faloutsos, On packing R-trees, Proceedings of the second international conference on Information and knowledge management, p.490-499, November 01-05, 1993, Washington, D.C., United States

13   S. W. Kim, S. Park, and W. W. Chu. An index-based approach for similarity search supporting time warping in large sequence databases. In Proc. IEEE ICDE, pages 607-614, 2001.

14   S. Park, W. W. Chu, J. Yoon, and C. Hsu. Efficient searches for similar subsequences of different lengths in sequence databases. In Proc. IEEE ICDE, pages 23-32, 2000.

15   Franco P. Preparata , Michael I. Shamos, Computational geometry: an introduction, Springer-Verlag New York, Inc., New York, NY, 1985

16   Lawrence Rabiner , Biing-Hwang Juang, Fundamentals of speech recognition, Prentice-Hall, Inc., Upper Saddle River, NJ, 1993

17   Davood Rafiei , Alberto Mendelzon, Similarity-based queries for time series data, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.13-25, May 11-15, 1997, Tucson, Arizona, United States

18   T. K. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-tree: A dynamic index for multi-dimensional objects. In Proc. VLDB, pages 507-518, 1987.

19   K. Shim, R. Srikant, and R. Agrawal. High-dimensional similarity joins. In Proc. ICDE, pages 301-311, 1997.

20   Graham A. Stephen, String searching algorithms, World Scientific Publishing Co., Inc., River Edge, NJ, 1994

21   B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. In IEEE ICDE, pages 201-208, 1998.




Peer to Peer - Readers of this Article have also read: