ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

StatiX: Making XML Count

Freire, Juliana and Haritsa, Jayant R and Ramanath, Maya and Roy, Prasan and Simeon, Jerome (2002) StatiX: Making XML Count. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 2002, Madison, Wisconsin, USA, pp. 181-192.

[img]
Preview
PDF
conpaper3.pdf

Download (122Kb)

Abstract

The availability of summary data for XML documents has many applications, from providing users with quick feedback about their queries, to cost-based storage design and query optimization. StatiX is a novel XML Schema-aware statistics framework that exploits the structure derived by regular expressions (which define elements in an XML Schema) to pinpoint places in the schema that are likely sources of structural skew. As we discuss below, this information can be used to build concise, yet accurate, statistical summaries for XML data. StatiX leverages standard XML technology for gathering statistics, notably XML Schema validators, and it uses histograms to summarize both the structure and values in an XML document. In this paper we describe the StatiX system. We develop algorithms that decompose schemas to obtain statistics at different granularities and discuss how statistics can be gathered as documents are validated. We also present an experimental evaluation which demonstrates the accuracy and scalability of our approach and show an application of these statistics to cost-based XML storage design.

Item Type: Conference Paper
Additional Information: ©ACM,2002.This is the author's version of the work.It is posted here by permission of ACM for your personal use.Not for redistribution.The definitive version was published in Proceedings of the 2002 ACM SIGMOD international conference on Management of data, DOI:10.1145/564691.564713
Keywords: XML Count;StatiX;schemas
Department/Centre: Division of Information Sciences > Supercomputer Education & Research Centre
Date Deposited: 22 Jul 2004
Last Modified: 19 Sep 2010 04:14
URI: http://eprints.iisc.ernet.in/id/eprint/1108

Actions (login required)

View Item View Item