Scientific applications require processing high-volume on-line streams of numerical data from instruments and simulations. We present an extensible stream database system that allows scalable and flexible continuous queries on such streams. Application dependent streams and query functions are defined through an Object-Relational model. Distributed execution plans for continuous queries are described as high-level data flow distribution templates. Using a generic template we define two partitioning strategies for scalable parallel execution of expensive stream queries: window split and window distribute. Window split provides operators for customized parallel execution of query functions whose complexity depends on size of the data units on which they are applied. It reduces the size of stream data units using application dependent functions as parameters. By contrast, window distribute provides operators for customized distribution of entire data units without reducing their size. We evaluated these strategies for a typical high volume scientific stream application and show that window split is favorable when computational resources are limited, while window distribute is better when there are sufficient resources.
Available as PDF (290 kB), Postscript (1.84 MB), and compressed Postscript (642 kB)
Download BibTeX entry.