Uppsala universitet
Hoppa över länkar

Information Technology

UDBL Home

People
Research
Publications
Theses
Openings
Master Projects
Contact
Amos II
Wrappers

Parallel Object Query System for Expensive Computations (POQSEC)

This work was funded by the Swedish Research Council.

Project description

Exceptionally large amounts of distributed data and computational resources will be available through the GRID. Many modern applications within, e.g. engineering, bioinformatics, neuroscience, music, space physics, etc. require scalable data access. They also require representation of not only traditional tabular databases but also other data representations, such as numerical data structures. Very demanding and memory-intensive computations need to be done over these large amounts of data. Another important issue is that it that the user should be able to transparently utilize the resources required for an analysis without having to manually partition data access and computations.

The goal of this project is to achieve high performance for scientific queries utilizing the operational Grid infrastructure NorduGrid.  A data manager and customizable query processor is being developed that allows transparent and efficient execution of database queries utilizing NorduGrid. The exections can access data from storage elements and wrapped external systems on the Grid.  The system will have support for customizable data representations, allow user-defined long-running distributed computations in queries, and access conventional relational databases. It will process application specific code on both local data and data distributed through the Grid.

NorduGrid is a distributed peer oriented Grid middleware system that does not rely on a central broker. Computer clusters accessed through NorduGrid have certain restrictions with respect to resource allocation, communication, and process management that the POQSEC architecture must cope with and this influences its architecture.

The POQSEC data manager and query processor scales up by utilizing the Grid to transparently and dynamically incorporate new nodes and clusters for the combined processing of data and computations as the database and application demands grow. Conventional databases and file-based Grid storage elements are used as back-ends for data repositories. Extensible and object-oriented query processing and rewrite techniques are used to efficiently combine distributed data and computations in this environment.

The POQSEC prototype being developed uses as test cases data and queries from Particle Physics where large amounts of data describing particle events are produced by proton-proton collisions. The queries involve regular data comparisons and aggregation operators along with user defined filter operations in terms C++ based computational libraries, e.g. the ROOT library. A single such analysis of a single dataset of size 1 million events often takes more than 1 hour to execute on a single machine. Thus, the processing needs to scale up to cover all distributed data produced by LHC. 

POQSEC utilizes the AMOS II database management system that provides object-relational DBMS functionality, peer to peer communication, declarative query language AmosQL, and interfaces to C++ and Java. The kernel is being extended in order to implement the architecture.

Resources

We use various computational resources to test and evaluate our system prototypes. Most of them are provided by Swedish National Infrastructures for Computing (SNIC) namely Swegrid, HPC2N, and UPPMAX resources. We use also other resources available through NorduGrid, mostly located in Sweden, Danmark, Finland, and Norway.

Publications

People

Responsible for this project is Tore Risch. It is  the basis for the PhD work of Ruslan Fomkin.

Acknowledgments

We would like to thank Christian Hansson and professor Tord Ekelf from the department of Radiation Science, Uppsala University, for providing the Particle Physics analysis application and data as system test cases. The secure communication is implemented by Mehran Ahsant from Center for Parallel Computers, Royal Institute of Technology, Stockholm. The support from the NorduGrid project and from ke Sandgren (HPC2N) and Tore Sundqvist (UPPMAX) is significant for the system implementation and testing.


Last update: 24/03/2005. Responsible: Tore Risch
Copyright © 2005 Uppsala University, Department of Information Technology