Project: Transparent inclusion, validation, and utilization of main memory domain indexes

Thanh Truong, Tore Risch

Department of Information Technology, Uppsala University
Box 337, SE-751 05, Sweden

thanh.truong@it.uu.se
tore.risch@it.uu.se

Table of Content

Abstract
Links

Full paper

Presentation

Software

Short version of the project in pictures
Code snippet
How to customize index extensions to ? It is all in 3 steps

This project is developed by Thanh Truong C. Visit his homepage here

Abstract

Main memory database systems (MMDBs) are viable solutions for many scientific applications. Scientific and engineering data often require special indexing methods, and there is a large of number domain specific main memory indexing implementations developed.
However, adding an index structure into a database system can be challenging. Mexima (Main-memory External Index Manager) provides an MMDB where new main memory index structures can be plugged in without modifying the index implementation code.
To transparently utilize new user-defined indexes in queries, Mexima provides index property tables containing index meta-data that enable automatic transformation of query fragments to call index operations. When indexed attributes are hidden inside
complex numerical expressions, an algebraic query transformation algorithm reasons on the expressions to expose hidden indexes. The index property tables are furthermore used for validating the correctness of an index implementation by executing automatically
generated test queries. Experiments show that the overhead of plugging in indexes in Mexima is low compared to the corresponding stand-alone C/C++ implementations. Substantial performance gains are shown by the index exposing rewrite mechanisms.

Short version of the project in pictures

There are many in-memory index structures but a very few of them was integrated into database system because such integration is very challenging or impossible (copyright, avaiablity only in binary)

Gist (Generalized Search Tree) based frameworks (stage 1, stage 2, and stage 3) were introduced to reduce integration efforts. However, it still required to re-engineer the index implementation followed Gist coding convention. No reusuability !

Mexima (Main-memory External Index Manager) goes beyond than Gist to provide inclusion of new index structures without code changes. (Transparent inclusion)

Mexima supports several query rewrite forms so that it transparently rewrites user-queries to utilize the new added index. (Transparent utilization)

The translation rules can rewrite conjunctions in queries having terms of one the following query fragment forms:
Form (i):

P(…iv,..) AND (iv r1 expression) AND
                         (iv r2 expression) AND
                        . . .
                        (iv rn expression)

Here, iv is a variable bound to an indexed column of table P(…).
We say iv is an indexed variable. ri are comparison operators in the set relop, ri  relop, where relop ={=, <, >, >=, <=}.

Form (ii):
P(…iv,..) AND (iv r1 expression) AND
                        (iv r2 expression) AND
                        . . .
                        (iv rn expression)
Here, iv is a variable bound to a indexed column of table P(…).

Form (iii):
P(…,iv,…) AND (..,iv,..) in isf(…..,P,..)
Here the isf() is an index sensitive function that takes a table P as argument and emits a set of rows.

Form (iv)
P(…iv,..) AND F(iv) relop expression
Here iv is an indexed variable and F(iv) is an expression consisting of a combination of transformable functions T. Currently T  {+, -, /, *, power, sqrt, abs} and the set can be extended.
The system tries to reformulate the query condition into an equivalent condition iv relop’ F’(expression) of Form (i) where the index is exposed to the query optimizer.
The algebraic inequality transformations automatically determine relop’ and F’(expression).

Moreover, Mexima takes advantages of index properties to enable testing index functionalities. (Transparent validation)

Experiments showed that penalty to use an index through Mexima (as database index) is very small compared to using it as standalone index structure.

The following figure illustrates how the execution time is spent in different layers of a plugged-in index implementation.
Here:

op: time spent to call algebra operations on an indexed table to add, delete, access, or map.

mc: time spent to dispatch and call the BAO function in an algebra operation. This includes time spent for type checking and automatic garbage collection.

ed: time spent in the index extension drivers for BAOs and index sepecial search functions (SSFs).

st: time spent on actually running the untouched index implementation code. This is the actual work to manipulate the index, i.e. the time to run the stand-alone C/C++ implementation.

In the experiment, we measured the execution times for the different index implementations both when plugging-in the implementation into Mexima and when running the
implementation as a stand-alone C/C++ program. The total execution time for using a plugged-in index implementation is tot = op + mc + ed + st.
The Mexima overhead, o, of calling a plugged-in index implementation is calculated as o= op + mc + ed.
The paper shows the average Mexima overheads o in microseconds for the BAOs put(), get(), delete(), and map(). The database size S was 5 million key/value pairs.
The total overhead was measured with both boxed keys bo and unboxed keys o. The standard deviations in all cases were less than 0.03 μs. The overhead of Mexima is well below one μs per call (low overhead)
and particularly low for unboxed keys.
The Table 5 in the paper furthermore breaks down the percentages of how the overhead o is spent in the different layers op, mc, and ed.

**Mexima is powered by Amos II

Snippet

Initialization function [a_initialization]
SSF range search [rangesearch]
Index properties [index properties]
Auto correctness test example [test]

How to customize index extensions? It is all in 3 steps

Step 1

B-tree index extension [Btree]
Linear Hashing index extension [Linh]
Xtree index extension [Xtree]
R*-tree index extension [R*-tree]
All-in-one [All-in-one]

Step 2

Here is instructions to compile the sourcecode of Btree, other sourcode work in the same manner

Download the following setenv.cmd script and modify it
Unizp the downloade zip file

If your AMOS_HOME environment is C:\thanh\downloads, then you should unizp the file to have
C:\thanh\downloads\AmosNT\extenders\.....

Browse the AmosNT\extenders\BTREE\extender\MVC and open the VS 6.0 project file ( or VS 2010 project file if exists)
Compile the project. You will find the corresponding bt.dll created in AmosNT\bin\ directory

Step 3

Place the new dll in your downloaded and running Mexima. Restart Mexima and enjoy.
If you want to learn more about the syntax of Mexima (Amos II), please refer to Amos II User Manual