Project:  Transparent inclusion, validation, and utilization of main memory domain indexes


Thanh Truong, Tore Risch

Department of Information Technology, Uppsala University
Box 337, SE-751 05, Sweden

thanh.truong@it.uu.se
tore.risch@it.uu.se


Table of Content

Abstract
Links

Short version of the project in pictures
Code snippet
How to customize index extensions to ? It is all in 3 steps

This project is developed by Thanh Truong C. Visit his homepage here


Abstract


Main memory database systems (MMDBs) are viable solutions for many scientific applications. Scientific and engineering data often require special indexing methods, and there is a large of number domain specific main memory indexing implementations developed.
However, adding an index structure into a database system can be challenging. Mexima (Main-memory External Index Manager) provides an MMDB where new main memory index structures can be plugged in without modifying the index implementation code.
To transparently utilize new user-defined indexes in queries, Mexima provides index property tables containing index meta-data that enable automatic transformation of query fragments to call index operations. When indexed attributes are hidden inside
complex numerical expressions, an algebraic query transformation algorithm reasons on the expressions to expose hidden indexes. The index property tables are furthermore used for validating the correctness of an index implementation by executing automatically
generated test queries. Experiments show that the overhead of plugging in indexes in Mexima is low compared to the corresponding stand-alone C/C++ implementations. Substantial performance gains are shown by the index exposing rewrite mechanisms.



Links


Short version of the project in pictures


       
   
  • Mexima supports several query rewrite forms so that it transparently rewrites user-queries to utilize the new added index. (Transparent utilization)
The translation rules can rewrite conjunctions in queries having terms of one the following query fragment forms:
Form (i):
P(…iv,..) AND (iv r1 expression) AND
                         (iv r2 expression) AND
                          . . .
                        (iv rn expression)
Here, iv is a variable bound to an indexed column of table P(…).
We say iv is an indexed variable. ri are comparison operators in the set relop, ri  relop, where relop ={=, <, >, >=, <=}.

Form (ii):
P(…iv,..) AND (iv r1 expression) AND
                        (iv r2 expression) AND
                        . . .
                        (iv rn expression)
Here, iv is a variable bound to a indexed column of table P(…).

Form (iii):
P(…,iv,…) AND (..,iv,..) in isf(…..,P,..)
Here the isf() is an index sensitive function that takes a table P as argument and emits a set of rows.

Form (iv)

P(…iv,..) AND F(iv) relop expression
Here iv is an indexed variable and F(iv) is an expression consisting of a combination of transformable functions T. Currently T  {+, -, /, *, power, sqrt, abs} and the set can be extended.
The system tries to reformulate the query condition into an equivalent condition iv relop’ F’(expression) of Form (i) where the index  is exposed to the query optimizer.
The algebraic inequality transformations automatically determine relop’ and F’(expression).
  • Moreover, Mexima takes advantages of index properties to enable testing index functionalities. (Transparent validation)

  • Experiments showed that penalty to use an index through Mexima (as database index) is very small compared to using it as standalone index structure.
The following figure illustrates how the execution time is spent in different layers of a plugged-in index implementation.
Here:
In the experiment, we measured the execution times for the different index implementations both when plugging-in the implementation into Mexima and when running the
implementation as a stand-alone C/C++ program. The total execution time for using a plugged-in index implementation is tot = op + mc + ed + st.
The Mexima overhead, o, of calling a plugged-in index implementation is calculated as o= op + mc + ed.
The paper shows the average Mexima overheads o in microseconds for the BAOs put(), get(), delete(), and map(). The database size S was 5 million key/value pairs.
The total overhead was measured with both boxed keys bo and unboxed keys o. The standard deviations in all cases were less than 0.03 μs. The overhead of Mexima is well below one μs per call (low overhead)
and particularly low for unboxed keys.
The Table 5 in the paper furthermore breaks down the percentages of how the overhead o is spent in the different layers op, mc, and ed.


**Mexima is powered by Amos II



Snippet

How to customize index extensions? It is all in 3 steps

Step 1

Step 2

Here is instructions to compile the sourcecode of Btree, other sourcode work in the same manner

If your AMOS_HOME environment is C:\thanh\downloads, then you should unizp the file to have
C:\thanh\downloads\AmosNT\extenders\.....

Step 3

    Place the new dll in your downloaded and running Mexima. Restart Mexima and enjoy.
    If you want to learn more about the syntax of Mexima (Amos II), please refer to Amos II User Manual