This
project is developed by Thanh Truong C. Visit his homepage
here
Abstract
Main memory database systems (MMDBs) are
viable solutions for many scientific applications. Scientific
and engineering data often require special indexing methods, and
there is a large of number domain specific main memory indexing
implementations developed.
However, adding an index structure into a database system can be
challenging. Mexima (Main-memory External Index Manager) provides
an MMDB where new main memory index structures can be plugged in
without modifying the index implementation code.
To transparently utilize new user-defined indexes in queries,
Mexima provides index property tables containing index
meta-data that enable automatic transformation of query
fragments to call index operations. When indexed attributes are
hidden inside
complex numerical expressions, an algebraic query transformation
algorithm reasons on the expressions to expose hidden indexes.
The index property tables are furthermore used for validating
the correctness of an index implementation by executing
automatically
generated test queries. Experiments show that the overhead of
plugging in indexes in Mexima is low compared to the
corresponding stand-alone C/C++ implementations. Substantial
performance gains are shown by the index exposing rewrite
mechanisms.
Links
Mexima was presented at the 27th Internaltional
Conference on Scientific and Satistical Database Management
(SSDBM) 2015,
UC San Diego Super Computer Center, June 29-July 1, 2015
There are many in-memory index structures but a very few
of them was integrated into database system because such
integration is very challenging or impossible (copyright,
avaiablity only in binary)
Gist (Generalized
Search Tree) based frameworks (stage 1, stage 2, and
stage 3) were introduced to reduce integration efforts.
However, it still required to re-engineer the index
implementation followed Gist coding convention. No
reusuability !
Mexima (Main-memory
External Index Manager) goes beyond than Gist to
provide inclusion of new index structures without code
changes. (Transparent
inclusion)
Mexima supports several query rewrite forms so that it
transparently rewrites user-queries to utilize the new
added index. (Transparent
utilization)
The translation rules can rewrite conjunctions in
queries having terms of one the following query fragment
forms: Form (i):
P(…iv,..) AND (iv r1 expression) AND (iv r2 expression) AND
. . . (iv rn expression)
Here, iv is a variable bound to an indexed column
of table P(…).
We say iv is an indexed variable. ri are
comparison operators in the set relop, ri relop, where relop
={=, <, >, >=, <=}.
Form (ii): P(…iv,..) AND (iv r1 expression) AND
(iv r2
expression) AND
. . .
(iv rn expression)
Here, iv is a variable bound to a indexed column of table
P(…).
Form (iii): P(…,iv,…) AND (..,iv,..) in isf(…..,P,..)
Here the isf() is an index sensitive function that
takes a table P as argument and emits a set of rows.
Form (iv) P(…iv,..) AND F(iv) relop expression
Here iv is an indexed variable and F(iv) is
an expression consisting of a combination of transformable
functions T. Currently T {+, -, /, *, power,
sqrt, abs} and the set can be extended.
The system tries to reformulate the query condition into an
equivalent condition iv relop’ F’(expression) of Form
(i) where the index is exposed to the
query optimizer.
The algebraic inequality transformations automatically
determine relop’ and F’(expression).
Moreover, Mexima takes advantages of index properties to
enable testing index functionalities. (Transparent validation)
Experiments showed that penalty to use an index through
Mexima (as database index) is very small compared to using
it as standalone index structure.
The following figure illustrates how the execution time is spent
in different layers of a plugged-in index implementation.
Here:
op: time spent to call algebra operations on an indexed
table to add, delete, access, or map.
mc: time spent to dispatch and call the BAO function in an
algebra operation. This includes time spent for type
checking and automatic garbage collection.
ed: time spent in the index extension drivers for BAOs and
index sepecial search functions (SSFs).
st: time spent on actually running the untouched index
implementation code. This is the actual work to manipulate
the index, i.e. the time to run the stand-alone C/C++
implementation.
In the experiment, we measured the execution times for the
different index implementations both when plugging-in the
implementation into Mexima and when running the
implementation as a stand-alone C/C++ program. The total execution
time for using a plugged-in index implementation is tot =
op + mc + ed + st.
The Mexima overhead, o, of calling a plugged-in index
implementation is calculated as o= op + mc + ed.
The paper shows the average Mexima overheads o in microseconds for
the BAOs put(), get(), delete(), and map(). The database size S
was 5 million key/value pairs.
The total overhead was measured with both boxed keys bo and
unboxed keys o. The standard deviations in all cases were less
than 0.03 μs. The overhead of Mexima is well below one μs per
call (low overhead)
and particularly low for unboxed keys.
The Table 5 in the paper furthermore breaks down the percentages
of how the overhead o is spent in the different layers op,
mc, and ed.
Here is instructions to compile the sourcecode of Btree, other
sourcode work in the same manner
Download the following setenv.cmd
script and modify it
Unizp the downloade zip file
If your AMOS_HOME environment is C:\thanh\downloads,
then you should unizp the file to have
C:\thanh\downloads\AmosNT\extenders\.....
Browse the AmosNT\extenders\BTREE\extender\MVC and open the
VS 6.0 project file ( or VS 2010 project file if exists)
Compile the project. You will find the corresponding bt.dll
created in AmosNT\bin\ directory
Step 3
Place the new dll in your downloaded and
running Mexima. Restart Mexima and enjoy.
If you want to learn more about the syntax of
Mexima (Amos II), please refer to Amos
II User Manual