AMOS II (Active Mediator Object System) is a light-weight, Object-Oriented (OO), multi-database system. AMOS II has an functional data model with a relationally complete object-oriented query language, AMOSQL.

Object-Oriented multi-database queries and views can be defined where external data sources of different kinds are translated through AMOS II and integrated through its OO mediation primitives. Through its multi-database facilities many distributed AMOS II systems can interoperate. Since most data reside in the data sources and to achieve good performance, the system is designed as a main-memory DBMS having a storage manager, query optimizer, transactions, client-server interface, etc.

Each AMOS II system is also a DBMS of its own, designed for main-memory and optimized for efficient execution when the entire database fits in main-memory. The AMOS II data manager is furthermore extensible so that new data types and operators can be added to AMOSQL, implemented in some external programming language (Java, C, or Lisp).

AMOS II is implemented in C and runs under Windows 95/98 and Windows NT 4.0. This document is an overview of the system.

Acknowledgements

The following persons have contributed to the development of the AMOS II project:
Silvio Brandani, Kristoffer Cassel, Daniel Elin, Marcus Eriksson, Gilles Fabre, Gustav Fahl, Staffan Flodin, Jörn Gebhardt, Björn Hellander, Vanja Josifovski, Jonas Karlsson, Timour Katchaounov, Salah-Eddine Machani, Joakim Näs, Kjell Orsborn, Thomas Padron-McCarthy, Tore Risch, Andreas Sjöstedt, Martin Sköld, Rickard Svensson, and Magnus Werner.

1 Introduction

This document gives an introduction to the AMOS II system [RJ00]. The core of AMOS II is an object-oriented, open, light-weight, and extensible database management system (DBMS). To achieve good performance AMOS II is designed as a main-memory DBMS. Each AMOS II server is also a DBMS of its own containing all the traditional database facilities, such as a storage manager, a recovery manager, a transaction manager, and an OO query language named AMOSQL. The system can be used as a single-user database or as a multi-user server to applications and to other AMOS II systems. The data manager is designed for main-memory and is optimized for efficient execution when the entire database fits in main-memory.

The query language of AMOS II, called AMOSQL, AMOSQL is similar to the OO parts of SQL:99 and based on the functional data model DAPLEX [Shi81] and OSQL [Lyn91]. The main new features of AMOSQL compared to OSQL are generalized foreign functions [LR92], optimization of overloaded functions with late binding, [FR95], functional active rules [Sko96][SR96] (being ported to AMOS II), and multi-database facilities [JR99a][JR99b]. AMOSQL furthermore has aggregation operators, nested subqueries, disjunctive queries, quantifiers, and is relationally complete.

AMOS II is a distributed mediator system [Wie92] allowing several AMOS II mediator servers to communicate over the Internet. Applications can access data from several distributed data sources through a collection of distributed AMOS II servers. Each mediator server appears as a virtual OO database layer having OO data abstractions and an OO query language. OO views provide transparent access to data sources from clients and other mediator servers. Conflicts and overlaps between similar real-world entities being modeled differently in different data sources are reconciled through the mediation primitives [JR99a][JR99b] of AMOSQL. The mediation services allow transparent access to similar object structures represented differently in different data sources. AMOS II mediators are composable since a mediator server can regard other mediator servers as data sources. Different interconnecting topologies can be used to connect mediator servers depending on the integration requirements of the environment.

In order to access data from external data sources AMOS II mediators contain one or several wrappers which process data from different kinds of external data sources, e.g. ODBC based access to relational databases [FR97][Bra98] or access to XML files. A wrapper is a program module in AMOS II having specialized facilities for query processing and translation of data from a particular class of external data sources. It contains both interfaces to external data sources and knowledge of how to efficiently translate and process queries involving accesses to a class of external data sources. In particular external AMOS II servers known to a mediator are also regarded as external data sources and there is a special wrapper for accessing other AMOS II servers. However, among the AMOS II servers special query optimization methods are used that take into account the distribution, capabilities, costs, etc. of the different servers [JR00][Jos99].

The declarative multi-database query language AMOSQL requires queries to be optimized before execution. The query compiler translates AMOSQL statements into object calculus and algebra expressions in an internal simple logic based language called ObjectLog [LR92], which is an OO dialect of Datalog [Ull88]. As part of the translation into object algebra programs, many optimizations are applied on AMOSQL expressions relying on its OO and multi-database properties. During the optimization steps, the object calculus expressions are re-written into equivalent but more efficient expressions. For distributed multi-database queries a multi-database query decomposer [JR00] distributes each object calculus query into local queries to be executed in the different distributed AMOS II servers and data sources. For better performance, the decomposed query plans are rebalanced over the distributed Amos II servers[JKR99]. A cost-based optimizer on each site translates the local queries into procedural execution plans in an OO algebra, based on statistical estimates of the cost to execute each generated query execution plan expressed in the OO algebra. A query interpreter finally interprets the optimized algebra to produce the (partial) result of a query.

The query optimizer is extensible through a generalized foreign function mechanism, multi-directional foreign functions [LR92]. It gives transparent access from AMOSQL to special purpose data structures such as internal AMOS II meta-data representations or user defined storage structures. The mechanism allows the programmer to implement query language operators in an external language (Java, C or Lisp) and to associate costs and selectivity estimates with different user-define access paths. The architecture relies on extensible optimization of such foreign function calls [LR92]. They are important both for accessing external query processors [Bra98] and for integrating customized data representations from data sources.

To achieve good performance we have carefully optimized the representation of critical system data structures, e.g. the storage manager, object representation, type information, and the representation of function definitions. We use tailored main memory data structure representations of system objects, rather than, e.g., storing them in relational tables represented as B-trees [GS92]. For example, our object identifiers are represented as variable length records with pointers to data structures representing type-information, function definitions, dependent objects, etc. It is crucial that system information is represented efficiently, since it is extensively looked up during both compilation and interpretation of AMOSQL functions. The storage manager relies on an incremental garbage collector for removing unused data.

AMOS II runs under Windows NT. The system uses around 350KB of code and 1500KB of meta data. The system has client-server and inter-database communication primitives whereby AMOS II servers can communicate over TCP/IP.

A graphical browser for AMOS II, Goovi [CR01], is also available. It is implemented in Java and has facilities for multi-database browsing and integration.

The predecessor of AMOS II, Amos [FRS93], was built on top of WS-Iris [LR92], the workstation version of the Iris system[Fis89], running on Unix platforms. AMOS II has a completely new kernel developed for Windows NT/95/98 that is now also ported to Unix platforms. AMOS II provides OO multi-database queries and reconciliation of heterogeneous data not present in Amos. Furthermore, AMOS II is designed for multi-layered distribution of mediator servers where distributed query optimization allows queries to be passed through many layers of mediators without any performance degradation.

The rest of this document gives an overview of the facilities of the AMOS II, including its OO data modeling primitives, its OO query language constructs, and its distributed multi-database facilities. The document AMOS II User's Manual describes the details of how to run AMOS II, the AMOSQL language, etc.

2 The AMOS II Data Model

The data model of AMOS II is an OO extension of the Daplex [Shi81] functional data model. The basic concepts of the AMOS II data model are objects, types, and functions.

2.1 Objects

Objects model all entities in the database. Everything in AMOS II is represented as objects managed by the system, both system and user-defined objects. There are two main kinds of representations of objects: literals and surrogates. The surrogates have associated explicit object identifiers (OIDs) which are explicitly created and deleted by the user or the system. Examples of surrogates are objects representing real-world entities such as persons, meta-objects such as functions, or even AMOS II mediators as meta-mediator objects.

The literal objects are self-described system maintained objects which do not have explicit OIDs. Examples of literal objects are numbers and strings. Literal objects can also be collections, representing collections of other objects. The system-supported collections are vectors (1-dimensional arrays of objects) and bags (unordered sets with duplicates).

Objects persist in the database until they are no longer referenced from any other object or from external systems. The removal of unreferenced objects is done through an automatic garbage collector.

Surrogate objects are created by the create stament, e.g.:

   create person instances :a;

creates an object of type person and assigns it to the session variable :a. (A session variable is a user variable that holds temporary results of AMOSQL computations; it is NOT considered as part of the database).

2.2 Types

Objects are classified into types making each object an instance of some types. The set of all instances of a type is called the extent of the type. The types are organized in a multiple inheritance, supertype/subtype hierarchy. If an object is an instance of a type, then it is also an instance of all the supertypes of that type; conversely, the extent of a type is a subset of the extent of a supertype of that type (extent-subset semantics). For example if the type student is a subtype of type person then the extent of type student is also a subset of the extent of type person. A type which is multiple inherited from other types has an extent which is the intersection of the extents of its supertypes.

Regular user-defined types are created by the create type statement, e.g.

   create type person;
   create type student under person;
   create type teacher under person;
   create type TA under student, teacher;

The above statement extends the OO database schema with four new types: A TA object is both a student and a teacher. The extent of type person is the union of all objects of type person, student, teacher, and TA. The extent of type TA is the intersection of the extents of type teacher and student.

The root in the type hierarchy is the type named object. All user-defined types are subtypes of the meta-type userobject. The above user type definitions creates the lower part of the following type hierarchy:

Every object has an associated type set which is the set of those types that the object is an instance of. Every object also has one most specific type which is the type specified when the object is created. For example, objects of type TA have the most specific type named TA. The full type set includes the most specific type and all types above the type in the type hierarchy. For example, objects of type TA have the type set {TA, teacher, student, person, userobject, object}. Notice that user-defined objects always have the meta-type userobject in their type set and that the extent of type userobject contains all user-defined objects in the database.

The type set of an object can dynamically change during the lifetime of the object through AMOSQL statements that change the most specific type of an object. The reason for such facilities is because the role of an object may change during the lifetime of the database. For example, a TA might become a student for a while and then a teacher.

All objects in the database are typed, including meta-objects such as those representing the types themselves. The meta-objects representing types are instances of the meta-type named type. In the example the extent of the type named type is the meta-objects representing the types named TA, teacher, student, and person.

The following picture shows parts of the system type hierarchy:

The type number is a supertype covering both integer and real numbers.

A collection are literals holding collections of other values.

Functions definitions are instances of the meta-type function.

The meta-database type datasource represents decriptions of other AMOS II databases and datasources known to this database. This will be explained in Mediation Primitives.

2.3 Functions

Functions model the semantics (meaning) of objects. They model properties of objects, computations over objects, and relationships between objects. They furthermore are basic primitives in OO queries and views. Functions are instances of the meta-type function.

A function consists of two parts, the signature and the implementation:

The signature defines the types, and optional names, of the argument(s) and the result of a function. For example, the signature of the function modeling the attribute name of type person could have the signature

   name(person p)->charstring nm

Functions can be defined to take any number of arguments, e.g. the arithmetic addition function implementing the infix operator '+' has the signature:

    plus(number,number)->number

The implementation specifies how to compute the result of a function given a tuple of argument values. For example, the function plus computes the result by adding the two arguments, and name obtains the name of a person by accessing the database.

The implementation of a function is normally non-procedural, i.e. a function only computes values and does not have any side effects. The exception is database procedures which are special functions having side effects.

AMOS II functions are, furthermore, often multi-directional meaning that the system is able to inversely compute one or several argument values if (some part of) the expected result value is known[LR92]. Inverses of multi-directional functions can be used in database queries and are important for specifying general queries with function calls over the database. For example, the following query, which finds the age of the person named 'Tore', uses the inverse of function name:

   select age(p) from person p where name(p)='Tore';

Depending on their implementation the basic functions can be classified into stored, derived, foreign, and proxy functions; and database procedures.

Stored functions represent properties of objects (attributes) stored in the database. Stored functions correspond to attributes in OO databases and tables in relational databases.
Derived functions are functions defined in terms of OO queries over other AMOSQL functions. Derived functions cannot have side effects and the query optimizer is applied when they are defined. Derived functions correspond to side-effect free methods in OO models and views in relational databases. AMOSQL has an SQL-like select statement for defining derived functions. It can also be used for ad hoc queries.
Foreign functions are implemented through an external programming language (Java, Lisp or C). Multi-directional foreign functions correspond to methods in OO databases and provide access to external storage structures similar to data 'blades', 'cartridges', or 'extenders' in object-relational databases. To help the query processor, a multidirectional foreign function can have several associated access path implementations with cost and selectivity functions.
Proxy functions represent functions in other databases. They are further explained in Mediation Primitives.
Database procedures are functions defined using a procedural sublanguage of AMOSQL. They correspond to methods with side effects in OO models.

AMOS II functions can furthermore be overloaded meaning that they can have different implementations, called resolvents depending on the type(s) of their argument(s). For example, the salary may be computed differently for types student and teacher. Resolvents can be any of the basic function types (a resolvent cannot be overloaded itself). AMOS II chooses then the resolvent based on the types of the argument(s), but not the result.

The extent of a function is a mapping between its arguments and its results. For example, the extent of the function defined as

   create function name(person p)-> charstring as stored;

is a set of tuples <P_i,N_i> where P_i are person objects and N_i are their corresponding names. The extent of a stored function is stored in the database and the extent of a derived function is defined by its query. Usually only parts of the function extents are accessed, e.g. the names of all persons in the database or the name of a given person. The (partial) extents are accessed through database queries, e.g.

   select name(p) from person p;
   select name(:v);

Some function may not have a fully computable extent, e.g. arithmetic functions have an indefinitely large extent. Queries over indefinite extents are not executable, e.g. the system will refuse to execute this query:

   select x+1 from number x;

3 Database manipulation

This section gives an introduction the the database manipulation primitives in AMOS II. There are three kinds of basic database manipulation commands: schema manipulation, database updates, and queries.

3.1 Schema manipulation

An AMOS II schema is represented by a set of type and function definitions. The definition of a new schema usually starts with defining a number of new types, e.g.

   create type Person;
   create type Teacher under Person;
   create type Student under Person;
   create type TA under Teacher, Student;
   create type Course;

The structure of the data to be associated with the new types is then defined through a set of function definitions. For example,

   create function name(Person) -> Charstring as stored;
   create function birthyear(Person) -> Integer as stored;
   create function hobbies(Person) -> bag of Charstring as stored;
   create function name(Course) -> Charstring as stored;
   create function teaches(Teacher) -> bag of Course as stored;
   create function enrolled(Student) -> bag of Course as stored;
   create function instructors(Course c) -> Teacher t as
        select t
        where teaches(t) = c; /* Inverse of teaches */

The function name is overloaded on types Person and Course. The function instructors is a derived function that uses the inverse of function teaches. The functions hobbies, teaches, and enrolled return sets of values. A bag (multiset) is a set with duplicates allowed. If bag of is declared for the value of a stored function it means that the result of the function is a set, otherwise it is a single value.

The types and their attributes can be defined together using the properties clause. For example, the above definition of types Person and Teacher with their attribute functions can also be defined by:

  create type Person properties
        (name Charstring,
         birthyear Integer,
         hobbies bag of Charstring);
   create type Teacher under Person properties
         (teaches bag of Course);

Functions (attributes) are inherited so the above statement will make objects of type Teacher have the attributes name, birthyear, hobbies, and teaches. Type attribute definitions and function definitions can be mixed freely. For example, after the initial schema above is defined and the database populated, one may need to add another attribute Phone to type Person, which is easily done with the statement:

   create function phone(Person) -> Charstring as stored;

We notice here that single argument AMOS functions are similar to relationships and attributesin the entity-relationship (ER) model and that AMOS types are similar to ER entities. The main difference between an AMOS function and an ER relationship is that AMOS functions have a logical direction from the argument to the result, while ER entities are direction neutral. Notice that AMOS functions normally are invertible and thus can be used in the inverse direction too. The main difference between AMOS types and the entities in the original ER model is that AMOS types can be inherited.

The following figure illustrates an AMOS II schema through a Daplex diagram [Shi81], where ovals are types (entities), thick arrows indicate inheritance relationships, and thin arrows indicate functions (directed relationships).

Types are deleted with the delete type statement, e.g.:

    delete type Person;

The system maintains referential integrity for type definitions, so in the above example the types Teacher, Student, and TA will also be deleted, along with all functions defined on these types. Notice, however, that the objects defined for the deleted types will NOT be deleted; they will only loose the deleted types from their type sets and they will still exist with the most specific type userobject.

Functions are deleted with the delete function statement, e.g.:

   delete function teaches;

The referential integrity maintenance will in this case delete all functions dependent on the deleted function. In the example the function instructors is also deleted.

3.2 Database updates

Once the schema is defined the database can be populated. The create statement creates new objects and sets attributes of the new objects, e.g.:

   create person(name, birhtyear, hobbies) instances
     :bill ("Bill", 1962, "Sailing"), :bob ("Bob", 1970, "Tennis");

The statement above creates two objects and sets their attributes name, birthyear, and hobbies. The (optional) environment variables :bill and :bob are set to the created objects, respectively.

The attribute assignments can be done separately too. The set command updates an instance of the extent of a function. For example, the create statement above is equivalent to:

   create person instances :bill, :bob;
   set name(:bill) = "Bill";
   set birthyear(:bill) = 1962;
   set hobbies(:bill) = "Sailing";
   set name(:bob) = "Bob";
   set birthyear(:bob) = 1970;
   set hobbies(:bob) = "Tennis";

Since AMOS functions can be bag valued there are two command to add and delete elements from bag valued functions, respectively. For example, to add the hobby Stamps to the hobbies of :bill do:

   add hobbies(:bill) = "Stamps";

After this command the result of the function call:

 
   hobbies(:bill);

will be:

   "Sailing"
   "Stamps"

To remove the hobby Sailing from the hobbies of :bill do:

   remove hobbies(:bill) = "Sailing";

To remove all Bill's hobbies do:

   set hobbies(:bill) = nil;

Set-oriented updates allow update statement to be applied on sets of objects. It is expressed as an iteration over the result of a query applying a database update operation. The iteration is expressed through a from clause, for example:

  set income(p) = pi
  from Person p, Integer pi
  where pi = sumagg(income(parents(p)))
    and income(p) > pi;

The above statement sets the incomes of all persons earning more than their parents' total incomes to their parents' total incomes. The updates are applied on a copy of the iterated data, which means that the database update operations are not cascaded.

Instances of objects are deleted with the delete statement, for example:

   delete :bill;
   delete :bob;

When an object is deleted it is also removed from the functions where it is stored.

3.3 Queries

3.3.1 Function calls

The simplest form of queries in AMOS II is function calls, for example:

   hobbies(:tore);

The query optimizer is not invoked for such unnested calls to AMOS II functions and the invocation is therefore very fast. The query optimizer is instead applied when a derived function is defined, and the function is then optimized for such unnested calls to the function.

Function calls can also be nested which can be seen as traversals of the function diagram, for example:

   name(teaches(:math));

Notice here that AMOS II uses Daplex semantics on function applications, which means that if a function returns a set (bag) of values (as, e.g., teaches) and another function is applied on the result of that function (as, e.g., name) then the outer function is applied on each instance of the result from the set valued function. In the example the result is the set of names of teachers of the course :math. The Daplex semantics makes it very simple to traverse the database through the function diagram. Daplex semantics is a form of generalized path expressions.

Another example of Daplex semantics is:

   sqrt(sqrt(16.0));

which will return the numbers 2.0 and -2.0, since the inner call to sqrt(16.0) returns 4.0 and -4.0 and sqrt(-4.0) returns NIL.

For nested function calls the query optimizer will be applied to produce an execution plan for the call, which is then immediately executed. Therefore nested function calls are significantly slower than unnested ones, and, if the calls are to be executed more than once, the user is recommended to avoid nested function calls by defining suitable derived functions. In the example above, define:

   create function teacher_name(Course p) -> Charstring
          as select name(teaches(p)); /* Optimizer invoked here
                                        and execution plan saved in db */

and then call

   teacher_name(:math); /* Optimizer NOT invoked here */

3.3.2 The select statement

General queries are formulated through the select statement having the format:

   select result
   from   type extents
   where  condition

For example;

   select name(p), birthyear(p)
   from   Person p
   where  birthyear(p) > 1970;

The above query will retrieve a tuple of the names and birth years of all persons in the database born after 1970.

In general the semantics of an AMOSQL query is as follows:

Form the cartesian product of the type extents.
Restrict the cartesian product by the condition.
For each possible variable binding to tuple elements in the restricted cartesian product, evaluate the result expressions to form a result tuple.
Result tuples containing NIL are not included in the result set; queries are null intolerant.

It would be very inefficient to directly use the above semantics to execute a query. It is therefore necessary for the system to do extensive query optimization to transform the query into an efficient execution strategy. Actually, in AMOSQL one may formulate queries that are not executable at all without query optimization. For example, the previous query could also have been formulated as:

   select nm, by
   from Person P, Charstring nm, Integer by
   where by = birthyear(p) and
         nm = name(p) and
         by > 1970;

The cartesian product of all persons, integers, and strings is indefinite so the above query is not executable without query optimization.

3.3.3 Defining derived functions

Derived functions are defined through the select statement. The following function definitions illustrate some of the power of using AMOSQL queries for defining derived functions:

   create type person;
   create type student under person;
   create function income(Person) -> Number as stored; 
   create function taxes(Person) -> Number as stored; 
   create function parents(Person) -> bag of Person as stored; 
   create function netincome(Person p) -> Number as 
        select income(p)-taxes(p);
   create function sparents(Person c) -> Student as 
        select parents(c); /* Parent if parent is student; 
                              bag of implicit for derived functions */ 
   create function grandsparentsnetincomes(Person c) -> Number as 
        select netincome(sparents(parents(c)));

Notice that in the definition of sparents the result is restricted to those persons also being students.

The function grandsparentsnetincomes illustrates the power of traversing the function diagram through the Daplex semantics, where each arc traversal become a function application. The function computes the net incomes of the studying grandparents of a person. Notice that derived functions, such as grandsparentnetincomes implicitly return bags (multisets) of values and therefore always have the result implicitly declared as bag of. In this case there will be more than one number returned when several grandparents study.

Given these function definitions one can formulate advanced queries, such as:

  select name(c) 
         from Person c
         where grandsparentsnetincomes(c) > 100000 and
         income(c) < 10000;

The above query selects the names of the persons earning less than 10000 who has a studying grandparent earning more than 100000.

3.3.4 Aggregation operators

The Daplex semantics is NOT used for aggregation operators which are functions that aggregate over subqueries. For example, to count how many grandparents :adam has, the aggregation operator count can be used which counts the elements in its argument:

   count(parents(parents(:adam));

Aggregation operators are defined by functions having an argument p declared as

   bag of p

For example, count has the signature:

   count(bag of object b) -> integer

Another very useful aggregation operator is sumagg which sums the numbers in a bag of numbers. For example, the following query finds those persons whose studying grandparents have a total income larger than 100000:

  select name(c) 
         from Person c
         where sumagg(grandsparentsnetincomes(c)) > 100000;

Nested subqueries return bags as results. For example, the following query totals the netincome for all persons:

  select sumagg(select netincome(p) from Person p);

3.3.5 Quantifiers

Queries with existential and universal quantification (exist and forall) are expressed through the aggregation operators some and notany, with signatures:

   some(bag of object o) -> boolean
   notany(bag of object o) -> boolean

some returns true if there is at least one object in the argument bag. It implements existential quantification.
notany returns true if there is no object in the argument bag. It implements not exists through which universal quantification can be expressed.

For example, the following queries find those persons having some grandparent, and having no grandparent, respectively:

   select name(p)
   from Person p
   where some(parents(parents(p)));

   select name(p)
   from Person p
   where notany(parents(parents(p)));

The following query finds those persons whose all grandparents earn more than 100000:

   select name(p)
   from Person p
   where notany(select gp from Person gp
                          where gp = parents(parents(p))
                            and netincome(gp)<100000);

Universal quantification is thus expressed through a notany with a negated subquery.

3.3.6 Disjunctive Queries

The 'or' operator works like a bag union operator, i.e. the union of the objects satisfying its operands without duplicates removed is returned. Queries and function definitions can have arbitrary nesting of 'and' and 'or'.

Example:

create function father(person) -> person as stored;
create function mother(person) -> person as stored; 
create function parent(person p) -> person q 
        as select q where q=father(p) or q=mother(p);

The function body of parent is a disjunctive query, since it contains an 'or'. parent would generate the set of all fathers and mothers for a given person.

3.3.7 Transitive closures

A transitive closure is all objects, o, reached directly or indirectly from an object, s, by applying some function, f. The classical example is to find all ancestors of a given person following the parent function (or finding all subparts of a given part). Transitive closures in AMOS II are computed by the built-in function tclose:

tclose(function f,object o,integer maxdepth)-> <object r,integer depth>

Starting with object o it constructs the transitive closure by successively applying f(o), f(f(o)) etc. down to level maxdepth. tclose returns the objects, r, in the closure and their distance, d, from o. f must be function with a single argument and result.

tclose is overloaded so that, as an alternative, the name of the traversal function can be specified as a string.

Example:

create function ancestors(person o)-> bag of person a
        as select a from integer d
        where  tclose("person.parents->person",o,200) = <a,d> 
                                and a != o;

The tclose function is invertible if the traversal function is invertible. This means that the direction of the transitive closure can be inverted. Thus both these queries are legal:

ancestors(:kain);
select p from person p where ancestors(p) = :eve;

The first query (function call) returns all ancestors of :kain while the other query returns all descendants of :eve .

An alternative definition of ancestors would be as a recursive function. This is, however, NOT supported in AMOS II!

4 Multi-database Architecture

The multi-database architecture of AMOS II allow several AMOS II systems to connect and communicate over a network using TCP/IP. There are furthermore AMOSQL data interoperability primitives to exchange data between different AMOS II systems and to mediate semantically heterogeneous data. Finally there are facilities to wrap and access relational databases [Bra98] or XML files.

4.1 Distribution

The figure below illustrates how AMOS II systems can communicate and how they can be configured in different modes with respect to how they interact with other systems. The lines indicate communication between sub-systems where the arrows indicate the servers.

The system can be configured in two dimensions:

It can be a single-user, a server or an embedded system, where a single-user AMOS II system is a private database, a server is servicing several other AMOS II systems, and an embedded system is linked to some application.
It can be a stand-alone, or a mediator system, where a stand-alone system is an isolated database and a mediator access data from some mediator(s) or data source(s).

The green-shaded AMOS II systems in the figure above illustrate the following modes of operation along the two dimensions:

	Single-user	Server	Embedded
Stand-alone	C	F	G
Mediator	B	D, E	A

(A) is an embedded AMOS II mediator linked to an application program. The small footprint of an embedded AMOS II system makes it easy to link it to applications. The system has interfaces to application programs in Java, C, and Lisp. Applications always access meditator servers by AMOSQL commands that are passed through an embedded AMOS II mediator.
(B) is a single-user mediator which imports and integrates data from AMOS II servers through the multi-database facilities, but which is not servicing other systems.
(C) is a single user stand-alone database where the user can enter AMOSQL commands to populate, search, and update a private database.
(D) is a mediator server which services inter-database requests from other AMOS II systems and defines mediating OO views that integrate data from other servers.
(E) is a mediator server which translates data from a relational database. It has knowledge of how to translate AMOSQL queries to SQL [FR97] and interfaces to call SQL through ODBC [Bra98]. It can use the facilities of AMOSQL for semantic mediation of data from its data source and its local database into views presented to other systems.
(F) is a stand-alone database server which is accessed from mediator (D) by TCP/IP. It is also a nameserver. It keeps track of the mediator servers and clients in this group of mediators, as indicated by the dotted arrows. Every AMOS II mediator belongs to a group of mediators and must be given a unique name within the group. The nameserver is an ordinary AMOS II mediator server having the special task to store information about names, locations, and other meta-properties of the mediators in a group. A nameserver thus identifies a group of mediators and all mediators in the group will access meta-data about the federation of mediators from the nameserver (dotted lines in the figure above).
(G) is a stand-alone embedded AMOS II system which provides database facilities for an application, e.g. for FEA analysis [OR96].

When you start running AMOS II you initially will have a stand-alone single-user database which cannot communicate with other AMOS II stand-alone databases or mediators. The stand-alone database can become a server by issuing some system function calls. There are furthermore system calls for making the stand-alone system join or leave a mediator federation through updates to the nameserver database.

4.1.1 The nameserver

Every AMOS II mediator belongs to a group of mediators and must be given a unique name within the group. There is one particular AMOS II meta-mediator server, the nameserver (F), which keeps track of the mediator servers and clients in a group of AMOS II mediators. The nameserver is an ordinary AMOS II mediator server having the special task to store information about names, locations, and other meta-properties of the mediators in a group. A nameserver thus identifies a group of mediators and all mediators in the group will access meta-data about the federation of mediators from the nameserver (dotted lines in Figure). The nameserver can, however, also be an ordinary AMOS II database which may be accessed from mediators in the group (illustrated through the solid line from D to F).

Every mediator must has a name registered with a nameserver and the nameserver is also its own nameserver. So the first thing to do when setting up a group of AMOS II mediators on the net is to start nameserver as follows:

1. Start a stand-alone AMOS II database.

2. Call the AMOSQL function
nameserver("<server name>");
for example
nameserver("MyServer");
This call will register the stand-alone AMOS II database as a mediator server named <server name> in the nameserver, i.e. in itself.

3. Start running the nameserver by calling
listen();
After this call the system will no longer prompt for AMOSQL commands but will start listening for inter-database requests from other AMOS II systems.

You can interrupt the listening loop by typing CTRL-C. After that the server is not active and you can type AMOSQL commands as usual. Resume the server listening loop again by calling listen();.

An unnamed AMOS II system is not a mediator but can be a stand-alone (C) or an embedded stand-alone (G) database. It can also be an embedded AMOS II system (A) through which an application communicates with mediator servers. Before an AMOS II database has become a named client mediator it is NOT allowed to import or integrate data from mediators to store in the local database.

4.1.2 Mediator clients

The following AMOSQL function makes a stand-alone AMOS II database into a mediator client:
register(<mediator name>[,<host name>]);
For example
register("MediatorA","lina1.ida.liu.se");
The current AMOS II database is then registered as a client mediator in the name server running on host <host name>. The default host name is the local workstation.

Once a mediator client is registered it may send multi-database requests to the mediator servers registered in its name server. The most primitive way to communicate with mediator servers is through the system function
ship("<mediator server>","<AMOSQL command>");
For example
ship("MyServer","select t from type t;");
It executes an AMOSQL command in a mediator server and ships the result back to the issuing mediator. Section 9.2 describes sophisticated multi-database query facilities in AMOS II through which data can be retrieved and combined from several mediator servers.

4.1.3 Mediator servers

To make a client mediator into a mediator server, simply call
listen();
It is currently not possible to revert a mediator server back to a client mediator.

4.2 Mediation

Several constructs are provided for accessing data from data sources and for combining different heterogeneous type hierarchies. The main such mediation primitives are:

A mechanism to access external data sources from AMOS II. The meta-type datasource describes properties of each data source and provides the basic mechanisms for multi-database queries to each kind of external data source. The wrappers for each class of data sources are defined as subtypes to the virtual type datasource.
Some special kinds of types and functions that are used for importing OO schemas from other AMOS II servers and for combining and transforming heterogeneous type hierarchies.

The following figure illustrates the meta-types used for data integration:

4.2.1 Mediation types

For data mediation, the types are divided into stored, proxy, derived, and integration union (IUT) types. Their type meta-objects are members of the extents of the meta-types storedtype, proxytype, derivedtype, and IUT, respectively.

Stored types are the regular types whose instances are explicitly stored in the the local database and created by the user. The supertypes of a stored type must also be stored types.
Derived types (DTs) are defined in terms of other types through queries, i.e. DTs define types as views. Derived types are important for data mediation but can also be used for modeling OO views. The extent of a DT is a subset of the extents of one or more constituent supertypes specified through a query over the supertypes. Its extent is a subset of the intersection of the extents of the constituent types. The principles of the DTs are described in detail in [JR99a].
Proxy types represent external objects stored in other AMOS II servers or in some of the supported types of data sources, e.g. ODBC data sources [Bra98].
Integration union types (IUTs) provide a mechanism for defining OO views capable of resolving semantic heterogeneity among meta-data and data from multiple data sources. Informally, while the DTs represent restrictions and intersections of extents of other types, the IUTs represent reconciled unions of data in one or more mediators or data sources.

Proxy types provide specifications of general multi-database queries over AMOS II servers and data sources, while derived and integration union types provide mechanisms for resolving semantic heterogeneities between object structures through object views. Queries over the OO views are transformed into multi-database queries over data in multiple data sources.

4.2.2 External data access

Each kind of data source accessible from an AMOS II mediator must have a wrapper defined. A wrapper defines interfaces between the AMOS II kernel and the kind of data sources supported by the wrapper. The AMOS II schema also contains descriptions of the propertes of the wrapper. Each such wrapper description is defined by a special type called a wrapper type which must be a subtype of the system type datasource. (The type datasource itself is virtual and has no instances of its own.) The figure illustrates wrapper definitions for the subtypes amos (accessing other AMOS II servers), relational (acessing relational databases), DTD (accessing XML files), and STEP (accessing STEP/EXPRESS files). (The wrappers for XML and STEP are not in the basic system, but in special AMOS II versions). When a new data source is to be accessed from a mediator the user creates a new instance of the particular wrapper type in the mediator through a data importation procedure specific for each wrapper. Each wrapper has a set of primitive data source access functions which are overloaded on the wrapper types. These data source access functions can be used in low level multi-database queries to data sources accessed though the wrapper. For example, relational database wrappers must supply an overloaded data access function sql(..query..) which submits an SQL string to the relational database of a data source for execution. However, the user does not normally use the primitive data source access functions directly. Instead each wrapper provides a set of data importation procedures which import data from a particular data source either by materializing the data in the mediator or by defining proxy types that represent objects mapped to external data. The proxy types can then be used in mult-database queries.

References

[Bra98] Silvio Brandani: Multi-database Access from Amos II using ODBC. In Linköping Electronic Press, Vol. 3, Nr. 19, Dec. 8th, 1998 (http://www.ep.liu.se/ea/cis/1998/019/).

[CR01]K.Cassel, T.Risch: An Object-Oriented Multi-Mediator Browser. Presented at 2nd International Workshop on User Interfaces to Data Intensive Systems, Zürich, Switzerland, May 31 - June 1, 2001

[FRS93] G. Fahl, T. Risch, M. Sköld: AMOS - An Architecture for Active Mediators. Workshop on Next Generation Information Technologies and Systems (NGITS'93), Haifa, Israel, June 1993.

[FR97] G. Fahl, T. Risch: Query Processing over Object Views of Relational Data. VLDB Journal, November 1997 ( http://www.dis.uu.se/~udbl/publ/vldbj97.pdf).

[Fis89] D.Fishman et al.: Overview of the IRIS DBMS, in W.Kim, F.H.Lochovsky (eds.): Object-Oriented Concepts, Databases, and Applications, ACM Press, Addison-Wesley, 1989.

[FR95] S. Flodin, T. Risch, Processing Object-Oriented Queries with Invertible Late Bound Functions, Proc. VLDB Conf., Zürich, Switzerland, 1995.

[GS92] H.Garcia-Molina and K.Salem: Main Memory Database Systems: An Overview, IEEE Transactions on Knowledge and Data Engineering, Vol. 4, No. 6, Dec. 1992.

[JKR99] V.Josifovski, T.Katchaounov, T.Risch: Optimizing queries in distributed and composable mediators, Presented at 4th Conference on Cooperative Information Systems, CoopIS'99, Edinburgh, Scotland, September 1999.

[Jos99] V.Josifovski: Design, Implementation and Evaluation of a distributed Mediator System for Data Integration, PhD Thesis No 582, Linköping University, 1999 (http://www.dis.uu.se/~udbl/publ/vanjaphd.pdf).

[JR99a] V.Josifovski, T.Risch: Functional Query Optimization over Object-Oriented Views for Data Integration. Journal of Intelligent Information Systems (JIIS), Vol. 12, No. 2-3, 1999.

[JR99b] V.Josifovski, T.Risch: Integrating Heterogeneous Overlapping Databases through Object-Oriented Transformations, Proc. 25th Intl. Conf. On Very Large Databases, Edinburgh, Scotland, September 1999 ( http://www.dis.uu.se/~udbl/publ/vldb99.pdf).

[JR00] V.Josifovski, T.Risch: Query Decomposition for a Distributed Object-Oriented Mediator System , To be published in Distributed and Parallel Databases J., Kluwer, 2000.

[LR92] W.Litwin, T.Risch: Main Memory Oriented Optimization of OO Queries Using Typed Datalog with Foreign Predicates, IEEE Transactions on Knowledge and Data Engineering, Vol. 4, No. 6, December 1992 ( http://www.dis.uu.se/~udbl/publ/tkde92.pdf).

[Lyn91] P. Lyngbaek et al: OSQL: A Language for Object Databases, Tech. Report, HP Labs, HPL-DTD-91-4, 1991

[OR96] K.Orsborn, T.Risch: Next Generation of O-O Database Techniques in Finite Element Analysis. The Third International Conference on Computational Structures Technology, Budapest, Hungary, August 21-23, 1996.

[RJ00]T.Risch, V.Josifovski: Distributed Data Integration through Object-Orinented Mediator Servers, to be published in Theory and Practice of Object Systems J., John Wiley & Sons, 2000.

[Shi81] D.Shipman: The Functional Data Model and the Data Language DAPLEX, ACM Transactions on Database Systems, 6(1), 1981.

[Sko96] M. Sköld, Active Rules based on Object Relational Queries - Efficient Change Monitoring Techniques, PhD Thesis No 494, Linköping University, 1996 (http://www.ida.liu.se/~edslab/publications.html).

[SR96] M. Sköld, T. Risch: Using Partial Differencing for Efficient Monitoring of Deferred Complex Rule Conditions. 12th International Conf. on Data Engineering (ICDE'96), (IEEE), New Orleans, Louisiana, Feb. 1996.

[Ull88] J.D.Ullman: Principles of Database and Knowledge-Base Systems, Volume I and II, Computer Science Press, 1988 and 1989.

[Wie92] G Wiederhold: Mediators in the Architecture of Future Information Systems, IEEE Computer, 1992.