**NUMERIC MULTIDIMENSIONAL
ARRAYS**

*API Reference*

** **

This reference covers C, ALisp and AmosQL functionality
regarding the use of NMA objects, both memory-resident and proxy. The terms *NMA*
and *array* are used interchangeably, and refer to all kinds of NMA
objects, unless stated otherwise.

The SciSPARQL functionality is described in SciSPARQL User Manual, and it makes no distinction between resident and proxy NMAs.

All dimension numbers, logical subscripts and storage indexes are 0-based in all the following interfaces. (In SciSPARQL they are 1-based by default, which can be changed by using _sq_python_ranges_ switch)

** **

** **

**1. NMA interface in C**

SSDM implements, and *ssdm.h* header makes available to
C programs the complete set of functionality to create and access NMAs.

**Warning:** all functions listed here are unsafe:
calling them might corrupt the memory or crash the process if the incorrect
arguments are supplied!

**1.1 Creating arrays**

The process should follow one of the paths shown in the following graph

**oidtype make_nma0(int ndims)**

Allocate and array (or array proxy) descriptor for the given number of dimensions

**void ****(oidtype x, int i, LONGINT size)**

Set the size of array x in dimension i to size.

**voi****d nma_init(oidtype x, int kind)**

Allocate the storage (for a resident array x with initialized dimensions) to accommodate elements of given kind. The default nesting of elements puts the first dimension to be the outmost one. The following kinds are used:

0 - integer

1 - double

2 - complex

**void nma_initProxy(oidtype x, int**** kind,
int proxyTag, oidtype s)**

Initialize the array proxy object, providing s ALisp value as an array identifier, and registered proxyTag value to identify the array storage subsystem. (The proxy tags are registered from ALisp). The default nesting order of dimensions puts the first dimension to be the outmost - this should be changed if the actual storage order is different.

**v****oid nma****_reverseStorageOrder(oidtype
x)**

Reverse the storage order of resident array or array proxy. Reversing the default storage order puts the first dimension to be the inmost one.

**oidtype nma_a****llocate(oidtype x, int newKind)**

Allocate a new resident array to store the elements of described by a given derived array or array proxy x. If newKind is negative, the same element type will be used, otherwise it can be widened by supplying a value as to nma_init().

**1.2 Reading array properties**

**int nma_ndims(oidtype x)**

Return the number of dimensions of an array x (resident or proxy).

**LONGINT nma_dim(oidtype x, int i)**

Return the size of array x in dimension i.

**int nma_kind(oidtype x)**

Return the element type of array, as listed for the kind argument to nma_init() function.

**int ****nma_kind2elemsize(int kind)**

Return the size (in bytes) of the element type identified by integer kind value.

**int nma_proxytag(oidtype x)**

Return the proxy tag of an array proxy, 0 for resident arrays (can be used to distinguish the former from the latter)

**int ****nma_isOriginal(oidtype x)**

Return 1 if the array (or array proxy) includes all elements
of its respective storage object (or stored array). Return 0 for derived
resident arrays and array proxies. There is typically one *original* array
or array proxy, and array projection/subset operations result in (non-original)
smaller arrays (or proxies).

**oidtype nma_s(oidtype x)**

Return array identifier in the storage system as an ALisp value. For resident arrays, the storage object is returned as a 1-dimensional NUMARRAY vector. Direct use of NUMARRAY is not advised, since the same object is returned for original array and its derived arrays. However, oidtype comparison can be used identify if two arrays are related. For proxies, appropriate deep comparison (available in ALisp) should be used to establish equivalence.

**1.3 Array elements and the iterator interface**

The elements (as integer, double or complex values) of
resident arrays (both original and derived) are accessible for reading and
writing by pointers returned by the *iterator* interface. Every array has
a built-in *iterator* to store the logical subscript of element being
currently accessed. Multiple *iterators* per
array will be supported in a future version.

**void nma_iter_reset(oidtype x)**

Reset the *iterator* to point to the first elements
(with every subscript set to 0) of a resident array x.

**int nma_iter_next(oidtype x)**

Advance the iterator to the next element. The rightmost logical subscript is incremented when possible. Return 1 on success and 0 if there are no more elements to iterate across.

**void nma_iter_setidx(oidtype x, int i, LONGINT idx)**

Set the logical subscript idx in dimension i of an array element to access.

**LONGINT nma_iter_getidx(oidtype x, int i)**

Get the current iterator index in dimension i.

**void* nma_iter2pointer(oidtype x)**

Get the pointer to the array element pointed to by iterator. This pointer should be cast to the respective C type.

**Warining**: Even though certain pointer arithmetics is
possible on original arrays, with the nesting order of dimensions in mind, it
is not recommended, as less straightforward storage approaches might be
implemented in the future versions. For derived arrays, the access logic is
already quite complex, and it is fully implemented in the iterator interface.

In order to apply vector operations to multiple array
elements at once, the use of **Fragment Mapper**, as described in the next
section, is encouraged.

**1.4 Fragment Mapper**

An original resident array is currently always stored in a single contiguous memory region. Array projection and subset operations result in smaller derived arrays, containing the same physical elements, however, contiguity is no longer guaranteed. It is sometimes desirable (e.g. for memory-to-file dumping, or SIMD vector arithmetics) to process the array elements in contiguously stored sequences. To identify the biggest contiguous sequences, the Fragment Mapper interface should be used.

The user *mapper* function should have the following
signature:

**int (*NMAFragmentMapper)(oidtype ****x,
LONGINT si, LONGINT fsize, void *xa);**

with arguments including the array (or array proxy) x, 1-dimensional storage index si, fragment size fsize, and a pointer to arbitrary data (typically used to store the context of the performed operations) xa. It returns 0 normally, or non-zero as a stop signal (so that no further calls to it will occur).

**void nma_mapfragments(oidtype x, NMAFragmentMapper mapperFn, void *xa)**

Call the mapperFn function for every (largest possible) contiguous fragment of array (or array proxy) x. Pass xa pointer on every call.

**1.5 Debug functions**

**void nma_inspect(oidtype x)**

Print (to the standard output) all the internal properties of the array (or array proxy) x for each dimension. This includes offset, original array size, storage order, logical origin, stride, and access multiplier, and derived array size, as described in [#2012].

**void nma****_sinspect(oidtype x, char* buf)**

Similar to nma_inspect(), but prints the output to the character buffer buf, which should be allocated upfront.

**2. NMA Interface in ALisp**

All functions are type-safe, unless stated otherwise. Appropriate ALisp errors are raised if wrong arguments are provided.

*Vector*
is used as a synonymous term for a standard lisp array, one created with make-array,
accessed with aref and serialized as #(...)
in Lisp.

**2.1 Creating arrays and accessing their elements**

**make-nma**** (kind dims)**

Create a new resident array of given kind and with dimensions provided as dims list or vector. The following kinds are used:

0 - integer

1 - double

2 - complex

**make-nmaproxy**** (kind dims
proxy-tag s)**

Create a new array proxy object of given kind and with dimensions provided as dims list. proxy-tag is the result of nma-register-proxytag described below. s is any Lisp value serving to identify the array in the external system. Derived proxies will share the same s value.

**nma-allocate**** (x new-kind)**

Allocate a new resident array to store the elements of described by a given derived array or array proxy x. If new-kind is negative, the same element type will be used, otherwise it can be widened by supplying a value as make-nma.

**nma-copy**** (x)**

Create a new resident array that is a deep copy of resident array x. Fails on array proxies.

**nma-elt (x idxs)**

Return the numeric element of resident array x identified by idxs list or vector of logical subscripts.

**nma-set (x idxs value)**

Set the element of resident array x identified by idxs to the given value.

**are-valid-subscripts (x idxs)**

Return T if idx contains the valid subscripts denoting an element in array x. Return NIL otherwise.

**nma-fill (x list)**

Fill the array x with values from the hierarchical list list or verctor. The latter should contain as many nesting levels as the dimensions of x, and a list or vector at each level should hold exactly as many elements as the array size in the respective dimension. Numeric types are widened when necessary. Errors are raised on incorrect element types or list shape.

**array-to-inma (vector)**

Transform a Lisp vector of integers into a 1-dimensional integer NMA.

**nma2array ****(x &optional
projection)**

Convert an NMA x to a nested Lisp vector. If projection list is specified, project NMA dimensions first according to indexes given in projection. If projection length is the same as number of dimensions in x, return single element as a number.

**2.2 Accessing array properties**

**nma-ndims (x)**

Return the number of dimensions of an array x (resident or proxy).

**nma-dim (x i)**

Return the size of array x in dimension i.

**nma-dims (x)**

Return a Lisp vector containing all dimensions sizes of x.

**nma-kind (x)**

Return the element type of array, as listed for the kind argument to make-nma function.

**nma-kind2elemsize**** (kind) **

Return the size (in bytes) of the element type identified by integer kind value.

**nma-s (x)**

Return array identifier in the storage system as an ALisp value. For resident arrays, the storage object is returned as a 1-dimensional NUMARRAY vector. Direct use of NUMARRAY is not advised, since the same object is returned for original array and its derived arrays. However, EQ comparison can be used identify if two arrays are related. For proxies, appropriate deep comparison should be used to establish equivalence.

**nma-set-s (x s)**

Set the array identifier for an array proxy, or a new NUMARRAY object for a resident array. Use with caution.

**nma-proxytag**** (x)**

Return 0 for resident arrays, or integer *proxy tag*,
identifying the storage subsystem for array proxies. Can be used to distinguish
resident arrays from array proxies)

**nma-is-original**** (x)**

Return T if the array (or array proxy) includes
all elements of its respective storage object (or stored array). Return NIL
for derived resident arrays and array proxies. There is typically one *original*
array or array proxy, and array projection/subset operations result in
(non-original) smaller arrays (or proxies)

**nma-elemcnt**** (x)**

Return the number of elements in array x.
Equals to the product of dimension sizes, or 1 for *single-element*
(dimensionless) proxies.

**2.3 Iterator interface**

The elements (as integer, double or complex values) of
resident arrays (both original and derived) are accessible for reading and
writing by pointers returned by the *iterator* interface. Every array has
a built-in *iterator* to store the logical subscript of element being
currently accessed. Multiple *iterators* per
array will be supported in a future version.

**void nma-iter-reset (x)**

Reset the *iterator* to point to the first elements
(with every subscript set to 0) of a resident array x.

**nma-iter-next (x)**

Advance the *iterator* to the next element. The
rightmost logical subscript is incremented when possible. Return T
on success and NIL if there are no more elements to iterate across.

**nma-iter-cur**** (x)**

Return the current element of x pointed to by the iterator.

**nma-iter-setcur (x value)**

Set to value the element of x pointed to by the iterator.

**2.4. Array transformations**

**nma-permute (x positions)**

Change the logical order of dimensions of x, given the positions list or
vector containing *N* distinct *0* to *N-1* integers, where *N*
is the dimensionality of x. For
example, transposition of 2-dimensional matrix x can be achieved by calling (nma-permute x '(1 0)).

**is-permutation-idxs****
(positions)**

Return T if positions is a list of distinct *0* to *N-1* integers
where *N* is list length. Return NIL otherwise

**nma-subset (x k lo stride hi)**

Return a derived array of x (with the same number of dimensions), containing slices in dimension
k starting from and including lo and up to
and including hi. The array shape in
other dimensions remains unaffected. If stride is
not equal to 1 then *lo, lo+stride, lo+2*stride, ...* subscripts, limited
by *hi* are selected in the given dimension.

**nma-project (x k idx)**

Return a derived array if x (with decreased number of dimensions), or its element, if x is 1-dimensional. A single index idx is selected in dimension k, resulting in a single slice.

The above operations are implemented so that no actual array data is touched, hence they are cheap to perform, are independent of array size, and work equally well on resident arrays and array proxies.

They can also be composed freely, resulting in a new derived array at each step. Distributivity is guaranteed, as long as the dimension index changes are accounted for. For example, for a 2-dimensional array x and valid parameters a...f the following equivalences hold:

(nma-subset (nma-subset x 0 a b c) 1 d e f) <=> (nma-subset (nma-subset x 1 d e f) 0 a b c)

(nma-project (nma-project x 1 d) 0 a) <=> (nma-project (nma-project x 0 a) 0 d)

(the dimension 1 of x becomes the dimension 0 after the inner projection)

(nma-project (nma-permute x '(1 0)) 0 d) <=> (nma-project x 1 d)

(nma-subset (nma-permute x '(1 0)) 1 a b c) <=> (nma-permute (nma-subset x 0 a b c) '(1 0))

** **

**2.5 Array computations**

**nma-vsum (x y)**

Element-wise sum of arrays of the same shapes. A new array with the element type widest among x and y is allocated and returned.

**nmau-visum (x y)**

Same as above, but array x is updated and returned. Fails if element type of y is wider.

**nma-scale (x s)**

Multiplication of each element of x by scalar value s. A new array of the shape of x and element type widest among x and s is allocated and returned.

**nmau-scale (x s)**

Same as above, but array x is updated and returned. Fails if numeric type of s is wider.

**nma-round (x)**

Round each element of array x, provided its element type is *double*.
A new array of the shape of x and element
type *integer* is allocated and returned.

**nma-roundto (x digits)**

Round each element of array x to the specified precision 10^(-digits). A new array of the shape and element type of x is allocated and returned.

**nmau-roundto (x digits)**

Same as above, but array x is updated and returned.

**2.6 Boolean matrix computations**

Boolean arrays are currently
indistinguishable from integer arrays, storing 4 bytes per element. More
compact representations will be introduced in a future version. The operations
below treat zero values as *false*, and non-zero values as *true*,
and are only defined on 2-dimensional matrices.

**nmau-reflexive-closure (x)**

Builds a reflexive closure of a binary predicate defined by x (by setting diagonal elements to 1). Matrix x is updated and returned.

**nmau-transitive-closure (x) **

Build a transitive closure of a binary
predicated defined by x, using Warschall algorithm (O(N^{3})
complexity, where *N* is the diagonal length of possibly non-square Boolean
matrix) Matrix x is updated and returned.

**nmau-cr-transitive-closure (x column-map)**

A version of the above algorithm, suited for Boolean matrices containing considerable amount of columns entirely filled with zeroes, and thus omitted from x. An 1-dimensional integer NMA column-map maps presented columns of x to the respective row indexes (none of the rows are omitted). Matrix x is updated and returned.

In order to handle the Boolean matrices containing lots of zero-filled rows instead, matrix x should be transposed before and after the operation.

**nma-boolean-product (x y padding)**

Computes the join of two binary predicates defined by x and y Boolean matrices. A new Boolean matrix is allocated and returned. If padding is 0, x and y are required to be of compatible sizes (i.e. number of columns in x should be the same as number of rows in y), otherwise, x and y can be of different sizes, and will be padded. The following padding modes are supported:

0 - strict (no padding)

1 - diagonal

2 - all zeroes

**2.7 Intra-array aggregation**

*(Unsafe)* **nma-sum (x)**

Computes the sum of all elements of x. returns the number of same type as element type of x.

*(Unsafe)* **nma-avg (x)**

Computes the average of all elements of x.
returns the floating-point number. (*Complex* element type is not
currently supported)

*(Unsafe)* **nma-min (x)**

Computes the minimum of all elements of x.
returns the number of same type as element type of x. Fails on *complex* data type.

*(Unsafe)* **nma-max (x)**

Computes the maximum of all elements of x.
returns the number of same type as element type of x. Fails on *complex* data type.

**2.8 Debug functions**

**nma-inspect (x)**

Print (to the standard output) all the internal properties of the array (or array proxy) x for each dimension. This includes offset, original array size, storage order, logical origin, stride, and access multiplier, and derived array size, as described in [#2012].

**2.9. Array bulk output**

** **

**nma-dump (x stream)**

Print the contents of the resident array x as a nested Lisp list to the Lisp stream.

*(Unsafe) * **nma-filedump
(x arrayid chunksize is-hex filesize-max get-filename-fn)**

Break the array into chunks of given chunksize and write to the sequence of files (of size up to filesize-max) with names returned by calls to get-filename-fn Lisp function or closure, with no parameters. Sizes are provided in bytes.

Each chunk is written as a record with
provided integer arrayid, sequentially
generated *chunk id*, and the
binary data representing the chunk itself. If is-hex parameter is 1, text (CSV) files are written, with chunk data in
hexadecimal format. Otherwise, MS SQL Server native binary format is used, so
that the resulting files can be loaded into

BULK INSERT ArrayChunks FROM <filename> WITH (DATATYPE = 'native')

into a table with fields (INTEGER, INTEGER, VARBINARY).

The last file remains open, so the next call to nma-filedump will write another array to the same file.

**nma-filedump-close**** ()**

Closes the last file open by a call nma-filedump. A subsequent call to nma-filedump will have to open a new file.

**2.10. Handling array proxies**

**apr (x)**

If x is not an array proxy, returne it unchanged. Otherwise, resolve it, by creating a resident array and filling it with data retrieved from the corresponding external storage system. If a proxy size limit was set using nma-proxy-setlimit, raise NMA_BIGPROXY_ERROR if amount of array elements would exceed that limit.

**nma-register-proxytag (resolve-fn)**

Register an array storage subsystem, and
return the integer proxy tag to be used when creating proxies referring to
arrays stored in that system. If resolve-fn is not NIL, this lisp function or closure will be called with a single
argument, whenever apr Lisp function, apr() and aapr() Amos functions encounter an array
proxy object with the associated *proxy tag*. In case of chunk-based proxy
resolving, resolve-fn here should be NIL, and nma-register-aapr should be subsequently called, providing the additional parameters.

**nma-register-aapr (proxytag buffer2query-fn query-fn-str
buffer-limit fp-limit)**

Register to use the Aggregated Array Proxy Resolve algorithm (AAPR) to resolve the proxies with specified proxytag.

Lisp function or closure buffer2query-fn will be called with two arguments

buffer - a list of Lisp vectors, where the
first element is *chunk id*, sorted on *chunk id* but possibly
containing repeated *chunk id* values

arrayid - an array identifier, as stored in the nma-s property of array proxies

This buffer2query-fn function or closure should return a vector of arguments to call an
Amos function identified by query-fn-str name (as a
string), which would return a bag vectors {chunkid, chunk} vectors, sorted by chunkid, with chunk values of *Binary*
type. The existence of function identified by query-fn-str is required at the time nma-register-aapr is
called.

The amount of distinct chunk ids in the buffer is limited by buffer-limit argument (the actual buffer size might be bigger due to repeated chunk id values). The amount tolerated irrelevant chunks returned from Amos (i.e. "false positives", in case a pattern-based query is induced from the buffer) is limited by fp-limit parameter.

**nma-proxy-set-default-chunksize (proxytag chunksize)**

Set the default chunk size (in bytes) for proxies with given proxytag to chunksize.

**nma-proxy-setlimit (limit)**

Set the maximum number of elements in a proxy that can be resolved by apr Lisp function, and also apr() and aapr() Amos functions.

**nma-proxy-enabled ()**

Return T if at least one *proxy tag* is registered, NIL otherwise.

**nma-cache-setlimit (limit)**

Set the total maximum size of chunk cache (in bytes). Only the sizes of cached chunks are counted to this limit.

**nma-proxy-startcache (x)**

Start the chunk cache on array proxy x. All derived proxies will share reference to the same cache, in the
same way the share the *array id*.

**nma-proxy-has-cache (x)**

Check if the proxy is connected to a chunk cache, return T if true, NIL otherwise.

**nma-cache-reset ()**

Empty all chunk caches.

**nma-cache-put (x chunkid chunk)**

Put the *Binary* object chunk with associated integer chunkid into
the chunk cache associated with array proxy x.

**3. NMA Interface in AmosQL**

** **

** **