Abstract Data Types

What is abstraction? A tool for handling complexity. Less important aspects/details are omitted, while the important aspects are kept. Sometimes out of ignorance, but often to simplify and permit us to see the "big picture" which often gives us new insights.

E.g. abstraction in science (physical theories), engineering (models), social sciences (statistics).

Programming: An inherently complex endeavour. Abstraction is our most important tool for writing good programs, handling complexity, separating "what" from "how".

In abstraction, you identify common patterns, usually with systematic variations. E.g. particular constants, procedural abstraction, non-basic control structures such as for-loops.

Definitional abstraction use abstract names instead of concrete values. This also allows us to extend our vocabulary by giving names to the common patterns (values, procedure, data types, control structures etc.)

Facilitates separation of concerns.

Pioneering computer scientist Edsger W. Dijkstra wrote in 1974: "Let me try to explain to you, what to my taste is characteristic for all intelligent thinking. It is, that one is willing to study in depth an aspect of one's subject matter in isolation for the sake of its own consistency, all the time knowing that one is occupying oneself only with one of the aspects. We know that a program must be correct and we can study it from that viewpoint only; we also know that it should be efficient and we can study its efficiency on another day, so to speak. In another mood we may ask ourselves whether, and if so: why, the program is desirable. But nothing is gained --on the contrary!-- by tackling these various aspects simultaneously. It is what I sometimes have called "the separation of concerns", which, even if not perfectly possible, is yet the only available technique for effective ordering of one's thoughts, that I know of. This is what I mean by "focusing one's attention upon some aspect": it does not mean ignoring the other aspects, it is just doing justice to the fact that from this aspect's point of view, the other is irrelevant. It is being one- and multiple-track minded simultaneously."

In programming, separation of concerns implies that different parts of the program should as far as possible avoid depending on each other.

E.g. the implementation of a feature (e.g. a data type) should not be concerned with the actual use of the feature, while the use of the feature should not be concerned with the implementation. This is data abstraction, the idea behind abstract data types. Focus on the concerns that must be shared, i.e. the interface. (observable behaviour). Separate what from how. ("information hiding"). Facilitates understanding, changes.

Example. What is a (natural) number? Is 14 a number? No - it is marks on the paper, pixels on the display, a representation, concrete, syntactic. Can be written in different ways.

"Base 1": iiiiiiiiiiiiii (14 "strokes")
Base 2: 1110
Base 10: 14
Roman: XIV

Numbers are ideas that must be represented. The representation does not usually help in defining operations, says nothing (or at least very little) about the relationship of different numbers. Needs an interpretation which depends on the representation.

An abstract description: Zero is a number. Each number has a unique successor which is also a number. This gives no info about the representation. It helps in defining operations (Peano arithmetic). Explains properties, semantics, behaviour

In mathematics, we can (almost) do with only the abstraction of numbers, since all we are going to do is, in effect, talking about them. In programming, we will do computations on hardware, so we must have the representation, but the use of numbers should focus on operations, not representation so as to be independent of the representation. Even on computers we have different representation (word size, floating point, bignums, even negative numbers historically). Thus separation of concerns. The usual integers in programming languages are abstract (more or less).

How to implement the operation to compute successor of a number:
"Base 1", add a "stroke"
Base 2, flip last bit. If flipped to 0 do same to the next bit to the left, repeat
Base 10, Look at last digit. Change 0 to 1, change 1 to 2 etc.... change 9 to 0. If changed to 0 too same to the next digit to the left, repeat.
Roman, tricky....

Implementation translates from abstract view to concrete representation.
Interpretation translates from concrete representation to abstract view.

The choice of representation is important

Memory usage
How to implement operations
Runtime, complexity
How to interpret

Some operations are more or less difficult to implement depending on the representation (see above).

Example: odd number predicate.

"Base 1", Group the "strokes" two-by-two. The number is odd if one stroke is left over.
Base 2, The number is odd if the last bit is 1.
Base 10, The number is odd if the last digit is 0, 2, 4, 6 or 8.
Roman, Tricky....

Some operations can be simplified by adding redundancy. E.g. In base 1, add an "odd/even" flag. Zero has the flag set to "even". Incrementing the number flips the flag in addition to adding a stroke. Checking if the number is odd amounts to looking at the flag. Note that this leads to representations that do not correspond to any number in the abstract view, e.g. 111+even.

Sometimes difficult to make the operations fully abstract - e.g. integers with different word lengths have different minimum and maximum values, floating point numbers have traditionally been represented different with different precision etc. (thus the IEEE floating point standard used today). Memory classes - heap/stack allocation - can be a major problem.

Built-in types may be fully abstract "in principle" (i.e. ML, Java) or almost not at all (i.e. C).

ML, Java are strongly typed, explicit type conversions are required to change the type of a value (always in ML, mostly in Java), no looking inside the representation.

C is weakly typed, e.g. the representation of pointers are visible as integers (permitting pointer arithmetic, null pointer checking by comparing to zero). Representation of booleans similarly visible as integers (false=0, true<>0). This is not good from the standpoint of producing good code and reliable, maintainable programs.

Example in C, the piece of code

if (x) {...}

means what? x not equal to 0, x not equal to NULL, x true? Or worse, x>0 (if it happens that x will not be negative)

Basic data structuring mechanisms. Are they abstract? Fully abstract in ML and Java, not at all in C. What happens in C?

Arrays (repetitions). E.g. int a[3]. In the abstract view functions from index sets to values. In C arrays are defined to be pointers to a reserved block of memory, no bounds checking.
Structures. E.g. struct s {int x; char y; ...}. Abstract view is objects with parts. In C the layout in memory of the parts is visible.
Union structures. E.g. union name {int x; char y;}. In the abstract view alternative kinds of values, the particular kind is known in every case. In C the layout in memory is visible, the particular kind is not kept track of - e.g. the integer x can be accessed as a real number y with the same bit pattern and vice versa.

Note that we do have expectations on how implementations do things even in ML or Java, but due to the abstraction it *can* be done differently in these languages but not in C. E.g. an array which is known to be sparse (most elements are zero) could be implemented as a table in ML or Java - not possible in C since C programs have direct access to how arrays are implemented.

Abstract Data Types (ADT). What is it?

A type (values with operations)
Abstract (only the abstract set of values and behaviour of operations is known, not the representation.)

The interface is all which is accessible to a user of the ADT. The interface includes the ADT name and abstract operations ("primitives"). Primitives can be loosely classified as constructors, initialisers, recognisers, selectors/accessors, comparators, modifiers, iterators, other things. The implementation provides a concrete operations of ADT values in terms of values of the representation types. It provides a concrete implementation of ADT operations in terms of the concrete representation and the operations on the representation types.

Again, the implementation can change as long as the abstract behaviour (interface) is unchanged.

ADT guidelines

Keep the interface small and simple.
Consider alternative abstract views. Is the particular view appropriate from the application point of view and not just chosen to reflect implementation?
An application-oriented interface is ok, but not ideosyncratic ones. I.e. don't omit obvious operations just because you don't happen to need them and don't put the application functionality into the ADT by defining overly specialised primitive operations. (The distinction is sometimes not obvious.)
Check that you have all operations you need for your application.
Consider different representations/implementations. Perhaps start simple and change later to a more complicated one if need be (because memory or performance problems). Application-oriented representations can be ok, they can always be changed later. E.g. vectors: cartesian or polar representation? You can even use alternative representations at the same time as long as the change between them is automatic and transparent.
Avoid having several concrete representations of one abstract value. It can complicate the implementation. E.g. Booleans in C (3 and 4 both mean "true") or rational numbers represented as two integers (divident and divisor). How do you compare 1/2 with 2/4 for equality. They are different representations but represent the same abstract number.
Having representations where some values do not correspond to abstract values is ok, just make sure that such concrete values can never be produced. Check input. E.g. rational number representation 1/0.
Don't lump different concepts together in the same ADT (e.g. int+real). Better to have a separate "union" ADT which combines other ADTs if need be.
An ADT should have high cohesion (a measure of how related and focused the responsibilities of a software module is).
The coupling (interdependency) between ADTs should be low. (Example of low coupling: parameter passing, of high coupling: shared global variables).

Some ADT examples

Lists. Abstract view: a finite sequence. Abstract operations: empty, cons, hd, tl, equality comparisons. Perhaps also nth, length, append, map, fold. Possible implementations: Arrays, linked chains, chunks (linked arrays). Possible redundant info: length, last link (for destructive append).

Tables. Abstract view a finite mapping: empty, member, lookup, update, remove, override, convert to list (convenient). Possible implementations: array, list, search tree, hash table etc.

File system (several ADTs):
Filename/Path. Abstract view should distinguish between various parts, e.g. proper name and type (The file type is a part of the file name in most operating systems but can be separate, e.g. on Mac OS, or can be hidden to appear separate as in Windows XP.) Unix has a convention that i/o to non-file system devices uses file names beginning with "/dev". The concrete syntax can be much different in different operating systems (e.g. / to separate directories in Unix, \ in Windows). All these differences in concrete representation should be abstracted away in the ADT.
Open file. Abstract view can be sequence, array etc.

Some hints on ADT implementation in C. Requires discipline - C cannot really hide information., Put the interface in a .h file, implementation in a .c file. See the coding standard lecture.

Discussion: Is abstraction necessarily good? No, efficiency and code size may suffer. E.g. modern GUIs, MS Word. Increased hardware speed allows developers to trade efficiency in execution for efficiency in development using many levels of abstraction. Also abstraction may be considered to make programming cumbersome and less fun (a common argument for C!!). Read Gabriel in the paper collection!

Check out these terms in e.g. Wikipedia: abstraction, separation of concerns, information hiding (encapsulation), ADT, interface, cohesion, coupling. Also read the paper collection (compendium).