We present a black-box active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Different dialects of EFSMs are widely used in tools for model-based software development, verification, and testing. Our algorithm infers a class of EFSMs called register automata. Register automata have a finite control structure, extended with variables (registers), assignments, and guards. Our algorithm is parameterized on a particular theory, i.e., a set of operations and tests on the data domain that can be used in guards.
Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We also show that, under these conditions, our framework induces a generalization of the classical Nerode equivalence and canonical automata construction to the symbolic setting. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.
Note: Updated and superceded by Technical Report 2015-032.
Download BibTeX entry.