The topic is distributed stepwise refinement of finite sequences (or shortly DSRFS).
The three (strongly interconnected) parts of the presentation took place in room A328 (Computer Science Building) from 10:15 to 11:00 AM on three consecutive Mondays: Aug 20, Aug 27, and Sep 3.
The presentation gives an introductory description of a proposed formal framework for DSRFS. (The topic is much more closely related to volume B than to volume A of Handbook of Theoretical Computer Science (J. van Leeuwen (ed.), Elsevier and MIT Press 1990). That is, "formal models and semantics" rather than "algorithms and complexity".) Proofs are omitted.
Below, an abstract can be found, and also a skeletal outline. Moreover, copies of the presented slides are downloadable.
Remark: For the developments after the presentation was given, see the afterthoughts at the bottom.
A formal device for distributed stepwise refinement of finite data sequences is introduced. The device, a tetrasystem, is also well suited for modeling such established tools as macro processors and parametric Lindenmayer systems. It differs from such classical formal devices as Chomsky grammars (which have more applications in program analysis than in program generation) by being able to operate on an infinite alphabet (which enables dealing with structured symbols such as macro calls) and by being sensitive to an unbounded two-sided context. However, even (context-sensitive) Chomsky grammars, pure grammars, and (classical) Lindenmayer systems can be rather satisfactorily modeled by suitable tetrasystems.
A tetrasystem has two main components: a letter-refiner, and a separate control mechanism. The letter-refiner is a single monolithic function comprising all the rewriting rules.
The central problem with a tetrasystem is to guarantee that the output has the intended semantics. If the letter-refiner is only known to be semantics-preserving, the control mechanism must not allow parallelism. But when the letter-refiner fulfills a sufficiently strong additional constraint, synchronous and even asynchronous parallelism become possible. Moreover, it may even become possible to switch between various subtypes of sequential and parallel rewriting, without any need to modify the letter-refiner. (These switches may change the structure but not the semantics of the output.) Any asynchronous parallelism then enables truly distributed processing.
This skeletal outline tries to avoid mathematical formalism. More mathematics and thus details are included in the actual presentation, however.
The estimated schedule is that each one of the three component sessions covers three of the nine top-level items listed below.
(See the abstract above.)
We consider trees with rather standard properties: they are finite, rooted, and ordered. (That is: the tree has only a finite number of nodes; there is a unique root node among the tree nodes; and the son, or direct descendant, nodes of each non-leaf node are totally ordered - (say) left to right.) We start with a very general function class: belt-providers include any such two-argument function that takes a given tree and a given node of the tree and returns such a node set that constitutes a cross section of the tree. Tree cross sections are called shortly as belts. So given a tree and any one of its belts, each leaf node of the tree - by definition - has exactly one ancestor node in the belt. (Of course, the ancestor may be a non-proper one, that is, the leaf itself.) Moreover, the nodes in any belt - trivially by the orderedness of each tree - constitute a finite node sequence.
Given a belt-provider, a tree, and a node in the tree, there are obviously three (mutually exclusive) alternative scenarios when the belt-provider is applied to the given node: the belt that is returned includes
We then formulate two rather restricted classes of belt-providers: belt-assigners and belt-selectors. Of these two function classes, the latter is a proper subclass of the former. All the listed alternatives (1)-(3) are possible scenarios with the belt-assigner class, but only alternatives (1) and (2) are possible with the belt-selector class. In what follows, belt-selectors are used heavily but the more general belt-assigner class not at all. (It is however suggested that both notions might find use even outside the scope of the present work.)
Example: The following are the two extreme ones (in an obvious sense) among all the belt-assigners:
Our formal device for distributed stepwise refinement is called a tetrasystem. The device has two main components - which are intended to be as orthogonal to each other as possible: a refinement rule base and a control mechanism. The control mechanism is constituted by a frame, which is simply a 4-tuple of belt-selectors. (So the prefix "tetra-" stems from the number of the belt-selectors involved.)
The refinement rule base is actually a single monolithic function operating on letters and words. The letter alphabet need not be finite; and the words are just finite letter sequences. The single function is called a letter-refiner. It takes a letter and two words, and returns a single word. The two argument words stand for the two sides of the refinement context, and the returned word represents the refinement result of the argument letter in the specified two-sided context. Note that the effective refinement context is thus unbounded in both directions.
In the previous paragraph, it was implicitly assumed that the letter-refiner is always strictly deterministic. But actually, even nondeterminism is supported: the letter-refiner can return a non-empty set of words (instead of just one word), of which any single one will do as the context-sensitive refinement result.
The distributed stepwise refinement is implemented as a tree generation process. Each tree node contains a single letter. Furthermore, the letter alphabet divides into nonterminals and terminals, and a terminal letter can appear in a leaf node only.
Suppose now that some finite data sequence is to be refined into a "lower-level" or "more detailed" finite data sequence (which is typically much longer than the original sequence). The process can be outlined as follows:
So the process terminates if (and only if) there are no fertile nonterminal-lettered leaves any more. But even if it never terminates, output words may still be produced, because mature nodes may well - in the general case at least - be found even before termination.
Example: The tetrasystem device is claimed to be versatile enough to be able to rather satisfactorily (even if in an informal sense only) emulate the following well-known devices (below, "HFL" refers to Handbook of Formal Languages, vols. 1-3, by G. Rozenberg and A. Salomaa (eds.), Springer 1997):
In each of the four cases listed above, choosing a suitable 4-tuple of belt-selectors effectively does the job. (In the cases of context-sensitive Chomsky grammars and pure grammars, however, the progressiveness properties (see below) of the tetrasystem model reflect the original features only in a rather meaningless sense. This is because the (nondeterministic) letter-refiner of the appropriate tetrasystem models must in practice be augmented with trivial (even void) extra refinement rules that simply allow each nonterminal to refine to itself in every context.)
The only belt-selector we have completely characterized - in an earlier example - is the one that always returns the full leaf sequence. It might be mentioned (again without bothering about any evidence in this outline) that this belt-selector can serve as belt-selectors 2, 3, and 4 both in the macro processor frame and in the context-sensitive Chomsky frame, and moreover as belt-selectors 2 and 4 in the pure grammar frame.
Remark: The four devices discussed in the preceding example and their tetrasystem models will be dealt with also in later examples.
The tree generation process described above includes two potential sources of nondeterminism:
However, a simple condition involving belt-selectors 1 and 2 effectively deals with the latter one of the sources listed above. When the condition holds, any two trees produced (independently of the number of the expansion steps involved) can always be merged (in a very natural sense and in a very straightforward fashion) into a compound tree that could have been produced from either one of the two trees. (Notice that in any imaginable practical application, the effects of the former source of nondeterminism seem to be eliminable by some trivial bookkeeping.) So confluence is achieved.
Example: The frames for macro processors and for L systems do both meet the above-mentioned condition. In contrast, the ones for context-sensitive Chomsky grammars and for pure grammars do not. (With macro processors, both of the two sources of nondeterminism are, in fact, inherently blocked.)
Three properties dealing with progressiveness are introduced:
When both distributive progressiveness and confluence hold, the tree generation process can fork into several subprocesses (each of which will eventually be able to fork again, etc. in a recurrent fashion). Thus distributed processing - without no need of re-synchronization of any kind - is enabled.
Distributive progressiveness implies strong progressiveness, which in turn implies weak progressiveness.
Example: In the case of macro processors, belt-selector 1 is weakly but not strongly progressive; in the case of L systems, it is strongly but not distributively progressive. (But, as mentioned in an earlier example, the formalized progressiveness properties of our tetrasystem models for context-sensitive Chomsky grammars and for pure grammars do not particularly well reflect the intuitive features of the original devices. With the augmented letter-refiners (also mentioned above), these models in fact employ such belt-selector 1 that is distributively progressive.)
Intuitively, a sound tetrasystem preserves the semantics of the input word. The formalization of this notion can be outlined as follows:
Note especially that both letter-refiner soundness and tetrasystem soundness do depend on the chosen semantics but frame soundness does not.
Example: The frames for macro processors, for context-sensitive Chomsky grammars, and for pure grammars are all three sound. In contrast, the frame for L systems is not sound.
Remark: There is an uncountable set of normal belt-selectors, each of which may be used as a component of a sound frame. Each normal belt-selector actually specifies a unique root-to-leaves tree traversal order.
Immediately above, the only thing we required from the letter-refiner was that it should preserve the semantics. That is sufficient for sequential rewriting only.
But the distributed processing scheme described above (in conjunction with distributive progressiveness) means parallel rewriting, which in this case is, moreover, asynchronous rather than synchronous. If one strives for sound parallel rewriting - synchronous or asynchronous - then one has to formulate additional constraints to be imposed on the letter-refiner.
So we now consider conditional frame soundness, which requires not only soundness but also non-trivial further properties of the letter-refiner.
Example: The L system frame implements synchronously parallel rewriting. There is a simple additional constraint on the letter-refiner which enables this particular frame to preserve the semantics.
We briefly address the following issues:
(See the slides below.)
Copies of the presented slides are downloadable (as pdf):
Remark: These afterthoughts were updated 2012-11-04.
Some parts of the presentation (whose final part was given 2012-09-03) are now outdated or found to need improvements.
In section "Tetrasystems", it was required that every mature node must be a non-leaf one. This should be relaxed: even nonterminal-lettered leaves should be eligible candidates for matureness. This in turn implies that the frame for pure grammar emulation should be changed so that the third belt-selector would differ from σI in one detail: for 0, its characteristic comb should return 1 - and not 0 like the characteristic comb of σI. (The new belt-selector may be denoted as σI|1|σI, and its characteristic comb actually returns 1 for every integer.)
The following changes have taken place:
It should be pointed out that the dummy productions effectively added to the tetrasystem models of context-sensitive Chomsky grammars and pure grammars (in section "Tetrasystems") enable infinite derivations (which the original grammar may totally lack).
A new and highly condensed 20-slide version of the key sections "Belt-selectors" and "Tetrasystems" was uploaded 2012-11-04.