Fri, 18 Aug 1995 15:57:24 GMT

Related articles |
---|

More Taxonomies and Toolkits of Algorithms (my PhD dissert.) watson@win.tue.nl (1995-08-18) |

Newsgroups: | comp.compilers |

From: | watson@win.tue.nl (Bruce W. Watson) |

Keywords: | report, tools, available |

Organization: | Eindhoven University of Technology, the Netherlands |

Date: | Fri, 18 Aug 1995 15:57:24 GMT |

Hi All,

I would like to announce the availability of my PhD dissertation. The full

citation is:

Watson, Bruce W. ``Taxonomies and Toolkits of Regular Language

Algorithms'', Faculty of Computing Science, Eindhoven University of

Technology, 1995.

The research reported in it should appeal to anyone who read and used (and

hopefully enjoyed) my previous taxonomies or C++ toolkits. The summary appears

later in this posting. While the software will be available for anonymous ftp

from Eindhoven, only the paper version of the dissertation is (for now)

available.

If you would like a copy of it, please send me your full mailing address. If

you are interested in puting a copy in your company/university/institute

library, just ask for an extra copy. I would also welcome any suggestions of

other people who may be interested in a copy.

Bruce. (watson@win.tue.nl)

Summary:

-------

A number of fundamental computing science problems have been studied since the

1950s and 1960s. For each of these problems, numerous solutions (in the form of

algorithms) have been developed over the years. In these collections of

solutions, we can identify the following three deficiencies:

1. Algorithms solving the same problem are difficult to compare to one

another. This could be due to the use of different programming languages,

paradigms, styles of presentation --- or simply the addition of unnecessary

details.

2. Collections of algorithm implementations solving a problem are difficult to

find. Some of the algorithms are presented in a relatively obsolete manner,

either using a now-defunct notation or obsolete programming language,

making it difficult to either implement the algorithm or find an existing

implementation.

3. Little is known about the comparative practical running time performance

of the algorithms. The lack of existing implementations in one and the same

framework, especially of some of the older algorithms, has made it

difficult to determine the running time characteristics of the algorithms.

Selection of an algorithm must then be made on the basis of the theoretical

running time, or simply by guessing.

In this dissertation, a solution to each of these deficiencies is presented. To

present the solutions, we use the following three fundamental computing science

problems:

1. Keyword pattern matching in strings. Given a finite non-empty set of

keywords (the patterns) and an input string, find the set of all

occurrences of a keyword as a substring of the input string.

2. Finite automata (FA) construction. Given a regular expression, construct a

finite automaton which accepts the language denoted by the regular

expression.

3. Deterministic finite automata (DFA) minimization. Given a DFA, construct

the unique minimal DFA accepting the same language.

In the following paragraphs, we will outline the solutions presented for each

of the deficiencies.

The difficulty in comparing algorithms is overcome by creating a {\em taxonomy}

of algorithms for a given problem. Each of the algorithms is rewritten in a

common notation and is examined to determine the essential ingredients that

distinguish it from any other algorithm. These ingredients (known as {\em

details}) can take the form of problem details (a restriction on the class

of problems solved), or algorithm details (some correctness-preserving change

to the algorithm to improve efficiency). Once each algorithm has been reduced

in such a way, it can be characterized by its set of details. In presenting the

taxonomy, the common details of several algorithms can be factored and

presented together. In this fashion, a `family tree' of the algorithms is

constructed, showing clearly what any two algorithms have in common and where

they differ. Because the root of the family tree is a na\"{\i}ve, and correct,

algorithm and the details are applied in a correctness-preserving manner, the

correctness argument for each of the algorithms is implicit in the taxonomy.

The common notation and presentations in the taxonomies enable us to implement

the algorithms uniformly, in the form of a {\em class library} (also known as a

{\em toolkit}). The factoring of essential details, inherent in the taxonomies,

leads to factoring of common components in the inheritance hierarchy of the

class library. Object-oriented concepts, such as virtual (or deferred) member

functions, inheritance and template classes, prove to be useful in presenting a

coherent class library, the structure of which reflects the taxonomy from which

it was created. For the first time, most (if not all) solutions are presented

in a single class library, giving clients of the library a large choice of

objects and functions.

With a class library that contains most of the known solutions, we are finally

able to gather data on the performance of the algorithms in practice. Since

the algorithms are taken from a single class library which was implemented by

one person, and the quality of implementation of the class library is

homogeneous, the relative performance data gathered (comparing algorithms) is

not biased by the implementation. This performance data allows software

engineers to make more informed decisions (based upon their needs, and the

characteristics of their input data) concerning which algorithm and therefore

which objects and functions to use in their applications.

The development of the taxonomies has not been without its spin-offs. In each

of the three taxonomies presented, significant new algorithms have been

developed. These algorithms are also implemented in the corresponding class

libraries. The techniques developed in the taxonomy of pattern matching

algorithms proved to be particularly useful in deriving an algorithm for

regular expression pattern matching, and in doing so, answering an open

question posed by A.V. Aho in \cite[p.~342]{ah:pattern-matching-open}.

--

_____________________________________________________________________________

Bruce Watson

watson@win.tue.nl

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.