TAXONOMY
The following paper breaks down and explains the
innovative techniques that Dr. Levitin uses to get the taxonomy working
for him in the classroom. Just as algorithms have patterns and best-practices,
so Anany has found that the carefully crafting the taxonomy of a lesson
can vastily improve students comprehension of the material by ordering
it with flexibility technique grouping in mind. Below you will find
the abstract of the paper, the work in full can be ordered from Dr.Dobbs.com.
Do We Teach the Right Algorithm Design Techniques ?
Abstract
Introduction
Four General Design Techniques
A Test of Generality
Further Refinements
Another Example
Conclusion
References
Abstract
Algorithms have come to be recognized as the cornerstone of computing.
Surprisingly, there has been little research or discussion of general
techniques for designing algorithms. Though several such techniques
have been identified, there are serious shortcomings in the existing
taxonomy. The paper points out these shortcomings, reevaluates some
of the techniques, and proposes a new, hierarchical classification scheme
by grouping techniques according to their level of generality. A variety
of examples from different areas of computing are used to demonstrate
the power and flexibility of the taxonomy being proposed.
1.
Introduction
A study of algorithms has come to be recognized as the cornerstone of
computer science. The progress in this field to date, however, has been
very uneven. While the framework for analysis of algorithms has been
firmly established and successfully developed for quite some time, much
less effort has been devoted to algorithm design techniques. This comparative
lack of interest is surprising and unfortunate in view of the two important
payoffs in the study of algorithm design techniques: "First, it
leads to an organized way to devise algorithms. Algorithm design techniques
give guidance and direction on how to create a new algorithm. Though
there are literally thousands of algorithms, there are very few design
techniques. Second, the study of these techniques help us to categorize
or organize the algorithms we know and in that way to understand them
better." [7,p.33]
Despite the dearth of papers dedicated to the subject, primarily through
efforts of textbook writers [1,4,5,6,8,13], a consensus seems to have
evolved as to which approaches qualify as major techniques for designing
algorithms. This list includes: divide-and-conquer, greedy approach,
dynamic programming, backtracking, and branch-and-bound. ( In addition,
probabilistic and parallel algorithms are two other important design
approaches; since they belong to different paradigms, we are not going
to discuss them here.)
This widely accepted taxonomy has serious shortcoming, however. First,
it includes techniques of different levels of generality. For example,
it seems obvious that divide-and-conquer is more general than, say,
the greedy approach and branch-and-bound, which are applicable to optimization
problems only. Second, it fails to distinguish divide-and-conquer and
what we will call decrease-and-conquer as two qualitatively different
techniques. Third, it also fails to include brute force and transform-and-conquer,
which we consider important general design techniques. Fourth, its linear,
as opposed to hierarchical, structure fails to reflect important special
cases of the techniques. Finally, and most importantly, it fails to
classify many classical algorithms (e.g., Euclid's algorithm, heapsort,
search trees, hashing, etc.)
The paper seeks to rectify these shortcomings. Section 2 reviews four
strategies: brute force, divide-and-conquer, decrease-and-conquer, and
transform-and-conquer. Section 3 proposes a test of generality to distinguish
more general techniques from less general ones; its application explains
why the four techniques discussed in Section 2 were singled out from
the rest. Section 4 refines the new classification scheme further. Section
5 gives another convenient educational vehicle for illustrating the
major design techniques. Section 6 concludes the paper by summarizing
its findings and contains a discussion of the advantages and limitations
of the new taxonomy being proposed.
2.
Four General Design Techniques
We will start with a discussion of a well-known design approach that
has been missing from the tables of content of textbooks organized around
algorithm design techniques: brute force.
It can be defined as a straightforward approach to solving a problem,
usually directly based on the problem's statement and definitions of
the concepts involved. Though very rarely a source of efficient algorithms,
the brute-force approach should not be overlooked as an important algorithm
design technique in view of the following. First, unlike some of the
others, this approach is applicable to a very wide variety of problems.
(In fact, it seems to be the only general approach for which it is more
difficult to point out problems it cannot tackle.) In particular,
it is brute force that is used for many elementary but important algorithmic
tasks such as computing the sum of n numbers, finding the largest element
in a list, adding two matrices, etc. Second, for some important problems
(e.g., sorting, searching, matrix multiplication, string matching),
the brute-force approach yields reasonable algorithms of at least some
practical value with no limitations on instance sizes. Third, even if
too inefficient in general, a brute-force algorithm can be still useful
(and an economically sound!) choice for solving small-size instances
of a problem. Fourth, a brute-force algorithm can serve an important
theoretical or educational purpose, e.g., as the only deterministic
algorithm for an NP-hard problem or as a yardstick for more efficient
alternatives for solving a problem. Finally, no taxonomy of algorithm
design techniques would be complete without it; moreover, as we are
going to see below, it happens to be one of only four design approaches
classified as most general.
Divide-and-conquer is probably the
best known general algorithm design technique. It is based on partitioning
a problem into a number of smaller subproblems, usually of the same
kind and ideally of about the same size. The sub-problems are then solved
(usually recursively or, if they are small enough, by a simpler algorithm)
and their solutions combined to get a solution to the original problem.
Standard examples include mergesort, quicksort, multiplication of large
integers, and Strassen's matrix multiplication; several other interesting
applications are discussed by Bentley [3]. Though most applications
of divide-and-conquer partition a problem into two subproblems, other
situations do arise: e.g., the multiway mergesort [9] and Pan's algorithm
for matrix multiplication [14]. As to the case of a single subproblem,
it is difficult to disagree with Brassard and Bratley [5] that for such
applications, "... it is hard to justify calling the technique
divide-and-conquer." [p.223] Hence, though binary search
is often cited as a quintessential divide-and-conquer algorithm, it
fits better in a separate category we are about to discuss.
Solving a problem by reducing its instance to a smaller one, solving
the latter (recursively or otherwise), and then extending the obtained
solution to get a solution to the original instance is, of course, a
well-known design approach in its own right. For obvious reasons, we
will call it decrease-and-conquer.
(Brassard and Bateley [4,5] use the term "simplification"
which we are going to use below for a different design technique.) This
approach has several important special cases. The first, and more frequently
encountered, decreases the size of an instance by a constant. The canonical
example here is insertion sort; other examples are provided by Manber
[11] who has investigated an intimate relationship between this approach
and mathematical induction. Though the size-reduction constant is equal
to one for most algorithms of this type, other situations may also arise:
e.g., recursive algorithms that have to distinguish between even and
odd sizes of their inputs.
The second special case of the decrease-and-conquer technique covers
the size reduction by a constant factor. The examples include binary
search and multiplication à la russe. Though most natural examples
involve a size reduction by the factor of two, other situations do happen:
e.g., the Fibonacci search for locating the extremum of a unimodal function
(e.g., [2, pp. 153-155]) and the "divide-into-three" algorithm
for solving the problem of identifying a lighter false coin with a balance
scale.
Finally, the third important special case of the approach covers more
sophisticated situations of the variable-size reduction. Examples include
Euclid's algorithm, interpolation search, and the quicksort-like algorithm
for the selection problem.
Though the decrease-and-conquer approach is well known, most authors
consider it either a special case of divide-and-conquer (e.g., [13])
or vice versa [11]. In our opinion, it is more appropriate, from theoretical,
practical and especially educational points of view, to consider divide-and-conquer
and decrease-and-conquer as two distinct design techniques.
The last technique to be considered here is based on the idea of transformation
and will be called transform-and-conquer.
One can identify several flavors of this approach. The first one ---
we will call it simplification ---
solves a problem by first transforming its instance to another instance
of the same problem (and of the same size) with some special property
which makes the problem easier to solve. Good examples include presorting
(e.g., for finding equal elements of a list), Gaussian elimination,
and heapsort (if the heap is interpreted as an array with the special
properties required from a heap).
The second --- to be called representation
change--- is based on a transformation of a problem's input
to a different representation, which is more conductive to an efficient
algorithmic solution. Examples include search trees, hashing, and heapsort
if the heap is interpreted as a binary tree.
Preconditioning (or Preprocessing)
can be considered as yet another variety of the transformation strategy.
The idea is to process a part of the input or the entire input to get
some auxiliary information which speeds up solving the problem. The
examples include the Knuth-Morris-Pratt and Boyer-Moore algorithms for
string matching, Winograd's matrix multiplication, and determining ancestry
in a tree [5, p.293].
Finally, in the last and most drastic version of the transform-and-conquer
approach, an instance of a problem is transformed to an instance of
a different problem altogether. Though this idea plays a central role
in the NP-completeness theory, practical algorithms based on this idea
are relatively rare. It is by this reason that we are not going to include
it in the taxonomy chart to be given below.
3.
A Test of Generality
In addition to the observations made above, we propose to partition
design techniques into two categories: The first one will include most
general techniques while the second will contain the remaining, i.e.
more limited, approaches. Though one can probably come to a reasonable
consensus as to which of these two categories each of the known techniques
should belong to, it would clearly be better to have a specific criterion
or criteria for making such a determination. For our part, we would
like to suggest the following test. In order to qualify for inclusion
in the category of most general approaches, a technique must yield reasonable
(though not necessarily optimal) algorithms for the two problems: sorting
and searching.
We can justify this choice by the importance of the problems selected
(" Indeed, I believe that every important aspect of programming
arises somewhere in the context of sorting or searching" [9, p.v])
and by noting with satisfaction that it results in partitioning the
known techniques in a manner quite supported by intuition. Indeed, only
four techniques --- brute force, divide-and-conquer, decrease-and-conquer,
and transform-and-conquer --- pass the test (see Table 1); the others
--- greedy approach, dynamic programming, backtracking, and branch-and-bound
--- fail to qualify as the most general design techniques.
SORTING SEARCHING
Table 1: Applicability of design techniques to
sorting and searching
| TECHNIQUE |
SORTING |
SEARCHING |
| Brute
force |
selection
sort |
sequential
search |
| Divide
& conquer |
mergesort |
applicable |
| Decrease
& conquer |
insertion
sort |
applicable |
| Transform
& conquer |
heapsort |
search trees, hashing |
4. Further Refinements
By identifying above special cases of the decrease-and-conquer and transform-and-conquer,
we have made a stride toward a multilevel taxonomy. One can strengthen
this line of thinking further. First, it might be useful to distinguish
between two types of divide-and-conquer algorithms. For some of such
algorithms, the bulk of the work is done while combining solutions to
smaller subproblems (e.g., mergesort); for others, it is processing
before a partition into subproblems that constitutes the heart of the
algorithm in question (e.g., quicksort). We will call these two types
of the divide-and-conquer technique divide-before-processing and process-before-dividing,
respectively.
Second, Moret and Shapiro [12] have considered the greedy approach as
one of two types of a more general approach to optimization problems:
"Greedy methods build solutions piece by piece... Each step increases
the size of the partial solution and is based on local optimization:
the choice selected is that which produces the largest immediate gain
while maintaining feasibility. Iterative methods start with any feasible
solution and proceed to improve upon the solution by repeated applications
of a simple step. The step typically involves a small, localized change
which improves the value of the objective function." [p.254]. Moret
and Shapiro devote separate chapters to illustrate each of these techniques;
more examples can be found by following the references therein. We like
this delineation though the name improvement methods seems to be more
descriptive of the second type than "iterative methods."
Further, one can point out two types of dynamic programming algorithms
as well. The canonical version of this approach is bottom-up: a table
of solutions to subproblems is filled starting with the problem's smallest
subproblems. A solution to the original instance of the problem is then
obtained from the table constructed, with many of the table's entries
remaining typically unused for the instance in question. In order to
overcome the latter shortcoming, a top-down version of dynamic programming
was developed, based on using so called "memory functions"
(e.g., [5, sec. 8.8]).
Finally, the two techniques---backtracking and banch-and-bound --- both
deal with combinatorial problems by constructing so called "state-space
trees." The difference between them lies in that backtracking is
not limited to optimization problems, while branch-and-bound is not
restricted to a specific way of traversing the problem's space tree.
So, it would be natural to consider them as special cases of a more
general approach. What is this approach to be called? The name "exhaustive
search" is often used in the literature. However, there are two
problems with this term. First, both backtracking and branch-bound try,
in fact, to avoid exhaustive search by pruning a problem's tree. Second,
the exhaustive search can be, in fact, an alternative approach to solving
a problem (albeit usually an inferior one) by generating and checking
all the candidates for the problem's domain. Therefore it is more natural
to consider the latter approach as an application of the brute-force
technique. Thus, instead of "exhaustive search", we
will use state-space-tree techniques to refer to the general design
approach containing both backtracking and branch-and-bound as its special
cases.
Thus, we end up with the following alternative taxonomy of major design
techniques:
| MAJOR ALGORITHM DESIGN TECHNIQUES |
| More general techniques |
Local search techniques |
| Brute force |
- improvement methods
- greedy methods |
Divide-and-conquer
- divide before processing
- process before dividing |
Dynamic programming
- bottom-up
- top-down (memory f.) |
Decrease-and-conquer
- decrease by a constant
- decrease by a constant factor
- variable size decrease |
State-space-tree technique
- backtracking
- branch-and-bound |
Transform-and-conquer
- simplification
- representation change
- preconditioning |
|
5.
Another Example
It is not easy to find problems to which all the four general design strategies
are applicable with a reasonable result. Sorting and searching are fortuitous
examples exploited above as the criterion for separation of more general
techniques from less general ones. Here, we will point out another problem
which can play a useful educational role: the exponentiation problem of
computing an. (Of course, the problem of computing an mod p is of great
practical interest as well because of its importance to public-key encryption
algorithms.)
A brute-force algorithm would simply multiply a by itself n-1 times. A
divide-and-conquer algorithm would use the formula an=a[n/2]a[n/2]. The
decrease-by-one variety of the decrease-and-conquer approach yields an=an-1a;
the decrease-by-half variety would be based on the formula an=(a[n/2])2
for even n's and (a[n/2])2a for odd n's. Finally, the transformation strategy
can be illustrated by two well-known algorithms that exploit the binary
representation of n (e.g., [15, pp. 524-525])
6. Conclusion
The existing taxonomy of algorithm design techniques has several important
shortcomings pointed out in this paper. We suggested reclassifying, and
in a few cases renaming some of the known approaches to algorithm design.
The paper further proposed grouping them according to the technique's
level of generality; the principal criterion suggested in the paper for
separating more general techniques from less general ones is the technique's
applicability to sorting and searching. The resulting taxonomy put four
techniques --- brute force, divide-and-conquer, decrease-and-conquer,
and transform-and-conquer --- in the category of more general techniques,
leaving the rest --- local search techniques, dynamic programming, and
state-space-tree techniques --- in the second, less general, category.
Further, important special types of the major design techniques were also
identified.
A few cautionary notes seems to be in order, however. First, no matter
how many general design techniques are recognized, there will always be
algorithms that cannot be naturally interpreted as an application of one
of those techniques. Some algorithms are just based on insights peculiar
to the problem they solve and do not allow for a broad and useful generalization.
On the other hand, some algorithms can be interpreted as an application
of different techniques. For example, selection sort can be legitimately
interpreted both as a brute-force algorithm and as a decrease-and-conquer
method. As another example, Horner's rule can be considered both as a
decrease-conquer algorithm and as an algorithm based on the transformation
strategy.
Finally, some algorithms may incorporate ideas of several techniques.
For example, the fast Fourier transform takes advantage of both the transformation
and divide-and-conquer ideas. Further, most successful approximation algorithms
for the traveling salesman problem are comprised from a greedy heuristic
to get an initial approximation followed by one of the improvement procedures
(see [10] ).
The above remarks notwithstanding, we see several advantages in this new
classification. First, it improves the currently accepted taxonomy by
eliminating the shortcomings enumerated above. Second, it better reflects
the richness of algorithm design techniques and allows showing them on
different levels of detail. Finally, it allows to classify some important
algorithms (e.g., Euclid's algorithm, heapsort, search trees, hashing)
which the currently accepted taxonomy is incapable of doing.
References
1. Aho, A.V., Hopcroft, J.E., and Ullman, J.D. The Design and Analysis
of Computer Algorithms. Addison-Wesley, 1974.
2. Bellman, R.E. and Dreyfus, S.E. Applied Dynamic Programming. Princeton
University Press, 1962.
3. Bentley, J. "Multidimensional divide-and-conquer." Communications
of the ACM, Vol. 23, No. 4 (April 1980 ), pp. 214-229.
4. Brassard, G. and Bratley, P. Algorithmics: Theory and Practice, Prentice-Hall,
1988.
5. Brassard, G. and Bratley, P. Fundamental of Algorithmics, Prentice-Hall,
1996.
6. Horowitz, E. and Sahni, S. Fundamentals of Computer Algorithms. Computer
Science Press, 1978.
7. Horowitz, E. "Algorithms, design and classification of."
In Encyclopedia of Computer Science, 3-rd edition, Ralston, A. and Reilly,
E.D., Eds. Van Nostrand Reinhold, 1993, pp.33-37.
8. Horowitz, E., Sahni, S., and Rajasekaran, Computer Algorithms. Computer
Science Press, New York, 1996.
9. Knuth, D.E., The Art of Computer Programming, Volume 3: Sorting and
Searching. Addison-Wesley, 1973.
10. Lawler, E.L. et al., Eds. The Traveling Salesman Problem: A Guided
Tour of Combinatorial Optimization. John Wiley & Sons, 1985.
11. Manber, U. Introduction to Algorithms: A Creative Approach. Addison-Wesley,
1989.
12. Moret, B.M.E. and Shapiro, H.D. Algorithms from P to NP, Volume
I: Design & Efficiency. Benjamin/Cummings Publishing, 1990.
13. Neapolitan, R.E. and Naimipour, K. Foundations of Algorithms, Jones
and Bartlett Publishers, 1996.
14. Pan, V. "Strassen's algorithm is not optimal." Proceedings
of the 19th Annual IEEE Symposiom on the Foundations of Computer Science,
1978, pp.166-176.
15. Sedgewick, R. Algorithms. Addison-Wesley, 2nd ed., 1988.