Exact Rational Solutions and Conditioning
The first section of this chapter deals with the solution of
linear systems of equations Ax = b when the matrix A and
the right-hand side b are known exactly as rational numbers.
This linear algebra problem is the one most familiar to people,
since it is by examples of this that most people first
encounter linear systems of equations. We will see that
this causes problems, because,
as we all know, in
an emergency people `return to the first stroke'
.
Since exact rational solution is the process most familiar to people, it is the one that they most often want --- for example, sometimes people convert a problem with imperfect (say, measured) data into one with rational numbers as entries and then want the exact, rational solution to that problem. This has the advantage of being exactly that `first problem' and so requires very little mental effort.
However, there are two difficulties with this approach: efficiency and stability. `Stability, you say? But I thought exact computation vitiated any stability difficulties?' Not necessarily, or even usually, as we will see. But first, we examine the efficiency question.
For large matrices, the cost of solving linear systems
with exact entries grows combinatorially --- and so in practice
large systems can often not be solved exactly no matter
how much computing power you have available.
[Mind you, it is not usually as bad as trying for pure symbolic
results: it is easy to see that the size of the elements of
must grow faster than
as n increases, for a fully symbolic
matrix.]
For example, consider the following Maple session exhibiting
solution of a particular class of exact rational matrices.
> with(linalg):
Warning: new definition for norm
Warning: new definition for trace
>
> A := n -> matrix(n,n, (i,j) -> (i-j+1)/(i+j+1) );
A := n -> matrix(n, n, (i,j) -> (i-j+1)/(i+j+1))
> B := n -> vector(n, i -> 1/i );
B := n -> vector(n, i -> 1/i)
>
> A(10);
[ 1/3 0 -1/5 -1/3 -3/7 -1/2 -5/9 -3/5 -7/11 -2/3 ]
[ ]
[ 1/2 1/5 0 -1/7 -1/4 -1/3 -2/5 -5/11 -1/2 -7/13 ]
[ ]
[ 3/5 1/3 1/7 0 -1/9 -1/5 -3/11 -1/3 -5/13 -3/7 ]
[ ]
[ 2/3 3/7 1/4 1/9 0 -1/11 -1/6 -3/13 -2/7 -1/3 ]
[ ]
[ 5/7 1/2 1/3 1/5 1/11 0 -1/13 -1/7 -1/5 -1/4 ]
[ ]
[ 3/4 5/9 2/5 3/11 1/6 1/13 0 -1/15 -1/8 -3/17 ]
[ ]
[ 7/9 3/5 5/11 1/3 3/13 1/7 1/15 0 -1/17 -1/9 ]
[ ]
[ 4/5 7/11 1/2 5/13 2/7 1/5 1/8 1/17 0 -1/19 ]
[ ]
[ 9/11 2/3 7/13 3/7 1/3 1/4 3/17 1/9 1/19 0 ]
[ ]
[ 5/6 9/13 4/7 7/15 3/8 5/17 2/9 3/19 1/10 1/21 ]
> B(10);
[ 1, 1/2, 1/3, 1/4, 1/5, 1/6, 1/7, 1/8, 1/9, 1/10 ]
> linsolve("",");
[ 1485, -40755, 435435, -2459457, 8240232, -17211480, 22644765, -18244655,
8221642, -1587222 ]
Note that the elements of the solution are all exact integers,
alternate in sign, and are many orders of magnitude larger than
the input data. This leads to gross inefficiency in the solution,
for large dimensions, as well as an extremely serious conditioning problem.
> times := NULL; # Use an expression sequence to record the times.
times :=
> for k from 20 to 48 by 4 do
> st := time();
> a := A(k);
> b := B(k);
> x := linsolve(a,b);
> times := times,k,time()-st;
> od:
> ratios := vector(7):
> for i to 7 do ratios[i] := times[2*i+2]/times[2*i]; od;
ratios[1] := 1.666666667
ratios[2] := 1.533333333
ratios[3] := 1.608695652
ratios[4] := 1.567567568
ratios[5] := 1.551724138
ratios[6] := 1.600000000
ratios[7] := 1.493055556
> 1.5^(1/4);
1.106681920
This calculation indicates that the growth rate for the computing
time with n is approximately exponential (I think that a more
careful analysis would show
that it is faster than exponential).
We could not do the case n = 160 on this computer in a year, even
assuming that enough memory was available for the
task. This too-high cost is a consequence of the choice of exact rational
arithmetic for the solution of this problem. It is not a consequence
of using one particular computer algebra package. Improvements in
the exact rational arithmetic may adjust the constants in the
formula for the exponential growth but will not alter the exponential
growth itself.
In contrast, a purely numerical solution of this problem can be found in less than 15 seconds (using Matlab on the same computer), for the n = 160 case. Thus for this size problem, numerical techniques are roughly a million times more efficient. We discuss the stability question shortly.
For this particular problem, of course, as there is for the Hilbert matrix problem, there may be an algebraic or explicit formula for the inverse that would greatly speed up the solution process --- but there was nothing special about this example (in fact it was the first one I tried, which says something about the relative frequency of expensive problems) and so one could change the example slightly to vitiate any analytical solution.
Now we return to the question of stability. The idea of `changing the problem slightly' is essential to the other difficulty with the exact rational solution of matrix problems. Sometimes (most of the time?) they are overly precise. One starts with a matrix of numbers known to (perhaps) three decimal places, with uncertainties in the fourth decimal place, and then one converts to rational numbers and solves this problem exactly. The difficulty is that the exactness of the answer is spurious. The stability that we have to consider is the stability of the solution to perturbations in the data. If, of course, your data is exact, then by all means, use exact rational arithmetic. However, this situation is in fact rather rare (except for the case of integer matrices --- see the next section).
What are the effects of the unknown uncertainties? To find this out one can use the traditional (for the numerical analysis community) method of computing a condition number. This idea is usually presented in an algebraic way, and in the context of the effects of rounding errors --- but rounding errors are never as severe in effect as measurement errors or other data errors, and the condition number can be presented more clearly in a geometric fashion, as follows. For simplicity, we consider a two-dimensional system.
Let

Then consider the effect of multiplying a unit vector x by the matrix A: y = Ax. The vector y will not, in general, be a unit vector --- it will be either longer or shorter than a unit vector. If we plot the tips of all possible unit vectors x we get a unit circle; if we plot the tips of all corresponding vectors y = Ax we get an oval. For the matrix A above, the oval is plotted in Figure 3.7.
Figure 1: The image of the unit circle under
.
Some Maple code which could be used to produce this oval is
> A := matrix([[2,3],[1,2]]);
[ 2 3 ]
A := [ ]
[ 1 2 ]
> x := vector([cos(t),sin(t)]);
x := [ cos(t), sin(t) ]
> y := evalm(A &* x);
y := [ 2 cos(t) + 3 sin(t), cos(t) + 2 sin(t) ]
> plot([y[1],y[2],t=0..2*Pi],x=-4..4,y=-4..4);
Inspection of the oval shows that there are vectors x which are
stretched by about a factor of 4 or so, and there are vectors which
are shrunk by about the same factor. To think about the consequences
of this diagram for the solution of linear systems Ax = y,
suppose first that y is the one of these large, expanded vectors
--- say
for definiteness.
[The vector in square brackets is a unit vector, and hence
.] Then x will be a unit vector ---
its magnitude compared to y will have shrunk the most possible.
Now consider a small error in y --- say in the direction of
the most shrinkage on multiplication by A --- for definiteness,
suppose
. Then
implies
by linearity, and we see that
will be the largest possible for a given magnitude
of
.
The following ratio measures the maximum possible amplification
of the relative size
of the error in the solution as compared to the (measurement or data)
error on the input. It is known as the condition number:

From our diagram before, we see that the condition number is the ratio of the length of the longest axis of the oval to the smallest axis of the oval. [This geometric picture restricts the definition of the condition number to the use of the Euclidean norm. Changing the norm changes the oval to another figure, but the ideas remain the same.]
In fact the oval is an ellipse, and the lengths of the semiaxes
of this ellipse are called the singular values of A.
There are widely available software packages for the efficient and accurate
computation of the singular values of a matrix --- see e.g. [89,106]. This can be done in Maple with evalf(SVD(A)), and in
Matlab by svd(A).
The condition number is then the ratio of the largest singular
value of A to the smallest, and this definition is general: for
a three-by-three matrix A, the map
maps the unit sphere
to an ellipsoid, with three semiaxes whose lengths are the three singular
values of A. The n-dimensional case is similar. We define then
the condition number
where
is
the largest singular value of A and
is the smallest.
See e.g. [47] for algorithms for the computation of
the singular values of a matrix A, or [89] or [106]
for programs to do so.
One sees directly that this quantity K is a characteristic of the matrix A and not of any technique used to solve the linear system Ax = b. In particular, it is still important if exact rational arithmetic is used and no rounding errors are incurred. This remark bears repeating:
A large condition number for a matrix cannot be dealt with by going to higher precision, or even exact arithmetic.
The difficult part is the data errors, not the rounding errors.
The final nail in the coffin of the use of exact rational arithmetic to solve linear systems with inexact data is the fact that good floating- point algorithms exist (see e.g. [106]) which give the exact solution to problems completely equivalent to the exact rational problem, given the uncertainties of measurement or data error. These algorithms have cost which grows only polynomially in the dimension, not combinatorially, and so are suitable for problems of much larger size. So, if you have a linear algebra problem with a matrix or right-hand side that contains inexact numbers, then use floating-point arithmetic to solve the problem and compute the condition number to find the effect of the inexactness (this will incidentally tell you the effect of the rounding errors in the computation, and they will usually be trivial in comparison to the effect of the inexactness of the matrix or right-hand sides). See also [25].
Example: Consider the matrix

This example was adapted from one in [99,78]. Suppose that our measured right-hand side data is

Then if we convert these to rational numbers and exactly solve, it so
happens that our answer is
.
What happens if we assume that the first entry is not
, but rather
? This is completely equivalent, of course, given that
our measured data is accurate only to 3 decimal places. If we now
convert to rationals and solve exactly, our solution becomes

or, in decimals for convenient understanding,

This means that a different assumption about the unknown trailing digits leads to a completely different answer. The exactness of the rational arithmetic was spurious.
Of course, this system is ill-conditioned. But a floating-point solution,
as done in, say, Matlab, will give you the exact solution to
where
affects only the unknown,
trailing digits --- so the floating-point answer is exactly equivalent
to the rational answer! It has only made a different assumption about
the unknown trailing digits --- the fact that the solution is sensitive
to such changes is something you have to deal with anyway, in the
face of measurement errors.
To make this completely concrete, Matlab's answer is the exact one, where the data have been perturbed by no more than 5.E-15. Thus Matlab's floating-point routines have provided the exact answer to a completely equivalent problem --- any conclusions we could draw from the rational solution, we could also draw from the floating-point one.
This begs the question `So, what conclusions can we draw?' --- it turns out that one can say a great deal about the effect of data perturbations by use of the singular value decomposition, but we leave that for good numerical analysis texts, such as [47].
We do not go into the idea of structured backward error analysis here, which attempts to take into account any structure that exists in the problem. This is an interesting area, and the reader is referred to [60] for an introduction. This analysis only makes the case for backward error analysis stronger, though, and extends the class of problems for which floating-point arithmetic is superior to exact rational arithmetic, and shrinks the class of remaining problems.