Next: So why give Up: Summary of relevant Previous: Eigenvalue problems

## The Singular Value Decompostion

Exact Rational Solutions and Conditioning The first section of this chapter deals with the solution of linear systems of equations Ax = b when the matrix A and the right-hand side b are known exactly as rational numbers. This linear algebra problem is the one most familiar to people, since it is by examples of this that most people first encounter linear systems of equations. We will see that this causes problems, because, as we all know, in an emergency people `return to the first stroke'.

Since exact rational solution is the process most familiar to people, it is the one that they most often want --- for example, sometimes people convert a problem with imperfect (say, measured) data into one with rational numbers as entries and then want the exact, rational solution to that problem. This has the advantage of being exactly that `first problem' and so requires very little mental effort.

However, there are two difficulties with this approach: efficiency and stability. `Stability, you say? But I thought exact computation vitiated any stability difficulties?' Not necessarily, or even usually, as we will see. But first, we examine the efficiency question.

For large matrices, the cost of solving linear systems with exact entries grows combinatorially --- and so in practice large systems can often not be solved exactly no matter how much computing power you have available. [Mind you, it is not usually as bad as trying for pure symbolic results: it is easy to see that the size of the elements of must grow faster than as n increases, for a fully symbolic matrix.] For example, consider the following Maple session exhibiting solution of a particular class of exact rational matrices.

```> with(linalg):
Warning: new definition for   norm
Warning: new definition for   trace
>
> A := n -> matrix(n,n, (i,j) -> (i-j+1)/(i+j+1) );
A := n -> matrix(n, n, (i,j) -> (i-j+1)/(i+j+1))

> B := n -> vector(n, i -> 1/i );
B := n -> vector(n, i -> 1/i)

>
> A(10);
[  1/3    0   -1/5  -1/3  -3/7   -1/2   -5/9   -3/5  -7/11   -2/3 ]
[                                                                 ]
[  1/2   1/5    0   -1/7  -1/4   -1/3   -2/5  -5/11   -1/2  -7/13 ]
[                                                                 ]
[  3/5   1/3   1/7    0   -1/9   -1/5  -3/11   -1/3  -5/13   -3/7 ]
[                                                                 ]
[  2/3   3/7   1/4   1/9    0   -1/11   -1/6  -3/13   -2/7   -1/3 ]
[                                                                 ]
[  5/7   1/2   1/3   1/5  1/11    0    -1/13   -1/7   -1/5   -1/4 ]
[                                                                 ]
[  3/4   5/9   2/5  3/11   1/6   1/13    0    -1/15   -1/8  -3/17 ]
[                                                                 ]
[  7/9   3/5  5/11   1/3  3/13   1/7    1/15    0    -1/17   -1/9 ]
[                                                                 ]
[  4/5  7/11   1/2  5/13   2/7   1/5    1/8    1/17    0    -1/19 ]
[                                                                 ]
[ 9/11   2/3  7/13   3/7   1/3   1/4    3/17   1/9    1/19    0   ]
[                                                                 ]
[  5/6  9/13   4/7  7/15   3/8   5/17   2/9    3/19   1/10   1/21 ]

> B(10);
[ 1, 1/2, 1/3, 1/4, 1/5, 1/6, 1/7, 1/8, 1/9, 1/10 ]

> linsolve("",");
[ 1485, -40755, 435435, -2459457, 8240232, -17211480, 22644765, -18244655,

8221642, -1587222 ]
```
Note that the elements of the solution are all exact integers, alternate in sign, and are many orders of magnitude larger than the input data. This leads to gross inefficiency in the solution, for large dimensions, as well as an extremely serious conditioning problem.
```> times := NULL;   # Use an expression sequence to record the times.
times :=

> for k from 20 to 48 by 4 do
>   st := time();
>   a := A(k);
>   b := B(k);
>   x := linsolve(a,b);
>   times := times,k,time()-st;
> od:
> ratios := vector(7):
> for i to 7 do ratios[i] := times[2*i+2]/times[2*i]; od;

ratios[1] := 1.666666667

ratios[2] := 1.533333333

ratios[3] := 1.608695652

ratios[4] := 1.567567568

ratios[5] := 1.551724138

ratios[6] := 1.600000000

ratios[7] := 1.493055556

> 1.5^(1/4);

1.106681920
```
This calculation indicates that the growth rate for the computing time with n is approximately exponential (I think that a more careful analysis would show that it is faster than exponential). We could not do the case n = 160 on this computer in a year, even assuming that enough memory was available for the task. This too-high cost is a consequence of the choice of exact rational arithmetic for the solution of this problem. It is not a consequence of using one particular computer algebra package. Improvements in the exact rational arithmetic may adjust the constants in the formula for the exponential growth but will not alter the exponential growth itself.

In contrast, a purely numerical solution of this problem can be found in less than 15 seconds (using Matlab on the same computer), for the n = 160 case. Thus for this size problem, numerical techniques are roughly a million times more efficient. We discuss the stability question shortly.

For this particular problem, of course, as there is for the Hilbert matrix problem, there may be an algebraic or explicit formula for the inverse that would greatly speed up the solution process --- but there was nothing special about this example (in fact it was the first one I tried, which says something about the relative frequency of expensive problems) and so one could change the example slightly to vitiate any analytical solution.

Now we return to the question of stability. The idea of `changing the problem slightly' is essential to the other difficulty with the exact rational solution of matrix problems. Sometimes (most of the time?) they are overly precise. One starts with a matrix of numbers known to (perhaps) three decimal places, with uncertainties in the fourth decimal place, and then one converts to rational numbers and solves this problem exactly. The difficulty is that the exactness of the answer is spurious. The stability that we have to consider is the stability of the solution to perturbations in the data. If, of course, your data is exact, then by all means, use exact rational arithmetic. However, this situation is in fact rather rare (except for the case of integer matrices --- see the next section).

What are the effects of the unknown uncertainties? To find this out one can use the traditional (for the numerical analysis community) method of computing a condition number. This idea is usually presented in an algebraic way, and in the context of the effects of rounding errors --- but rounding errors are never as severe in effect as measurement errors or other data errors, and the condition number can be presented more clearly in a geometric fashion, as follows. For simplicity, we consider a two-dimensional system.

Let

Then consider the effect of multiplying a unit vector x by the matrix A: y = Ax. The vector y will not, in general, be a unit vector --- it will be either longer or shorter than a unit vector. If we plot the tips of all possible unit vectors x we get a unit circle; if we plot the tips of all corresponding vectors y = Ax we get an oval. For the matrix A above, the oval is plotted in Figure 3.7.

Figure 1: The image of the unit circle under .

Some Maple code which could be used to produce this oval is

```> A := matrix([[2,3],[1,2]]);
[ 2  3 ]
A := [      ]
[ 1  2 ]

> x := vector([cos(t),sin(t)]);
x := [ cos(t), sin(t) ]

> y := evalm(A &* x);
y := [ 2 cos(t) + 3 sin(t), cos(t) + 2 sin(t) ]

> plot([y[1],y[2],t=0..2*Pi],x=-4..4,y=-4..4);
```

Inspection of the oval shows that there are vectors x which are stretched by about a factor of 4 or so, and there are vectors which are shrunk by about the same factor. To think about the consequences of this diagram for the solution of linear systems Ax = y, suppose first that y is the one of these large, expanded vectors --- say for definiteness. [The vector in square brackets is a unit vector, and hence .] Then x will be a unit vector --- its magnitude compared to y will have shrunk the most possible.

Now consider a small error in y --- say in the direction of the most shrinkage on multiplication by A --- for definiteness, suppose . Then implies by linearity, and we see that will be the largest possible for a given magnitude of . The following ratio measures the maximum possible amplification of the relative size of the error in the solution as compared to the (measurement or data) error on the input. It is known as the condition number:

From our diagram before, we see that the condition number is the ratio of the length of the longest axis of the oval to the smallest axis of the oval. [This geometric picture restricts the definition of the condition number to the use of the Euclidean norm. Changing the norm changes the oval to another figure, but the ideas remain the same.]

In fact the oval is an ellipse, and the lengths of the semiaxes of this ellipse are called the singular values of A. There are widely available software packages for the efficient and accurate computation of the singular values of a matrix --- see e.g. [89,106]. This can be done in Maple with `evalf(SVD(A))`, and in Matlab by `svd(A)`.

The condition number is then the ratio of the largest singular value of A to the smallest, and this definition is general: for a three-by-three matrix A, the map maps the unit sphere to an ellipsoid, with three semiaxes whose lengths are the three singular values of A. The n-dimensional case is similar. We define then the condition number where is the largest singular value of A and is the smallest. See e.g. [47] for algorithms for the computation of the singular values of a matrix A, or [89] or [106] for programs to do so.

One sees directly that this quantity K is a characteristic of the matrix A and not of any technique used to solve the linear system Ax = b. In particular, it is still important if exact rational arithmetic is used and no rounding errors are incurred. This remark bears repeating:

A large condition number for a matrix cannot be dealt with by going to higher precision, or even exact arithmetic.

The difficult part is the data errors, not the rounding errors.

The final nail in the coffin of the use of exact rational arithmetic to solve linear systems with inexact data is the fact that good floating- point algorithms exist (see e.g. [106]) which give the exact solution to problems completely equivalent to the exact rational problem, given the uncertainties of measurement or data error. These algorithms have cost which grows only polynomially in the dimension, not combinatorially, and so are suitable for problems of much larger size. So, if you have a linear algebra problem with a matrix or right-hand side that contains inexact numbers, then use floating-point arithmetic to solve the problem and compute the condition number to find the effect of the inexactness (this will incidentally tell you the effect of the rounding errors in the computation, and they will usually be trivial in comparison to the effect of the inexactness of the matrix or right-hand sides). See also [25].

Example: Consider the matrix

This example was adapted from one in [99,78]. Suppose that our measured right-hand side data is

Then if we convert these to rational numbers and exactly solve, it so happens that our answer is .

What happens if we assume that the first entry is not , but rather ? This is completely equivalent, of course, given that our measured data is accurate only to 3 decimal places. If we now convert to rationals and solve exactly, our solution becomes

or, in decimals for convenient understanding,

This means that a different assumption about the unknown trailing digits leads to a completely different answer. The exactness of the rational arithmetic was spurious.

Of course, this system is ill-conditioned. But a floating-point solution, as done in, say, Matlab, will give you the exact solution to where affects only the unknown, trailing digits --- so the floating-point answer is exactly equivalent to the rational answer! It has only made a different assumption about the unknown trailing digits --- the fact that the solution is sensitive to such changes is something you have to deal with anyway, in the face of measurement errors.

To make this completely concrete, Matlab's answer is the exact one, where the data have been perturbed by no more than 5.E-15. Thus Matlab's floating-point routines have provided the exact answer to a completely equivalent problem --- any conclusions we could draw from the rational solution, we could also draw from the floating-point one.

This begs the question `So, what conclusions can we draw?' --- it turns out that one can say a great deal about the effect of data perturbations by use of the singular value decomposition, but we leave that for good numerical analysis texts, such as [47].

We do not go into the idea of structured backward error analysis here, which attempts to take into account any structure that exists in the problem. This is an interesting area, and the reader is referred to [60] for an introduction. This analysis only makes the case for backward error analysis stronger, though, and extends the class of problems for which floating-point arithmetic is superior to exact rational arithmetic, and shrinks the class of remaining problems.

Next: So why give Up: Summary of relevant Previous: Eigenvalue problems

Robert Corless
Wed Jan 31 11:33:59 EST 1996