Next: Maple Implementation Up: Linear Algebra in Previous: Consequences of exact

## Row Echelon Decomposition

The main class of linear algebra problems for which symbolic manipulation systems are helpful, however, is for systems with parameters. These can usually be cast in the form of matrices with multivariate polynomial entries. For simplicity we concentrate first on the univariate case.

The main goals in this chapter are to show that the solutions of some parametric linear systems can be found using standard computer algebra packages, and that these solutions can be evaluated efficiently, and that they can give insight into the nature of the problem under study. A secondary goal is to show the limitations of this approach (again, efficiency and numerical stability of the resulting expressions are critical).

We will also see for the first time the problem of `specialization' --- it is common for computer algebra programs to produce an answer for a parametrized problem which is correct for `most' values of the parameters but is wrong for some values of the parameters. These errors can be visible (when, for example, singularities appear) or invisible. It is the invisible errors that concern the mathematician, as it is often the `special cases' that give the most insight into the nature of the problem.

We introduce some new (but simple) mathematics, the Row Echelon Decomposition, to help deal with this problem. This is an example of the `proviso' approach in computer algebra [34].

Consider the following example problem, adapted from the introductory text [82] and discussed in the present context in [36].

Problem. below is the augmented matrix for a system of linear equations. Find the values of the parameter x for which the system has no, one, and infinitely many solutions.

During row reduction, we divide by pivots and , both of which could equal zero and thus define special cases. Putting these cases to one side, we obtain the general row-echelon form of as

We must now go back to the original problem and calculate the special cases. The points we wish to make are clear if we only consider one root of , namely x=0, and of , namely . These special cases then give the row-echelon forms

and

From the above analysis the complete solution to the original problem is clear.

The important point to notice in this example is the fact that the special cases are only apparent while the row reductions are being carried out. However, computer algebra systems such as Maple typically take the problem as input and return one and only one answer. When this is the case, the only matrices that the user sees are (1) and (2), and all of the intermediate manipulations are lost.

Exercises

1. Use the Maple routine `rref` to calculate the row echelon form of (1).
2. Investigate other Maple linear algebra routines for the solution of this problem.
3. If you have access to Mathematica or Theorist, see what they can do with this problem.
4. If

solve by hand . Use Maple to solve the system, and compare your solutions with Maple's. Is Maple correct?

One can expect that the special case would be noticed by most users, because in that case the entries in the last column are undefined (of course the only x for which this is true are complex, and thus may be of no interest). There is nothing, however, in the general result ( 2) that gives any hint that is a special case and that the general result (2) must then be replaced by (3). Clearly some information has been lost during the transformation of (1) into (2). It is for all cases like this that we wish to devise a better way of proceeding.

There are essentially four approaches to this problem:

1. Ignore the difficulty, and hope the special cases never occur. This is the approach taken by Maple's built-in `rref` and it is usually satisfactory, if you don't mind being lied to occasionally. (To be fair, see the remark following this list).
2. Automatically investigate all special cases, regardless of whether or not they are of interest to the user. This is always the `correct' thing to do --- but the answer may be exponentially complicated. [It may not be, of course, and there are reasonably efficient ways of discovering and exploring all possible solutions when there aren't too many of them [97].]
3. Ask the user questions as the solution proceeds. `Is x > 0?' `Is ?' `Is x > my/temporary/local/variable?' This is the sort of thing that Macsyma does. Sometimes it is what you want, but often the CAS winds up asking questions that the user cannot possibly answer. Further, there may be an awful lot of questions, more than the user can bear to answer. And finally, this is not at all suitable to a batch or file-reading environment.
4. Return the most `generic' answer you can, together with a proviso or caveat: `The answer is 5, provided that '. If the proviso is in fact invalid --- `Oh, nuts, I forgot to specify that ' --- then it is the user's responsibility to re-do the calculation.
There is a fifth approach, one that Maple occasionally uses in other contexts (to the annoyance of the users): refuse to answer at all. This approach will be ignored here.

Remark on indeterminates versus parameters. Item (1) above is a fair criticism of Maple (and all other CAS) only if you take the view that parameters have values: p = 1 + x is a real number, given a certain (as yet unknown) real number x. This is the most natural view of polynomials and functions, but it is not the only one. Many abstract algebra courses discuss the algebra of polynomials in an indeterminate x, with (say) complex coefficients. In this perspective, the zero polynomial is , and p = 0 implies that p is the zero polynomial (and says nothing about x). Thus p = 1 + x is never zero, and so if you divide another polynomial by it, there is no problem. Thus in this perspective and that is the end of the story --- x is indeterminate and hence can never equal any particular value, and so in particular is not -1. This viewpoint is not natural to most people (I seem to remember arguments about it when first taking number theory some years ago) but does simplify the work greatly, by eliminating the worrisome zero divisor problem from your algebra.

This algebraic point of view is the one taken by most computer algebra systems, and not just in the context of polynomials or linear systems of equations. The Risch decision procedure, for example, is a procedure for working in differential fields and is essentially algebraic, and a related analytical difficulty surfaces there (see Chapter 7). This book takes the analytic point of view (because I think it is more interesting and useful --- please feel free to disagree) and I will point out some consequences of this shift in point of view on the use of computer algebra systems.

As the simplest possible example, consider solving the equation kx = 1 for x. All computer algebra systems (except Mathematica using its Reduce command, and Theorist) return , which is true only when k is not 0 (but is always true from the algebraic point of view). This, however, might be counted acceptable because the failure is visible (i.e. easily noticed) in the special case k=0. Now consider solving the equation kx = k. All systems, with the same exceptions, give only the (algebraically but not analytically correct) solution x = 1. The fact that there was a division by k is no longer evident from the general solution, and thus a special case has been lost from view. Row reduction is essentially similar to the last case, because it is possible to lose any visible reminder of the fact that divisions were made by quantities that could be zero.

Note that the definition of row-echelon form contributes to the difficulties in its automatic implementation. First, row reduction is usually presented as a transformation of one matrix into another. This old-fashioned point of view has largely been replaced by the idea of matrix factorizations in other topics of linear algebra (for example, the Gram-Schmidt reduction has been replaced by QR-factorization, Gauss-Jordan reduction by PA = LU factorization, and even the FFT is now being looked at from the factorization point of view [102]). The fact that this is a transformation, rather, means that some information is hidden from the user.

It is the normalization, however, which causes the central difficulty. A key property of row-echelon form is the fact that it is unique --- many proofs in linear algebra rely on this fact --- and to obtain uniqueness, we must normalize the leading term in each row to 1, thus forcing the division that is the point of discussion. One way to tackle the problem would be to change the definition of row-echelon form to remove the need to normalize to 1, and this leads, for example, to the Hermite normal form for matrices of integers or univariate polynomials (see e.g. [80]). However, the Hermite normal form is not defined for the general algebraic objects encountered in applied mathematics, necessarily. In any event, the same difficulty of `losing' a division by a parameter can be constructed for the Hermite normal form by considering polynomials with parameters in the coefficients.

By defining the row-echelon decomposition or factorization (the terms are used interchangeably here) we can resolve the difficulties described above, without requiring changes either in the definitions of linear algebra or in computer-algebra systems. Factorization offers many advantages over a transformation. Factorizations are written as equations, in which the given matrix is written as a product of matrices with special properties. The key advantage of this for our purpose is that if we express row reduction as an equation, the divisors must appear somewhere in that equation, and that means that a computer system does not have to watch during the reduction to identify special cases for us. The row-echelon decomposition thus preserves all information about the original matrix, and moreover presents it in a convenient form for the investigation of special cases.

The existence and continuity of a row-echelon decomposition is examined in Theorems 1 and 2 of [36]. We reproduce the theorems and examples here.

Theorem Theorem 1: If A is an matrix over C, there exists a permutation matrix P, a unit lower triangular matrix L, and a nonsingular upper triangular matrix U such that

where R is the (unique) row-echelon form of A.

Remark. This decomposition subsumes the definition of the row echelon form and the LU-factorization of a matrix. If a (square) matrix is nonsingular, its row-echelon form R is just the identity matrix. If A is singular or non-square, then there exists a square nonsingular matrix F such that A = F R where R is the unique row-echelon form of A [82]. Applying the LU-factorization to F gives the decomposition above. Further, we need do no extra computation to discover P, L, and U --- we merely remember the steps in the computation of R.

Definition. A family of matrices is continuous at x=a if for all there exists such that implies . The 2-norm is used for explicitness, both for the k-dimensional parameter space and for the space of matrices.

Theorem Theorem 2. Let be a matrix depending upon one or more variables or parameters x. Let A be continuous at a point x=a. If has a row-echelon decomposition in a neighbourhood of x=a (with some particular permutation matrix P)

with , then R is continuous at a.

Remark. There may be only one permutation matrix P which gives such a factorization, or there may be several. The hypotheses of the theorem say that if such a factorization exists with , then R is continuous. Verification that such a factorization exists is easy --- one just computes it.

Corollary 1: The only values of x at which can be discontinuous are those for which .

For proofs, see [36].

The continuity of the row echelon form is exactly the analytic expression of the computer-algebra concept of `correctness on specialization' for this problem. That is, the row echelon form will be correct on specialization if we can substitute the value of the parameter into the formula for and get the correct row echelon form of . Since the expressions contained in will always be continuous functions of the parameters if the original matrix is continuous at x, substitution of values into always gets the limiting value: if the actual row echelon form is discontinuous at that point, the computed formula will be incorrect `on specialization'.

Exercises

1. Show that the matrix (2) does not tend to the matrix (3) as x goes to 0, while the matrix (1) is continuous at x=0.
2. Parameterized matrices arise naturally in eigenvalue problems, such as the following. This problem is chosen to illustrate that, even if an individual diagonal element of U is zero, it is only when that discontinuous changes in the row echelon form may occur. Consider

1. Compute (by hand) the row-echelon factorization of and show that if and only if is an eigenvalue of B.
2. Show that if no pivoting (row echanges) take place then when some of the pivots are zero, in spite of the fact that is not an eigenvalue. Discuss how to remedy this, and the consequences for the theorem.
3. Re-factorize for each eigenvalue.
4. Compute the eigenvectors for each eigenvalue from the above factorizations. This example was constructed using the method of Ortega [85].
3. This problem shows that even when the row-echelon form might be continuous. Consider

1. Compute the row echelon decomposition of A.
2. Show that even though when b = a, the row echelon form is correct unless also c=a. This problem shows that Theorem 2 is as strong as possible, in that parameter values rendering U singular can only be said to `need further investigation'.

4. The so-called `binary powering' algorithm to compute given x and n runs as follows.
```c := 1
while (n > 0) do  # loop invariant: u = c*x^n at this point
if n is even then
n := n/2
x := x^2
else
c := c*x
n := n - 1
fi
od
terminate and return c  (which must now be x^n).
```
1. Use the loop invariant to prove by induction that this algorithm terminates and is correct.
2. Count the number of multiplications and additions this algorithm takes. You may assume .
3. Implement this algorithm in Maple, as a method for computing powers of matrices. Call your routine `mpower(A,n)`. Test your routine.
4. Compute where A is