Making orthonormal bases
If we have a inner product $\langle - , - \rangle$ on a vector space $V$ and we locate a basis $v_1,\ldots,v_d$, then generally it will not be orthonormal.
For example, take $V = k^3$ for a field $k$. The collection \(\begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}\) is a basis but none of the pairs of vectors in it is orthogonal for the standard inner product $\langle v,w \rangle = v^T w$.
Moreover, starting from a basis $v_1,\ldots,v_d$, we know that we can
- swap any pair,
- scaled any vector by a nonzero scalar, or
- add a multiple of one vector to a different one
and still have a basis in the end.
If we started with an orthonormal basis, then, of these three classes of operations, only swapping a pair and scaling a vector by $\pm 1$ will output a new orthonormal basis. This is pretty constraining. Said another way, orthonormal bases are very special.
Orthonormal bases can be very useful, however, especially computationally. So figuring out a way to product an orthonormal basis from a general basis is useful.
The most well-known way is the Gram-Schmidt algorithm. Before we state, we want to recall one condition on inner product. The following only makes sense if we have some ordering on the field $k$. Therefore, we will restrict attention to $k = \mathbb{R}$.
Definition. An inner product $\langle -, - \rangle$ on a vector space over $\mathbb{R}$ is called positive definite if $ \langle v, v \rangle > 0$ for any $v \in V$ with $v \neq 0$.
Example. Let $A$ be an invertible $n \times n$ matrix with $\mathbb{R}$ entries. Then, the inner product \(\langle v, w \rangle := (Av)^T (Aw) = v^T A^TA w\) is a positive definite inner product on $\mathbb{R}^n$. Since \(\langle v, v \rangle = (Av)^T Av \geq 0\) from the standard inner product on $\mathbb{R}^m$.
Definition. If we have a positive definite inner product $\langle -,- \rangle$ on a vector space $v$, then we say the length (or norm) of a vector $v$ is \(||v|| := \sqrt{\langle v,v \rangle}\)
A vector is of unit length (or is norm one) if $||v|| = 1$.
Note that $v = 0$ if and only if its norm is $||v|| = 0$.
The Gram-Schmidt Algorithm
The inputs to Gram-Schmidt are a vector space $V$ with a positive definite inner product $\langle -, - \rangle$ and a basis for $V$. The output is an orthonormal basis $w_1,\ldots,w_d$.
Establish an ordering for your basis $v_1,\ldots,v_d$. Now recursively do 2 and 3.
Define \(u_i := v_i - \sum_{j=1}^{i-1} \langle v_i, w_j \rangle w_j\)
Set
\(w_i := \frac{1}{\sqrt{\langle u_i, u_i \rangle}} u_i\)
Note that \(w_1 := \frac{1}{\sqrt{\langle v_1, v_1 \rangle}} v_1\) by definition.
Lemma. Suppose $u_j \neq 0$ for all $1 \leq j \leq i$. Then $w_1,\ldots,w_i$ is an orthonormal set.
Proof. (Expand to view)
Note that from our assumption \(w_j := \frac{1}{\sqrt{\langle u_j,u_j \rangle}}u_j\) is well-defined for each $1 \leq j \leq i$.
We have \(\langle w_j, w_j \rangle = \left \langle \frac{1}{\sqrt{\langle u_j, u_j \rangle}} u_j, \frac{1}{\sqrt{\langle u_j, u_j \rangle}} u_j \right \rangle = \frac{\langle u_j, u_j \rangle}{\langle u_j, u_j \rangle} = 1\)
Next we prove that $w_1,\ldots,w_j$ is an orthogonal set for each $1 \leq j \leq i$. We work using induction. For the base case of $j = 1$ there is nothing to prove.
Assume that $w_1,\ldots,w_{j-1}$ is an orthogonal set. We want to show that $w_1,\ldots,w_j$ is an orthogonal. From the induction hypothesis, we know that $\langle w_r, w_s \rangle = 0$ for $1 \leq r,s \leq j-1$. We need to check that $\langle w_r, w_j \rangle = 0$. \(\begin{aligned} \langle w_r , w_j \rangle & = \langle u_j, u_j \rangle \langle w_r, v_j - \sum_{s =1}^{j-1} \langle v_j, w_s \rangle w_s \rangle \\ & = \langle u_j, u_j \rangle \left( \langle w_r, v_j \rangle - \sum_{s=1}^{j-1} \langle v_j, w_s \rangle \langle w_r, w_s \rangle \right) \end{aligned}\) Since $w_1,\ldots,w_{j-1}$ is an orthogonal set, $\langle w_r,w_s \rangle =0$ for $r \neq s$ and this expression simplifies as \(= \langle u_j, u_j \rangle ( \langle w_r, v_j \rangle - \langle v_j, w_r \rangle \langle w_r, w_r \rangle ) = 0\) This is zero since $\langle -,- \rangle$ is symmetry and we already saw that $\langle w_r, w_r \rangle = 1$. ■
Next we have a simple criteria for an orthogonal set to be a basis: it just has to have the dimension many vectors in it.
Lemma. Let $\langle -,- \rangle$ be a positive definite inner product on a vector space. If $w_1,\ldots,w_i$ is a set of nonzero orthogonal vectors, then it is a linearly independent. In particular, if $\dim V = d$, and $i = d$, it is a basis.
Proof. (Expand to view)
See Worksheet 15 ■
As corollary of the previous two results, we have the following.
Corollary. If $u_i \neq 0$ for $1 \leq i \leq j$, then $w_1,\ldots,w_j$ is an orthonormal basis for the $\operatorname{Span}(v_1,\ldots,v_j)$.
Proof. (Expand to view)
We first check that each $w_i$ lies in the span of the $v_1,\ldots,v_i$ using induction. In the base case, \(w_1 = \frac{1}{\sqrt{\langle v_1,v_1 \rangle}}v_1 \in \operatorname{Span}(v_1)\)
Assume now that $w_1,\ldots,w_{i-1} \in \operatorname{Span}(v_1,\ldots,v_{i-1})$. Since \(w_i = \frac{1}{\sqrt{\langle u_i, u_i \rangle}}\left(v_i - \sum_{s=1}^{i-1} \langle v_i, w_s \rangle w_s \right)\) we have written $w_i$ as a linear combination of $w_1,\ldots,w_{i-1}$ and $v_i$. Since we can write each $w_1,\ldots,w_{i-1}$ as a linear combination of $v_1,\ldots,v_{i-1}$, we can substitute those in and simplify to get $w_i$ as a linear combination of $v_1,\ldots,v_i$.
Since $v_1,\ldots,v_d$ is basis, it is linearly independent. Thus, the dimension of the span of $v_1,\ldots,v_i$ is $i$. We also have $i$ orthogonal vectors in $w_1,\ldots,w_i$. Each is nonzero since we assumed that $u_j \neq 0$ for all $j$. Using the previous lemma, we see that $w_1,\ldots,w_i$ is a basis for $\operatorname{Span}(v_1,\ldots,v_i)$. ■
We are left with checking that each $u_i \neq 0$ so Gram-Schmidt is well defined.
Lemma. Each $u_i \neq 0$ for $1 \leq i \leq d$.
Proof. (Expand to view)
We work by induction again. For case of $i=1$, we have $v_1 \neq 0$ since it is a member of a basis. And $u_1 = v_1$.
Assume that $u_j \neq 0$ for $j < i$. If $u_i = 0$, then we have \(v_i = \sum_{s=1}^{i-1} \langle v_i,w_s \rangle w_s\) In other words, $v_i$ is in the span of $w_1,\ldots,w_{i-1}$. But, we just saw that \(\operatorname{Span}(w_1,\ldots,w_{i-1}) = \operatorname{Span}(v_1,\ldots,v_{i-1})\) This implies that \(v_i \in \operatorname{Span}(v_1,\ldots,v_{i-1})\) which is a contradiction to the linear independence of $v_1,\ldots,v_i$. Thus, $u_i \neq 0$. ■
Example. Let’s go back to the example a basis which is not orthonormal \(v_1,v_2,v_3 := \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}\) and let’s apply Gram Schmidt. We have $w_1=v_1$ since the length of $v_1$ is already $1$. Next we have \(u_2 = \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} - \left\langle \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} \right\rangle \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix}\) Since $||u_2||=1$, we have $w_2 = u_2$. Finally, \(u_3 = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} - \left\langle \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}, \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} \right\rangle \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} - \left\langle \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}, \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} \right\rangle \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}\) and $w_3 = u_3$.
So in this case we recover the standard basis. But that is not always the case. Suppose we reverse the ordering of our basis and run Gram Schmidt again. Then we would a different orthonormal basis \(\frac{1}{\sqrt{3}} \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}, \frac{\sqrt{3}}{\sqrt{2}} \begin{pmatrix} 1/3 \\ 1/3 \\ -2/3 \end{pmatrix}, \sqrt{2} \begin{pmatrix} 1/2 \\ -1/2 \\ 0 \end{pmatrix}\)
In Sage, you can perform Gram Schmidt on the rows of a matrix $A$ by calling A.gram_schmidt()
. But you should be a little surprised. We will talk more about the output here next.