Gram Schmidt

Making orthonormal bases

If we have a inner product $\langle - , - \rangle$ on a vector space $V$ and we locate a basis $v_1,\ldots,v_d$, then generally it will not be orthonormal.

For example, take $V = k^3$ for a field $k$. The collection $\begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}$ is a basis but none of the pairs of vectors in it is orthogonal for the standard inner product $\langle v,w \rangle = v^T w$.

Moreover, starting from a basis $v_1,\ldots,v_d$, we know that we can

swap any pair,
scaled any vector by a nonzero scalar, or
add a multiple of one vector to a different one

and still have a basis in the end.

If we started with an orthonormal basis, then, of these three classes of operations, only swapping a pair and scaling a vector by $\pm 1$ will output a new orthonormal basis. This is pretty constraining. Said another way, orthonormal bases are very special.

Orthonormal bases can be very useful, however, especially computationally. So figuring out a way to product an orthonormal basis from a general basis is useful.

The most well-known way is the Gram-Schmidt algorithm. Before we state, we want to recall one condition on inner product. The following only makes sense if we have some ordering on the field $k$. Therefore, we will restrict attention to $k = \mathbb{R}$.

Definition. An inner product $\langle -, - \rangle$ on a vector space over $\mathbb{R}$ is called positive definite if $ \langle v, v \rangle > 0$ for any $v \in V$ with $v \neq 0$.

Example. Let $A$ be an invertible $n \times n$ matrix with $\mathbb{R}$ entries. Then, the inner product $\langle v, w \rangle := (Av)^T (Aw) = v^T A^TA w$ is a positive definite inner product on $\mathbb{R}^n$. Since $\langle v, v \rangle = (Av)^T Av \geq 0$ from the standard inner product on $\mathbb{R}^m$.

Definition. If we have a positive definite inner product $\langle -,- \rangle$ on a vector space $v$, then we say the length (or norm) of a vector $v$ is $||v|| := \sqrt{\langle v,v \rangle}$
A vector is of unit length (or is norm one) if $||v|| = 1$.

Note that $v = 0$ if and only if its norm is $||v|| = 0$.

The Gram-Schmidt Algorithm

The inputs to Gram-Schmidt are a vector space $V$ with a positive definite inner product $\langle -, - \rangle$ and a basis for $V$. The output is an orthonormal basis $w_1,\ldots,w_d$.

Establish an ordering for your basis $v_1,\ldots,v_d$. Now recursively do 2 and 3.
Define $u_i := v_i - \sum_{j=1}^{i-1} \langle v_i, w_j \rangle w_j$
Set
$w_i := \frac{1}{\sqrt{\langle u_i, u_i \rangle}} u_i$

Note that $w_1 := \frac{1}{\sqrt{\langle v_1, v_1 \rangle}} v_1$ by definition.

Lemma. Suppose $u_j \neq 0$ for all $1 \leq j \leq i$. Then $w_1,\ldots,w_i$ is an orthonormal set.

Proof. (Expand to view)

Note that from our assumption $w_j := \frac{1}{\sqrt{\langle u_j,u_j \rangle}}u_j$ is well-defined for each $1 \leq j \leq i$.

We have $\langle w_j, w_j \rangle = \left \langle \frac{1}{\sqrt{\langle u_j, u_j \rangle}} u_j, \frac{1}{\sqrt{\langle u_j, u_j \rangle}} u_j \right \rangle = \frac{\langle u_j, u_j \rangle}{\langle u_j, u_j \rangle} = 1$

Next we prove that $w_1,\ldots,w_j$ is an orthogonal set for each $1 \leq j \leq i$. We work using induction. For the base case of $j = 1$ there is nothing to prove.

Assume that $w_1,\ldots,w_{j-1}$ is an orthogonal set. We want to show that $w_1,\ldots,w_j$ is an orthogonal. From the induction hypothesis, we know that $\langle w_r, w_s \rangle = 0$ for $1 \leq r,s \leq j-1$. We need to check that $\langle w_r, w_j \rangle = 0$. $\begin{aligned} \langle w_r , w_j \rangle & = \langle u_j, u_j \rangle \langle w_r, v_j - \sum_{s =1}^{j-1} \langle v_j, w_s \rangle w_s \rangle \\ & = \langle u_j, u_j \rangle \left( \langle w_r, v_j \rangle - \sum_{s=1}^{j-1} \langle v_j, w_s \rangle \langle w_r, w_s \rangle \right) \end{aligned}$ Since $w_1,\ldots,w_{j-1}$ is an orthogonal set, $\langle w_r,w_s \rangle =0$ for $r \neq s$ and this expression simplifies as $= \langle u_j, u_j \rangle ( \langle w_r, v_j \rangle - \langle v_j, w_r \rangle \langle w_r, w_r \rangle ) = 0$ This is zero since $\langle -,- \rangle$ is symmetry and we already saw that $\langle w_r, w_r \rangle = 1$. ■

Next we have a simple criteria for an orthogonal set to be a basis: it just has to have the dimension many vectors in it.

Lemma. Let $\langle -,- \rangle$ be a positive definite inner product on a vector space. If $w_1,\ldots,w_i$ is a set of nonzero orthogonal vectors, then it is a linearly independent. In particular, if $\dim V = d$, and $i = d$, it is a basis.

Proof. (Expand to view)

See Worksheet 15 ■

As corollary of the previous two results, we have the following.

Corollary. If $u_i \neq 0$ for $1 \leq i \leq j$, then $w_1,\ldots,w_j$ is an orthonormal basis for the $\operatorname{Span}(v_1,\ldots,v_j)$.

Proof. (Expand to view)

We first check that each $w_i$ lies in the span of the $v_1,\ldots,v_i$ using induction. In the base case, $w_1 = \frac{1}{\sqrt{\langle v_1,v_1 \rangle}}v_1 \in \operatorname{Span}(v_1)$

Assume now that $w_1,\ldots,w_{i-1} \in \operatorname{Span}(v_1,\ldots,v_{i-1})$. Since $w_i = \frac{1}{\sqrt{\langle u_i, u_i \rangle}}\left(v_i - \sum_{s=1}^{i-1} \langle v_i, w_s \rangle w_s \right)$ we have written $w_i$ as a linear combination of $w_1,\ldots,w_{i-1}$ and $v_i$. Since we can write each $w_1,\ldots,w_{i-1}$ as a linear combination of $v_1,\ldots,v_{i-1}$, we can substitute those in and simplify to get $w_i$ as a linear combination of $v_1,\ldots,v_i$.

Since $v_1,\ldots,v_d$ is basis, it is linearly independent. Thus, the dimension of the span of $v_1,\ldots,v_i$ is $i$. We also have $i$ orthogonal vectors in $w_1,\ldots,w_i$. Each is nonzero since we assumed that $u_j \neq 0$ for all $j$. Using the previous lemma, we see that $w_1,\ldots,w_i$ is a basis for $\operatorname{Span}(v_1,\ldots,v_i)$. ■

We are left with checking that each $u_i \neq 0$ so Gram-Schmidt is well defined.

Lemma. Each $u_i \neq 0$ for $1 \leq i \leq d$.

Proof. (Expand to view)

We work by induction again. For case of $i=1$, we have $v_1 \neq 0$ since it is a member of a basis. And $u_1 = v_1$.

Assume that $u_j \neq 0$ for $j < i$. If $u_i = 0$, then we have $v_i = \sum_{s=1}^{i-1} \langle v_i,w_s \rangle w_s$ In other words, $v_i$ is in the span of $w_1,\ldots,w_{i-1}$. But, we just saw that $\operatorname{Span}(w_1,\ldots,w_{i-1}) = \operatorname{Span}(v_1,\ldots,v_{i-1})$ This implies that $v_i \in \operatorname{Span}(v_1,\ldots,v_{i-1})$ which is a contradiction to the linear independence of $v_1,\ldots,v_i$. Thus, $u_i \neq 0$. ■

Example. Let’s go back to the example a basis which is not orthonormal $v_1,v_2,v_3 := \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}$ and let’s apply Gram Schmidt. We have $w_1=v_1$ since the length of $v_1$ is already $1$. Next we have $u_2 = \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} - \left\langle \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} \right\rangle \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix}$ Since $||u_2||=1$, we have $w_2 = u_2$. Finally, $u_3 = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} - \left\langle \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}, \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} \right\rangle \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} - \left\langle \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}, \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} \right\rangle \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}$ and $w_3 = u_3$.

So in this case we recover the standard basis. But that is not always the case. Suppose we reverse the ordering of our basis and run Gram Schmidt again. Then we would a different orthonormal basis $\frac{1}{\sqrt{3}} \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}, \frac{\sqrt{3}}{\sqrt{2}} \begin{pmatrix} 1/3 \\ 1/3 \\ -2/3 \end{pmatrix}, \sqrt{2} \begin{pmatrix} 1/2 \\ -1/2 \\ 0 \end{pmatrix}$

In Sage, you can perform Gram Schmidt on the rows of a matrix $A$ by calling A.gram_schmidt(). But you should be a little surprised. We will talk more about the output here next.