Introduction

SVD(singular value decomposition) is a mathematical method to process matrix. SVD can help extract the key traits of the number in a matrix. Apparently we can use it to process images because images are expressed as matries in computer science. This article is aimed to figure out why and how the SVD can extract the traits of the matrix. Plus I want to give several examples of its application in computere science.

SVD

Start from one way to factorize matrix

Here is a matrix with rank=1rank=1:

A=(123123)A=\begin{pmatrix} 1&2&3 \\ 1&2&3 \end{pmatrix}

It can be written as a column times a row:

A=(123123)=(11)(123)A=\begin{pmatrix} 1&2&3 \\ 1&2&3 \end{pmatrix}=\begin{pmatrix} 1 \\ 1 \end{pmatrix}\begin{pmatrix} 1&2&3 \end{pmatrix}

We can say the matrix is decomposed as a column times a row and more generally, for any matrix, it can be decomposed as the sum of column×rowcolumn \times row with different coefficients. A coefficient decides the weight of the specific column×rowcolumn \times row in the whole sum. Here is an example:

A=(a000b0)=a(10)(100)+b(01)(010)A=\begin{pmatrix} a&0&0 \\ 0&b&0 \end{pmatrix}=a\begin{pmatrix} 1 \\ 0 \end{pmatrix}\begin{pmatrix} 1&0&0 \end{pmatrix}+b\begin{pmatrix} 0 \\ 1 \end{pmatrix}\begin{pmatrix} 0&1&0 \end{pmatrix}

It can be formalised as

A=σ1u1v1T+σ2u2v2TA=\sigma_1{u_1}v_1^{T}+\sigma_2u_2v_2^{T}

Here aa(σ1\sigma_1) is the weight of the first column×rowcolumn \times row on the left and bb(σ2\sigma_2) is the only other one. If aa is much more larger than bb, we can neglect the second product to approximate AA as only one column×rowcolumn \times row with its coefficient. This is useful to store the matrix AA with less space.

But where is the trait? Let’s get it now:
AA is by m×nm\times n, and suppose it has been decomposed as follow:

A=σ1u1v1T+σ2u2v2T++σnunvnT(1)A=\sigma_1{u_1}v_1^{T}+\sigma_2u_2v_2^{T}+\cdots+\sigma_n u_n v_n^{T} \tag{1}

where σ1>σ2>>σn\sigma_1>\sigma_2>\cdots>\sigma_n.

We can take in equation (1) in this way:
σi\sigma_i is the weight, uiu_i is the trait vector and viTv_i^{T} is the combination vector who gives a matrix AiA_i of m×nm \times n by uiviTu_i v_i^{T}, whose columns are the multiple of uiu_i with their coefficients in viTv_i^{T}.

SVD form

In matrix notation and more formalised, the former example could be written

A=(a000b0)=(1001)(a000b0)(100010001)A=\begin{pmatrix} a&0&0 \\ 0&b&0 \end{pmatrix}=\begin{pmatrix} 1&0 \\ 0&1 \end{pmatrix}\begin{pmatrix} a&0&0 \\ 0&b&0 \end{pmatrix}\begin{pmatrix} 1&0&0 \\ 0&1&0 \\ 0&0&1 \end{pmatrix}

In a general form of Am×nA_{m\times n}, the SVD wants to write AA as

A=UΣVT(2)A=U\Sigma V^T \tag{2}

Um×mU_{m\times m} is called the left singular matrix with left singular vectors as its colomns and Vn×nTV^{T}_{n\times n} is the right one. Σm×n\Sigma_{m\times n} has singular values σi\sigma_i arranged in descending order in position Σii\Sigma_{ii} and zeros in other positions.

Usually, UU and VV are orthogonal matrix. Therefore, V1=VTV^{-1}=V^{T}, and multiply VV in both side of equation (2) we can get another form of it and it is where equation (2) comes from:

AV=UΣ(3)AV=U\Sigma \tag{3}

The idea is the same as that of diagonalising a real and symmetric square matrix with orthogonal matrix:

A=QΛQTAQ=QΛA=Q\Lambda Q^T \Leftrightarrow AQ=Q\Lambda

where Λ\Lambda is a diagonal matrix with eigen values of AA and QQ is the orthogonal matrix with corresponding orthonormal eigen vectors.

Back to equation (3), rewrite it with specific vectors:

A(v1v2vn)=(u1u2um)(σ1000σ200)m×nA\begin{pmatrix} v_1&v_2&\cdots &v_n \end{pmatrix}=\begin{pmatrix} u_1&u_2&\cdots &u_m \end{pmatrix}\begin{pmatrix} \sigma_1&0&\cdots &0 \\ 0 &\sigma_2 &\cdots&0\\ \vdots &\vdots&\cdots&0 \end{pmatrix}_{m\times n}

Suppose there are σ1\sigma_1 to σr\sigma_r. Then we have

Avi=σiui,i=1,2,,r(4)Av_i=\sigma_i u_i, i=1,2,\cdots,r \tag{4}

What we need to do is find these singular values σi\sigma_i and corresponding sigular vectors viv_i and uiu_i, where uiu_i can be given by viv_i and σi\sigma_i with equation (4).

Transpose both sides of equation (2):

AT=VΣTUTA^T=V\Sigma^T U^T

Then multiply with AA:

AAT=(UΣVT)(VΣTUT)=U(ΣΣT)UT(5)AA^T=(U\Sigma V^T)(V\Sigma^T U^T)=U(\Sigma\Sigma^T)U^T \tag{5}

(ΣΣT)(\Sigma\Sigma^T) is a m×m{m\times m} matrix with σ12,σ22,,σr2\sigma_1^2,\sigma_2^2,\cdots,\sigma_r^2 and zeros on (ΣΣT)ii(\Sigma\Sigma^T)_{ii} and zero otherwise. Therefore, σi2(i=1,2,...,r)\sigma_i^2(i=1,2,...,r) are eigenvalues of AATAA^T, with uiu_i being their corresponding eigenvectors.
Similarly, we can also get σi2(i=1,2,...,r)\sigma_i^2(i=1,2,...,r) are eigenvalues of ATAA^TA, with viv_i being their corresponding eigenvectors.

  • Conclusion:
    Now let’s summarize the whole process of SVD.
    Our aim is to write AA as UΣVTU\Sigma V^T

SVD application——PCA

PCA(principal component analysis) is a dimensionality reduction technique used in statistics and machine learning. It transforms a dataset with many correlated variables into a smaller set of uncorrelated variables called principal components, while preserving as much variance as possible.
SVD is an effective way to implement PCA.

A visulized example

Take Am×nA_{m\times n} in this way: A set of nn samples with mm variables of measurement.
Here is an example from reference[1] listed last.

Suppose we have nn different points in a 2-D plane and we want to find out the principal direction of these points, namely a line that will be as close as possible to the points.

First, we center each of the measurement: x,yx,y in this example. Substract xˉ\bar{x} and yˉ\bar{y} for each xix_i and yiy_i. We can get a matrix A2×nA_{2\times n} with each of its row having average of 00.

We can draw these centered points in a coordinate. The center of these points is the origin. It helps a lot when we try to find the direction formed by these points.

dian
Data points in A are often close to a line in $R^2$ or a subspace in $R^m$

Now consider doing SVD on AA:

A=UΣVT=(u1u2)Σ(v1v2vn)TA=U\Sigma V^T=\begin{pmatrix} u_1&u_2 \end{pmatrix}\Sigma \begin{pmatrix} v_1&v_2&\cdots&v_n \end{pmatrix}^T

The leading sigular vector with bigger sigular value shows the direction in the former scatter graph for the reasons we talk about above.

Least perpendicular squares

Also this direction is the direction with least perpendicular squares.

Namely, the sum of squared distances from the points to the line is a minimum.

Proof: Consider a triangle formed by the origin, the line and the point aia_i(a vector with(xi,yi)(x_i,y_i)).
Using Pythagorean theorem:

ai2=aiTu12+aiTu22||a_i||^2=|a_i^Tu_1|^2+|a_i^Tu_2|^2

The first term on the right is the projection of aia_i on the direction of u1u_1(unit vector), namely the principal direction and the second term is the distance square we want. Sum all the distance squares:

dsum=i=1naiTu22=i=1nai2i=1naiTu12d_{sum}=\sum_{i=1}^{n}|a_i^Tu_2|^2=\sum_{i=1}^{n}||a_i||^2-\sum_{i=1}^{n}|a_i^Tu_1|^2

The first term on the right is a constant for given points and the second term can be written as

i=1naiTu12=(a1Tu1a2Tu1anTu1)(a1Tu1a2Tu1anTu1)=u1TAATu1\sum_{i=1}^{n}|a_i^Tu_1|^2=\begin{pmatrix} a_1^Tu_1&a_2^Tu_1\cdots&a_n^Tu_1 \end{pmatrix}\begin{pmatrix} a_1^Tu_1\\ a_2^Tu_1\\ \vdots\\ a_n^Tu_1 \end{pmatrix}=u_1^TAA^Tu_1

Minimise dsumd_{sum} means maximise u1TAATu1u_1^TAA^Tu_1. Of course it arrives it maximum when u1u_1 is the singular vector for the maximum singular value. This cooresponds with the PCA by SVD.

The general form

Application in computer science

Face recognition

SVD——from the operator perspective

Polar decomposition

A=UΣVT=(UVT)(VΣVT)=QSA=U\Sigma V^T=(UV^T) (V\Sigma V^T)=QS

References

  • [1] Chapter 7 Introduction to linear algebra 5th by Gilbert Strang
  • [2] Eigenface Wiki
  • [3] Chapter 7 Linear algebra done right 4th by Sheldon Axler