Cosine Similarity: A simple technique in the era of deep learning

Intro to Singular Value Decomposition

Singular Value Decomposition (SVD)

SVD is a foundational technique in Machine Learning. This series approaches the SVD from a few view points to build more useful intuition what is actually happening behind the equations.

The equation behind SVD is a very elegant mathematical statement, but also a very “thick” one as well.

The SVD of a matrix is defined as

\[ X = U S V^T \\ (n \times m) = (n \times k) (k \times k) (k \times m) \\ n: \text{ number of rows/data points} \\ m: \text{ number of columns/features} \\ k: \text{ number of singular vectors} \]

In data problems, \(X\) is typically the dataset, and the SVD helps create a more “condensed” representation. Although a simple statement to state, it is not so simple to understand what this statement gives us.

As a data scientist, I prefer much more the following way to write the SVD:

\[ X V_{\text{weighted}} = U \\ \text{where } V_{\text{weighted}} = V W \\ \text{and } W \text{ is a matrix that weights the columns of V} \]

SVD Image
SVD Image

This way to write the SVD gives as the interpetation that SVD extract patterns from the data represented by columns of the matrix V. This is the red and green column in the picture. Then each data point is compared with the patterns via the dot product. The dot product is a way to measure overlap. In the end, the matrix U gives us a new representation of the data based on the patterns discovered in the original dataset.

SVD is used in Search, Recommendation Systems and Natural Language Processing. In those fields the application of SVD is known under the names of Latent Semantic Indexing or Latent Semantic Analysis.