Cosine Similarity |

Cosine Similarity

Cosine similarity is a metric used to measure how similar two vectors are by computing the cosine of the angle between them. It is widely used in machine learning, especially in text similarity, recommendation systems, and clustering.

Intuition

If two vectors point in exactly the same direction, their cosine similarity is 1.
If they are orthogonal (completely different), the similarity is 0.
If they point in opposite directions, the similarity is -1.

It ignores magnitude, focusing on orientation, which makes it great for comparing text embeddings where length may vary but direction (semantic meaning) matters.

Formula

For two vectors A and B:

$ \text{cosine_similarity}(A, B) = \frac{A \cdot B}{|A| |B|} $

Where:

$A \cdot B$ = dot product of vectors A and B
$|A|$ = Euclidean norm (length) of A
$|B|$ = Euclidean norm of B

Example (Python)

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

A = np.array([[1, 2, 3]])
B = np.array([[4, 5, 6]])

similarity = cosine_similarity(A, B)
print(similarity)  # Output: [[0.9746]]

Use Cases

NLP: comparing sentence or word embeddings
Recommendation: finding similar users/items
Clustering: grouping similar vectors
Document similarity: e.g., search engines