Cosine similarity is a metric used to measure how similar two vectors are by computing the cosine of the angle between them. It is widely used in machine learning, especially in text similarity, recommendation systems, and clustering.

Intuition

It ignores magnitude, focusing on orientation, which makes it great for comparing text embeddings where length may vary but direction (semantic meaning) matters.

Formula

For two vectors A and B:

( \text{cosine_similarity}(A, B) = \frac{A \cdot B}{|A| |B|} )

Where:

Example (Python)

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

A = np.array([[1, 2, 3]])
B = np.array([[4, 5, 6]])

similarity = cosine_similarity(A, B)
print(similarity)  # Output: [[0.9746]]

Use Cases