Cosine similarity is a metric used to measure how similar two vectors are by computing the cosine of the angle between them. It is widely used in machine learning, especially in text similarity, recommendation systems, and clustering.
It ignores magnitude, focusing on orientation, which makes it great for comparing text embeddings where length may vary but direction (semantic meaning) matters.
For two vectors A and B:
( \text{cosine_similarity}(A, B) = \frac{A \cdot B}{|A| |B|} )
Where:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
A = np.array([[1, 2, 3]])
B = np.array([[4, 5, 6]])
similarity = cosine_similarity(A, B)
print(similarity) # Output: [[0.9746]]