-
Cosine Similarity
Cosine similarity is a metric used to measure how similar two vectors are by computing the cosine of the angle between them. It is widely used in machine learning, especially in text similarity, recommendation systems, and clustering.
Intuition
- If two vectors point in exactly the same direction, their cosine similarity is 1.
- If they are orthogonal (completely different), the similarity is 0.
- If they point in opposite directions, the similarity is -1.
It ignores magnitude, focusing on orientation, which makes it great for comparing text embeddings where length may vary but direction (semantic meaning) matters.
Formula
For two vectors A and B:
( \text{cosine_similarity}(A, B) = \frac{A \cdot B}{|A| |B|} )
Where:
- $A \cdot B$ = dot product of vectors A and B
- $|A|$ = Euclidean norm (length) of A
- $|B|$ = Euclidean norm of B
Example (Python)
from sklearn.metrics.pairwise import cosine_similarity import numpy as np A = np.array([[1, 2, 3]]) B = np.array([[4, 5, 6]]) similarity = cosine_similarity(A, B) print(similarity) # Output: [[0.9746]]
Use Cases
- NLP: comparing sentence or word embeddings
- Recommendation: finding similar users/items
- Clustering: grouping similar vectors
- Document similarity: e.g., search engines
-
Dependent Variable
A dependent variable is the variable being measured or predicted in an experiment or model. Its value depends on changes in one or more independent variables. In machine learning, it is often called the target or output variable, as it is the value the model aims to predict.
-
Embedding
An embedding is a learned representation of data in a lower-dimensional space. It transforms high-dimensional, discrete, or symbolic data (like words, users, or items) into dense, continuous vectors that preserve semantic or structural relationships.
Why use embeddings?
- Reduce dimensionality
- Enable similarity comparison
- Improve learning by preserving structure
See: Word embedding, matrix embedding
-
Gradient Descent
Gradient Descent is an optimization algorithm used to minimize a function by iteratively moving in the direction of steepest descent…
-
Independent Variable
An independent variable is a variable that is manipulated or used as input to predict the value of the dependent variable. In machine learning, independent variables are also called features or predictors, and they provide the information used by the model to make predictions.
-
Matrix Embedding
A matrix embedding refers to stacking multiple embeddings into a matrix form. This is common when dealing with sequences like:
- Sentences (word embeddings stacked into a 2D matrix)
- Paragraphs (sentence embeddings stacked)
- Users/items in recommender systems
Shape example:
If you have a sentence of 10 words and each word embedding is 300-dimensional, the sentence embedding matrix is:
shape = (10, 300)
-
Tensor
In machine learning (ML), a tensor is a generalization of scalars, vectors, and matrices to higher dimensions and is a core data structure used to represent and process data.
Formal Definition:
A tensor is a multidimensional array of numerical values. Its rank (or order) denotes the number of dimensions:
- 0D tensor: Scalar (e.g., 5)
- 1D tensor: Vector (e.g., [1, 2, 3])
- 2D tensor: Matrix (e.g., [[1, 2], [3, 4]])
- 3D+ tensor: Higher-dimensional arrays (e.g., a stack of matrices)
Why Tensors Matter in ML:
- Input/output representation: Data like images (3D: height × width × channels), text sequences (2D: batch × sequence length), and time series are commonly represented as tensors.
- Efficient computation: Libraries like PyTorch and TensorFlow use tensor operations heavily, leveraging GPUs/TPUs for fast computation.
- Backpropagation: Tensors support automatic differentiation, essential for training neural networks.
Example in Code (PyTorch):
import torch # 2D tensor (matrix) x = torch.tensor(\[[1.0, 2.0], \[3.0, 4.0]]) print(x.shape) # torch.Size(\[2, 2])
In summary, a tensor is the fundamental building block for data in machine learning frameworks, offering a consistent and optimized structure for mathematical operations.
-
Token
In Natural Language Processing (NLP), a token is a basic unit of text used for processing and analysis. It typically represents a word, subword, character, or symbol, depending on the tokenization strategy.
Definition:
A token is a meaningful element extracted from raw text during tokenization, the process of breaking text into smaller pieces.
Common Types of Tokens:
Token Type Example for “I’m learning NLP!” Word token [“I”, “‘m”, “learning”, “NLP”, “!”] Subword [“I”, “’”, “m”, “learn”, “##ing”, “NLP”, “!”] (e.g., BERT) Character [“I”, “’”, “m”, “ “, “l”, “e”, “a”, “r”, “n”, “i”, “n”, “g”, “ “, “N”, “L”, “P”, “!”]
Why Tokens Matter:
• Input to models: NLP models operate on sequences of tokens, not raw text. • Efficiency: Tokenizing helps standardize and normalize text, aiding in tasks like classification, translation, and summarization. • Vocabulary mapping: Tokens are converted to numerical IDs using a vocabulary (lookup table), enabling neural models to process them.
Tokenization Example (Python + NLTK):
from nltk.tokenize import word_tokenize text = "I'm learning NLP!" tokens = word_tokenize(text) print(tokens) # Output: ['I', "'m", 'learning', 'NLP', '!']
Summary:
A token in NLP is a unit of text—often a word or subword—that forms the basis for downstream processing and modeling. Tokenization strategy varies depending on the language and model architecture.
-
Word Embedding
A word embedding is a type of embedding specifically used in Natural Language Processing (NLP). It maps words (or subwords) to real-valued vectors in a continuous vector space, where semantically similar words are close together.
Example word embeddings:
- Word2Vec
- GloVe
- FastText
- BERT (contextual embeddings)
Properties:
- Vectors are typically 50 to 1,024 dimensions
- Similar meanings → similar vectors (cosine similarity)
Example:
word_vectors["king"] - word_vectors["man"] + word_vectors["woman"] ≈ word_vectors["queen"]
See: Cosine Similarity
AI Glossary
- Cosine Similarity
- Dependent Variable
- Embedding
- Gradient Descent
- Independent Variable
- Matrix Embedding
- Tensor
- Token
- Word Embedding