Zero-Shot |

Zero-Shot

In the context of machine learning (especially few-shot learning) and prompt engineering for large language models (LLMs), zero-shot, 1-shot (one-shot), and their variants refer to different ways of providing examples to a model to guide its output. These paradigms are critical for adapting pre-trained models to new tasks without extensive fine-tuning.

1. Zero-Shot Learning

Definition: The model is tasked to perform a task without any explicit examples of the desired input-output pairs. It relies solely on its pre-trained knowledge and a natural language description of the task.

Key Characteristics

No demonstration examples are provided in the prompt.
The model uses its understanding of language, logic, and world knowledge learned during pre-training.
Commonly used for tasks that are intuitive or align with the model’s training data.

Example (Text Classification)

Prompt:

Classify the following sentence into “positive” or “negative”: “The new smartphone has a terrible battery life.”

Model Output:

negative

2. 1-Shot (One-Shot) Learning

Definition: The model is given exactly one example of the target task’s input-output pair to learn the pattern before being asked to perform the task on new data.

Key Characteristics

One demonstration example is included in the prompt to clarify the task’s requirements.
Useful when the task is specific or the model might misinterpret the zero-shot instruction.

Example (Text Classification)

Prompt:

Example: Sentence: “I love this movie!” Label: positive Now classify the following sentence into “positive” or “negative”: “The new smartphone has a terrible battery life.”

Model Output:

negative

3. N-Shot Learning (Few-Shot Learning)

Definition: A generalization where the model is given N examples (typically $N \geq 2$ and small, e.g., 2–5) of the task to learn the pattern. “Few-shot” is an umbrella term that includes 1-shot as a special case.

Key Characteristics

More examples help the model grasp complex patterns (e.g., nuanced classification, named entity recognition).
The number of examples $N$ is small compared to full fine-tuning (which uses thousands/millions of samples).

Example (Named Entity Recognition, 2-shot)

Prompt:

Example 1: Sentence: “Apple was founded by Steve Jobs in Cupertino.” Entities: Apple (Company), Steve Jobs (Person), Cupertino (City) Example 2: Sentence: “Tesla’s factory is located in Austin.” Entities: Tesla (Company), Austin (City) Now extract entities from the sentence: “Microsoft was established by Bill Gates in Seattle.”

Model Output:

Microsoft (Company), Bill Gates (Person), Seattle (City)

4. Comparison Table

Key Takeaway

These paradigms enable pre-trained models to adapt to new tasks efficiently—without the need for costly fine-tuning on large datasets. The choice of zero/1/N-shot depends on the task complexity and the model’s familiarity with the target domain.