SFTTrainer is a high-level training API designed to simplify and accelerate Supervised Fine-Tuning (SFT) of large language models using Hugging Face’s ecosystem. It wraps around the core transformers.Trainer
or accelerate
components and streamlines processes such as:
- Tokenization
- Formatting input-output pairs
- Applying LoRA (Low-Rank Adaptation)
- Managing precision (fp16/bf16)
- Logging, evaluation, and saving checkpoints
SFTTrainer is particularly popular in open-source frameworks like TRLLM (TRL), Unsloth, and Axolotl that offer lightweight alternatives to full Hugging Face trainer setup.
Supervised fine-tuning (SFT) is the most common step in post-training foundation models, and also one of the most effective. In TRL, we provide a simple API to train models with SFT in a few lines of code; for a complete training script, check out trl/scripts/sft.py
. Experimental support for Vision Language Models is also included in examples/scripts/sft_vlm.py
.
When to Use SFTTrainer
Use SFTTrainer
when:
- You want to fine-tune a base or quantized model like LLaMA, Mistral, Gemma, or Phi-2.
- You’re working with instruction-tuning datasets like Alpaca, OpenOrca, or custom Q&A pairs.
- You want simple APIs to launch fine-tuning with minimal boilerplate.
🔧 Core Features
Feature | Description |
---|---|
Model Support | Hugging Face + PEFT + QLoRA-compatible models |
Data Format | JSON, Hugging Face datasets, or Python dictionaries |
Fine-tuning Method | Full fine-tuning or LoRA |
Mixed Precision | Native bf16 , fp16 , and int4 (bnb_config ) |
Optimizers | AdamW, 8-bit optimizers, Adafactor |
Evaluation & Logging | Optional, can use wandb , tensorboard , or console |
Installation
Install dependencies:
bashCopyEditpip install transformers accelerate datasets peft bitsandbytes trl
If you’re using Unsloth, it comes with its own FastSFTTrainer
.
Typical Dataset Format
You need to pass a dataset where each item has at least:
prompt
: instruction or inputresponse
: the expected model output
jsonCopyEdit[
{
"prompt": "Explain Newton's second law of motion.",
"response": "Newton’s second law states that F = ma..."
}
]
Or, use Hugging Face’s datasets
library:
pythonCopyEditfrom datasets import load_dataset
dataset = load_dataset("tatsu-lab/alpaca")['train']
🔍 Implementation Example: Fine-Tuning LLaMA-2 with SFTTrainer
Step 1: Load Model and Tokenizer
pythonCopyEditfrom transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "meta-llama/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
Step 2: Prepare Dataset
pythonCopyEditfrom datasets import Dataset
examples = [
{"prompt": "What is the capital of France?", "response": "Paris."},
{"prompt": "Who wrote Hamlet?", "response": "William Shakespeare."}
]
dataset = Dataset.from_list(examples)
Step 3: Define SFTTrainer
pythonCopyEditfrom trl import SFTTrainer
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir="./llama2-finetuned",
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
num_train_epochs=3,
logging_steps=10,
save_strategy="epoch",
fp16=True,
report_to="none" # or "wandb"
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="prompt", # automatically formats prompt+response
max_seq_length=512,
args=training_args,
)
Step 4: Train the Model
pythonCopyEdittrainer.train()
Example with LoRA (Parameter-Efficient Fine-Tuning)
pythonCopyEditfrom peft import LoraConfig
lora_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="prompt",
peft_config=lora_config, # apply LoRA
max_seq_length=512,
args=training_args,
)
Evaluation
You can evaluate by generating responses:
pythonCopyEditprompt = "List three benefits of exercise."
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
output = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Save Model for Later Use
pythonCopyEdittrainer.save_model("./llama2-alpaca-lora")
tokenizer.save_pretrained("./llama2-alpaca-lora")
Repeat for Other Models
This workflow works with:
mistralai/Mistral-7B-v0.1
unsloth/llama-2-7b-bnb-4bit
google/gemma-7b
microsoft/phi-2
Just change the model name and tokenizer.
Best Practices
Tip | Why it Matters |
---|---|
Use fp16 or bf16 | Faster training and less memory usage |
Combine with LoRA | Efficient tuning on consumer GPUs |
Enable gradient checkpointing | Reduces memory at the cost of speed |
Set logging_steps and eval_strategy | For regular checkpointing and loss monitoring |
Use weight decay in TrainingArguments | Helps prevent overfitting |