SFTTrainer is a high-level training API designed to simplify and accelerate Supervised Fine-Tuning (SFT) of large language models using Hugging Face’s ecosystem. It wraps around the core transformers.Trainer or accelerate components and streamlines processes such as:
- Tokenization
- Formatting input-output pairs
- Applying LoRA (Low-Rank Adaptation)
- Managing precision (fp16/bf16)
- Logging, evaluation, and saving checkpoints
SFTTrainer is particularly popular in open-source frameworks like TRLLM (TRL), Unsloth, and Axolotl that offer lightweight alternatives to full Hugging Face trainer setup.
Supervised fine-tuning (SFT) is the most common step in post-training foundation models, and also one of the most effective. In TRL, we provide a simple API to train models with SFT in a few lines of code; for a complete training script, check out trl/scripts/sft.py. Experimental support for Vision Language Models is also included in examples/scripts/sft_vlm.py.
When to Use SFTTrainer
Use SFTTrainer when:
- You want to fine-tune a base or quantized model like LLaMA, Mistral, Gemma, or Phi-2.
- You’re working with instruction-tuning datasets like Alpaca, OpenOrca, or custom Q&A pairs.
- You want simple APIs to launch fine-tuning with minimal boilerplate.
🔧 Core Features
| Feature | Description |
|---|---|
| Model Support | Hugging Face + PEFT + QLoRA-compatible models |
| Data Format | JSON, Hugging Face datasets, or Python dictionaries |
| Fine-tuning Method | Full fine-tuning or LoRA |
| Mixed Precision | Native bf16, fp16, and int4 (bnb_config) |
| Optimizers | AdamW, 8-bit optimizers, Adafactor |
| Evaluation & Logging | Optional, can use wandb, tensorboard, or console |
Installation
Install dependencies:
bashCopyEditpip install transformers accelerate datasets peft bitsandbytes trl
If you’re using Unsloth, it comes with its own FastSFTTrainer.
Typical Dataset Format
You need to pass a dataset where each item has at least:
prompt: instruction or inputresponse: the expected model output
jsonCopyEdit[
{
"prompt": "Explain Newton's second law of motion.",
"response": "Newton’s second law states that F = ma..."
}
]
Or, use Hugging Face’s datasets library:
pythonCopyEditfrom datasets import load_dataset
dataset = load_dataset("tatsu-lab/alpaca")['train']
🔍 Implementation Example: Fine-Tuning LLaMA-2 with SFTTrainer
Step 1: Load Model and Tokenizer
pythonCopyEditfrom transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "meta-llama/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
Step 2: Prepare Dataset
pythonCopyEditfrom datasets import Dataset
examples = [
{"prompt": "What is the capital of France?", "response": "Paris."},
{"prompt": "Who wrote Hamlet?", "response": "William Shakespeare."}
]
dataset = Dataset.from_list(examples)
Step 3: Define SFTTrainer
pythonCopyEditfrom trl import SFTTrainer
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir="./llama2-finetuned",
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
num_train_epochs=3,
logging_steps=10,
save_strategy="epoch",
fp16=True,
report_to="none" # or "wandb"
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="prompt", # automatically formats prompt+response
max_seq_length=512,
args=training_args,
)
Step 4: Train the Model
pythonCopyEdittrainer.train()
Example with LoRA (Parameter-Efficient Fine-Tuning)
pythonCopyEditfrom peft import LoraConfig
lora_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="prompt",
peft_config=lora_config, # apply LoRA
max_seq_length=512,
args=training_args,
)
Evaluation
You can evaluate by generating responses:
pythonCopyEditprompt = "List three benefits of exercise."
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
output = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Save Model for Later Use
pythonCopyEdittrainer.save_model("./llama2-alpaca-lora")
tokenizer.save_pretrained("./llama2-alpaca-lora")
Repeat for Other Models
This workflow works with:
mistralai/Mistral-7B-v0.1unsloth/llama-2-7b-bnb-4bitgoogle/gemma-7bmicrosoft/phi-2
Just change the model name and tokenizer.
Best Practices
| Tip | Why it Matters |
|---|---|
Use fp16 or bf16 | Faster training and less memory usage |
| Combine with LoRA | Efficient tuning on consumer GPUs |
| Enable gradient checkpointing | Reduces memory at the cost of speed |
Set logging_steps and eval_strategy | For regular checkpointing and loss monitoring |
Use weight decay in TrainingArguments | Helps prevent overfitting |