Llama API
Log in

Fine-tuning and evaluation (Preview)

Fine-tuning is available as a limited preview. Register your interest by creating a support ticket in the Support hub.
Fine-tuning improves the performance of a pre-trained model for a particular use case, by training it on a specialized dataset. This helps in adapting general-purpose models to industry-specific or task-specific needs.Llama API offers a range of pre-trained models with different parameter sizes. Large-parameter models often perform better but take longer to generate responses and cost more than smaller models.With fine-tuning, you can optimize a smaller model to match or exceed the performance of a larger model.

How it works

Fine-tuning involves training a pre-trained base model on a domain-specific dataset that you supply, then verifying the performance of the fine-tuned model by running evaluations. Once the fine-tuned model meets your performance expectations, you can use it directly on Llama API, or download your model and deploy it on your own infrastructure.
For a full example, including details on dataset formats and multi-turn examples, see the fine-tuning guide.

When to fine-tune

Fine-tuning an 8 billion parameter Llama model has many advantages over using a larger model, including:
  • •Reducing inference costs while maintaining quality
  • •Increasing inference speeds for low-latency or high-throughput applications
  • •Improving performance on domain-specific knowledge (e.g., legal, medical, finance), particularly if you have data that was not included in the original model training set.

Datasets

Preparing a dataset

A fine-tuning dataset comprises a set of prompts and responses that show the model how it should respond to requests.Your dataset should consist of several hundred examples of high-quality questions and responses in the same format used in the chat completions endpoint, which is a list of messages with each message containing role and content key-values. The examples in your dataset should closely align with your targeted use case.Depending on your use case, you may want to provide single-turn or multi-turn examples in your dataset.

Managing datasets

Under the Datasets tab, you can manage the uploaded datasets used in fine-tuning and evaluation jobs. You can view dataset details, delete and download the dataset file there.

Dataset format

There are some restrictions on the format of a dataset which you should follow:
  • •Each example is a dictionary with a key of messages and a list of messages as the value.
  • •Your fine-tuning dataset should have at least 10 examples, but you will likely get better results with at least several hundred examples. Data quality is more important than quantity for fine-tuning.
  • •The list of messages should interleave user messages and assistant messages.
  • •A system message is optional, but it must be the first message if it exists.
  • •The list of messages must end with an assistant message.
  • •The maximum sequence length for tokenized input list of messages is 8192 tokens, after which the input will be truncated.
  • •The maximum dataset size supported for fine-tuning is 1GB.

Dataset examples

Single-turn datasets

A single-turn dataset can be used to train models for tasks such as a Q\&A chatbot or a classifier. The desired answer from the model is based on a system prompt and a single previous message.The input prompts will consist of a system message and a user message, and the expected output (label) will be the subsequent assistant message.
Single-turn Q&A dataset
Single-turn classifier dataset
You can find a complete example single-turn fine-tuning dataset for a real-world tax preparation use case here: example_finetune_dataset.jsonl

Multi-turn datasets

A multi-turn dataset can be used to fine-tune a model that acts as a conversational chatbot, where the desired answer from the model is based on a complete conversation history.In a multi-turn dataset, you should include the entire conversation history, and the final assistant message will be used as the expected output (label).
Multi-turn chatbot dataset

Fine-tuning a model

Choosing a base model

To fine-tune a model, you start by selecting a pre-trained base model. Currently one base model is available for fine-tuning: Llama 3.3 8B Instruct.

Configuring hyperparameters

Hyperparameters are variables that are used to manage model training. You can customize the hyperparameters to tailor the training process to suit your needsBelow are the hyperparameters that you can customize for your fine-tuning job:
  1. Epochs: The number of times to loop through the whole training dataset (the training data will be reshuffled after each epoch).
  2. Batch size: The number of training examples within a batch to update model parameters. A larger batch size means the model parameters are updated less frequently but with lower variance.
  3. Learning rate multiplier: The scaling factor for learning rate (base value is 3e-4). A larger learning rate may lead to faster convergence but can overshoot the optimal point, causing the loss to increase significantly. A smaller learning rate may lead to more stable convergence but can result in slow training or getting stuck in local minima.

Creating a fine-tuning job

Follow the steps below to fine-tune your model:
  1. In the API dashboard, go to the Fine-tuning tab to create a new fine-tuning job, or to view all the existing fine-tuning jobs under your team.
  2. Click the Create button to create a new fine-tuning job. Upload your prepared dataset, or select an existing dataset for your team from the dropdown list.
  3. Optionally check the Split data checkbox to automatically extract a portion of your dataset to be used for evaluation later. Selecting this option means that you don’t have to worry about separately generating an evaluation dataset.
  4. Configure your fine-tuning job by specifying the base model to train, naming your job and configuring hyperparameters.
  5. Start your job. You can view the job metadata and job progress in the Job tab. You can also navigate to the Metrics tab to see the learning curve, and the Logs tab for job event logs.
  6. After the job finishes, you can try out your fine-tuned model in the Playground, or evaluate it with the evaluation flow.

Using a fine-tuned model

Using your model with Llama API

Your fine-tuned model will be available for use with Llama API, in the same way that base models are available. Simply change the model name in your API call to the name of your fine-tuned model and call the API in the same way you would for any other model.Fine-tuned models are deployed and hosted on dedicated servers which may not have the same performance optimization as the base models. You may find slightly slower response times using fine-tuned models.
If you haven’t used your model before, or haven’t used it in a while, your first API call could take up to 10 seconds while the model loads, or during periods of heavy traffic on the platform.

Downloading a model

Your fine-tuned model can be downloaded from the fine-tuning job page and used on your own infrastructure or a cloud inference service that supports model uploads. The downloaded model will be in Hugging Face format.Follow these steps to download your model:
  1. Click the three dots at the top right of the fine-tuned job page. It will show two dropdown options: Download model and Delete.
  2. Click the Download model option - the Download fine-tuned model pop-up will appear with details about the file size.
  3. Click the Download button to download the model.

Deleting a model

Fine-tune jobs and their associated data and model files can be deleted from the job detail page as follows:
  1. Click the three dots at the top right of the fine-tuned job page. It will show two dropdown options: Download model and Delete.
  2. Click the Delete option.
  3. Confirm the deletion.

Evaluation

Evaluating a model allows you to see how well it performs against a set of evaluation criteria.
  1. Click the Evaluation tab to create a new evaluation run.
  2. Name your job, choose the model you want to evaluate, and select an uploaded dataset to evaluate the model against.
  3. Add one or more graders that will score your model’s outputs against specific criteria, such as string-matching or semantic similarity.
  4. Kick off your job, which will run batch inference to generate model outputs and grade-score them.
  5. When the job completes, check out the summary of your graders. You can inspect the results in more detail.
If you have already run an evaluation job, and want to re-run new graders against the same candidate model responses. You can select a dataset tagged with (Existing Responses). This will skip batch inference on your candidate model, and re-run grading with your new grader configurations.
Was this page helpful?
How it works
When to fine-tune
Datasets
Fine-tuning a model
Using a fine-tuned model
Evaluation

Get started

Overview
Quickstart

Essentials

Models
API keys
SDKs & libraries
Rate limits

Features

Chat completion
Image understanding
Structured output
Tool calling
OpenAI compatibility
Moderation
Fine-tuning & evaluation

Guides

Chat & conversation
Tool calling
Moderation & security
Best practices

API reference

Chat completion
Models
Moderations

Resources

Data commitments
Legal