The Good Tech Companies - Fine-Tuning LLMs: A Comprehensive Tutorial
Episode Date: February 2, 2026This story was originally published on HackerNoon at: https://hackernoon.com/fine-tuning-llms-a-comprehensive-tutorial. A hands-on guide to fine-tuning large language mo...dels, covering SFT, DPO, RLHF, and a full Python training pipeline. Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #llm-fine-tuning-tutorial, #supervised-fine-tuning-sft, #qwen-llm-fine-tuning, #llm-training-pipeline, #hugging-face-transformers, #fine-tuning-lora, #preference-optimization-dpo, #good-company, and more. This story was written by: @oxylabs. Learn more about this writer by checking @oxylabs's about page, and for more stories, please visit hackernoon.com. Training an LLM from scratch is expensive and usually unnecessary. This hands-on tutorial shows how to fine-tune pre-trained models using SFT, DPO, and RLHF, with a full Python pipeline built on Hugging Face Transformers. Learn how to prepare data, tune hyperparameters, avoid overfitting, and turn base models into production-ready specialists.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
Fine-tuning LLMs, a comprehensive tutorial by OxyLabs.
It costs millions of dollars and months of computing time to train a large language model from the ground up.
You most likely never need to do it.
Fine tuning lets you adapt pre-trained language models to your needs in hours or days, not months,
with a fraction of the resources.
This tutorial takes you from theory to practice.
you'll learn the four core fine-tuning techniques, code a complete training pipeline in Python,
and learn the techniques that separate production-ready models from expensive experiments.
What is LLM fine-tuning?
Fine-tuning trains an existing language model on your data to enhance its performance on specific tasks.
Pre-trained models are powerful generalists, but exposing them to focused examples can transform
them into specialists for your use case.
Instead of building a model from scratch, which requires massive compute and
data, you're giving an already capable model a crash course in what matters to you, whether
that's medical diagnosis, customer support automation, sentiment analysis, or any other particular
task. How does LLM fine-tuning work? Fine-tuning continues the training process on pre-trained
language models using your specific dataset. The model processes your provided examples, compares
its own outputs to the expected results, and updates internal weights to adapt and minimize loss.
This approach can vary based on your goals, available data, and computational resources.
Some projects require full fine-tuning, where you update all model parameters,
while others work better with parameter-efficient methods like low R-A-that-modify only a small subset.
LLM fine-tuning methods.
Supervised fine-tuning SFT teaches the model to learn the patterns of the correct question-answer pairs
and adjusts model weights to match those answers exactly.
You need a dataset OF pairs.
Use this when you want consistent outputs, like making the model always respond in JSON format,
following your customer service script or writing emails in your company's tone.
Unsupervised fine-tuning feeds the model tons of raw text, no questions or labeled data needed,
so it learns the vocabulary and patterns of a particular domain.
While this IS technically a pre-training process known as continued pre-training, CPT,
thesis usually done after the initial pre-training phase. Use this first when your model needs to
understand specialized content it wasn't originally trained on, like medical terminology, legal contracts,
or a new language. Direct preference O-P-T-I-M-I-Z-A-T-I-O-N-D-P-O teaches the model to prefer
better responses by showing examples of good versus bad answers to the same question and adjusting
it to favor the good ones. Needs triplets. Use DPO after basic training to fix annoying
behaviors like stopping the model from making things up, being too wordy, or giving unsafe answers.
Reinforcement fine-tuning in RLHF, you first train a reward model on prompts with multiple
responses ranked by humans, teaching it to predict which responses people prefer. Then, you use
reinforcement learning to optimize and fine-tune a model that generates responses, which the reward model judges.
This helps the model learn over time to produce higher scoring outputs. This process requires
datasets in this format. It's best for tasks where judging quality is easier than creating perfect
examples, like medical diagnoses, legal research, and other complex domain-specific reasoning.
Step-by-step fine-tuning LLM's tutorial, we'll walk you through every step of fine-tuning a small
pre-trained model to solve word-based math problems, something it struggles with out of the box.
We'll use the Quen 2-5-base model with 0, 5B parameters that already has natural language processing
capabilities. The approach works for virtually any use case of fine-tuning LLMs, teaching a model
specialized terminology, improving the model's performance on specific tasks, or adapting it to your domain.
Prerequisites install a few Python packages that we'll use throughout this tutorial. In a new project
folder, create and activate a Python virtual environment, and then install these libraries using
or your preferred package manager. 1. Get in Load THE dataset the fine-tuning process starts with
choosing the dataset, which is arguably the most important decision.
The dataset should directly reflect the task you want your model to perform.
Simple tasks like sentiment analysis need basic input-output pairs.
Complex tasks like instruction following or question answering require richer
datasets with context, examples, and varied formats.
Fine-tuning data quality and size directly impact training time and your model's performance.
The easiest starting point is the Hugging Face dataset library, which hosts
thousands of open source data sets for different domains and tasks. Need something specific and high
quality? Purchase specialized datasets or build your own by scraping publicly available data. For example,
if you want to build a sentiment analysis model for Amazon product reviews, you may want to collect
data from real reviews using a web scraping tool. Here's a simple example that uses Oxilab's
WebScraper API. 2. Tokenize the data F-O-R processing models don't understand text directly,
They work with numbers.
Tokenization converts your text into tokens, numerical representations,
that the model can process. Every model has its own tokenizer trained alongside it,
so use the one that matches your base model, how we tokenize our data shapes what the model learns.
For math problems, we want to fine-tune the model to learn how to answer questions,
not generate them. Here's the trick. Tokenize questions and answer separately,
then use a masking technique. Setting question tokens to tell
the training process to ignore them when calculating loss. The model only learns from the answers,
making training more focused and efficient. Apply this tokenization function to both training and
testing datasets. We filter out examples longer than 512 tokens to keep memory usage manageable and ensure the
model processes complete information without truncation. Shuffling the training data helps the model
learn more effectively. Optional. Want to test the entire pipeline quickly before committing to a full
training run. You can train the model on a subset dataset. So, instead of using the full 8, 5K
dataset, you can minimize it to 3K in total, making the process much faster. Keep in mind,
smaller datasets increase overfitting risk, where the model memorizes training data rather than
learning general patterns. For production, aim for at least 5K plus training samples and carefully
tune your hyperparameters. 3. Initialize the base model next. Load the pre-trained base model to
fine-tune it by improving its math problem-solving abilities. Four. Fine-tune using the trainer
method this is where the magic happens. Training arguments controls how your medellerns,
think of it as the recipe determining your final results quality. These settings and hyperparameters
can make or break your fine-tuning, SO experiment with different values to find what works for your use
case. Key parameters explained. Filled circle epics. More epics equal more learning opportunities, but too many
cause overfitting. Filled Circle batch size affects memory usage and training speed. Adjust these
based in your hardware. Filled Circle learning rate controls how quickly the model adjusts. Too high
and it might miss the optimal solution, too low and training takes forever. Field Circle weight decay can
help to prevent overfitting by deterring the model from leaning too much on any single pattern.
If weight decay is too large, it can lead to underfitting by preventing the model from learning the
necessary patterns. The optimal configuration below is specialized for CPU training. Removas underscore
CPU equals true if you have a GPU. 5. Evaluate THE model after fine tuning. Measure how well your
model performs using two common metrics. Filled circle loss. Measures how far off the model's
predictions are from the target outputs, where lower values indicate better performance.
Filled circle perplexity, the exponential of loss, shows the same information on a more
intuitive scale, where lower values mean the model is more confident in its predictions.
For production environments, consider adding metrics like blue or rouge to measure how closely
generated responses match reference answers. You can also include other metrics like F1,
which measures how good your model is at catching what matters while staying accurate.
This Hugging Face lecture is a good starting point to learn the essentials of use
using the Transformers library.
Complete fine-tuning code example after these five steps,
you should have the following code combined into a single Python file.
Before executing, take a moment to adjust your trainer configuration and hyperparameters based on what your machine can actually handle.
To give you a real-world reference, here's what worked smoothly for us on a MacBook Air with the M4 chip and 16 gigabytes RAM.
With this setup, it took around 6, 5 hours to complete fine tuning, filled circle batch size for
training, seven filled circle batch size for aval, seven filled circle gradient accumulation.
Five as your model trains, keep an eye on the evaluation loss.
If it increases while training loss drops, the model is overfitting.
In that case, adjust epics, lower the learning rate, modify weight decay, and other hyperparameters.
In the example below, we see healthy results with aval loss decreasing from zero,
496 to 0, 469 and a final perplexity of 1.60.6.
Test the fine-tuned model now for the moment of truth. Was our fine-tuning actually successful?
You can manually test the fine-tuned model by prompting it with this Python code.
In this side-by-side comparison, you can see how the before-and-after models respond to the same
question. The correct answer is 10. With sampling enabled, both models occasionally get it right
or wrong due torrandomness. But setting function reveals their true confidence. The model always
picks its highest probability answer. The base model confidently outputs, wrong, while the fine-tuned
model confidently outputs, correct. That's the fine-tuning at work. Fine-tuning best practices.
Model selection filled circle choose the right base model. Domain-specific models and appropriate
context windows save you from fighting against the model's existing knowledge. Filled circle understand the
model architecture. Encoder-only models like BERT, Excel at classification tasks, decoder-only
models like GPT at text generation and encoder decoder models like T5 at transformation tasks
like translation or summarization. Filled circle match your models input format. If your base model was
trained with specific prompt templates, use the same format in fine-tuning. Mismatchett
formats confuse the model and tank performance. Data preparation fill
Filled circle prioritized data quality over quantity. Clean and accurate examples beat massive and
noisy datasets every time. Filled circle split training and evaluation samples. Never let your model
see evaluation data during training. This lets you catch overfitting before it ruins your model.
Filled circle establish a golden set for evaluation. Automated metrics like perplexity don't tell
you if the model actually follows instructions or just predicts words statistically.
training strategy filled circle start with a lower learning rate you're making minor adjustments
not teaching it from scratch so aggressive rates may erase what it learned during pre-training
filled circle use parameter efficient fine tuning laura pftt train only 1% of parameters to get
90% plus performance while using way less memory and time filled circle target all linear layers in
laura targeting all layers etc yields models that reason significant
better, not just mimic style. Filled Circle use neftune, noisy embedding fine tuning, random noise
in embedding sacks as regularization, which can prevent memorization and boost conversational quality
by 35 plus percentage points. Filled Circle after SFT run DPO. Don't just stop after
SFT. SFT teaches how to talk. DPO teaches what is good by learning from preference pairs.
What are the limitations of LLM fine tuning? Filled Circle Catastrichter.
for getting. Fine-tuning overrides existing neural patterns, which can erase valuable general
knowledge the model learned during pre-training. Multitask learning, where you train on your
specialized task alongside general examples, can help preserve broader capabilities. Filled circle overfitting
on small datasets. The model may memorize your training examples instead of learning patterns,
causing it to fail on slightly different inputs. Filled circle high computational cost,
Fine-tuning billions of parameters requires expensive GPUs, significant memory, and hours to days or weeks of training time.
Filled circle bias amplification.
Pre-trained models already carry biases from their training data, and fine-tuning can intensify these biases if your dataset isn't carefully curated.
Filled Circle Manual Knowledge Update.
New and external knowledge may require retraining the entire model or implementing retrieval augmented generation, rag.
while repeated fine tuning often degrades performance.
Conclusion, fine tuning works, but only if your data is clean and your hyperparameters are
re-diled in.
Combine it with prompt engineering for the best results, where fine tuning handles the task
specialization while prompt engineering guides the model's behavior at inference time.
Continue by grabbing a model from hugging face that fits your use case for domain-specific
fine-tuning, scrape or build a quality dataset for your task, and run your first fine-tuning session
on a small subset. Once you see promising results, scale up and experiment with Laura, DPO, or
Neftune to squeeze out better performance. The gap between reading this tutorial and having a
working specialized model is smaller than you think. Thank you for listening to this Hackernoon story,
read by artificial intelligence. Visit hackernoon.com to read, write, learn and publish.
