§01
Abstract
LumiChats v1.1 is a parameter-efficient conversational language model fine-tuned from Meta's Llama 3.2 3B Instruct using Low-Rank Adaptation (LoRA). Trained on the FineTome-100k curated dialogue dataset using response-only supervision, the model achieves strong natural conversation quality at 3.2 billion parameters — suitable for deployment on edge devices and consumer-grade hardware. Fine-tuning was performed in 8.54 minutes on a Tesla T4 GPU using the Unsloth framework, consuming only 2.35 GB of additional GPU memory. The model supports 8+ languages and is released under the Llama 3.2 Community License.
§02
Architecture & Configuration
LumiChats v1.1 is built on unsloth/Llama-3.2-3B-Instruct using Low-Rank Adaptation (LoRA) — a parameter-efficient fine-tuning technique. Only 0.75% of parameters are updated.
Architecture
Auto-regressive transformer (LlamaForCausalLM) with grouped-query attention
Total Parameters
3,237,063,680 (3.21B)
Trainable Parameters
24,313,856 (24.3M) (0.75%)
Context Length
128,000 tokens (trained at max_seq_length 2,048)
Quantization
4-bit NF4 during training (load_in_4bit=True)
LoRA Rank (r)
16
LoRA Alpha (α)
16
LoRA Target Modules
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Languages
EnglishGermanFrenchItalianPortugueseHindiSpanishThai
§03
Training Details
Dataset
mlabonne/FineTome-100k
Dataset Size
100,000 curated dialogue samples (ShareGPT → HuggingFace chat format)
Objective
Response-only causal language modelling (user inputs masked with label −100)
Framework
Unsloth 2026.1.4 + TRL
Hardware
Tesla T4 (14.741 GB VRAM)
Training Time
8.54 minutes (512 seconds, 60 steps)
Peak Memory
2.35 GB additional GPU memory
Max Steps
60
Hyperparameters
Learning Rate
2e-4
Batch Size
2
Gradient Accum.
4
Effective Batch
8
Optimizer
AdamW 8-bit
LR Scheduler
Linear
§04
Evaluation & Benchmarks
| Metric | Value | Baseline | Description |
|---|---|---|---|
| Inference speed (T4 + Unsloth) | 40–80 tokens/s | 20–40 tokens/s (standard PyTorch) | Throughput improvement from Unsloth kernel optimisations |
| Inference speed (RTX 4090) | 60–100+ tokens/s | — | Consumer flagship GPU performance |
| Inference speed (CPU) | 5–15 tokens/s | — | High-end CPU with Q4_K_M GGUF |
| Minimum RAM (Q4_K_M) | 4 GB | — | Practical edge deployment threshold |
| GGUF Q4_K_M size | ~2.0 GB | — | Recommended format — best size/quality tradeoff |
§05
Base Model vs Fine-Tuned
Key improvements from fine-tuning on the mlabonne/FineTome-100k dataset versus the Llama-3.2-3B-Instruct base model.
| Dimension | Base (Llama-3.2-3B-Instruct) | LumiChats v1.1 |
|---|---|---|
| Instruction following quality | Moderate (pretrain behaviour) | Reinforced via response-only SFT |
| Multi-turn coherence | Good | Significantly improved |
| Chat template support | Manual configuration needed | Llama 3.1 template pre-applied |
| Multilingual dialogue | Supported | Consistent across 8 languages |
| Inference speed | Baseline | 2x faster (Unsloth) |
§06
Use Cases
Conversational AI and chat applications
Personal assistants for task management and information retrieval
On-device mobile AI assistants with no cloud dependency
Writing assistance and content generation
Document and conversation summarisation
Question-answering systems
Basic code explanation and assistance
§07
Limitations & Disclaimers
LumiChats v1.1 inherits limitations of its base architecture and training data.
Context understanding may degrade beyond 2,048 tokens despite 128K architectural capacity
Can generate plausible-sounding but incorrect information (hallucination risk)
Not optimised for highly technical or specialised domain tasks
No access to real-time information; training data has a fixed cutoff
LoRA parameters limited to attention and MLP layers only
60-step training means shorter SFT than a full epoch — production fine-tune is ongoing
§08
Citation
If you use LumiChats v1.1 in research or products, please cite:
@misc{lumichats2025,
author = {Aditya Kumar Jha},
title = {LumiChats v1.1: A Fine-Tuned Conversational AI Model on Llama 3.2 3B},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/adityakum667388/lumichats-v1.1},
note = {Fine-tuned with LoRA on FineTome-100k via Unsloth}
}License: Llama 3.2 Community License (Meta) — View full license on Hugging Face