§01

Abstract

LumiChats v1.1 is a parameter-efficient conversational language model fine-tuned from Meta's Llama 3.2 3B Instruct using Low-Rank Adaptation (LoRA). Trained on the FineTome-100k curated dialogue dataset using response-only supervision, the model achieves strong natural conversation quality at 3.2 billion parameters — suitable for deployment on edge devices and consumer-grade hardware. Fine-tuning was performed in 8.54 minutes on a Tesla T4 GPU using the Unsloth framework, consuming only 2.35 GB of additional GPU memory. The model supports 8+ languages and is released under the Llama 3.2 Community License.

§02

Architecture & Configuration

LumiChats v1.1 is built on unsloth/Llama-3.2-3B-Instruct using Low-Rank Adaptation (LoRA) — a parameter-efficient fine-tuning technique. Only 0.75% of parameters are updated.

Architecture

Auto-regressive transformer (LlamaForCausalLM) with grouped-query attention

Total Parameters

3,237,063,680 (3.21B)

Trainable Parameters

24,313,856 (24.3M) (0.75%)

Context Length

128,000 tokens (trained at max_seq_length 2,048)

Quantization

4-bit NF4 during training (load_in_4bit=True)

LoRA Rank (r)

LoRA Alpha (α)

LoRA Target Modules

q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Languages

EnglishGermanFrenchItalianPortugueseHindiSpanishThai

§03

Training Details

Dataset

mlabonne/FineTome-100k

Dataset Size

100,000 curated dialogue samples (ShareGPT → HuggingFace chat format)

Objective

Response-only causal language modelling (user inputs masked with label −100)

Framework

Unsloth 2026.1.4 + TRL

Hardware

Tesla T4 (14.741 GB VRAM)

Training Time

8.54 minutes (512 seconds, 60 steps)

Peak Memory

2.35 GB additional GPU memory

Max Steps

Hyperparameters

Learning Rate

2e-4

Batch Size

2

Gradient Accum.

4

Effective Batch

8

Optimizer

AdamW 8-bit

LR Scheduler

Linear

Dataset: https://huggingface.co/datasets/mlabonne/FineTome-100k

§04

Evaluation & Benchmarks

Metric	Value	Baseline	Description
Inference speed (T4 + Unsloth)	40–80 tokens/s	20–40 tokens/s (standard PyTorch)	Throughput improvement from Unsloth kernel optimisations
Inference speed (RTX 4090)	60–100+ tokens/s	—	Consumer flagship GPU performance
Inference speed (CPU)	5–15 tokens/s	—	High-end CPU with Q4_K_M GGUF
Minimum RAM (Q4_K_M)	4 GB	—	Practical edge deployment threshold
GGUF Q4_K_M size	~2.0 GB	—	Recommended format — best size/quality tradeoff

§05

Base Model vs Fine-Tuned

Key improvements from fine-tuning on the mlabonne/FineTome-100k dataset versus the Llama-3.2-3B-Instruct base model.

Dimension	Base (Llama-3.2-3B-Instruct)	LumiChats v1.1
Instruction following quality	Moderate (pretrain behaviour)	Reinforced via response-only SFT
Multi-turn coherence	Good	Significantly improved
Chat template support	Manual configuration needed	Llama 3.1 template pre-applied
Multilingual dialogue	Supported	Consistent across 8 languages
Inference speed	Baseline	2x faster (Unsloth)

§06

Use Cases

Conversational AI and chat applications

Personal assistants for task management and information retrieval

On-device mobile AI assistants with no cloud dependency

Writing assistance and content generation

Document and conversation summarisation

Question-answering systems

Basic code explanation and assistance

§07

Limitations & Disclaimers

LumiChats v1.1 inherits limitations of its base architecture and training data.

Context understanding may degrade beyond 2,048 tokens despite 128K architectural capacity

Can generate plausible-sounding but incorrect information (hallucination risk)

Not optimised for highly technical or specialised domain tasks

No access to real-time information; training data has a fixed cutoff

LoRA parameters limited to attention and MLP layers only

60-step training means shorter SFT than a full epoch — production fine-tune is ongoing

§08

Citation

If you use LumiChats v1.1 in research or products, please cite:

@misc{lumichats2025,
  author    = {Aditya Kumar Jha},
  title     = {LumiChats v1.1: A Fine-Tuned Conversational AI Model on Llama 3.2 3B},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/adityakum667388/lumichats-v1.1},
  note      = {Fine-tuned with LoRA on FineTome-100k via Unsloth}
}

License: Llama 3.2 Community License (Meta) — View full license on Hugging Face

LumiChats v1.1