Conversationalv1.1June 1, 2025

LumiChats v1.1

Efficient multilingual conversational AI on Llama 3.2 3B

Parameters:3,237,063,680
Trainable:0.75%
Training time:8.54 minutes (512 seconds, 60 steps)
Dataset:100,000 curated dialogue
Llama 3.2 Community License (Meta)
Only 0.75% of parameters trained via LoRA — 24.3M of 3.21B2x faster inference with Unsloth optimizations2.35 GB peak additional memory during training on Tesla T4Response-only training prevents prompt memorisation8+ languages: English, German, French, Italian, Portuguese, Hindi, Spanish, ThaiAvailable in SafeTensors (FP16), GGUF (Q4_K_M, Q5_K_M, Q8_0), and LoRA adapter formats
§01

Abstract

LumiChats v1.1 is a parameter-efficient conversational language model fine-tuned from Meta's Llama 3.2 3B Instruct using Low-Rank Adaptation (LoRA). Trained on the FineTome-100k curated dialogue dataset using response-only supervision, the model achieves strong natural conversation quality at 3.2 billion parameters — suitable for deployment on edge devices and consumer-grade hardware. Fine-tuning was performed in 8.54 minutes on a Tesla T4 GPU using the Unsloth framework, consuming only 2.35 GB of additional GPU memory. The model supports 8+ languages and is released under the Llama 3.2 Community License.
§02

Architecture & Configuration

LumiChats v1.1 is built on unsloth/Llama-3.2-3B-Instruct using Low-Rank Adaptation (LoRA) — a parameter-efficient fine-tuning technique. Only 0.75% of parameters are updated.

Architecture
Auto-regressive transformer (LlamaForCausalLM) with grouped-query attention
Total Parameters
3,237,063,680 (3.21B)
Trainable Parameters
24,313,856 (24.3M) (0.75%)
Context Length
128,000 tokens (trained at max_seq_length 2,048)
Quantization
4-bit NF4 during training (load_in_4bit=True)
LoRA Rank (r)
16
LoRA Alpha (α)
16
LoRA Target Modules
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Languages
EnglishGermanFrenchItalianPortugueseHindiSpanishThai
§03

Training Details

Dataset
mlabonne/FineTome-100k
Dataset Size
100,000 curated dialogue samples (ShareGPT → HuggingFace chat format)
Objective
Response-only causal language modelling (user inputs masked with label −100)
Framework
Unsloth 2026.1.4 + TRL
Hardware
Tesla T4 (14.741 GB VRAM)
Training Time
8.54 minutes (512 seconds, 60 steps)
Peak Memory
2.35 GB additional GPU memory
Max Steps
60
Hyperparameters
Learning Rate
2e-4
Batch Size
2
Gradient Accum.
4
Effective Batch
8
Optimizer
AdamW 8-bit
LR Scheduler
Linear
§04

Evaluation & Benchmarks

MetricValueBaselineDescription
Inference speed (T4 + Unsloth)40–80 tokens/s20–40 tokens/s (standard PyTorch)Throughput improvement from Unsloth kernel optimisations
Inference speed (RTX 4090)60–100+ tokens/sConsumer flagship GPU performance
Inference speed (CPU)5–15 tokens/sHigh-end CPU with Q4_K_M GGUF
Minimum RAM (Q4_K_M)4 GBPractical edge deployment threshold
GGUF Q4_K_M size~2.0 GBRecommended format — best size/quality tradeoff
§05

Base Model vs Fine-Tuned

Key improvements from fine-tuning on the mlabonne/FineTome-100k dataset versus the Llama-3.2-3B-Instruct base model.

DimensionBase (Llama-3.2-3B-Instruct)LumiChats v1.1
Instruction following qualityModerate (pretrain behaviour)Reinforced via response-only SFT
Multi-turn coherenceGoodSignificantly improved
Chat template supportManual configuration neededLlama 3.1 template pre-applied
Multilingual dialogueSupportedConsistent across 8 languages
Inference speedBaseline2x faster (Unsloth)
§06

Use Cases

Conversational AI and chat applications
Personal assistants for task management and information retrieval
On-device mobile AI assistants with no cloud dependency
Writing assistance and content generation
Document and conversation summarisation
Question-answering systems
Basic code explanation and assistance
§07

Limitations & Disclaimers

LumiChats v1.1 inherits limitations of its base architecture and training data.

Context understanding may degrade beyond 2,048 tokens despite 128K architectural capacity
Can generate plausible-sounding but incorrect information (hallucination risk)
Not optimised for highly technical or specialised domain tasks
No access to real-time information; training data has a fixed cutoff
LoRA parameters limited to attention and MLP layers only
60-step training means shorter SFT than a full epoch — production fine-tune is ongoing
§08

Citation

If you use LumiChats v1.1 in research or products, please cite:

@misc{lumichats2025,
  author    = {Aditya Kumar Jha},
  title     = {LumiChats v1.1: A Fine-Tuned Conversational AI Model on Llama 3.2 3B},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/adityakum667388/lumichats-v1.1},
  note      = {Fine-tuned with LoRA on FineTome-100k via Unsloth}
}
License: Llama 3.2 Community License (Meta) View full license on Hugging Face

Related Models