§01
Abstract
LumiChats Instruct 14B is a fine-tuned conversational AI model adapted from Microsoft's Phi-4 14B parameter language model. Phi-4, the fourth generation of Microsoft's Phi series (the "4" is a version number, not a parameter count), achieves remarkable benchmark performance at its size: 56.1% on GPQA graduate-level science questions — outperforming GPT-4o — and 80.4% on competition mathematics. LumiChats Instruct 14B applies LoRA fine-tuning on 99,990 samples of FineTome-100k curated dialogues using response-only training, transforming the base model's pretraining behaviour into purpose-built conversational structure: reinforced instruction following, consistent chat template application, and improved multi-turn dialogue coherence, while retaining 99.55% of the base model's original weights.
§02
Architecture & Configuration
LumiChats Instruct 14B is built on microsoft/phi-4 using Low-Rank Adaptation (LoRA) — a parameter-efficient fine-tuning technique. Only 0.45% of parameters are updated.
Architecture
Dense decoder-only transformer — 40 layers, 14.7B parameters
Total Parameters
14,725,043,200 (14.7B)
Trainable Parameters
65,536,000 (65.5M) (0.45%)
Context Length
2,048 tokens (trained); base supports 16,384 tokens
Quantization
4-bit NF4 (bitsandbytes) — ~8 GB VRAM inference
LoRA Rank (r)
16
LoRA Alpha (α)
16
LoRA Target Modules
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Languages
English (primary); multilingual via Phi-4 base
§03
Training Details
Dataset
mlabonne/FineTome-100k
Dataset Size
99,990 samples (ShareGPT → HuggingFace multi-turn format)
Objective
Response-only causal LM — loss on assistant turns only
Framework
Unsloth 2026.2.1 + TRL 0.22.2
Hardware
Tesla T4 — 14.563 GB VRAM (Google Colab)
Training Time
19.28 minutes (30 steps)
Peak Memory
13.242 GB (90.9% of T4 VRAM)
Max Steps
30
Hyperparameters
Learning Rate
2e-4
Batch Size
2
Gradient Accum.
4
Effective Batch
8
Optimizer
AdamW 8-bit
LR Scheduler
Linear
§04
Evaluation & Benchmarks
| Metric | Value | Baseline | Description |
|---|---|---|---|
| MMLU — General Knowledge | 84.8% | GPT-4o-mini: 81.8% | Llama 3.3 70B: 86.3% | Massive Multitask Language Understanding (5-shot) |
| GPQA — Graduate Science | 56.1% 🏆 | GPT-4o: 50.6% | GPT-4o-mini: 40.9% | Graduate-level science reasoning — Phi-4 leads GPT-4o at 14x smaller parameter count |
| MATH — Competition Mathematics | 80.4% | GPT-4o-mini: 73.0% | Llama 3.3 70B: 66.3% | Competition math problems |
| HumanEval — Code Generation | 82.6% | GPT-4o: 90.6% | GPT-4o-mini: 86.2% | Python function generation from docstrings |
| MGSM — Multilingual Math | 80.6% | — | Multi-language grade school math reasoning |
§05
Base Model vs Fine-Tuned
Key improvements from fine-tuning on the mlabonne/FineTome-100k dataset versus the phi-4 base model.
| Dimension | Base (phi-4) | LumiChats Instruct 14B |
|---|---|---|
| Multi-turn conversation quality | Generic, unoptimised pretrain behaviour | ✅ Purpose-built structured dialogue |
| Instruction following | Moderate (raw pretrain) | ✅ Reinforced via response-only SFT |
| Chat template | Manual configuration required | ✅ Phi-4 <|im_start|> template pre-applied |
| Training data | 9.8T token web/book crawl | ✅ 99,990 curated conversational samples |
| Base knowledge preserved | N/A | ✅ 99.55% of weights untouched |
§06
Use Cases
High-quality conversational AI requiring strong scientific and mathematical reasoning
Code generation and technical assistance
Research and educational dialogue systems
Enterprise Q&A applications requiring reliable instruction following
Applications where strong reasoning is more important than context length
§07
Limitations & Disclaimers
LumiChats Instruct 14B inherits limitations of its base architecture and training data.
30-step demonstration fine-tune — full epoch (~12,500 steps) will yield substantially stronger alignment
Context window limited to 2,048 tokens in this release; base Phi-4 supports 16K
English-primary — multilingual support inherited from base (~8% multilingual training data)
Factual hallucination possible on knowledge-heavy queries
Peak VRAM requirement of 13+ GB on T4 limits CPU-only and low-VRAM deployment
§08
Citation
If you use LumiChats Instruct 14B in research or products, please cite:
@misc{lumichats-instruct-14b-lora-2026,
author = {LumiChats},
title = {LumiChats Instruct 14B LoRA: Fine-Tuned Microsoft Phi-4 for Conversational AI},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/adityakum667388/LumiChats-Instruct-4B_lora}
}License: MIT — View full license on Hugging Face