§01

Abstract

LumiChats Instruct 14B is a fine-tuned conversational AI model adapted from Microsoft's Phi-4 14B parameter language model. Phi-4, the fourth generation of Microsoft's Phi series (the "4" is a version number, not a parameter count), achieves remarkable benchmark performance at its size: 56.1% on GPQA graduate-level science questions — outperforming GPT-4o — and 80.4% on competition mathematics. LumiChats Instruct 14B applies LoRA fine-tuning on 99,990 samples of FineTome-100k curated dialogues using response-only training, transforming the base model's pretraining behaviour into purpose-built conversational structure: reinforced instruction following, consistent chat template application, and improved multi-turn dialogue coherence, while retaining 99.55% of the base model's original weights.

§02

Architecture & Configuration

LumiChats Instruct 14B is built on microsoft/phi-4 using Low-Rank Adaptation (LoRA) — a parameter-efficient fine-tuning technique. Only 0.45% of parameters are updated.

Architecture

Dense decoder-only transformer — 40 layers, 14.7B parameters

Total Parameters

14,725,043,200 (14.7B)

Trainable Parameters

65,536,000 (65.5M) (0.45%)

Context Length

2,048 tokens (trained); base supports 16,384 tokens

Quantization

4-bit NF4 (bitsandbytes) — ~8 GB VRAM inference

LoRA Rank (r)

LoRA Alpha (α)

LoRA Target Modules

q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Languages

English (primary); multilingual via Phi-4 base

§03

Training Details

Dataset

mlabonne/FineTome-100k

Dataset Size

99,990 samples (ShareGPT → HuggingFace multi-turn format)

Objective

Response-only causal LM — loss on assistant turns only

Framework

Unsloth 2026.2.1 + TRL 0.22.2

Hardware

Tesla T4 — 14.563 GB VRAM (Google Colab)

Training Time

19.28 minutes (30 steps)

Peak Memory

13.242 GB (90.9% of T4 VRAM)

Max Steps

Hyperparameters

Learning Rate

2e-4

Batch Size

2

Gradient Accum.

4

Effective Batch

8

Optimizer

AdamW 8-bit

LR Scheduler

Linear

Dataset: https://huggingface.co/datasets/mlabonne/FineTome-100k

§04

Evaluation & Benchmarks

Metric	Value	Baseline	Description
MMLU — General Knowledge	84.8%	GPT-4o-mini: 81.8% \| Llama 3.3 70B: 86.3%	Massive Multitask Language Understanding (5-shot)
GPQA — Graduate Science	56.1% 🏆	GPT-4o: 50.6% \| GPT-4o-mini: 40.9%	Graduate-level science reasoning — Phi-4 leads GPT-4o at 14x smaller parameter count
MATH — Competition Mathematics	80.4%	GPT-4o-mini: 73.0% \| Llama 3.3 70B: 66.3%	Competition math problems
HumanEval — Code Generation	82.6%	GPT-4o: 90.6% \| GPT-4o-mini: 86.2%	Python function generation from docstrings
MGSM — Multilingual Math	80.6%	—	Multi-language grade school math reasoning

§05

Base Model vs Fine-Tuned

Key improvements from fine-tuning on the mlabonne/FineTome-100k dataset versus the phi-4 base model.

Dimension	Base (phi-4)	LumiChats Instruct 14B
Multi-turn conversation quality	Generic, unoptimised pretrain behaviour	✅ Purpose-built structured dialogue
Instruction following	Moderate (raw pretrain)	✅ Reinforced via response-only SFT
Chat template	Manual configuration required	✅ Phi-4 <\|im_start\|> template pre-applied
Training data	9.8T token web/book crawl	✅ 99,990 curated conversational samples
Base knowledge preserved	N/A	✅ 99.55% of weights untouched

§06

Use Cases

High-quality conversational AI requiring strong scientific and mathematical reasoning

Code generation and technical assistance

Research and educational dialogue systems

Enterprise Q&A applications requiring reliable instruction following

Applications where strong reasoning is more important than context length

§07

Limitations & Disclaimers

LumiChats Instruct 14B inherits limitations of its base architecture and training data.

30-step demonstration fine-tune — full epoch (~12,500 steps) will yield substantially stronger alignment

Context window limited to 2,048 tokens in this release; base Phi-4 supports 16K

English-primary — multilingual support inherited from base (~8% multilingual training data)

Factual hallucination possible on knowledge-heavy queries

Peak VRAM requirement of 13+ GB on T4 limits CPU-only and low-VRAM deployment

§08

Citation

If you use LumiChats Instruct 14B in research or products, please cite:

@misc{lumichats-instruct-14b-lora-2026,
  author    = {LumiChats},
  title     = {LumiChats Instruct 14B LoRA: Fine-Tuned Microsoft Phi-4 for Conversational AI},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/adityakum667388/LumiChats-Instruct-4B_lora}
}

License: MIT — View full license on Hugging Face

LumiChats Instruct 14B