Medical Visionv1.3December 15, 2024

LumiChats v1.3 11B Vision

Radiology-specialised vision model for clinical image description

Parameters:~11B
Trainable:< 1%
Training time:Training run on radiological dataset
Dataset:Radiology-specific image-caption pairs
Llama 3.2 Community License (Meta) + Apache 2.0 (fine-tune)
Substantially reduced hallucination rate vs. Llama 3.2 11B Vision InstructTrained on expert-annotated radiological image captions60% memory reduction via 4-bit quantisation (Unsloth + bitsandbytes)Generates professional clinical language with correct anatomical terminologyFine-tunes all four component groups: vision, language, attention, and MLP layersCompatible with vLLM, Transformers, and local llama.cpp deployment
§01

Abstract

LumiChats v1.3 11B Vision is a domain-adapted vision-language model fine-tuned from Meta's Llama 3.2 11B Vision Instruct for the specialised task of radiological image interpretation. The model is trained to generate structured, clinically accurate descriptions of panoramic radiographs, X-rays, and CT scans using precise medical terminology. Domain adaptation significantly reduces hallucinations compared to the general-purpose base model — eliminating fabricated pathologies such as unseen fractures and misalignments — replacing them with focused, evidence-based pathology descriptions. The model is released in 4-bit quantised format, reducing GPU memory requirements by approximately 60% while maintaining high accuracy.
§02

Architecture & Configuration

LumiChats v1.3 11B Vision is built on unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit using Low-Rank Adaptation (LoRA) — a parameter-efficient fine-tuning technique. Only < 1% of parameters are updated.

Architecture
Auto-regressive multimodal transformer with vision encoder (MLLaMA)
Total Parameters
~11B total
Trainable Parameters
LoRA-adapted subset (< 1%)
Context Length
Extended for image-text alignment
Quantization
4-bit bitsandbytes NF4
LoRA Rank (r)
16
LoRA Alpha (α)
16
LoRA Target Modules
Vision layers, Language layers, Attention modules, MLP layers
Languages
English medical terminology
§03

Training Details

Dataset
Medical radiological image captioning dataset
Dataset Size
Radiology-specific image-caption pairs
Objective
Supervised fine-tuning: image → structured clinical description
Framework
Unsloth FastVisionModel + TRL
Hardware
Tesla T4 / A100-class GPU
Training Time
Training run on radiological dataset
Peak Memory
Significantly reduced via 4-bit
Max Steps
30
Hyperparameters
Learning Rate
2e-4
Batch Size
2
Gradient Accum.
4
Effective Batch
8
Optimizer
AdamW 8-bit
LR Scheduler
Linear
§04

Evaluation & Benchmarks

MetricValueBaselineDescription
Hallucination rate on panoramic radiographsSubstantially reducedBase: fabricates fractures, misalignments, absent findingsFine-tuned model focuses only on findings present in the image
Medical terminology accuracyImprovedBase: general terms, occasional inaccuraciesUses precise clinical language (osteolytic, resorption, maxillary sinus floor)
Output concisenessConcise, actionableBase: long, speculative descriptionsReports are focused and clinically useful
Min VRAM (4-bit)~4.2 GB4-bit model size on disk / approximate VRAM requirement
§05

Base Model vs Fine-Tuned

Key improvements from fine-tuning on the Medical radiological image captioning dataset dataset versus the Llama-3.2-11B-Vision-Instruct-bnb-4bit base model.

DimensionBase (Llama-3.2-11B-Vision-Instruct-bnb-4bit)LumiChats v1.3 11B Vision
Image type identification✅ Identifies correctly✅ Exact + pathology focused
Hallucinated pathologies❌ Fractures, misalignments (fabricated)✅ None — ground-truth only
Medical terminology⚠️ General terms, some inaccuracies✅ Professional clinical language
Output length📝 Long and speculative📝 Concise, actionable reports
Clinical relevance❌ Includes irrelevant details✅ Pathology-focused analysis
§06

Use Cases

Preliminary analysis of panoramic radiographs and dental X-rays
Medical education and radiology training
Clinical documentation draft generation for radiologist review
Teleradiology support for initial triage of imaging studies
Research in AI-assisted medical image interpretation
§07

Limitations & Disclaimers

LumiChats v1.3 11B Vision inherits limitations of its base architecture and training data.

Research tool only — not a diagnostic device; always consult qualified radiologists
Optimised for panoramic radiographs; performance on other imaging modalities varies
30-step training run — extended training will improve generalisation
Should not be used for final diagnostic or treatment decisions
Requires compliance with local healthcare AI regulations (HIPAA, GDPR, etc.)
§08

Citation

If you use LumiChats v1.3 11B Vision in research or products, please cite:

@misc{lumichats-llama32-vision-11b-4bit,
  author    = {LumiChats Team},
  title     = {LumiChats v1.3 11B: Radiology-Specialised Vision-Language Model},
  year      = {2024},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/adityakum667388/lumichats_v1.3_11b_vision}
}
License: Llama 3.2 Community License (Meta) + Apache 2.0 (fine-tune) View full license on Hugging Face

Related Models