What is limitations and active research areas?

Federated Learning: Limitations and active research areas. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/federated-learning

What is practice questions?

Federated Learning: Practice questions. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/federated-learning

Federated Learning

Federated learning is a machine learning paradigm where a model is trained across multiple decentralised devices or servers — each holding local data — without the raw data ever leaving the device. Instead of sending data to a central server, each participant trains a local model on its own data and sends only the model updates (gradients or weights) to a central aggregator, which combines them into a global model update. Federated learning enables training on sensitive data (medical records, personal messages, financial transactions) while preserving privacy and complying with data localization regulations.

Training AI models across devices without centralising private data.

Category: Machine Learning

How federated learning works

Initialization: A central server distributes the current global model to all participating clients (devices or institutions).
Local training: Each client trains the model on its local data for E epochs, producing updated local weights.
Upload gradients/weights: Each client sends only the model update (difference between local weights and global weights) to the server — not the raw training data.
Aggregation: The server aggregates updates from all clients, typically using FedAvg (weighted average of updates proportional to local dataset sizes).
Distribution: The aggregated global model is redistributed and the cycle repeats.

Property	Traditional centralised training	Federated learning
Data location	All data sent to central server	Data stays on device — never transmitted
Privacy	Strong assumption of data centralization	Raw data private by design; gradient leakage still possible
Communication cost	Data upload once	Model updates sent each round — potentially many rounds
Data heterogeneity	IID data (shuffled from one pool)	Non-IID: each device has different data distribution
Stragglers	Not relevant — central GPU cluster	Slow devices delay training; need asynchronous strategies

Real deployments in 2026: Google uses federated learning for keyboard next-word prediction, autocorrect, and voice recognition on Android — training on billions of phones without ever seeing individual users' typed words. Apple uses it for Safari suggestions and Siri improvements on iOS. In healthcare, federated learning allows hospital networks to collaboratively train diagnostic models on patient data without any hospital sharing records with others. The EU AI Act and India's DPDP Act both cite federated learning as a privacy-preserving technique for compliance.

Limitations and active research areas

Gradient leakage: Gradients sent to the aggregator can be used to reconstruct training data samples with surprising fidelity. Differential privacy (adding calibrated noise to gradients) mitigates this at the cost of model quality.
Non-IID data: In real deployments, each device's data distribution is completely different from others. Standard FedAvg converges poorly on highly non-IID data — an active research problem.
Communication overhead: Frontier models have billions of parameters. Sending full gradient updates each round is bandwidth-prohibitive. Gradient compression, quantization, and sparse updates are active research areas.
Byzantine robustness: A malicious participant can send adversarial gradients to poison the global model. Robust aggregation algorithms (Krum, FLTrust) detect and exclude outlier updates.

Practice questions

In FedAvg, client A has 1000 examples and client B has 100 examples. How are their updates weighted in aggregation? (Answer: FedAvg weights updates proportionally to dataset size. Client A's gradient gets weight 1000/(1000+100) = 0.909. Client B's gradient gets weight 100/1100 = 0.091. This is equivalent to computing the gradient over the combined dataset as if it were centralised, assuming the local training achieves good convergence.)
What is the non-IID problem in federated learning and why does it matter? (Answer: Non-IID (non-independently-identically-distributed): different clients have different data distributions. A keyboard FL client in Paris has French text; one in Tokyo has Japanese text. Their local gradients point in very different directions. FedAvg with non-IID data can diverge or converge to a poor global minimum. Strategies: FedProx (proximal term keeps local model close to global), SCAFFOLD (variance reduction), MOON (contrastive learning).)
Why can gradient-only transmission still leak private information? (Answer: Gradient inversion attacks (Zhu et al. 2019): an adversarial server can reconstruct the original training data from gradients, especially for small batch sizes. The gradient of a loss w.r.t. input contains sufficient information to approximately reconstruct the input. Defences: gradient compression (reducing information in updates), differential privacy noise addition, secure aggregation (server never sees individual updates, only the aggregate).)
Google uses federated learning for keyboard autocorrect on Android. Why not just collect keystroke data centrally? (Answer: Keystroke data is extremely private — it captures everything users type including passwords, medical searches, financial information, and personal messages. User privacy expectations and legal requirements (GDPR Article 5 data minimization) make centralised collection problematic. FL allows Google to improve autocorrect quality from billions of devices while never transmitting typed text to Google servers.)
What is the Byzantine fault tolerance problem in federated learning? (Answer: In federated settings, some clients may be adversarial (Byzantine clients) — sending malicious gradients designed to poison the global model. Since the server cannot verify the integrity of updates from untrusted devices, a small fraction of malicious clients can corrupt training. Defences: robust aggregation methods (median, trimmed mean, Krum) that are resistant to outlier updates rather than simple averaging.)

Property

Traditional centralised training

Federated learning

Data location

All data sent to central server

Data stays on device — never transmitted

Privacy

Strong assumption of data centralization

Raw data private by design; gradient leakage still possible

Communication cost

Data upload once

Model updates sent each round — potentially many rounds

Data heterogeneity

IID data (shuffled from one pool)

Non-IID: each device has different data distribution

Stragglers

Not relevant — central GPU cluster

Slow devices delay training; need asynchronous strategies

Federated Learning

How federated learning works

Limitations and active research areas

Practice questions

Federated Learning

How federated learning works

Limitations and active research areas

Practice questions

Practice what you just learned

Related Terms