Glossary/AI Bias & Fairness
AI Safety & Ethics

AI Bias & Fairness

Why AI systems can discriminate — and what we can do about it.


Definition

AI bias refers to systematic errors in AI model outputs that create unfair outcomes for certain groups of people — often related to race, gender, age, disability, or socioeconomic status. Bias enters through training data (reflecting historical inequalities), model architecture choices, evaluation metrics, and deployment decisions. Fairness in AI means designing and auditing systems to ensure their outputs are equitable across demographic groups.

Where bias comes from: the pipeline

Bias is not a single problem with a single fix — it enters at every stage of the AI development pipeline, often in hard-to-detect ways:

StageSource of biasReal-world exampleDetection method
Data collectionNon-representative training dataFacial recognition trained mostly on light-skinned faces; error rate on dark skin was 34% vs 0.8% (Buolamwini & Gebru, 2018)Demographic breakdown of dataset; representation audits
Label collectionHuman annotator biasSentiment labelers rated the same text as more negative when written in African American EnglishInter-annotator agreement per demographic; bias in annotation guidelines
Feature engineeringProxy variables encode protected attributesZIP code encodes race; using it in a loan model discriminates indirectlyCorrelation analysis between features and protected attributes
Model trainingClass imbalance; optimization for average accuracyHigh overall accuracy masks 40% error rate on the minority classDisaggregated evaluation metrics per subgroup
EvaluationBenchmark datasets under-represent minority groupsA model scores 94% on a benchmark where 90% of test examples are from one groupStratified evaluation; held-out subgroup test sets
DeploymentDistribution shift; feedback loopsA biased hiring model rejects minority candidates → less diverse training data → more bias next cycleMonitoring production outputs; disparate impact audits

Definitions of fairness — and why they conflict

One of the most important (and counterintuitive) results in algorithmic fairness is that many common definitions of fairness are mathematically incompatible — you can't satisfy all of them simultaneously.

Fairness definitionWhat it requiresMathematical conditionProblem with it
Demographic parityEqual positive prediction rates across groupsP(Ŷ=1|A=0) = P(Ŷ=1|A=1)Ignores actual base rates; can force unequal error rates
Equal opportunityEqual true positive rates (recall) across groupsP(Ŷ=1|Y=1,A=0) = P(Ŷ=1|Y=1,A=1)Can allow very different false positive rates
Equalized oddsEqual TPR and FPR across groupsBoth TPR and FPR equal across AMathematically incompatible with calibration when base rates differ
CalibrationPredicted probabilities match actual outcomes equally for all groupsP(Y=1|score=s,A=0) = P(Y=1|score=s,A=1)Incompatible with equalized odds when base rates differ
Individual fairnessSimilar individuals should receive similar predictionsIf d(x,x') is small, |f(x)-f(x')| should be smallRequires defining "similar" without bias

The impossibility theorem

Chouldechova (2017) and Kleinberg et al. (2016) proved that when base rates differ across groups, calibration, false positive rate parity, and false negative rate parity cannot all be achieved simultaneously. Any real system must choose which fairness criteria matter most for the specific application — there is no mathematically perfect solution.

Practical mitigation techniques

TechniqueWhen appliedHow it worksTradeoff
Data resampling / reweightingPre-processingOversample underrepresented groups; assign higher loss weights to minority samplesCan improve parity but may reduce overall accuracy
Adversarial debiasingIn-trainingTrain a classifier to predict the target AND an adversary to predict the protected attribute from representations; penalize the adversaryAdds training complexity; can be unstable
Reranking / post-processingPost-processingAdjust decision thresholds per group to equalize specified metricsRequires group labels at inference; legally sensitive in some jurisdictions
Counterfactual data augmentationPre-processingGenerate versions of training examples with protected attributes swapped; train on bothEffective for text/NLP; harder for structured data
RLHF with fairness constraintsLLM fine-tuningInclude fairness criteria in human feedback; penalize biased outputs in reward modelExpensive; hard to define "fair" consistently across annotators

Fairness auditing tools

Open-source libraries: Fairlearn (Microsoft), AI Fairness 360 (IBM/AIF360), What-If Tool (Google), and Aequitas (U Chicago). For LLMs specifically: BOLD and WinoBias benchmarks measure representation and stereotype bias in generated text.

Practice questions

  1. What is the difference between disparate treatment and disparate impact in AI systems? (Answer: Disparate treatment: the AI explicitly uses a protected characteristic (race, gender, age) as an input to make decisions — intentional discrimination. Disparate impact: the AI uses neutral-seeming variables (zip code, name, education institution) that correlate with protected characteristics, producing discriminatory outcomes without explicit use of those characteristics. Both can be illegal under anti-discrimination law. Most AI bias cases involve disparate impact since training on historical data automatically captures proxy correlations.)
  2. What is the word embedding test for gender bias (WEAT) and what did studies find? (Answer: Word Embedding Association Test (WEAT): measures whether gendered words (he/she) are more similar to certain career or attribute words in embedding space. Caliskan et al. (2017) found: word2vec and GloVe associate 'programmer, engineer, scientist' more closely with male pronouns; 'nurse, teacher, librarian' more closely with female pronouns — mirroring US labour market statistics. These biases reflect historical data but are problematic when used in hiring/recommendation systems.)
  3. COMPAS is a recidivism prediction tool used in US courts. What bias issue did ProPublica identify? (Answer: ProPublica (2016) found COMPAS predicted Black defendants would reoffend at nearly twice the false positive rate of White defendants — Black defendants who did NOT reoffend were labelled high risk more often. Northpointe (COMPAS developer) argued the tool was 'fair' by calibration metric (equal accuracy across groups). This exemplifies the fairness impossibility theorem: ProPublica's definition (equal FPR) and Northpointe's definition (calibration) are mathematically incompatible when base rates differ.)
  4. What is 'technical debt' in AI fairness and why is it hard to address retroactively? (Answer: Technical debt: deploying a biased model creates a record of biased decisions (denied loans, failed interviews) that becomes the next round of training data if not carefully managed. The biased model's outputs may influence real-world distributions (denying loans to a community reduces economic activity, making future loan applications from that community look riskier). Retroactive debiasing requires: identifying the source of bias, retraining on corrected data, addressing real-world impacts of past decisions — none of which are technically straightforward.)
  5. What is 'intersectional fairness' and why is standard demographic fairness analysis insufficient? (Answer: Intersectional fairness (Crenshaw's intersectionality applied to ML): a model may be fair for Black individuals AND fair for women when evaluated separately, but unfair specifically for Black women. Standard fairness analysis evaluates one dimension at a time. Intersectional analysis evaluates all combinations of demographic groups. Practical challenge: small group sizes at intersections (e.g., 'non-binary Hispanic individuals') make statistical analysis unreliable. But ignoring intersections misses systematic harm to specific communities.)

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

4 terms