What is Question Answering Systems in NLP?

Question Answering (QA) is an NLP task where a system reads a context passage and produces a direct answer to a natural language question. Types: Extractive QA (span extraction from context — the answer is a substring of the passage), Generative QA (generates free-form answers), and Open-Domain QA (no given context — must retrieve relevant documents first, then answer). SQuAD (Stanford Question Answering Dataset) is the benchmark that drove modern QA research. RAG (Retrieval-Augmented Generation) is the modern production architecture.

What is practice questions?

Question Answering Systems in NLP: Practice questions. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/question-answering-nlp

Question Answering Systems in NLP

Question Answering (QA) is an NLP task where a system reads a context passage and produces a direct answer to a natural language question. Types: Extractive QA (span extraction from context — the answer is a substring of the passage), Generative QA (generates free-form answers), and Open-Domain QA (no given context — must retrieve relevant documents first, then answer). SQuAD (Stanford Question Answering Dataset) is the benchmark that drove modern QA research. RAG (Retrieval-Augmented Generation) is the modern production architecture.

Building systems that read a passage and answer questions about it.

Category: Natural Language Processing

Real-life analogy: The open-book vs closed-book exam

Extractive QA is like an open-book exam where you must find and quote the exact sentence from the textbook that answers the question. Generative QA is like explaining the answer in your own words. Open-domain QA is like a closed-book exam — you must recall (or retrieve) relevant knowledge first, then reason about it. LLMs like GPT-4 do a mix: they have knowledge memorized in weights, but RAG gives them an open book.

Extractive QA — span prediction with BERT

Extractive QA models predict two token positions in the context: the start and end of the answer span. BERT-based models fine-tuned on SQuAD achieve near-human F1 scores by leveraging bidirectional context.

from transformers import pipeline

# BERT fine-tuned on SQuAD 2.0
qa = pipeline("question-answering",
    model="deepset/roberta-base-squad2")

context = """
The transformer architecture was introduced in the paper "Attention Is All
You Need" by Vaswani et al. in 2017. It replaced recurrent neural networks
with a self-attention mechanism, enabling parallelization and better
modeling of long-range dependencies. The encoder processes the input
sequence while the decoder generates the output sequence.
"""

questions = [
    "Who introduced the transformer architecture?",
    "What did transformers replace?",
    "What year was the transformer introduced?",
    "What does the encoder do?",
]

for q in questions:
    result = qa(question=q, context=context)
    print(f"Q: {q}")
    print(f"A: {result['answer']} (score: {result['score']:.2%})")
    print()

# Output:
# Q: Who introduced the transformer architecture?
# A: Vaswani et al. (score: 89.23%)
# Q: What year was the transformer introduced?
# A: 2017 (score: 96.41%)

Open-Domain QA and RAG

Open-domain QA requires retrieving relevant passages before answering — the system does not have a given context. The retrieval-augmented generation (RAG) pipeline:

Query encoding: Convert the question to a dense vector using a bi-encoder (e.g., DPR — Dense Passage Retrieval).
Retrieval: Search a vector database (FAISS, Pinecone, Chroma) for the top-k most similar document chunks using approximate nearest-neighbor search.
Reading / Generation: Pass the retrieved chunks + question to a reader model (BERT for extractive, GPT/BART for generative) to produce the final answer.

QA type	Context given?	Retrieval needed?	Answer type	Model
Extractive	Yes	No	Span from context	BERT-SQuAD, RoBERTa
Abstractive	Yes	No	Free-form generated	T5, BART, GPT-4
Open-Domain (RAG)	No (retrieved)	Yes	Free-form generated	DPR + GPT-4, Llama
Closed-Book	No	No (LLM memory)	Free-form (may hallucinate)	GPT-4, Claude, Gemini

SQuAD and SQuAD 2.0: SQuAD (Stanford QA Dataset) has 100k+ Q&A pairs from Wikipedia. SQuAD 2.0 added 50k unanswerable questions (the answer is not in the passage) — models must also learn to say "I don't know" instead of always extracting a span. This tests reading comprehension more rigorously. EM (Exact Match) and F1 over answer tokens are the standard metrics.

Practice questions

What are the two output tokens that an extractive QA model predicts? (Answer: Start token index and end token index of the answer span within the context passage.)
Why does RAG reduce hallucination compared to closed-book LLM QA? (Answer: RAG grounds the answer in retrieved documents — the model is conditioned on actual retrieved text, not solely on memorized training weights that may be outdated or incorrect.)
What does EM (Exact Match) measure in QA evaluation? (Answer: The percentage of predictions that exactly match the ground truth answer string after normalization (lowercase, remove punctuation). Strict metric — partial credit is given by token-level F1.)
DPR (Dense Passage Retrieval) uses a bi-encoder. What are the two encoders? (Answer: A question encoder and a passage encoder. Both trained so that relevant question-passage pairs have high dot-product similarity in embedding space.)
What makes SQuAD 2.0 harder than SQuAD 1.1? (Answer: SQuAD 2.0 includes unanswerable questions. Models must detect when no answer exists in the context instead of always extracting a span — requires reasoning about absence of evidence.)

LumiChats uses a RAG pipeline for document QA: paste a PDF or document, and the system retrieves the most relevant chunks and generates a grounded answer with citations. This is extractive + generative QA in production.

from transformers import pipeline # BERT fine-tuned on SQuAD 2.0 qa = pipeline("question-answering", model="deepset/roberta-base-squad2") context = """ The transformer architecture was introduced in the paper "Attention Is All You Need" by Vaswani et al. in 2017. It replaced recurrent neural networks with a self-attention mechanism, enabling parallelization and better modeling of long-range dependencies. The encoder processes the input sequence while the decoder generates the output sequence. """ questions = [ "Who introduced the transformer architecture?", "What did transformers replace?", "What year was the transformer introduced?", "What does the encoder do?", ] for q in questions: result = qa(question=q, context=context) print(f"Q: {q}") print(f"A: {result['answer']} (score: {result['score']:.2%})") print() # Output: # Q: Who introduced the transformer architecture? # A: Vaswani et al. (score: 89.23%) # Q: What year was the transformer introduced? # A: 2017 (score: 96.41%)

QA type

Context given?

Retrieval needed?

Answer type

Model

Extractive

Yes

Span from context

BERT-SQuAD, RoBERTa

Abstractive

Yes

Free-form generated

T5, BART, GPT-4

Open-Domain (RAG)

No (retrieved)

Yes

Free-form generated

DPR + GPT-4, Llama

Closed-Book

No (LLM memory)

Free-form (may hallucinate)

GPT-4, Claude, Gemini

Question Answering Systems in NLP

Real-life analogy: The open-book vs closed-book exam

Extractive QA — span prediction with BERT

Open-Domain QA and RAG

Practice questions

Question Answering Systems in NLP

Real-life analogy: The open-book vs closed-book exam

Extractive QA — span prediction with BERT

Open-Domain QA and RAG

Practice questions

Practice what you just learned

Related Terms