Glossary/POS Tagging, Chunking & Dependency Parsing
Natural Language Processing

POS Tagging, Chunking & Dependency Parsing

Labelling every word with its grammatical role and mapping sentence structure.


Definition

Part-Of-Speech (POS) tagging assigns a grammatical category (noun, verb, adjective, etc.) to each token. Chunking groups consecutive POS-tagged tokens into phrase-level units (noun phrases, verb phrases). Dependency parsing maps the syntactic relationships between words — identifying which word is the subject, object, or modifier of another. These three tasks form the syntactic analysis layer of the NLP pipeline and are prerequisites for information extraction, semantic analysis, and many downstream applications.

Real-life analogy: The grammar teacher

POS tagging is like a grammar teacher marking parts of speech in a sentence: "The quick (adjective) brown (adjective) fox (noun) jumps (verb) over the lazy (adjective) dog (noun)." Dependency parsing goes further — the teacher also draws arrows showing "jumps" is the root, "fox" is its subject, "over the dog" is its location modifier. Chunking groups "the quick brown fox" into a single noun phrase box.

POS Tagging — Penn Treebank tagset

POS TagMeaningExample
NNNoun, singulardog, city, model
NNSNoun, pluraldogs, cities, models
VBVerb, base formrun, eat, train
VBZVerb, 3rd person singularruns, eats, trains
JJAdjectivequick, large, neural
RBAdverbquickly, very, never
DTDeterminerthe, a, an, this
INPreposition/conjunctionin, on, of, because
PRPPersonal pronounI, he, she, they
NNPProper noun singularLondon, Google, Ravi

POS tagging with NLTK and spaCy

import nltk
nltk.download('averaged_perceptron_tagger_eng', quiet=True)
from nltk.tokenize import word_tokenize
from nltk import pos_tag

sentence = "The quick brown fox jumps over the lazy dog"
tokens = word_tokenize(sentence)
tagged = pos_tag(tokens)
print(tagged)
# [('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'), ('fox', 'NN'),
#  ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]

# Modern approach: spaCy (faster, more accurate)
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for token in doc:
    print(f"{token.text:<15} {token.pos_:<8} {token.tag_:<6} {token.dep_}")
# Apple           PROPN    NNP    nsubj
# is              AUX      VBZ    aux
# looking         VERB     VBG    ROOT
# at              ADP      IN     prep
# buying          VERB     VBG    pcomp

Chunking — shallow parsing

Chunking (shallow parsing) groups tagged tokens into multi-word phrases without building a full parse tree. The most common chunk type is Noun Phrase (NP): a determiner + adjectives + noun. Chunking uses regular expressions over POS tag sequences.

Noun phrase chunking with NLTK

import nltk
from nltk import RegexpParser
from nltk.tokenize import word_tokenize
from nltk import pos_tag

sentence = "He bought a brand new electric car from a local dealer"
tokens   = word_tokenize(sentence)
tagged   = pos_tag(tokens)

# Grammar: NP = optional DT + optional JJ* + NN/NNS
grammar = r"""
  NP: {<DT>?<JJ>*<NN.*>+}
  VP: {<VB.*><NP>}
"""
parser = RegexpParser(grammar)
tree   = parser.parse(tagged)
tree.pretty_print()

# Output:
# (S
#   He/PRP
#   bought/VBD
#   (NP a/DT brand/JJ new/JJ electric/JJ car/NN)
#   from/IN
#   (NP a/DT local/JJ dealer/NN))

Dependency Parsing

Dependency parsing builds a tree where each word points to its head (the word it modifies or depends on). Every sentence has exactly one root word (the main verb). The tree captures grammatical relations: subject (nsubj), direct object (dobj), modifier (amod), preposition (prep).

Example: She enjoys reading books. Dependency tree: enjoys(root) ← She(nsubj), enjoys → reading(xcomp), reading → books(dobj).

Universal Dependencies

The Universal Dependencies (UD) project defines a consistent set of dependency relations across 100+ languages, enabling cross-lingual NLP. SpaCy, Stanza, and modern parsers all support UD. Key relations: nsubj (nominal subject), dobj/obj (direct object), amod (adjectival modifier), advmod (adverbial modifier), prep (prepositional modifier), cc (coordinating conjunction).

Analysis typeOutputUse case
POS taggingToken → grammatical tagFeature for NER, chunking, parsing
ChunkingToken spans → phrase typeIE, shallow syntax for fast pipelines
Constituency parsingFull phrase-structure treeGrammar checking, formal syntax analysis
Dependency parsingWord → head + relationNLU, semantic role labelling, QA, coreference

Practice questions

  1. POS tag the sentence "The dog barks loudly." (Answer: The/DT dog/NN barks/VBZ loudly/RB ./.)
  2. What is the difference between constituency parsing and dependency parsing? (Answer: Constituency builds a phrase-structure tree (NP, VP). Dependency builds a word-relation tree showing which word governs which.)
  3. In the NP chunk grammar {
    ?*+}, what does the ? mean? (Answer: Optional — the determiner DT may appear zero or one time.)
  4. Why is POS tagging considered a sequence labelling problem? (Answer: The correct tag for a word depends on its context — "run" is NN in "a run" but VB in "I run". Models must consider the entire sequence.)
  5. Which dependency relation connects "She" to "enjoys" in "She enjoys music"? (Answer: nsubj — nominal subject. "She" is the subject of the root verb "enjoys".)

On LumiChats

When you paste a document into LumiChats and ask it to extract key entities or summarise the main actions, the underlying model applies dependency parsing to understand subject-verb-object relationships — enabling it to answer questions like who did what to whom.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

4 terms