Part-Of-Speech (POS) tagging assigns a grammatical category (noun, verb, adjective, etc.) to each token. Chunking groups consecutive POS-tagged tokens into phrase-level units (noun phrases, verb phrases). Dependency parsing maps the syntactic relationships between words — identifying which word is the subject, object, or modifier of another. These three tasks form the syntactic analysis layer of the NLP pipeline and are prerequisites for information extraction, semantic analysis, and many downstream applications.
Real-life analogy: The grammar teacher
POS tagging is like a grammar teacher marking parts of speech in a sentence: "The quick (adjective) brown (adjective) fox (noun) jumps (verb) over the lazy (adjective) dog (noun)." Dependency parsing goes further — the teacher also draws arrows showing "jumps" is the root, "fox" is its subject, "over the dog" is its location modifier. Chunking groups "the quick brown fox" into a single noun phrase box.
POS Tagging — Penn Treebank tagset
| POS Tag | Meaning | Example |
|---|---|---|
| NN | Noun, singular | dog, city, model |
| NNS | Noun, plural | dogs, cities, models |
| VB | Verb, base form | run, eat, train |
| VBZ | Verb, 3rd person singular | runs, eats, trains |
| JJ | Adjective | quick, large, neural |
| RB | Adverb | quickly, very, never |
| DT | Determiner | the, a, an, this |
| IN | Preposition/conjunction | in, on, of, because |
| PRP | Personal pronoun | I, he, she, they |
| NNP | Proper noun singular | London, Google, Ravi |
POS tagging with NLTK and spaCy
import nltk
nltk.download('averaged_perceptron_tagger_eng', quiet=True)
from nltk.tokenize import word_tokenize
from nltk import pos_tag
sentence = "The quick brown fox jumps over the lazy dog"
tokens = word_tokenize(sentence)
tagged = pos_tag(tokens)
print(tagged)
# [('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'), ('fox', 'NN'),
# ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
# Modern approach: spaCy (faster, more accurate)
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for token in doc:
print(f"{token.text:<15} {token.pos_:<8} {token.tag_:<6} {token.dep_}")
# Apple PROPN NNP nsubj
# is AUX VBZ aux
# looking VERB VBG ROOT
# at ADP IN prep
# buying VERB VBG pcompChunking — shallow parsing
Chunking (shallow parsing) groups tagged tokens into multi-word phrases without building a full parse tree. The most common chunk type is Noun Phrase (NP): a determiner + adjectives + noun. Chunking uses regular expressions over POS tag sequences.
Noun phrase chunking with NLTK
import nltk
from nltk import RegexpParser
from nltk.tokenize import word_tokenize
from nltk import pos_tag
sentence = "He bought a brand new electric car from a local dealer"
tokens = word_tokenize(sentence)
tagged = pos_tag(tokens)
# Grammar: NP = optional DT + optional JJ* + NN/NNS
grammar = r"""
NP: {<DT>?<JJ>*<NN.*>+}
VP: {<VB.*><NP>}
"""
parser = RegexpParser(grammar)
tree = parser.parse(tagged)
tree.pretty_print()
# Output:
# (S
# He/PRP
# bought/VBD
# (NP a/DT brand/JJ new/JJ electric/JJ car/NN)
# from/IN
# (NP a/DT local/JJ dealer/NN))Dependency Parsing
Dependency parsing builds a tree where each word points to its head (the word it modifies or depends on). Every sentence has exactly one root word (the main verb). The tree captures grammatical relations: subject (nsubj), direct object (dobj), modifier (amod), preposition (prep).
Example: She enjoys reading books. Dependency tree: enjoys(root) ← She(nsubj), enjoys → reading(xcomp), reading → books(dobj).
Universal Dependencies
The Universal Dependencies (UD) project defines a consistent set of dependency relations across 100+ languages, enabling cross-lingual NLP. SpaCy, Stanza, and modern parsers all support UD. Key relations: nsubj (nominal subject), dobj/obj (direct object), amod (adjectival modifier), advmod (adverbial modifier), prep (prepositional modifier), cc (coordinating conjunction).
| Analysis type | Output | Use case |
|---|---|---|
| POS tagging | Token → grammatical tag | Feature for NER, chunking, parsing |
| Chunking | Token spans → phrase type | IE, shallow syntax for fast pipelines |
| Constituency parsing | Full phrase-structure tree | Grammar checking, formal syntax analysis |
| Dependency parsing | Word → head + relation | NLU, semantic role labelling, QA, coreference |
Practice questions
- POS tag the sentence "The dog barks loudly." (Answer: The/DT dog/NN barks/VBZ loudly/RB ./.)
- What is the difference between constituency parsing and dependency parsing? (Answer: Constituency builds a phrase-structure tree (NP, VP). Dependency builds a word-relation tree showing which word governs which.)
- In the NP chunk grammar {
- ?
* +}, what does the ? mean? (Answer: Optional — the determiner DT may appear zero or one time.) - ?
- Why is POS tagging considered a sequence labelling problem? (Answer: The correct tag for a word depends on its context — "run" is NN in "a run" but VB in "I run". Models must consider the entire sequence.)
- Which dependency relation connects "She" to "enjoys" in "She enjoys music"? (Answer: nsubj — nominal subject. "She" is the subject of the root verb "enjoys".)
On LumiChats
When you paste a document into LumiChats and ask it to extract key entities or summarise the main actions, the underlying model applies dependency parsing to understand subject-verb-object relationships — enabling it to answer questions like who did what to whom.
Try it free