Glossary/Text Classification & Sentiment Analysis
Natural Language Processing

Text Classification & Sentiment Analysis

Teaching machines to assign categories to text — from spam detection to emotion detection.


Definition

Text classification assigns a predefined label to a piece of text. Sentiment analysis is the most common special case: classifying text as positive, negative, or neutral. Classification powers spam filters, news categorisation, intent detection in chatbots, hate speech detection, medical triage, and product review analysis. Methods range from traditional Naive Bayes and SVM to fine-tuned transformer models (BERT) that achieve human-level accuracy on many benchmarks.

Real-life analogy: The email triage clerk

Imagine a company receives thousands of emails daily. A clerk reads each one and sorts it into folders: Sales Inquiry, Technical Support, Billing, Spam, Complaint. This is multi-class text classification. The clerk learns patterns from experience: emails containing 'refund' and 'angry' go to Complaints; emails with 'free money' and 'click here' go to Spam. Machine learning models learn exactly these patterns — automatically, from labelled examples.

Naive Bayes for text classification

Naive Bayes is the classic text classifier. It applies Bayes theorem with the naive conditional independence assumption: given the class, all words are independent. Despite this unrealistic assumption, it works well for spam filtering and topic classification.

Naive Bayes: probability of class c given document d (word sequence w_1...w_n). P(c) = class prior. P(w_i|c) = likelihood of word w_i in class c. Take the class with highest posterior.

Spam classifier with Naive Bayes and TF-IDF

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Sample data (in practice: 10k+ examples)
texts  = [
    "Win free iPhone now click here",
    "Meeting at 3pm tomorrow in conference room",
    "Claim your prize you have been selected",
    "Can you review the quarterly report",
    "FREE CASH no credit check apply now",
    "Hi please find attached the project update",
]
labels = ["spam", "ham", "spam", "ham", "spam", "ham"]

X_train, X_test, y_train, y_test = train_test_split(
    texts, labels, test_size=0.3, random_state=42)

# Pipeline: TF-IDF vectorisation + Naive Bayes
clf = Pipeline([
    ("tfidf", TfidfVectorizer(ngram_range=(1, 2))),  # unigrams + bigrams
    ("nb",    MultinomialNB(alpha=0.1)),               # Laplace smoothing
])
clf.fit(X_train, y_train)
print(classification_report(y_test, clf.predict(X_test)))

# Predict new email
print(clf.predict(["Congratulations! You won $1000 click to claim"]))  # ['spam']

Sentiment analysis — beyond binary

Binary sentiment: positive / negative. Fine-grained sentiment: 1-5 star rating prediction. Aspect-based sentiment analysis (ABSA): 'The food was great but the service was terrible' — positive on food aspect, negative on service aspect. ABSA requires identifying both the aspect term and its polarity.

Sentiment analysis with Hugging Face transformers

from transformers import pipeline

# Zero-shot sentiment (no fine-tuning needed)
sentiment = pipeline("sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english")

texts = [
    "This product is absolutely amazing, I love it!",
    "Worst purchase I have ever made. Do not buy.",
    "It was okay, nothing special but does the job.",
]
for text in texts:
    result = sentiment(text)[0]
    print(f"{result['label']:<10} ({result['score']:.2%})  {text[:50]}")

# Output:
# POSITIVE   (99.98%)  This product is absolutely amazing...
# NEGATIVE   (99.91%)  Worst purchase I have ever made...
# NEGATIVE   (52.38%)  It was okay, nothing special...

# For ABSA: use aspect-specific models or prompt LLMs
from transformers import pipeline as llm_pipeline
# "Classify the sentiment toward [food] in: 'food great service terrible'"
MethodData neededAccuracyTraining timeBest for
Naive Bayes + TF-IDFSmall (100s)Medium (~85%)SecondsFast prototyping, spam
SVM + TF-IDFMedium (1k+)Good (~88%)MinutesShort text, interpretable
LSTM/RNNLarge (10k+)Better (~91%)HoursSequential patterns
BERT fine-tunedMedium (1k+ often)SOTA (~95%)Hours (GPU)Production NLP tasks
GPT-4 zero-shotNoneVery good (~92%)InstantWhen no labelled data

Evaluation metrics for classification

Precision: of all predicted positives, how many are actually positive. Recall: of all actual positives, how many were found. F1: harmonic mean — useful when class distribution is imbalanced. Use macro-F1 for multi-class when classes are equally important.

Accuracy is misleading for imbalanced data

If 99% of emails are ham and 1% are spam, a classifier that always predicts ham gets 99% accuracy but 0% recall on spam. Always report precision, recall, and F1 per class. For medical diagnosis, recall (sensitivity) is critical — missing a disease (false negative) is worse than a false alarm.

Practice questions

  1. Naive Bayes classifies "free money" as spam. The word "free" never appeared in ham training data. What problem occurs? (Answer: Zero probability — P("free"|ham) = 0, making the entire product 0. Solution: Laplace (add-1) smoothing adds 1 to all word counts.)
  2. A spam filter has precision 0.95, recall 0.70. What does this mean? (Answer: Of predicted spam, 95% are actually spam (low false positives). But it misses 30% of actual spam (low recall). Better to tune toward high recall for spam.)
  3. Why is BERT better than TF-IDF + SVM for sentiment on "I am not unhappy"? (Answer: TF-IDF misses double negation. BERT reads the full contextual sequence bidirectionally and understands "not unhappy" ≈ positive.)
  4. What is aspect-based sentiment analysis? Give an example. (Answer: Identifying sentiment at the entity level. "The battery life is poor but the screen is gorgeous" → battery: NEGATIVE, screen: POSITIVE.)
  5. Name two evaluation metrics besides accuracy for text classification. (Answer: F1 score (harmonic mean of precision and recall) and AUROC (area under the ROC curve — measures discrimination ability at all thresholds).)

On LumiChats

LumiChats uses sentiment-aware context when generating responses — it detects frustration or confusion in your messages and adjusts its tone accordingly. The same classification models power intent detection: are you asking a question, giving a command, or providing feedback?

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

5 terms