Text classification assigns a predefined label to a piece of text. Sentiment analysis is the most common special case: classifying text as positive, negative, or neutral. Classification powers spam filters, news categorisation, intent detection in chatbots, hate speech detection, medical triage, and product review analysis. Methods range from traditional Naive Bayes and SVM to fine-tuned transformer models (BERT) that achieve human-level accuracy on many benchmarks.
Real-life analogy: The email triage clerk
Imagine a company receives thousands of emails daily. A clerk reads each one and sorts it into folders: Sales Inquiry, Technical Support, Billing, Spam, Complaint. This is multi-class text classification. The clerk learns patterns from experience: emails containing 'refund' and 'angry' go to Complaints; emails with 'free money' and 'click here' go to Spam. Machine learning models learn exactly these patterns — automatically, from labelled examples.
Naive Bayes for text classification
Naive Bayes is the classic text classifier. It applies Bayes theorem with the naive conditional independence assumption: given the class, all words are independent. Despite this unrealistic assumption, it works well for spam filtering and topic classification.
Naive Bayes: probability of class c given document d (word sequence w_1...w_n). P(c) = class prior. P(w_i|c) = likelihood of word w_i in class c. Take the class with highest posterior.
Spam classifier with Naive Bayes and TF-IDF
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# Sample data (in practice: 10k+ examples)
texts = [
"Win free iPhone now click here",
"Meeting at 3pm tomorrow in conference room",
"Claim your prize you have been selected",
"Can you review the quarterly report",
"FREE CASH no credit check apply now",
"Hi please find attached the project update",
]
labels = ["spam", "ham", "spam", "ham", "spam", "ham"]
X_train, X_test, y_train, y_test = train_test_split(
texts, labels, test_size=0.3, random_state=42)
# Pipeline: TF-IDF vectorisation + Naive Bayes
clf = Pipeline([
("tfidf", TfidfVectorizer(ngram_range=(1, 2))), # unigrams + bigrams
("nb", MultinomialNB(alpha=0.1)), # Laplace smoothing
])
clf.fit(X_train, y_train)
print(classification_report(y_test, clf.predict(X_test)))
# Predict new email
print(clf.predict(["Congratulations! You won $1000 click to claim"])) # ['spam']Sentiment analysis — beyond binary
Binary sentiment: positive / negative. Fine-grained sentiment: 1-5 star rating prediction. Aspect-based sentiment analysis (ABSA): 'The food was great but the service was terrible' — positive on food aspect, negative on service aspect. ABSA requires identifying both the aspect term and its polarity.
Sentiment analysis with Hugging Face transformers
from transformers import pipeline
# Zero-shot sentiment (no fine-tuning needed)
sentiment = pipeline("sentiment-analysis",
model="distilbert-base-uncased-finetuned-sst-2-english")
texts = [
"This product is absolutely amazing, I love it!",
"Worst purchase I have ever made. Do not buy.",
"It was okay, nothing special but does the job.",
]
for text in texts:
result = sentiment(text)[0]
print(f"{result['label']:<10} ({result['score']:.2%}) {text[:50]}")
# Output:
# POSITIVE (99.98%) This product is absolutely amazing...
# NEGATIVE (99.91%) Worst purchase I have ever made...
# NEGATIVE (52.38%) It was okay, nothing special...
# For ABSA: use aspect-specific models or prompt LLMs
from transformers import pipeline as llm_pipeline
# "Classify the sentiment toward [food] in: 'food great service terrible'"| Method | Data needed | Accuracy | Training time | Best for |
|---|---|---|---|---|
| Naive Bayes + TF-IDF | Small (100s) | Medium (~85%) | Seconds | Fast prototyping, spam |
| SVM + TF-IDF | Medium (1k+) | Good (~88%) | Minutes | Short text, interpretable |
| LSTM/RNN | Large (10k+) | Better (~91%) | Hours | Sequential patterns |
| BERT fine-tuned | Medium (1k+ often) | SOTA (~95%) | Hours (GPU) | Production NLP tasks |
| GPT-4 zero-shot | None | Very good (~92%) | Instant | When no labelled data |
Evaluation metrics for classification
Precision: of all predicted positives, how many are actually positive. Recall: of all actual positives, how many were found. F1: harmonic mean — useful when class distribution is imbalanced. Use macro-F1 for multi-class when classes are equally important.
Accuracy is misleading for imbalanced data
If 99% of emails are ham and 1% are spam, a classifier that always predicts ham gets 99% accuracy but 0% recall on spam. Always report precision, recall, and F1 per class. For medical diagnosis, recall (sensitivity) is critical — missing a disease (false negative) is worse than a false alarm.
Practice questions
- Naive Bayes classifies "free money" as spam. The word "free" never appeared in ham training data. What problem occurs? (Answer: Zero probability — P("free"|ham) = 0, making the entire product 0. Solution: Laplace (add-1) smoothing adds 1 to all word counts.)
- A spam filter has precision 0.95, recall 0.70. What does this mean? (Answer: Of predicted spam, 95% are actually spam (low false positives). But it misses 30% of actual spam (low recall). Better to tune toward high recall for spam.)
- Why is BERT better than TF-IDF + SVM for sentiment on "I am not unhappy"? (Answer: TF-IDF misses double negation. BERT reads the full contextual sequence bidirectionally and understands "not unhappy" ≈ positive.)
- What is aspect-based sentiment analysis? Give an example. (Answer: Identifying sentiment at the entity level. "The battery life is poor but the screen is gorgeous" → battery: NEGATIVE, screen: POSITIVE.)
- Name two evaluation metrics besides accuracy for text classification. (Answer: F1 score (harmonic mean of precision and recall) and AUROC (area under the ROC curve — measures discrimination ability at all thresholds).)
On LumiChats
LumiChats uses sentiment-aware context when generating responses — it detects frustration or confusion in your messages and adjusts its tone accordingly. The same classification models power intent detection: are you asking a question, giving a command, or providing feedback?
Try it free