What is nLP applications of deep learning?

Deep Learning Applications — NLP, Speech Recognition & Recommendation Systems: NLP applications of deep learning. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/dl-applications

What is practice questions?

Deep Learning Applications — NLP, Speech Recognition & Recommendation Systems: Practice questions. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/dl-applications

Deep Learning Applications — NLP, Speech Recognition, Recommendation Systems

Deep Learning Applications — NLP, Speech Recognition & Recommendation Systems

Deep learning transforms raw data into intelligent systems across three mega-domains: Natural Language Processing (understanding and generating text — chatbots, translation, summarization), Speech Recognition (converting audio to text and text to speech — voice assistants, transcription), and Recommendation Systems (personalising content — Netflix, Spotify, Amazon, YouTube). Each domain has evolved from hand-crafted features to deep learning pipelines that achieve superhuman performance on narrow tasks.

Where deep learning creates real-world impact — language, voice, and personalization.

Category: Deep Learning & Neural Networks

NLP applications of deep learning

NLP Application	Deep Learning model	Input	Output
Machine Translation	Transformer (MarianMT, NLLB)	Source language text	Target language text
Text Summarization	BART, T5, Pegasus	Long document	Short summary
Question Answering	BERT-SQuAD, RoBERTa	Context + question	Answer span or generated text
Sentiment Analysis	Fine-tuned BERT/DistilBERT	Review/tweet text	Positive/Negative/Neutral
Named Entity Recognition	BERT with token classification	Text	Token labels (PER, ORG, LOC)
Text Generation	GPT-4, Claude, Llama, Mistral	Prompt	Continuation / response
Text Classification	BERT, FastText	Document	Category label

from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
import torch

# ── Sentiment Analysis ──
sentiment_clf = pipeline("sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english")
results = sentiment_clf([
    "This product is absolutely amazing, I love it!",
    "Worst purchase I have ever made. Do not buy.",
])
for r in results:
    print(f"{r['label']:<10} ({r['score']:.2%})")

# ── Named Entity Recognition ──
ner = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple")
text = "Apple CEO Tim Cook announced new products at the iPhone 16 event in Cupertino."
entities = ner(text)
for e in entities:
    print(f"{e['entity_group']}: {e['word']} ({e['score']:.2%})")
# ORG: Apple, PER: Tim Cook, MISC: iPhone 16, LOC: Cupertino

# ── Text Summarization ──
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
article = """The transformer architecture revolutionised natural language processing
by replacing recurrent networks with self-attention mechanisms. This allows
parallel processing of entire sequences simultaneously, dramatically reducing
training time and enabling the development of much larger language models..."""
summary = summarizer(article, max_length=80, min_length=30)[0]['summary_text']
print(f"Summary: {summary}")

# ── Zero-Shot Classification ──
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
result = classifier(
    "The new iPhone has a longer battery life and better camera.",
    candidate_labels=["technology", "sports", "politics", "finance"]
)
print(f"Label: {result['labels'][0]} ({result['scores'][0]:.2%})")

Speech recognition and synthesis

Automatic Speech Recognition (ASR) converts audio waveforms to text. The pipeline: audio → mel spectrogram (feature extraction) → deep model (CNN+RNN or Transformer) → text via CTC/attention decoder. Modern state-of-the-art: Whisper (OpenAI) — encoder-decoder transformer trained on 680k hours of multilingual audio, achieving near-human transcription in 99 languages.

# OpenAI Whisper: end-to-end speech recognition
# Architecture: CNN feature extractor + Transformer encoder + decoder
import whisper
import numpy as np

# Load model (tiny/base/small/medium/large)
model = whisper.load_model("base")   # 74M params, 1GB, ~16x real-time on CPU

# Transcribe audio file
result = model.transcribe("audio.mp3")
print(f"Text: {result['text']}")
print(f"Language: {result['language']}")

# With timestamps
result_ts = model.transcribe("audio.mp3", word_timestamps=True)
for segment in result_ts['segments']:
    print(f"[{segment['start']:.1f}s - {segment['end']:.1f}s]: {segment['text']}")

# Batch transcription
import torch
audio = whisper.load_audio("audio.mp3")   # Load and resample to 16kHz
audio = whisper.pad_or_trim(audio)        # Pad/trim to 30s (max segment)
mel   = whisper.log_mel_spectrogram(audio).to(model.device)  # 80-band mel

# Whisper architecture for speech:
# Audio → 80-channel mel spectrogram → CNN → Transformer Encoder
#                                              ↓
#                                     Transformer Decoder → tokens → text

# Text-to-Speech (TTS) with deep learning
# from TTS import api; tts = api.TTS("tts_models/en/ljspeech/tacotron2-DDC")
# tts.tts_to_file("Hello, this is deep learning generated speech!", file_path="output.wav")

Recommendation systems with deep learning

Modern recommendation systems use deep learning embeddings to represent users and items in a shared latent space — items similar to what a user likes are nearby in embedding space. Collaborative Filtering (matrix factorization with neural embeddings) powers Netflix and Spotify. Two-Tower models encode queries and candidates into embedding spaces for fast retrieval. Transformers for recommendations (BERT4Rec, SASRec) model sequential user behavior.

import torch
import torch.nn as nn

class NeuralCF(nn.Module):
    """Neural Collaborative Filtering: learns user-item interactions via embeddings."""

    def __init__(self, n_users, n_items, embed_dim=64, hidden_dims=[128, 64, 32]):
        super().__init__()
        # Separate embeddings for users and items
        self.user_embed = nn.Embedding(n_users, embed_dim)
        self.item_embed = nn.Embedding(n_items, embed_dim)

        # MLP layers for learning complex interactions
        layers = []
        input_dim = embed_dim * 2   # Concat user + item embeddings
        for h_dim in hidden_dims:
            layers += [nn.Linear(input_dim, h_dim), nn.ReLU(), nn.Dropout(0.2)]
            input_dim = h_dim
        layers.append(nn.Linear(input_dim, 1))
        layers.append(nn.Sigmoid())    # Output: probability of interaction
        self.mlp = nn.Sequential(*layers)

    def forward(self, user_ids, item_ids):
        user_emb = self.user_embed(user_ids)    # (batch, embed_dim)
        item_emb = self.item_embed(item_ids)    # (batch, embed_dim)
        x = torch.cat([user_emb, item_emb], dim=1)  # (batch, 2×embed_dim)
        return self.mlp(x).squeeze(1)           # (batch,) — interaction probability

# Training
n_users, n_items = 10000, 50000
model     = NeuralCF(n_users, n_items)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-5)
loss_fn   = nn.BCELoss()

# Sample: user 42 interacted with items 100, 200, 300 (positive)
# Negative sampling: user 42 did NOT interact with 400, 500, 600
users_pos  = torch.tensor([42, 42, 42])
items_pos  = torch.tensor([100, 200, 300])
users_neg  = torch.tensor([42, 42, 42])
items_neg  = torch.tensor([400, 500, 600])

users = torch.cat([users_pos, users_neg])
items = torch.cat([items_pos, items_neg])
labels = torch.cat([torch.ones(3), torch.zeros(3)])

preds = model(users, items)
loss  = loss_fn(preds, labels)
loss.backward(); optimizer.step()
print(f"Batch loss: {loss.item():.4f}")

# Inference: get top-N recommendations for user 42
user_tensor = torch.tensor([42] * n_items)
item_tensor = torch.arange(n_items)
with torch.no_grad():
    scores = model(user_tensor, item_tensor)
top_items = torch.topk(scores, k=10).indices
print(f"Top 10 recommendations for user 42: {top_items.tolist()}")

Practice questions

Named Entity Recognition uses token classification. What does this mean? (Answer: Instead of one label per sentence (like sentiment analysis), NER assigns a label to EVERY token: "Tim/B-PER Cook/I-PER is/O the/O CEO/O of/O Apple/B-ORG". B- = beginning of entity, I- = inside entity, O = not an entity. It is a seq-to-seq classification task.)
Whisper uses a mel spectrogram as input. What is a mel spectrogram? (Answer: A mel spectrogram converts audio into a 2D image (time × frequency) using the mel scale — a perceptual scale that matches human hearing (logarithmic). 80 frequency bands, sampled every ~10ms, gives a (80, T) matrix. CNNs can extract features from this representation just like from images.)
Why do recommendation systems use embeddings instead of one-hot encoding for users and items? (Answer: With 10M users and 50M items, one-hot vectors are 60M-dimensional — intractable. Embeddings map each entity to a dense 64-512 dimensional vector that captures latent characteristics. Similar users/items have similar embeddings. Also enables learning complex non-linear interactions via MLP layers.)
BERT-based NLP models are "fine-tuned" for downstream tasks. What does this mean? (Answer: BERT is pre-trained on masked language modeling (self-supervised). For a downstream task (sentiment, NER, QA), you add a task-specific head (linear layer) and continue training on the labeled task data with a small learning rate. The entire model updates, but starting from pre-trained weights — much better than random initialization.)
Two-Tower architecture in recommendations: what are the two towers and why is it fast? (Answer: Tower 1: User encoder — maps user context to embedding. Tower 2: Item encoder — maps item features to embedding. Similarity = dot product of both embeddings. Speed: item embeddings are pre-computed and indexed (FAISS). At query time: only user tower runs online, then fast nearest-neighbor search retrieves top items. O(1) per user, not O(n_items).)

LumiChats is built on all three application domains: NLP (transformer-based text generation), speech (Whisper-powered voice input), and recommendations (surfacing relevant features and responses based on your context). Understanding these applications explains the capabilities and limitations of every AI system you use.

Definition

NLP applications of deep learning

NLP Application	Deep Learning model	Input	Output
Machine Translation	Transformer (MarianMT, NLLB)	Source language text	Target language text
Text Summarization	BART, T5, Pegasus	Long document	Short summary
Question Answering	BERT-SQuAD, RoBERTa	Context + question	Answer span or generated text
Sentiment Analysis	Fine-tuned BERT/DistilBERT	Review/tweet text	Positive/Negative/Neutral
Named Entity Recognition	BERT with token classification	Text	Token labels (PER, ORG, LOC)
Text Generation	GPT-4, Claude, Llama, Mistral	Prompt	Continuation / response
Text Classification	BERT, FastText	Document	Category label

NLP deep learning pipeline with Hugging Face

from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
import torch

# ── Sentiment Analysis ──
sentiment_clf = pipeline("sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english")
results = sentiment_clf([
    "This product is absolutely amazing, I love it!",
    "Worst purchase I have ever made. Do not buy.",
])
for r in results:
    print(f"{r['label']:<10} ({r['score']:.2%})")

# ── Named Entity Recognition ──
ner = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple")
text = "Apple CEO Tim Cook announced new products at the iPhone 16 event in Cupertino."
entities = ner(text)
for e in entities:
    print(f"{e['entity_group']}: {e['word']} ({e['score']:.2%})")
# ORG: Apple, PER: Tim Cook, MISC: iPhone 16, LOC: Cupertino

# ── Text Summarization ──
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
article = """The transformer architecture revolutionised natural language processing
by replacing recurrent networks with self-attention mechanisms. This allows
parallel processing of entire sequences simultaneously, dramatically reducing
training time and enabling the development of much larger language models..."""
summary = summarizer(article, max_length=80, min_length=30)[0]['summary_text']
print(f"Summary: {summary}")

# ── Zero-Shot Classification ──
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
result = classifier(
    "The new iPhone has a longer battery life and better camera.",
    candidate_labels=["technology", "sports", "politics", "finance"]
)
print(f"Label: {result['labels'][0]} ({result['scores'][0]:.2%})")

Speech recognition and synthesis

Speech recognition with OpenAI Whisper

# OpenAI Whisper: end-to-end speech recognition
# Architecture: CNN feature extractor + Transformer encoder + decoder
import whisper
import numpy as np

# Load model (tiny/base/small/medium/large)
model = whisper.load_model("base")   # 74M params, 1GB, ~16x real-time on CPU

# Transcribe audio file
result = model.transcribe("audio.mp3")
print(f"Text: {result['text']}")
print(f"Language: {result['language']}")

# With timestamps
result_ts = model.transcribe("audio.mp3", word_timestamps=True)
for segment in result_ts['segments']:
    print(f"[{segment['start']:.1f}s - {segment['end']:.1f}s]: {segment['text']}")

# Batch transcription
import torch
audio = whisper.load_audio("audio.mp3")   # Load and resample to 16kHz
audio = whisper.pad_or_trim(audio)        # Pad/trim to 30s (max segment)
mel   = whisper.log_mel_spectrogram(audio).to(model.device)  # 80-band mel

# Whisper architecture for speech:
# Audio → 80-channel mel spectrogram → CNN → Transformer Encoder
#                                              ↓
#                                     Transformer Decoder → tokens → text

# Text-to-Speech (TTS) with deep learning
# from TTS import api; tts = api.TTS("tts_models/en/ljspeech/tacotron2-DDC")
# tts.tts_to_file("Hello, this is deep learning generated speech!", file_path="output.wav")

Recommendation systems with deep learning

Neural collaborative filtering with PyTorch

import torch
import torch.nn as nn

class NeuralCF(nn.Module):
    """Neural Collaborative Filtering: learns user-item interactions via embeddings."""

    def __init__(self, n_users, n_items, embed_dim=64, hidden_dims=[128, 64, 32]):
        super().__init__()
        # Separate embeddings for users and items
        self.user_embed = nn.Embedding(n_users, embed_dim)
        self.item_embed = nn.Embedding(n_items, embed_dim)

        # MLP layers for learning complex interactions
        layers = []
        input_dim = embed_dim * 2   # Concat user + item embeddings
        for h_dim in hidden_dims:
            layers += [nn.Linear(input_dim, h_dim), nn.ReLU(), nn.Dropout(0.2)]
            input_dim = h_dim
        layers.append(nn.Linear(input_dim, 1))
        layers.append(nn.Sigmoid())    # Output: probability of interaction
        self.mlp = nn.Sequential(*layers)

    def forward(self, user_ids, item_ids):
        user_emb = self.user_embed(user_ids)    # (batch, embed_dim)
        item_emb = self.item_embed(item_ids)    # (batch, embed_dim)
        x = torch.cat([user_emb, item_emb], dim=1)  # (batch, 2×embed_dim)
        return self.mlp(x).squeeze(1)           # (batch,) — interaction probability

# Training
n_users, n_items = 10000, 50000
model     = NeuralCF(n_users, n_items)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-5)
loss_fn   = nn.BCELoss()

# Sample: user 42 interacted with items 100, 200, 300 (positive)
# Negative sampling: user 42 did NOT interact with 400, 500, 600
users_pos  = torch.tensor([42, 42, 42])
items_pos  = torch.tensor([100, 200, 300])
users_neg  = torch.tensor([42, 42, 42])
items_neg  = torch.tensor([400, 500, 600])

users = torch.cat([users_pos, users_neg])
items = torch.cat([items_pos, items_neg])
labels = torch.cat([torch.ones(3), torch.zeros(3)])

preds = model(users, items)
loss  = loss_fn(preds, labels)
loss.backward(); optimizer.step()
print(f"Batch loss: {loss.item():.4f}")

# Inference: get top-N recommendations for user 42
user_tensor = torch.tensor([42] * n_items)
item_tensor = torch.arange(n_items)
with torch.no_grad():
    scores = model(user_tensor, item_tensor)
top_items = torch.topk(scores, k=10).indices
print(f"Top 10 recommendations for user 42: {top_items.tolist()}")

Practice questions

Named Entity Recognition uses token classification. What does this mean? (Answer: Instead of one label per sentence (like sentiment analysis), NER assigns a label to EVERY token: "Tim/B-PER Cook/I-PER is/O the/O CEO/O of/O Apple/B-ORG". B- = beginning of entity, I- = inside entity, O = not an entity. It is a seq-to-seq classification task.)
Whisper uses a mel spectrogram as input. What is a mel spectrogram? (Answer: A mel spectrogram converts audio into a 2D image (time × frequency) using the mel scale — a perceptual scale that matches human hearing (logarithmic). 80 frequency bands, sampled every ~10ms, gives a (80, T) matrix. CNNs can extract features from this representation just like from images.)
Why do recommendation systems use embeddings instead of one-hot encoding for users and items? (Answer: With 10M users and 50M items, one-hot vectors are 60M-dimensional — intractable. Embeddings map each entity to a dense 64-512 dimensional vector that captures latent characteristics. Similar users/items have similar embeddings. Also enables learning complex non-linear interactions via MLP layers.)
BERT-based NLP models are "fine-tuned" for downstream tasks. What does this mean? (Answer: BERT is pre-trained on masked language modeling (self-supervised). For a downstream task (sentiment, NER, QA), you add a task-specific head (linear layer) and continue training on the labeled task data with a small learning rate. The entire model updates, but starting from pre-trained weights — much better than random initialization.)
Two-Tower architecture in recommendations: what are the two towers and why is it fast? (Answer: Tower 1: User encoder — maps user context to embedding. Tower 2: Item encoder — maps item features to embedding. Similarity = dot product of both embeddings. Speed: item embeddings are pre-computed and indexed (FAISS). At query time: only user tower runs online, then fast nearest-neighbor search retrieves top items. O(1) per user, not O(n_items).)

On LumiChats

Try it free

Deep Learning Applications — NLP, Speech Recognition & Recommendation Systems

NLP applications of deep learning

Speech recognition and synthesis

Recommendation systems with deep learning

Practice questions

Deep Learning Applications — NLP, Speech Recognition & Recommendation Systems

NLP applications of deep learning

Speech recognition and synthesis

Recommendation systems with deep learning

Practice questions

Practice what you just learned

Related Terms