AI Explained

What Are AI Tokens? Explained Simply

Aditya Kumar JhaAditya Kumar JhaLinkedInAmazon·June 27, 2026·9 min read

Tokens are the chunks of text AI reads and writes, and they decide your cost, speed, and limits. Here is what a token is, in plain English.

A token is a small chunk of text that an AI model reads and writes, usually a word, part of a word, or a punctuation mark. As a rule of thumb in English, one token is about four characters, and roughly 100 tokens is about 75 words. Tokens matter to you for three practical reasons: AI pricing is charged per token, every model can only hold so many tokens at once (its context window), and longer token counts mean slower, costlier responses. Understand tokens and the rest of how AI billing and limits work suddenly makes sense.

Models do not read letters or whole sentences the way we do. They break text into tokens first, then work with those. This one idea quietly explains why a long document costs more to process, why a model 'forgets' the start of a very long chat, and why your API bill is shaped the way it is. Here it is in plain English.

What a Token Actually Is

When you send text to a model, it is split into tokens by a tokenizer. Common short words are often a single token, while longer or rarer words get split into pieces. 'Cat' is one token. 'Unbelievable' might be three. A space or a comma can be its own token. Numbers and code split in their own ways. The exact split varies by model, but the rough English conversion holds: about four characters per token, about 75 words per 100 tokens.

TextApprox. tokensNote
'Hello'1 tokenCommon short word
'unbelievable'2 to 3 tokensLonger words split into pieces
A 75-word paragraph~100 tokensThe everyday rule of thumb
A 4-page document~2,000 tokensWhy long inputs cost more

Why Tokens Decide Your Cost

AI providers price by the token, and they usually charge separately for input tokens (what you send) and output tokens (what the model writes back), with output often more expensive. So a request is billed on the full conversation it has to read plus everything it generates. This is why a short question with a huge pasted document can cost more than a long question with no attachment, and why trimming what you send is the simplest way to lower your bill on paid APIs.

Why Tokens Decide the Limits (Context Windows)

Every model has a context window, the maximum number of tokens it can consider at once, counting both your input and its output. When a conversation or document exceeds that window, the oldest content falls out of view, which is why a very long chat can seem to 'forget' what you said at the start. Bigger context windows let a model hold entire books or codebases at once, but more tokens in the window also means slower and more expensive responses, so bigger is not automatically better for every task.

Insight

Three things you care about all trace back to one unit: cost is per token, the memory limit is a token count, and speed drops as token counts rise. Master tokens and AI pricing stops being mysterious.

Input vs Output Tokens

It helps to separate the two. Input tokens are everything the model reads: your prompt, any pasted text or files, and the earlier turns of the conversation it is shown. Output tokens are what it writes in reply. On most paid APIs, output costs more per token than input, which means asking for a shorter answer can save money even when your input is large. If you only use a consumer chat app on a flat monthly plan, you do not pay per token directly, but the same limits still shape how much the model can handle at once.

How to Use Fewer Tokens (and Pay Less)

  • Send only what is needed: paste the relevant section of a document, not the entire file, when the rest is irrelevant.
  • Ask for the length you want: 'answer in 3 bullet points' produces fewer output tokens than an open-ended request.
  • Start fresh for new topics: a brand-new chat does not carry the token weight of a long, unrelated history.
  • Summarize long threads: ask the model to summarize a long conversation, then continue from the summary to shed old tokens.
  • Check before you scale: if you build on an API, run sample text through a tokenizer so you can estimate cost before going live.

Do You Need to Count Tokens Yourself?

For everyday chatting, no. The rough conversion, about 75 words per 100 tokens, is all most people ever need, and consumer apps handle the rest invisibly. Token counting becomes worth your attention only when you are building on an API at scale, where input and output token costs add up across thousands of requests, or when you are deciding whether a long document will fit a model's context window. A tool like LumiChats, which lets you use 40-plus models in one place, also makes it easy to see how different models handle the same long input without managing each provider's billing yourself.

Frequently Asked Questions
01What is a token in AI?

A token is a small chunk of text, typically a word, part of a word, or a punctuation mark, that an AI model reads and writes. Models break all text into tokens before processing it. In English, a token averages about four characters, and roughly 100 tokens equals about 75 words.

02How many words is 1,000 tokens?

Roughly 750 words in English, using the common rule of about 75 words per 100 tokens. The exact number varies because longer and rarer words split into more tokens, but 750 words is a reliable estimate for planning cost and length.

03Why do AI tools charge by the token?

Tokens are the unit of work a model processes, so providers price by them. Most charge separately for input tokens (what you send) and output tokens (what the model writes), with output often more expensive. Sending less and requesting shorter answers lowers cost on paid APIs.

04What is the difference between tokens and a context window?

Tokens are the unit; the context window is how many tokens a model can consider at once, counting both your input and its output. When content exceeds the window, the oldest tokens drop out of view, which is why long chats can seem to forget earlier messages.

05Do I need to count tokens as a regular user?

No. For everyday chatting the rough conversion of about 75 words per 100 tokens is enough, and apps handle it invisibly. Token counting matters mainly when building on an API at scale or checking whether a long document fits a model's context window.

In short, tokens are the hidden unit behind everything you notice about AI: the price, the memory limit, and the speed. Remember about 75 words per 100 tokens, that input and output are billed separately, and that bigger context windows cost more, and you will understand AI pricing and limits better than most people who use these tools every day.

Was this article helpful?

Found this useful? Share it with someone who needs it.

Free to get started

Claude, GPT-5.4, Gemini —
all in one place.

Switch between 40+ AI models in a single conversation. No juggling tabs, no separate subscriptions. Pay only for what you use.

Start for free No credit card needed
Aditya Kumar Jha
Written by
Aditya Kumar JhaLinkedIn

Published author of six books and founder of LumiChats. Writes about AI tools, model comparisons, and how AI is reshaping work and education.

Keep reading

More guides for AI-powered students.