AI Decoded

How large language models (LLMs) actually work

By Dean Stanberry

5.25.2026

Technology Emerging Topics

How large language models (LLMs) actually work

Every time someone types a question into an AI tool and hits send, something remarkable happens inside the software. In less than a second, those words are broken apart, mapped onto a vast network of meaning, cross-referenced against themselves and transformed into a response — one word at a time.

Most people have no idea how this works; and for a long time, that was fine. But AI is showing up in lease abstraction tools, preventive maintenance systems, capital planning platforms and building analytics dashboards. Facility managers and real estate professionals who understand what is happening under the hood will make better decisions about when to trust it, when to verify it and when to build with it.

Understanding the four stages that every major AI language model — Claude, ChatGPT, Gemini — uses to produce a response does not require math or a coding background. The mechanics are explained in terms that make sense for facility professionals who spend their days thinking about buildings, systems and leases.

The most important thing to know before reading further: AI does not look things up. It predicts.

That single fact explains almost everything about what AI can do well, what it gets wrong and why it sometimes sounds completely confident while being completely wrong. It is worth keeping in mind throughout the four stages that follow.

LLM-4Stages

The four stages every AI language model uses to turn user input into a response.

Stage 1: Tokenization — The model does not read words

When a user sends a message to an AI tool, the first thing it does is break the text into small fragments. These fragments are called tokens. Each token is actually a number that maps to a word, part of a word or punctuation mark.

The word "extraordinary" might become two tokens: extra and ordinary. A technical term the model has rarely seen in its training data might fragment into even smaller pieces — chunks it does recognize even if the full word is unfamiliar. A carefully worded lease clause or maintenance specification gets disassembled before the model processes a single bit of meaning from it.

This matters for practical reasons. AI tools charge by the token, not by the word. A long building condition report pasted into a prompt is a significant cost. Different languages produce different token counts for the same content: Korean, Arabic and Chinese typically require more tokens per concept than English, which is why AI tools tend to perform better and cost less in English. Also, highly specialized terminology — OSCRE taxonomy codes, BOMA area definitions, proprietary system model numbers — may tokenize in ways that produce unexpected results.

LLM-Stage 1 Text as written by the user, versus the token fragments the AI actually processes.

Stage 2: Embeddings — Mapping meaning across a vast terrain

Tokens are just numbers. On their own, they carry no meaning. Stage 2 converts each token into something richer: an embedding. An embedding is a set of thousands of numbers that describes where a word sits on a vast, multidimensional map of meaning.

The way this map works is intuitive once the concept is clear. Words with similar meanings end up close together. "Boiler" and "chiller" are neighbors. "Preventive maintenance" and "PM schedule" are nearby. "Boiler" and "rent escalation" are far apart. The model learned these relationships by reading enormous amounts of text, and it encoded them as positions in this meaning space.

This is why AI tools can find relevant documents even when the search terms differ from those in the document. Search for "cooling system maintenance," and the model can surface records about "chiller plant servicing" because the positions of those phrases on the meaning map are close. This is also why data quality and terminology consistency matter so much for AI-powered search. If an asset database uses "RTU" in some records, "rooftop unit" in others and "packaged HVAC unit" in others, those entries land in slightly different places on the meaning map — and retrieval misses documents it should find.

Standardized data taxonomy is not just good data hygiene. It is a prerequisite for AI retrieval to work accurately.

LLM-Stage 2

The AI's meaning map. Similar concepts cluster together, just like similar spaces cluster on a floor plan.

Stage 3: Attention — Holding the whole document in mind

The third stage is where the transformer architecture earns its name. Attention is the mechanism that allows an AI model to connect any word in the input to any other word in the input, regardless of how far apart they are.

For every token in the sequence, the model calculates a weight for every other token: essentially asking, "How much should this word influence my understanding of every other word in this document?" These weights let the model understand that "it" in the 15th paragraph refers to the building system introduced on page one. They let the model track that a constraint mentioned in the opening section applies to the recommendation being asked about at the end.

This is powerful. It is also expensive. Attention computation grows roughly with the square of the context size, which is why context windows (the total amount of text a model can hold at once) are finite and why larger context windows cost more. It is also why very long documents can cause the model to lose track of information buried in the middle. When an AI tool is asked to analyze a lengthy document, the most important instructions and context should be at the beginning or the end, not buried in the middle.

There is one more thing worth understanding about attention. The model has no persistent memory between conversations. Every new conversation starts fresh. Multiturn conversations work because the tool automatically resends the entire prior exchange with every new message. The model rereads the full conversation history from the beginning every single time. Session state management is an application responsibility, not something the model handles automatically.

LLM-Stage 3 How attention works across a long document — connecting findings from different sections the way an experienced FM connects observations in a condition assessment.

Stage 4: Generation — One word at a time, by probability

After the input has been tokenized, embedded and processed through attention, the model does something that surprises most people when they learn about it. It does not retrieve a stored answer. It generates a response — one token at a time — by calculating the probability of every possible next word and picking one.

"The" might have a 14 percent chance of being the right first word. "Based" might have 11 percent. "Typically" might have 9 percent. The model selects one, then does the entire calculation again for the word after that. And again. And again. Until the response is complete. The final answer the reader sees is the accumulation of hundreds or thousands of these individual probability selections.

This is why a user can send the same question twice and get slightly different answers. The model is sampling from probability distributions, not reading from a fixed lookup table. A setting called temperature controls how much randomness is in that sampling. Low temperature makes the model choose the most likely option almost every time, producing consistent, predictable output. Higher temperature introduces more variety, which can be useful for creative or exploratory work, and counterproductive for tasks that require precision.

High probability does not mean factually correct. The model has no mechanism to verify its own output.

LLM-Stage 4 How the AI selects each next word from a probability distribution. Right: the FM equivalent — a veteran PM's ROM confidence varies with familiarity. AI generation confidence is constant whether the output is right or wrong.

The confidence problem: Why AI gets things wrong

There is a term for when an AI tool produces a confident, detailed, completely wrong answer: hallucination. It sounds dramatic, but it is the natural consequence of probability-based generation. When the most statistically likely sequence of words happens to be factually wrong, the model generates it anyway because it has no mechanism to check.

For FM and CRE professionals, this shows up in predictable patterns. The model might cite a building code section that sounds exactly right but does not exist. It might quote a lease obligation that reflects standard triple-net practice rather than the actual lease on file. It might produce a benchmark statistic — maintenance cost per square foot, occupancy ratio — that is plausible but fabricated. It might blend details from several real equipment specifications into one product that never existed.

None of this is deception. The model genuinely does not know it is wrong. It selected the most likely tokens at each step and accumulated into a plausible-sounding but inaccurate answer. LLM-Analogy 5 V2

The practical response to hallucination risk is not distrust; it is structured verification. Any AI output that involves specific regulatory requirements, contract terms, cost data or safety-related information should be treated as a first draft that requires review, not a finding that can go directly into a report or decision.

The most durable mitigation is architectural. Retrieval-augmented generation (RAG) addresses the problem at the source: instead of asking the model what it knows about a lease or a standard, the practitioner provides the actual document as part of the prompt and instructs the model to work from that source. The model's behavior does not change but the quality of the input material improves dramatically, and the model's responses are grounded in verifiable content rather than training patterns.

What this means for FM & CRE practice

The four stages are not just theory. Each one maps directly to a practical decision that FM and CRE professionals face when evaluating, purchasing or building AI tools for their operations.

Tokenization tells practitioners that AI operating costs are driven by the volume of text fed into it. Long building condition reports, full lease documents and pasted email threads are all significant token loads. Efficient prompting and session design are operational expense decisions, not just technical preferences.

Embeddings tell practitioners that retrieval accuracy depends on terminology consistency. When evaluating or building an AI tool that searches asset data, lease records or maintenance history, standardized taxonomy is a prerequisite — not something an organization can retrofit after the fact. The meaning map only works when similar concepts have consistent names.

Attention tells practitioners that context windows are finite and that the model has no persistent memory. For portfolio-scale applications that need to process many documents, the architecture must account for document chunking, session management and context limits. An organization cannot simply dump 500 leases into a single prompt and expect coherent analysis.

Generation tells practitioners that any AI output involving regulatory requirements, contract terms, cost data or safety information requires professional verification. The model generates the most statistically likely answer, which is often correct and always confident. The practitioner’s job is to confirm that this particular answer is accurate for this particular situation.

AI should be treated the way a team treats a smart, experienced, occasionally overconfident junior colleague. Brief them well. Give them the right documents. Review the work before it goes out.

That is not a limitation to apologize for. It is an accurate description of a very useful tool. Junior colleagues who are smart, fast and well-briefed produce enormous amounts of value. The professional who manages them well — who gives clear instructions, provides the right source materials and reviews the output before it has consequences — gets far more done than one who either dismisses the help or accepts every output uncritically.

The same posture serves practitioners well with AI. Understanding the mechanics, setting up the inputs correctly and verifying what matters — these four stages are the foundation of that understanding.

Register for World Workplace

AI Decoded

How large language models (LLMs) actually work

Stage 1: Tokenization — The model does not read words

Stage 2: Embeddings — Mapping meaning across a vast terrain

Stage 3: Attention — Holding the whole document in mind

Stage 4: Generation — One word at a time, by probability

The confidence problem: Why AI gets things wrong

What this means for FM & CRE practice

References

Let Us Hear From You

Register for World Workplace

AI Decoded

How large language models (LLMs) actually work

Stage 1: Tokenization — The model does not read words

Stage 2: Embeddings — Mapping meaning across a vast terrain

Stage 3: Attention — Holding the whole document in mind

Stage 4: Generation — One word at a time, by probability

The confidence problem: Why AI gets things wrong

What this means for FM & CRE practice

References

Related articles

Artificial Intelligence in FM

Don’t Drop the Ball

Beware of Growing Cyberthreats

Let Us Hear From You