AI Glossary - productized.tech

## Inference An AI model making predictions on new data that wasn‘t included in its training. During inference, the model: 1. Takes in new data (like an image, text, or numbers) 2. Processes it through its learned parameters and architecture 3. Produces an output (like a classification, prediction, or generated content) For example, when you ask an AI a question, it’s performing inference - using its trained parameters to process your input and generate an appropriate response. ## Tokens and Vectors Large Language Models (LLMs) don’t read text like humans—they process it as tokens and vectors. **Tokens** are the building blocks of text, which can be words, subwords, or characters. Each token is converted into a **vector**—a list of numbers that represents its meaning in a high-dimensional space. Similar words have similar vectors, helping AI understand relationships between concepts. LLMs are trained on large amounts of text (books, articles, websites). The model starts with **random vectors** for each token and learns their meaning by predicting missing words in sentences. During training, the model **adjusts the vectors** so that words with similar meanings are mathematically closer. For example: - “King” → `[0.8, 0.1, 0.7]` - “Queen” → `[0.78, 0.12, 0.72]` - “Apple” → `[0.2, 0.9, 0.3]` Since “King” and “Queen” have similar vectors, the AI understands they are related, while “Apple” is in a different area of the space. When you ask an AI a question, it converts your words into vectors, searches for similar meanings, and generates a response based on patterns it has learned—without storing exact text. This is how LLMs can answer questions, complete sentences, or summarize content intelligently. See [How does AI answer my question?](How%20does%20AI%20answer%20my%20question?.md) for more details. ## RAG RAG stands for Retrieval-Augmented Generation. It is a technique that enhances the performance of generative AI models by incorporating external knowledge retrieval during the response generation process. RAG process: 1. Retrieval: When the model receives a query, it searches an external knowledge base (e.g., a document database, vector store, or the web) to find relevant information. 2. Augmentation: The retrieved information is then fed into the generative AI model as additional context. 3. Generation: The AI model generates a response based on both its internal knowledge and the retrieved data.