Building AI Agents? The AI Is the Easy Part.

The models are smart enough. That realization changed how I think about AI agents. It's natural to focus on the model - which one is smartest, which one is fastest, which one just topped the benchmarks. But after spending real time building and configuring agents, I've come to believe that model intelligence is essentially a solved problem. The hard part - and the part that actually determines whether an agent is useful - is everything you build around it. I think it comes down to four fundamental dimensions. If you miss one, the whole thing falls apart.

I. Identity & Grounding - Who the agent is and what it knows

This is the foundation. Without it, you have a generic AI. With it, you have your agent.

Personality is the consistent voice that makes the difference between talking to a colleague and talking to a generic AI. It's the remembering of preferences, the adapting to different styles as needed in different situations, the tone that feels right for you. Think of it as the agent's identity layer - the SOUL.md that defines who this agent is, not just what it does.

Domain knowledge is where the real value is generated. It enhances the vast world knowledge the AI already has with specific knowledge about the domain in which it operates. But domain knowledge is not one thing - it splits into two very different design problems:

Static knowledge - the things you tell the agent upfront. Your company's product catalog, your brand guidelines, your preferences, the rules of your industry. This is curated, stable, and deliberate.
Dynamic context - the things the agent discovers at runtime. A live conversation thread, an incoming email, the current state of a project. This is fluid, situational, and often messy.

The agent needs different strategies for each. Static knowledge can be carefully structured and pre-loaded. Dynamic context requires the agent to fetch, filter, and prioritize - deciding what's relevant to this specific task right now. This is the difference between an agent that retrieves everything and one that retrieves the right thing.

And here's the part most people underestimate: getting context connected to the agent is often the hardest problem. Let's say your company produces meeting transcripts in Microsoft Teams - super relevant domain knowledge. But there's no automated way to pipe those transcripts into an agent running on your machine. It's locked in, out of reach for the agent.

Context infrastructure is frequently the actual bottleneck, not the AI itself. Solving this often means changing your tooling so that a real connection exists between the agent and the context it needs.

Memory is the foundation of learning - for humans and agents alike. The ability to have short-term and long-term memory is what makes an agent feel like an intelligent, capable entity rather than a stateless function being called over and over.

Short-term memory can be as simple as a daily log of what happened, including successes and failures.
Long-term memory is synthesized from those daily logs - learnings that persist across sessions and get applied to future work.

Past learnings are applied. Mistakes are avoided. Success compounds. Although agents start with all the knowledge the world has to offer, they still need to be trained on the actual job they are supposed to do - like any human would.

II. Cognition - How the agent thinks

Knowledge without reasoning is just a database. This dimension is about what the agent does between receiving input and producing output.

Planning and goal decomposition. This is arguably the single most important capability that separates a useful agent from a glorified script. A genuinely useful agent can take a high-level goal, break it into sub-goals, sequence them, and adapt the plan as conditions change. "Prepare me for tomorrow's board meeting" isn't one task - it's a dozen: check the agenda, pull relevant metrics, review recent decisions, draft talking points, flag open questions. An agent that can use tools but can't plan is just an automation with natural language input.

Judgement. An agent will happily execute workflows, produce content, and never grow tired. But how do you make sure that what it's doing is actually useful? This requires judgement - and for an agent, judgement must be made explicit in a way that human judgement never is. We operate on intuition. Agents need that intuition decoded into concrete terms:

Values - what principles should the agent inherit from you?
Rules - what are the non-negotiable guidelines it should follow?
Expectations - what does "good" look like for a specific task?

This is the hardest design problem in the entire framework. Everything is technically just context for an AI model - input in, output out. But how to structure that input, how to provide the right steering signals that shape the output - that's the art.

But knowing what good looks like is only half the picture.

Guardrails and boundaries. Judgement tells the agent what it should do. Guardrails define what it should never do. This is the negative space of judgement, and it becomes critical the moment you give an agent real autonomy. An agent that confidently sends the wrong message to a customer is worse than one that does nothing. Guardrails are how you make autonomy safe.

III. Agency - How the agent acts

Knowing and thinking are useless without the ability to do. This is about turning decisions into action.

Tools. Agents need to be able to act - through integrations with third-party services, access to a local file system, access to a terminal to run applications. But the real power isn't in any single tool call. It's in chaining tool calls together into full workflows, where the output of one action feeds the input of the next.

Autonomy. The ability to act without being manually triggered every time. Scheduled execution (cron jobs, recurring tasks) is one dimension of this - and an important one. Giving an agent the ability to create its own schedule allows it to think through how to act like a fully autonomous entity. But autonomy is broader than scheduling. It includes:

Decision-making scope - what the agent is allowed to do without asking you
Error recovery - what happens when a step fails mid-workflow
Delegation - can the agent enlist other agents or sub-processes to help?

The real design question behind autonomy is the trust boundary: how much latitude do you give the agent to act, decide, and recover on its own? An agent with perfect scheduling but no ability to handle the unexpected is still brittle.

IV. Interface - How the agent interacts

The final dimension is about how the agent connects to the humans it works with.

Communication channels. If your agent is limited to a single application on your computer, it won't be much useful in everyday life. It needs to reach you wherever you are - WhatsApp on your phone, a smart speaker in your living room, a keyboard shortcut on your computer. And natural language input through voice is essential. It removes all friction when you can simply speak to your agent.

Human collaboration. This is the deeper and more interesting interface problem. It's not just about reaching the human - it's about knowing when to involve the human. Remember the decision-making scope from the previous chapter - the boundary of what the agent is allowed to do on its own? Human collaboration is the other side of that coin. The scope defines where the boundary is. Collaboration defines what happens at the boundary. A genuinely useful agent knows when to pause and ask, how to present options rather than just outputs, and how to incorporate human feedback mid-workflow. The best agents aren't fully autonomous. They're fluent collaborators that make the human feel in control even when the agent is doing most of the work.

Observability. There's a related but distinct need: the ability to see what the agent is doing, not just its final output. Not always - you don't want to supervise every email draft or calendar move. But when the stakes are higher than usual, or when the agent's behavior feels slightly off, you need the option to look under the hood. What steps did it take? What did it consider and discard? Why did it choose this approach over another? An agent without observability is a black box that does magic tricks - impressive when it works, terrifying when it doesn't. The best agents make their reasoning accessible on demand, so you can dial your involvement up or down depending on how much trust the moment requires.

The Framework at a Glance

Dimension	Core question	Ingredients
Identity & Context	What does the agent know?	Domain knowledge (static + dynamic), Memory (short-term + long-term), Personality
Cognition	How does the agent think?	Planning & goal decomposition, Judgement, Self-evaluation, Guardrails
Agency	How does the agent act?	Tools & tool chaining, Autonomy (scheduling, decision scope, error recovery, delegation)
Interface	How does the agent interact?	Communication channels, Human collaboration (when to ask, how to present, how to incorporate feedback), Observability (on-demand reasoning transparency)

Think of It as a Tiny Company

If the framework above feels abstract, try this: picture your agent as a tiny company with a single employee - a brilliant generalist who showed up to an empty office. No files, no phone, no address book, no memory of yesterday. That's an LLM.

Your job isn't to make the employee smarter. It's to build a company around them:

Identity & Grounding is the onboarding - the culture doc, the wiki, the notebook. Here's who you are, here's what we know, here's what we've learned.
Cognition is the work at the desk - the employee sits down, reads the brief, makes a plan, exercises judgement, and knows what lines not to cross.
Agency is the tools and the initiative - the software on the laptop, the badge that opens doors, and the alarm clock that says check in every morning without being asked.
Interface is the office door - how the employee communicates with you, when to knock and ask, and the glass wall that lets you watch the work happen when you need to.

The model is the employee. Everything else is the company you build around it. And like any company, the bottleneck is almost never the talent - it's the infrastructure.

When Things Break, Look at the Framework

The real gift of thinking in dimensions is that it tells you where to look when your agent isn't working. Agent problems are almost never intelligence problems. They're structural problems - and each one maps to a specific part of the framework:

Agent gives inconsistent or off-brand answers? → Identity & Grounding. Your personality file is vague or missing.
Agent doesn't know something it should? → Identity & Grounding. The knowledge exists somewhere, but it's not connected to the agent.
Agent does something dumb despite having the right information? → Cognition. Your judgement criteria or guardrails aren't explicit enough.
Agent can't complete a task end to end? → Agency. It lacks the tools or the permission to chain actions together.
Agent only works when you poke it? → Agency. You haven't given it autonomy - no schedule, no heartbeat, no initiative.
Agent does the right thing but you don't trust it? → Interface. You need better observability - the ability to see why it made a decision, not just the result.

This diagnostic lens is what turns the framework from a thinking tool into a building tool. When something feels off, don't start from scratch. Ask: which dimension is broken?

If you're building the agent, you're the CEO

The models are smart enough. That's not the constraint anymore. The constraint is whether we can design systems that turn that intelligence into dependable work: grounded in reality, guided by values, bounded by rules, able to act, and easy to supervise when it matters. Once you see it, you can't unsee it: the person building the agent isn't selecting a model, they're founding a tiny company. And most agent failures are just company failures in disguise.