https://chatgpt.com/share/69cee5a1-d868-8390-86cd-27056d56e1ca
https://osf.io/hj8kd/files/osfstorage/69cee9a7029a034cd24a10c7
From Agents to Coordination Cells: A Practical Agent/Skill Framework for Episode-Driven AI Systems
Subtitle: Artifact Contracts, Deficit-Led Wake-Up, Semantic Bosons, and Dual-Ledger Control
0. Reader Contract and Scope
0.1 Who this article is for
This article is written for AI engineers who already know the practical reality of building agent systems: once a workflow grows beyond a single prompt, the system starts to feel harder to reason about than it should. You may have multiple tools, a planner, a verifier, a retrieval stage, a few specialist prompts, maybe a critic, maybe a memory layer, and yet the whole system still behaves like a pile of half-visible heuristics. The goal here is to offer a cleaner runtime language for that situation. It is aimed at engineers who want systems that are more modular, more inspectable, and more stable than “just add another agent” architectures. The starting point is not metaphysics but runtime design: coordination episodes, skill cells, artifact contracts, deficit-led wake-up, and explicit state accounting.
0.2 What this framework is and is not
This framework is a proposal for how to organize advanced Agent/Skill systems. It is not a claim that we already possess a final theory of AGI. It is not a claim about consciousness. It is not a claim that every useful runtime must literally contain “Bosons” or “semantic fields” as ontological objects. Instead, it is an engineering proposal: use a better unit of decomposition, use a better clock, and use a better state ledger. The decomposition unit is the skill cell. The clock is the coordination episode. The ledger is a dual runtime ledger that tracks maintained structure, active drive, health gap, structural work, and environmental drift. The article therefore stays at the level of operational architecture and measurable control, not speculative ontology.
0.3 Why the article starts from zero
A common failure mode in technical writing about agent systems is that the vocabulary is introduced too late. People say “agent,” “router,” “memory,” “tool policy,” or “planner” as if those units were already natural. In practice they are often not. Different systems use the same word for different things, and different words for the same thing. So this article starts from zero on purpose. It first asks a simpler question: what is the smallest useful runtime unit, what is the natural time variable for coordination, and what should count as the state of the system? Only after those questions are fixed does it become meaningful to talk about agents, skills, wake-up logic, or stability control. The article therefore builds the framework from the bottom up rather than from existing product labels.
0.4 The three claims of the paper
The framework rests on three claims.
First, capability should be decomposed into skill cells defined by bounded transformation responsibility rather than vague persona labels. A “research agent” or “debugging agent” is too broad to be the atomic runtime unit. What matters is a repeatable local transformation under a clear contract. The proposed skill-cell schema therefore centers on regime scope, phase role, input artifact contract, output artifact contract, wake mode, deficit conditions, Boson emission/reception, and failure states.
Second, higher-order reasoning should be indexed not primarily by token count or wall-clock time, but by coordination episodes. A coordination episode is a variable-duration semantic unit that begins when a meaningful trigger activates local processes and ends when a stable, transferable output is formed. This is a better clock for advanced agent systems because equal numbers of tokens do not correspond to equal amounts of semantic progress, while completed bounded closures often do.
Third, serious agent runtimes need not only orchestration but also accounting. The runtime should declare a maintained structure, a drive that is trying to shape that structure, a health gap between the two, a notion of inertia or mass, a notion of structural work, and an explicit environmental baseline. The dual-ledger view gives exactly that language: body as maintained structure s, soul as drive λ, health as gap G, mass as curvature-derived inertia, work as W_s, and environment as declared baseline q with declared features φ.
These three claims can be compressed into the following backbone:
x_(n+1) = F(x_n) (0.1)
S_(k+1) = G(S_k, Π_k, Ω_k) (0.2)
ΔW_s(k) = λ_k · (s_k − s_(k−1)) (0.3)
Equation (0.1) is the ordinary micro-step picture.
Equation (0.2) is the coordination-episode picture.
Equation (0.3) is the per-episode accounting picture.
0.5 The core equations and notation
To keep the article coherent, we fix a minimal notation at the start.
Let n denote a low-level computational step index and k denote a coordination-episode index. Let S_k denote the effective runtime state before episode k. Let Π_k denote the coordination program assembled during that episode: the active skill cells, routes, constraints, and policies. Let Ω_k denote observations encountered during the episode: retrieved evidence, tool outputs, memory fragments, or environment signals. This gives the episode-time update:
S_(k+1) = G(S_k, Π_k, Ω_k) (0.4)
At the control layer, let s_k denote the maintained runtime structure after episode k, λ_k the active coordination drive during that episode, q the declared baseline environment, and φ the declared feature map that specifies what counts as structure. Following the dual-ledger formulation:
System = (X, μ, q, φ) (0.5)
s(λ) = E_(p_λ)[φ(X)] (0.6)
G(λ,s) = Φ(s) + ψ(λ) − λ·s ≥ 0 (0.7)
W_s = ∫ λ · ds (0.8)
The interpretation is deliberately practical. s is not “the soul of the machine.” It is the maintained runtime structure. λ is not mystical intention. It is the active coordination pressure or drive. G is the measurable misalignment between active drive and maintained structure. W_s is the structural work performed while changing that structure.
The rest of the article explains why these abstractions are worth using in agent engineering.
1. Why Today’s Agent Stacks Still Feel Ad Hoc
1.1 The hidden cost of “just add another agent”
When an AI workflow fails, one common response is to add another specialized agent. Add a critic. Add a planner. Add a verifier. Add a summarizer. Add a retrieval judge. Add a tool selector. In the short term this often works. In the long term it creates a system whose behavior is increasingly hard to inspect. The surface vocabulary improves faster than the runtime semantics. One ends up with more role names, more prompts, and more edges in the orchestration graph, but not necessarily a better account of why the system moved from one local state to another.
The hidden cost is not just complexity. It is loss of runtime legibility. When a system produces a good answer, it becomes difficult to say which bounded process actually produced the crucial closure. When it fails, it becomes difficult to say whether the failure was wrong routing, missing artifact production, unresolved contradiction, unstable local closure, drift in the operating environment, or simple over-triggering. The result is “agent theater”: a system that appears architecturally rich but remains operationally blurry.
1.2 Why role names are too vague
Human naming convenience is not the same thing as runtime factorization. A label like “Research Agent” may hide many fundamentally different sub-capabilities:
Those are not one capability. They are several different transformations, each with different entry conditions, exit conditions, and failure modes. The same is true for labels like “Debugging Agent,” “Writer Agent,” or “Planner Agent.” These names are useful at the product or team level, but they are too coarse to serve as the atomic units of runtime coordination.
The new framework therefore treats a role name as, at most, a coordination shell. The true reusable unit is the skill cell: a bounded transformation whose inputs, outputs, triggers, and failures are explicit. This follows the design direction in the skill-cell and semantic-Boson material, which argues that skills should be decomposed by recurrently stable factorization under regime constraints rather than by naming convenience or arbitrary tool grouping.
1.3 Why prompt-only routing becomes brittle
Most current agent systems route by some mixture of semantic similarity, handcrafted rules, or an LLM-based planner that decides what should happen next. All of these are useful, but they often miss the most important question: what is missing right now?
Pure relevance is not enough. A skill can be relevant yet unnecessary. Another skill can be only moderately relevant in semantic space yet absolutely necessary because the current episode cannot advance without the artifact it produces. The semantic-Boson framework makes exactly this point: many current systems look only at relevance and ignore deficit, while wake-up should depend heavily on what the current episode still lacks. Missing required artifacts, high contradiction residue, unresolved uncertainty, blocked phase advancement, and unmet export conditions are stronger wake signals than topical similarity alone.
This distinction matters because routing failure usually happens in one of two forms:
wake_too_early(skill_i) (1.1)
wake_too_late(skill_j) (1.2)
In the first case a skill is triggered because it is semantically nearby, even though the episode is not ready for it. In the second case a skill is not triggered because the system fails to represent deficit pressure explicitly. Relevance-only routing therefore creates both noise and blindness.
1.4 Why chat history is a poor state model
Many production systems implicitly treat chat history as the main state of the runtime. That is convenient, but weak. A long message log mixes together several different categories of information:
This is a poor substitute for an explicit runtime state. A history is a record of what happened. A state should say what is currently maintained, what is still unresolved, what phase the system is in, what artifacts are available, and what drives or pressures are active.
The coordination-episode view improves this by defining advancement in terms of bounded semantic closures rather than raw textual continuation. The dual-ledger view improves it further by making the maintained structure explicit as s, the drive explicit as λ, and the environment explicit as q with declared features φ. Together they imply that a serious runtime should not rely on history alone. It should maintain a structured state object, even if that object is only approximate.
1.5 The missing unit of coordination
All the above problems point to one deeper absence: the missing unit of coordination. If the runtime unit is too large, everything becomes a vague “agent.” If the time unit is too small, everything becomes token churn. If the state unit is too loose, everything becomes message history. The framework proposed here replaces these three weak defaults with three stronger units:
skill cell instead of vague role (1.3)
coordination episode instead of token count (1.4)
maintained structure instead of raw history (1.5)
This is the central reason the article starts by rebuilding vocabulary. Without the correct units, even good tools end up inside an ad hoc operating model.
2. The Core Shift: From Roles to Skill Cells