01 · Starting Point

We Don't Start from Zero

Every other document in this research series draws on external sources: published tabletop modules, academic beat theory, AI narrative research, improvisation pedagogy. This one is different. EV2090 is a production system. It runs continuously. It generates story arcs every five and a half hours. It executes beats in real time, handles live NPC conversations, processes economy signals, and persists world state across sessions. It is not a proof-of-concept.

That distinction matters enormously. Theoretical research tells us what should work. Production code tells us what actually does · and, just as importantly, what the cost of each architectural decision turns out to be once the system has been running long enough to reveal its own limits. A design that looks elegant in a whiteboard diagram often develops friction points that only emerge under load, under time pressure, or when the problem domain turns out to be slightly larger than the original scope assumed.

The EV2090 narrative engine was built to solve a specific problem for a specific game. It was not designed as a general-purpose narrative platform. Reading its source is therefore not primarily an exercise in critique · it is an exercise in archaeology. We are reading the evidence of real decisions made under real constraints, and extracting from that evidence both the patterns that proved themselves and the limitations that now define the scope of what we need to build differently.

The core files are narrative-engine.ts (the Durable Object managing arc lifecycle and beat execution), narrative/prompts.ts (every LLM prompt), narrative/types.ts (the type system), narrative/validation.ts (output validation), bounty-generator.ts (the separate mission pipeline), and bounty-npc-chat.ts (the interactive character system). The architecture reads as a system that was built in layers, where each layer solved the most urgent problem in front of it at the time.

The most valuable thing a production system teaches you is not what works · it is what the cost of each architectural decision turns out to be once the problem grows larger than you originally imagined.
02 · What It Got Right

The Five-Phase Pipeline

The single most important structural decision in EV2090's narrative engine is the multi-phase pipeline. Rather than generating a complete arc in one call, the system decomposes narrative generation into five distinct cognitive operations, each with its own model configuration, token budget, and temperature setting. This is not a performance optimization. It is a quality architecture.

The insight behind it is that different stages of narrative construction require fundamentally different cognitive modes. Generating a story idea from scratch (high temperature, unconstrained, creative) is a completely different task from auditing that idea for continuity against prior arcs (lower temperature, analytical, comparative). Mixing these two modes into a single prompt produces outputs that are neither particularly creative nor particularly rigorous · the model averages between the two pressures and satisfies neither. Separating them lets each phase do exactly one thing well.

The Five Phases: Concept (Sonnet, temp 0.85, 2000 tokens) → Audit (Sonnet, temp 0.7, 2500 tokens) → Choreography + Temporal Audit (Sonnet, temp 0.8, 5000 tokens) → Per-Beat Writers (Haiku × N, temp 0.8, 400–800 tokens per beat). Each phase has a distinct cognitive role, receives only the context it needs, and produces a validated output that gates the next phase.

The Concept phase at temperature 0.85 is the engine's creative imagination. It receives the richest context · an economy snapshot, the arc history, world state notes, temporal pacing signals, and the recurring character registry · and from that produces a raw story idea: title, theme, premise, characters (three or four, with name, role, faction, location, voice, and motivation), affected planets and commodities, and a world change statement. The high temperature is deliberate. This is where novelty lives. Constraining it too tightly produces repetition; loosening it too far produces incoherence. 0.85 is a tuned value, not an arbitrary one.

The Audit phase at temperature 0.7 is the engine's continuity conscience. It receives the concept output alongside the full arc history and runs a structured review: does this theme repeat something recently resolved? Are the economy references grounded in the actual game state? Are the character stakes specific enough to matter? The slightly lower temperature reflects the analytical nature of the task · the model is not inventing, it is evaluating. This phase functions as an inline quality gate that catches the most common failure mode in LLM narrative generation: thematic repetition that erodes the world's sense of consequence.

The Choreography phase at temperature 0.8 and 5000 tokens is where the arc becomes concrete. It designs ten to fourteen beats distributed across 330 minutes, assigns each beat a delivery channel, establishes causality links between beats (each beat must declare what it reacts to and what it foreshadows), and specifies data injection requirements · which economy fields the beat writer will need at render time. The combined Temporal Audit in this phase checks that the beat sequence makes sense as a timeline: that causes precede effects, that pacing varies appropriately, that the arc has a real beginning, middle, and end rather than a flat sequence of events.

The Per-Beat Writers are Haiku at execution time. Not generation time · execution time. The arc is choreographed in advance, but the beat content is rendered when the beat fires. This means each Haiku call receives live data: current economy values, current world state, the beats that have already fired, and the beat's full rendering specification. The result is content that is narratively coherent with the arc design but economically grounded in the present moment.

The infrastructure choice underpinning all of this is Cloudflare Durable Objects with alarm-based scheduling. The NarrativeEngine Durable Object is a single globally-unique instance with its own persistent SQLite-backed storage. It wakes itself on a five-minute alarm tick to check whether a beat needs to fire, advance the arc timeline, or generate a new arc when the current one expires. The alarm system provides guaranteed at-least-once execution with exponential backoff retry · meaning a beat that fails to render does not silently disappear from the timeline, it retries automatically until it succeeds or exhausts its attempts.

The five-minute tick interval is a considered design choice. It is short enough to maintain narrative momentum · a player checking in regularly will see the world moving · but long enough to absorb the latency of an LLM call without creating pressure to skip phases. It also means the system can handle beat windows gracefully: if a beat was scheduled for minute 47 and the alarm fires at minute 50, the beat is still fired rather than missed. The arc duration of 5h30 (330 minutes) across 10–14 beats produces an average beat spacing of roughly 25–33 minutes, which aligns with the pacing rhythm that human DMs naturally adopt in live sessions.

03 · Channel Design

The Three-Channel Model and Why It Works

EV2090's rendering layer uses three distinct delivery channels, each instantiated as a separate writer: the Feed Writer (station broadcasts, 150–400 characters, institutional voice), the Chat Writer (NPC radio COMMS, 3–6 lines, personal in-character voice), and the Board Writer (cork board notices, 60–180 characters, handwritten feel). These are not stylistic variations of the same prompt. They are genuinely separate writer instances with separate system prompts, separate anti-pattern lists, and separate success criteria.

The quality difference this produces is not subtle. A single general-purpose writer given channel guidance as a parameter produces content that averages the styles · slightly too formal for chat, slightly too personal for broadcast. The cognitive load of maintaining one style while avoiding another creates a kind of prompt interference. Separate instances eliminate that interference entirely. The Feed Writer does not know the Chat Writer exists. It only knows what a station broadcast sounds like and what it must never sound like.

Feed Writer

Station Broadcast

150–400 characters. Institutional, impersonal, declarative. The voice of systems and infrastructure · not people. Anti-patterns: no first-person, no emotional register, no speculation. Success: reads as official communication from an authority that does not care about your feelings.

Chat Writer

NPC Radio COMMS

3–6 lines. Personal, in-character, emotionally present. The voice of a specific human being with a specific situation. Anti-patterns: no omniscience, no clean exposition, no information the NPC would not actually have. Success: you feel like you overheard a real transmission.

Board Writer

Bulletin Notice

60–180 characters. Terse, practical, written by hand under some pressure. Anti-patterns: no complete sentences with subjects and verbs, no formal grammar, no narrator framing. Success: reads like someone tacked this up and walked away.

TTS Constraint

Audio-Aware Writing

ElevenLabs v3 integration imposes a discipline that improves all channels: short sentences, no visual formatting, emotion tags where needed. The constraint of writing for ears rather than eyes produces tighter, more active prose. A design rule adopted for audio quality ends up benefiting text quality as a side effect.

The bounty system's 4-pass generation pipeline extends this logic into audio. After the core bounty narrative is generated by Sonnet (temperature 0.9, 4096 tokens · higher creativity for mission scenarios), a Haiku pass generates atmospheric feed entries for the game world, and then ElevenLabs synthesizes audio briefing and conclusion files that are stored in R2 alongside the structured bounty data. The pipeline treats audio as a first-class delivery channel, not a post-processing step. The TTS-aware writing constraint · short sentences, no nested clauses, emotion tags embedded in the text · is enforced at generation time, not edited in afterward.

This is a meaningful architectural lesson. When a rendering constraint is applied at the point of generation rather than the point of delivery, it improves the underlying content rather than patching its presentation. Writing for ElevenLabs v3's emotion-tag system produces prose that is naturally more rhythmic and expressive, because it forces the writer (in this case, the model) to be explicit about emotional register rather than relying on the reader to infer it.

04 · Prompt Architecture

The Prompt Philosophy We Are Keeping

EV2090's prompt design reflects a set of hard-won principles that are not immediately obvious from first principles. They emerged from running the system, observing failure modes, and correcting them. Each principle addresses a specific class of output degradation. They are worth examining individually.

No Examples in Prompts

The single most counterintuitive rule in the system. Examples are pedagogically effective for human learners. In LLM prompts, they function differently: they become the floor and the ceiling simultaneously. The model anchors on the provided example and produces variations of it rather than generating freely within the defined constraints. The result is a kind of narrative monoculture · outputs that are technically acceptable but structurally identical. Removing examples and replacing them with constraints (what must be present) and anti-patterns (what must never appear) produces outputs with higher variance, which in a story generation system is the same as saying higher quality.

Anti-Pattern Lists Over Style Guides

Every writer prompt in EV2090 contains a "NEVER do this" section. The list is specific and empirically derived: not "avoid clichés" but "never begin a station broadcast with a person's name." Not "write naturally" but "never use the word 'suddenly' in a board notice." This specificity matters. Vague negative guidance ("avoid being too formal") gives the model little actionable constraint. Specific anti-patterns ("never begin a COMMS line with a character explaining their current situation to someone who would already know it") close precise loopholes. The anti-pattern list is a diagnostic artifact · each entry represents a failure mode that was observed in production and corrected.

Dynamic Context Injection

No prompt in the system is static. Every call assembles its system prompt at generation time from current world state, current economy data, the arc history, and the relevant entity registry. This means the model never operates on stale context. It also means the prompts themselves are functions · their output is a property of their input state, not a fixed artifact. This is architecturally sound for a long-running system where the world is continuously changing. Hardcoded system prompts that reference specific game facts become incorrect as the world evolves; dynamic injection ensures that the model's context of "what is true right now" is always accurate.

Causality Enforcement

Every beat in the choreography must declare both a reactsTo value (the beat ID or event it is responding to) and a foreshadows value (the beat or consequence it is setting up). This is not enforced by a validator · it is enforced by the prompt structure itself, which treats these fields as required rather than optional. The practical effect is that the model is forced to think relationally rather than sequentially. It cannot generate a beat that exists in isolation. Every narrative unit must be woven into the causal fabric of the arc. The result is arcs that feel like stories rather than event logs.

Data Injection: Choreography Specifies, Writers Receive

During choreography, the Sonnet model does not generate economy content · it specifies which economy fields each beat will need when it fires. The beat writer (Haiku) then receives those actual live values at execution time. This separation means that narrative structure (planned well in advance) and economic grounding (necessarily current) never conflict. A beat about a commodities price spike is designed as a narrative moment during arc generation, but the specific prices in the rendered content reflect what the market actually says when the beat fires, not what the market said 5 hours earlier when the arc was planned.

05 · Unexpected Depth

The NPC Chat System: The Best Part of the Engine

The most sophisticated component of the EV2090 narrative system is also the least visible: the NPC informant chat handler in bounty-npc-chat.ts. It is sophisticated enough that its architecture is worth examining in detail, because it solves a problem that the broader system does not · the problem of sustained character consistency across a multi-turn dialogue with a live player.

The system uses Haiku at temperature 0.8 with a maximum of 120 tokens per response. The token limit is deliberate: it prevents NPCs from becoming expository, forces responses to stay conversational and in-character, and keeps per-exchange costs low enough that a player can have a full investigation dialogue without the cost model becoming prohibitive. Rate limiting at 10 requests per minute and a 40-exchange cap per session prevents abuse while giving genuine investigators enough room to work.

The memory architecture is a rolling 8-exchange window stored in R2. This is a pragmatic choice that solves a real problem: the context window needed to hold a full conversation history grows unboundedly, but the last 8 exchanges contain virtually all the information an NPC needs to maintain conversational coherence. Earlier exchanges are compressed into the NPC's personality card and fact registry rather than retained verbatim. The rolling window is the difference between a system where NPCs remember the last thing you said and a system where they remember everything, which is both more expensive and not necessarily more coherent.

Each NPC informant chat session has a personality card (trait, voice, faction, knowledge scope, what they want from the player, what they will never reveal), a rolling memory window (last 8 exchanges), a fact extraction pass (after each exchange, key statements are extracted to the world state), and a knowledge boundary (the model is explicitly told what the NPC does not know, not just what they do). This is the correct architecture for character consistency at scale.

The fact extraction pass is the most architecturally interesting element. After each exchange, the system runs a second Haiku call that reads the NPC's response and extracts any factual statements the NPC made · prices mentioned, locations revealed, names dropped · and stores them in the world state. This means NPC chat is not just interactive content; it is world state generation. A player who correctly interrogates an informant changes the state of the world in a small but meaningful way. The engine knows what was revealed, to whom, and when.

What makes this design genuinely impressive is that it was built to solve a bounty system problem · getting players to clues · and ended up producing a character interaction model sophisticated enough to power the broader engine we are now designing. The tragedy is that it is buried in the bounty system. Arc characters in EV2090 cannot be conversed with. The NPC chat infrastructure exists only for bounty informants, who are purpose-built for a single mission and discarded afterward. The generalization of this architecture to all arc characters is one of the most valuable things the Narrative Engine can do.

The best architecture in EV2090 is locked inside the bounty system. The character interaction model that the whole engine should use already exists · it just was never connected to anything larger than a mission.
06 · Diagnostic

The Eight Structural Gaps

A structural gap is not a bug. A bug is an implementation error · something the system was supposed to do but does not. A structural gap is a design boundary · a place where the architecture as conceived cannot do something, not because it was implemented incorrectly, but because it was never designed to. Understanding the root cause of each gap is more important than cataloguing the symptom, because root causes reveal what needs to be rethought at the architectural level rather than patched at the implementation level.

Gap 1: Arcs and Bounties Are Disconnected. Root cause: they were designed as entirely separate Durable Object instances with no shared state, no shared entity registry, and no mechanism for one to signal the other. The NarrativeEngine DO and the BountyRegistry DO are peers, not collaborators. An arc character who witnesses a crime cannot become the target of a bounty. A bounty outcome cannot advance an arc front. The systems that should be the most tightly coupled are completely isolated from each other.
Gap 2: No Player Agency in Narrative. Root cause: the arc timeline is a fixed sequence with no input channels. The NarrativeEngine alarm tick advances the timeline based on elapsed minutes, not on player actions. There is no API surface through which player behavior enters the narrative system. Players observe the story; they cannot modify it. The architecture has no concept of a player decision that changes a beat's outcome, advances a front prematurely, or branches the arc.
Gap 3: Stateless Beat Rendering. Root cause: the Haiku beat writers receive the arc premise and the list of beats that have already fired, but they never receive player state. They do not know whether players engaged with the previous beat, ignored it, or actively responded to it. The rendered content is therefore the same whether ten players are actively investigating the arc or nobody is watching. Engagement-aware rendering · content that acknowledges what the player did, not just what the world did · is architecturally absent.
Gap 4: Limited Bounty Types. Root cause: the bounty schema was designed around the locate_* pattern · find a ship, find a person. The clue structure, the NPC informant model, and the resolution logic all assume that the player's task is discovery rather than action. Mission types that require doing something (intercept a convoy, retrieve cargo, escort a target, win a fight) require fundamentally different schema structures. The current schema cannot represent them, so they were never implemented.
Gap 5: Single System Lock. Root cause: planetary descriptions are hardcoded strings in system-context.ts. The Sol system with four planets is not a configuration · it is a constant. Despite infrastructure for multi-system play being partially built elsewhere in the codebase, the narrative engine defaults to Sol because the context that shapes arc generation is not configurable. The planet names, their economies, their political character · all of it is baked into a file, not into a schema that could be populated with any universe's geography.
Gap 6: No Story Recap. Root cause: world state persistence extracts three to five facts from each completed arc into a persistent_facts store, but these facts are never surfaced to players. They exist to inform the next arc's Concept phase · to give Sonnet continuity context when generating the next story. The mechanism for keeping players who missed beats caught up on the narrative exists nowhere in the system. The facts are there; the rendering layer for those facts is not.
Gap 7: Shallow Economy Integration. Root cause: beat economy actions are one-shot side-effect calls · trigger_disruption, adjust_stock, post_board_note · that fire and are forgotten. They produce transient economic events but not persistent economic mutations. A beat about a trade route collapse does not actually modify the underlying economy model in a durable way that future arcs inherit. The narrative and the economy run in parallel rather than influencing each other bidirectionally.
Gap 8: NPC Chat Is Bounty-Only. Root cause: the chat handler in bounty-npc-chat.ts is instantiated per-bounty. It is a bounty feature, not a narrative feature. Arc characters · the protagonists, antagonists, and witnesses of the main story · are one-shot rendered content. The rich personality card + rolling memory + fact extraction architecture exists, but it has no equivalent at the arc level. The system that could power the most interesting player interactions is wired only to the mission system that generates the most self-contained content.

Reading the gaps together, a pattern emerges. Most of them share the same underlying cause: the system was built as two separate pipelines (arcs and bounties) that share a runtime environment but not an architecture. The decisions that made each pipeline work well in isolation · separate Durable Objects, separate state, separate entity models · are exactly the decisions that prevent them from working together. The Narrative Engine does not need to fix these gaps individually. It needs to address the architectural separation that produced them.

07 · Economics

What EV2090 Teaches Us About Cost

The cost model of the EV2090 pipeline is straightforward in principle: Sonnet calls are expensive and rare (once per 5.5-hour arc, roughly 4–5 calls for the full pipeline), and Haiku calls are cheap and frequent (one per beat, 10–14 per arc). The economics favor this architecture strongly. A full arc generation cycle costs a fraction of what a naive approach · one Sonnet call per beat · would cost, while producing higher quality output because the planning and rendering responsibilities are cleanly separated.

The bounty system adds ElevenLabs synthesis to this model, which introduces a different cost structure: audio generation is priced per character of text, making the length of briefing and conclusion content a real cost driver. This is why the TTS-aware writing constraint · short sentences, no padding · is not merely an aesthetic preference. It is a cost discipline. Every word that does not need to exist in a briefing text costs money to synthesize. The constraint that makes audio sound better also makes it cheaper.

4–5
Sonnet calls per arc
10–14
Haiku renders per arc
5h30
Arc duration
8
Exchange memory window

The claude-client.ts wrapper reveals something important about production reliability. It implements retry logic with exponential backoff and maintains generation attempt counts per phase. The retry pattern is not defensive programming · it is a reflection of observed reality. LLM API calls fail. They time out, return malformed JSON, or produce outputs that fail validation. In a system where a failed arc generation means five hours of silence in the game world, this is not an academic concern. The validation layer in validation.ts is the real cost driver when generation fails: a validation failure at the Choreography phase means the three Sonnet calls that preceded it are sunk cost, and the phase must be regenerated.

This makes validation architecture a cost architecture. The more precisely the validation layer can identify and reject partial failures · keeping the good parts of a generation while re-running only the phase that failed · the lower the cost of coherence recovery. The current system does not have partial recovery; a validation failure at phase 4 restarts from phase 1. This is acceptable at current arc volumes but would become expensive at scale.

The real cost insight from EV2090 is that the unit of cost is not the API call · it is the coherent arc. A system that generates five cheap arcs with high failure and retry rates may cost more than a system that generates three more expensive arcs with near-zero failure rates. Designing for generation quality is also designing for generation cost, because quality reduces the failure modes that require expensive recovery.

08 · Surprises

What We Discovered

The Temperature Gradient Is Intentional Design, Not Tuning

Moving from 0.85 (Concept) to 0.7 (Audit) to 0.8 (Choreography) is not a set of arbitrary tuning choices · it is a model of how narrative construction should feel at each phase. High temperature for the creative leap, lower temperature for the critical review, mid-range for the structural design that must be both inventive and coherent. This gradient is worth formalizing in the Narrative Engine's design specification rather than leaving it as implicit knowledge in a config file.

The 120-Token NPC Response Cap Is Load-Bearing

The decision to cap NPC chat responses at 120 tokens looks like a cost constraint. It is actually a character design constraint. At 120 tokens, an NPC response is approximately 2–4 sentences. That is enough to be in character, convey one piece of information, and create intrigue about the next exchange. At 400 tokens, NPCs become monologues. They over-explain, they break character through sheer volume of words, and they give players everything in one turn rather than rewarding continued engagement. The cap makes the characters better, not just cheaper.

The Fact Extraction Pass Changes What the Engine Is

When player exchanges with NPC informants produce extracted facts that enter the world state, the NPC chat system stops being a content delivery mechanism and becomes a world-building mechanism. The player is not just receiving information · they are generating it, in the sense that their specific investigative choices determine which facts become canonical in the world state. This is a significant capability that the EV2090 system has but does not fully exploit, because the extracted facts are scoped to the bounty rather than promoted to the broader arc and world context.

The Biggest Architecture Lesson Is About Coupling

EV2090 is a story of two systems built in parallel that needed to be one system built from a shared core. The sophistication of the bounty NPC chat system and the sophistication of the arc generation pipeline are roughly comparable · but they share nothing. No entity registry, no world state protocol, no shared causality model. The most expensive thing about the Narrative Engine project is not the LLM costs. It is the cost of designing a unified architecture from the start, rather than paying the refactoring cost of merging two mature systems that were never designed to speak to each other.

09 · Application

How This Shapes the Engine We Build

The value of this analysis is not in cataloguing what EV2090 does wrong. It is in translating what we learned into concrete architectural decisions for the Narrative Engine. Every conclusion below is directly traceable to evidence from the production code.

The 5-phase pipeline structure is proven · we evolve it, not replace it. The Concept → Audit → Choreography → Per-Beat Rendering sequence is validated by production. The Narrative Engine extends this structure (adding a Consequence Processing phase and a Context Assembly phase before Concept) but does not redesign it. The cognitive separation of phases, the temperature gradient, and the Sonnet/Haiku split are all retained.

The three-channel model becomes five delivery channels. Feed, Chat, and Board are proven. The Narrative Engine adds NPC Dialogue (direct in-character conversation, the generalization of the bounty informant model) and Environmental (diegetic documents, intercepted transmissions, found objects). Each channel retains its own writer instance, anti-pattern list, and rendering specification. No general-purpose writer. No channel parameters.

The NPC chat architecture becomes the standard pattern for all NPC interactions. Personality card + rolling memory window + fact extraction is not a bounty feature · it is the foundational NPC interaction model. Every arc character that a player can interact with gets this architecture. The only variation is scope: bounty informants have narrow knowledge, arc characters have arc-scoped knowledge, recurring world characters have world-state knowledge. The mechanism is identical.

Arc and bounty generation share the same pipeline · they are modes, not separate systems. There is one entity registry, one world state, one causality model. Arcs and bounties are both narrative structures that reference entities, fire beats, modify world state, and produce player-facing content. The distinction between them is a matter of scope and duration, not architecture. Building them on the same pipeline eliminates Gap 1, Gap 3, Gap 8, and most of Gap 6 simultaneously.

The world state becomes the only source of truth. Planet descriptions, economy state, entity positions, arc history, player state · all of it lives in the world state, versioned and queryable. Nothing is hardcoded in a context file. Context assembly at the start of each generation pipeline reads from the world state; consequence processing at the end writes to it. The loop is closed.

Validation gates are explicit pipeline stages, not error handlers. The current system treats validation as a try-catch around generation. The Narrative Engine treats validation as a named pipeline phase with defined inputs, outputs, and partial-recovery semantics. A choreography that passes concept validation but fails temporal audit triggers a partial re-run from Phase 3, not a full restart from Phase 1. This reduces both latency and cost for the most common failure modes.

10 · Connections

Connections to Other Research

Research 01 & 02 · D&D + Warhammer

Best Practice Discovered Empirically

EV2090 converged on fixed spine / variable flesh, NPC personality cards, and clue node networks without studying tabletop design — it discovered them by building and failing. That convergence validates the research: EV2090 is not just a codebase to improve; it is independent confirmation that these structural principles are necessary.

Research 03 · Narrative Beat Theory

The Missing Type Signal

The EV2090 choreography phase generates a typed sequence of beats (arc_start, development, climax, resolution) but the type signal never reaches the renderer. Haiku receives the beat brief but not the beat type — so a character beat and a story beat are rendered with the same register. Beat theory names the gap and provides the fix: the type field belongs in the rendering brief.

Research 04 · Improv DM Techniques

The Three Absent Mechanisms

EV2090 has no Fronts, no Three-Clue enforcement, and no passive timeline. The arc fires on a timer; the world does not move. The improv DM research describes exactly these three mechanisms as non-negotiable for narrative coherence. Their absence explains why EV2090 arcs feel like scheduled broadcasts rather than living stories.

Research 05 · Procedural Narrative AI

Failure Modes in Production

The 5 failure modes catalogued in the ACL survey — coherence collapse, character inconsistency, pacing failure, context blindness, agency illusion — map directly to the 8 structural gaps in EV2090. Context blindness is the rendering layer problem. Character inconsistency is the missing personality model. Agency illusion is the absence of real player feedback loops.

The Narrative Engine is not a replacement for what EV2090 built. It is the version that EV2090 was always implying · designed from the start with the scope its best components were already demanding.