OFFICINA: AI-native operational runtime for cost-aware, multi-model workflows

Cost-awareUse premium models only where consequence or complexity demands it.

OperationalKeep the minimum stabilized context needed to reconstruct work.

Human-controlledIrreversible and high-risk actions stay behind explicit approval.

No lock-inRuns across local, open-source, low-cost, and premium models.

The problem

AI is powerful, but the way we work with it is fragmented.

Whether you are shipping code or running a business, useful AI work today is spread across chats, local and premium models, agents, vector search, databases, CRMs, inboxes, and scattered notes. The result is the same everywhere: expensive, fragile, and hard to reconstruct from one day to the next.

Uncontrolled model cost

Premium models get used for routine work a cheaper or local model could have handled safely, across a team, that spend adds up fast and quietly.

Lost operational context

Decisions, constraints, and state disappear across sessions, tools, repos, and providers. Every new session re-explains what was already settled.

Unvalidated output as truth

Exploratory or weakly checked AI output slips into systems of record, and into actions that touch real customers, money, or code, with nothing in the way.

The solution

An operational runtime, not just a chatbot.

OFFICINA turns one conversational interface into an operational layer. You ask for work in plain language; the runtime classifies the task, pulls the right context, picks a model by cost and risk, calls tools when needed, validates the result, and asks for approval before anything irreversible.

The thesis is simple: chat becomes the place where real work gets executed across connected systems, with cost control, validation, operational memory, and multi-model escalation built in.

Cost-aware routing

Each task is matched to the cheapest model that can do it safely; premium only when it earns its cost.

Universal chat interface

One conversational surface that reaches into the tools and systems you connect, not ten consoles.

Operational ledger

Only stabilized state is kept, the minimum needed to reconstruct work across sessions.

Human-in-the-loop

Irreversible, external, or high-impact actions pause for an explicit human decision.

KISS: one surface, every tool

CRM console ERP screens DB client Email app Repo UI Admin panels

collapse into

One officina chatEvery scenario and tool driven from one conversational surface, nothing specialized to build, deploy, or operate.

The core

One graph: nodes and edges.

This is the heart of OFFICINA. Every decision, document, conversation, and configuration value is stored as a node in one graph, which gives the system two properties most tools never get for free: universal capture (anything can be recorded the same way) and universal retrieval (anything can be found the same way).

Unlike a traditional relational schema, where relationships are defined across separate domain-specific tables through foreign keys and joins, OFFICINA uses a graph structure in which entities are stored as nodes and their relationships are expressed directly as typed edges. In other words, the system does not depend on table-to-table structure to understand how data is connected; the connections are represented from within the graph itself.

Because the graph carries those two advantages, OFFICINA needs no specialized tables or bespoke queries per data type, one mechanism registers and retrieves everything, and that is exactly what keeps cost-aware routing, context reconstruction, and the operational ledger simple enough to actually work.

Two relationship types are enough to represent how real work connects: belonging, this task is part of that project, and dependency, this draft depends on that contract. Enough to stay useful without the graph becoming its own maintenance problem.

PostgreSQL + pgvector

One store for both the graph and the vectors, structured records and semantic search together.

Belonging edges

“Part of” relationships, this task belongs to that project, this note to that contract.

Dependency edges

“Depends on” relationships, this draft depends on that decision, this answer on that source.

KISS: one store, not a dozen tables

users documents sessions configs decisions embeddings bespoke queries

collapse into

nodes + edges + pgvectorOne registration and retrieval mechanism for everything, no specialized tables or bespoke queries to maintain.

Universal capture · universal retrieval

Any input

documents & policy
decisions & state
messages & config
code & records

universal capture→

nodes + edgesPostgreSQL · pgvector

One store. No specialized tables.

universal retrieval→

Any output

relevant context
semantic search
work reconstruction
graph traversal

Persistent

Identity, configuration, and policy, the rules that hold across every session.

Situational

What's true right now, the active task, the conversation in progress.

Episodic

What happened, past sessions, crystallized into memory the system can recall.

Documental

What you fed it, source documents and policy, parsed and embedded for retrieval.

How it works

From a request to a controlled result.

Every request moves through the same operational path. The model-routing and context layers run today; connector and approval coverage is in active development.

Ask→ Classify→ Retrieve→ Route model→ Call tools→ Validate→ Approve→ Ledger

Ask in plain language

The operator states the task through a single chat interface, no jumping between consoles.

Classify the task

The runtime reads what kind of work it is and how much it matters: routine, ambiguous, or high-consequence.

Retrieve context

Relevant project state, prior decisions, and documents are pulled from the knowledge graph and retrieval layer.

Select model & provider

It picks the cheapest sufficient model, local, open-source, or low-cost, and reserves premium models for tasks that need them.

Call tools when needed

Where the task requires it, the runtime invokes connected tools and systems through a controlled integration layer.

Validate the output

Results are checked against cost, risk, and consequence before they are treated as usable, not accepted blindly.

Approve if risk is high

Irreversible, external, or high-impact actions pause for explicit human sign-off. Nothing destructive happens on its own.

Write stabilized state

Only decisions worth keeping are persisted to an operational ledger, so the next session reconstructs work instead of restarting it.

Universal chat interface

Connect a system. Operate it from chat.

Instead of logging into ten different tools, the goal is to work from one conversational surface that reaches into the systems you connect. Connector support is in active development; the examples below show the workflows OFFICINA is being built to run.

CRM

"Summarize the accounts at risk this week", pull customer state and surface what needs attention.

ERP

"Review open invoices and prepare follow-ups", read records, draft, hold for approval before sending.

Email

"Reply to the supplier using the contract", draft grounded in the right documents, not guesswork.

Database

"Query sales by product, then analyze", run the read, return the result and a summary.

Documents / RAG

"Use the internal policy to answer this", retrieve from your own material and answer from it.

GitHub

"Analyze the issue, change code, open a PR", read the repo, propose changes, leave merge to a human.

Integration approach, MCP-oriented tool connectors plus provider abstraction, so adding a new system is a connector, not a rewrite.

The operational core

Three things that make AI work reliable.

Cost control, continuity, and controlled actions, the parts most "AI apps" leave out.

Cost-aware model routing

Designed to use the cheapest model that's safe for a task and escalate to premium only when consequence or complexity calls for it. A cascading fallback chain, cloud through to a fully local model, runs today.

Live: fallback chain · Building: cost policy

Operational continuity

An operational ledger keeps only stabilized state, what's needed to reconstruct work, and deliberately forgets transient thinking. Stored as nodes and edges in the same graph, so long-running workflows survive across sessions.

Live: ledger continuity

Tool orchestration

OFFICINA doesn't just call tools, it coordinates permissions, validation, risk, and consequence. Actions on real systems are controlled, not automatic chaos.

Building: permissions & approvals

Governing principles

Simple on purpose, so it adapts.

KISS here isn't minimalism for its own sake. Reducing each problem to one basic mechanism is exactly what lets one small system adapt to many scenarios, without building, deploying, and operating something specialized for each.

One interface, not many menus

A single conversational surface drives every tool and scenario, instead of specialized menus that are hard to develop, deploy, and operate.

One schema, not many tables

A single nodes-and-edges store handles all registration and retrieval, instead of many specialized tables and the bespoke queries that come with them.

One model mechanism, not per-model wiring

One routing mechanism resolves model choice automatically, instead of implementing and selecting each model by hand for every task.

MDI+, maximum information density, structured

Every node stores the smallest operationally-complete unit of information, in a structured, consistent form. Less noise to process means cheaper inference and far more precise retrieval - density and structure are what make a deliberately simple design efficient at scale.

Use cases

What it's built for.

Built for technical builders and operating teams that need advanced AI workflows without enterprise budgets or vendor lock-in.

AI-assisted software

Read issues, change code, prepare PRs, and review, with project memory across the work, not one prompt at a time.

Small-business ops

Drive CRM, email, and admin workflows from chat, with human approval on anything that leaves the building.

ERP & back-office

Read records, draft follow-ups, and prepare actions for a person to confirm, instead of manual data shuffling.

Document & RAG

Answer from internal policy, contracts, and source material, grounded retrieval, not hallucinated guesses.

Founder & product ops

Keep decisions, context, and state coherent across many small projects without a dedicated ops team.

Cost-controlled adoption

Get advanced AI into a small team's workflow while keeping premium-model spend deliberate and bounded.

Differentiation

Known ideas: composed in a way that is new.

None of the building blocks are exotic. The contribution is the composition: assembling proven ideas into one coherent runtime where each reinforces the others, instead of living in separate tools that never share state.

Knowledge graphs → as the only substrate

Graphs are well understood. OFFICINA makes one nodes-and-edges store the single substrate for memory and live configuration, so remembering and configuring become the same operation.

Model routing → governed by cost and consequence

Routing exists everywhere. Here it is driven by an explicit cost/risk policy, with a fallback chain that ends in a fully local model, resilience and spend control in one mechanism.

Human-in-the-loop → as a structural boundary

Approval gates are common. OFFICINA makes them structural: a hard read/write line between an operational surface and a configuration plane, so high-impact actions cannot slip through.

Information density → as an operating discipline

Summarization is routine. MDI+ turns it into a rule: every node is the smallest operationally-complete, structured unit, cutting inference cost and sharpening retrieval at the same time.

Typical approach

With OFFICINA

A chatbot answers questions

Executes work across connected systems and keeps the state

Isolated agents act without oversight

High-risk and irreversible actions wait for human approval

Single-provider apps lock you to one model

Provider abstraction across local, open-source, low-cost, premium

Many specialized tables and queries

One nodes-and-edges store for all registration and retrieval

Manual workflow across tools loses context

One interface plus a ledger that reconstructs long-running work

Architecture & technology

A small, portable, cloud-ready stack.

Built on proven open components and a provider-abstraction layer, packaged to run anywhere from a single machine to cloud and GPU infrastructure.

Framework map

Interface

Operational surfaceReact, where the work happens (read-only on system state)

Control planewhere the system is configured

↓

Runtime

FastAPI cognitive engineclassify · route · validate · orchestrate · inject context

↓

Core substrate

PostgreSQL — nodes + edges + pgvectoruniversal capture & universal retrieval · one store for memory and live configuration

↑↓

Resources

Modelscascading fallback chain, cloud to local

ToolsMCP-oriented connectors

Agentsepisodic memory — planned

Stack

FastAPI, cognitive engine, context injection, routing, API.
PostgreSQL + pgvector, structured persistence and vector retrieval in one store.
React, operational interface for AI-native work.
Ollama, local models for offline and low-cost inference.
Cloud model providers, premium and hosted inference via provider abstraction.
MCP-oriented integrations, tools, repositories, and systems as connectors.
RAG, retrieval over your own documents and policy.

Cascading inference fallback, if a provider is down, rate-limited, or unreachable, the runtime steps down to the next, ending at a model that needs no network at all.

01

Vertex · GeminiPrimary hosted inference

Primary

02

GroqLow-latency alternate

Alternate

03

NVIDIAHosted alternate

Alternate

04

OpenRouterBroad model access

Alternate

05

OllamaLocal, runs with no network

Local

Live now

Document pipeline end to end (upload → parse → chunk → embed → retrieve → chat); single graph holding memory + live config; multi-provider fallback; operational-ledger continuity.

In development

Cost-aware routing policy, tool permissions and human-approval flow, model-initiated retrieval, observability surfaces.

Planned

Connectors for CRM, ERP, email, and external systems; episodic-memory agents; permanent low-cost production deployment.

Portable by design, packaged to run on a single host, in the cloud, or on GPU infrastructure.

Packaged deploy — one bundle, many targets

The whole runtime packages into a single bundle that stands up the same way on a laptop, one server, or cloud and GPU infrastructure. No per-environment rebuild, the same system, many scenarios, which is what makes adoption (and evaluation on a cloud partner) low-friction.

Cloud infrastructure

Where cloud support goes.

OFFICINA is built to run on standard cloud primitives. Credits and infrastructure support directly accelerate the runtime, here's concretely where they'd be used.

What we run on the cloud

Managed PostgreSQL + pgvector, the graph and vector store at production scale.
GPU inference, evaluating local and open-source models against premium baselines.
Container hosting, the FastAPI runtime and React interface, always-on.
Object storage, document tiers feeding the RAG pipeline.
CDN + edge, serving the operational interface to small teams anywhere.

Why a cloud partner benefits

Encourages responsible, deliberate use of premium and cloud AI rather than blanket spend.
Drives real evaluation workloads across inference, databases, storage, and GPU.
Supports an AI-native developer ecosystem built on open components.
Helps small teams adopt AI safely and cost-effectively, a large, underserved segment.

Current stage

Early, open, and actively built.

OFFICINA is in early bootstrap development, with a public open-source repository, a defined architecture and roadmap, operational documentation, and working design around model routing, tool orchestration, and operational continuity.

The next step is to expand the runtime, evaluate model-routing policies, build a practical developer interface, and test workflows across local, open-source, low-cost, premium, cloud, and GPU-accelerated models. We're seeking cloud and startup-program support to accelerate that.

View demo → Explore the repository →

Who's building it

OFFICINA is designed and built by its founders, working session by session with a disciplined, documented method, the public open-source repository is the running record of that work.

Who it's for

Solo developers Technical founders Startups SMBs Software teams Open-source builders

Open-source · AI-native infrastructure

Infrastructure for the next generation of AI-native work.

OFFICINA is being built to help small teams run reliable human-AI systems that remember what matters, forget what shouldn't persist, and escalate only when needed. If you're a cloud or startup-program partner, a builder, or just want to compare notes, get in touch.

View the repository → officina@ioblu.com

PP Consultoria · Panama City, Panamá

The right model, the right tool, the right level of validation, for every task.

AI is powerful, but the way we work with it is fragmented.

Uncontrolled model cost

Lost operational context

Unvalidated output as truth

An operational runtime, not just a chatbot.

One graph: nodes and edges.

Persistent

Situational

Episodic

Documental

From a request to a controlled result.

Ask in plain language

Classify the task

Retrieve context

Select model & provider

Call tools when needed

Validate the output

Approve if risk is high

Write stabilized state

Connect a system. Operate it from chat.

Three things that make AI work reliable.

Cost-aware model routing

Operational continuity

Tool orchestration

Simple on purpose, so it adapts.

One interface, not many menus

One schema, not many tables

One model mechanism, not per-model wiring

MDI+, maximum information density, structured

What it's built for.

AI-assisted software

Small-business ops

ERP & back-office

Document & RAG

Founder & product ops

Cost-controlled adoption

Known ideas: composed in a way that is new.

A small, portable, cloud-ready stack.

Stack

Packaged deploy — one bundle, many targets

Where cloud support goes.

What we run on the cloud

Why a cloud partner benefits

Early, open, and actively built.

Who's building it

Who it's for

Infrastructure for the next generation of AI-native work.