Open-source AI-native runtime

The right model, the right tool, the right level of validation — for every task.

OFFICINA is AI-native infrastructure for cost-aware, multi-model operational work — for developers and operating teams alike. One chat interface routes each task across local and cloud models by cost and risk, orchestrates your tools and systems, preserves operational context, and holds high-risk actions for human approval.

Open-source Multi-model Cost-aware Human-in-the-loop
officina Illustrative preview
Review this week's open invoices and draft follow-ups.
Classified routine + external action
Context 3 invoices · vendor terms
Model gemini-flash · low-cost$0.002
Tool ERP read · 3 records
Validated drafts checked
3 follow-up drafts ready. Sending touches an external system, so it needs your sign-off.
Awaiting human approval Approve & sendEdit
Cost-awareUse premium models only where consequence or complexity demands it.
OperationalKeep the minimum stabilized context needed to reconstruct work.
Human-controlledIrreversible and high-risk actions stay behind explicit approval.
No lock-inRuns across local, open-source, low-cost, and premium models.
The problem

AI is powerful — but the way we work with it is fragmented.

Whether you are shipping code or running a business, useful AI work today is spread across chats, local and premium models, agents, vector search, databases, CRMs, inboxes, and scattered notes. The result is the same everywhere: expensive, fragile, and hard to reconstruct from one day to the next.

Uncontrolled model cost

Premium models get used for routine work a cheaper or local model could have handled safely — across a team, that spend adds up fast and quietly.

Lost operational context

Decisions, constraints, and state disappear across sessions, tools, repos, and providers. Every new session re-explains what was already settled.

Unvalidated output as truth

Exploratory or weakly checked AI output slips into systems of record — and into actions that touch real customers, money, or code — with nothing in the way.

The solution

An operational runtime — not just a chatbot.

OFFICINA turns one conversational interface into an operational layer. You ask for work in plain language; the runtime classifies the task, pulls the right context, picks a model by cost and risk, calls tools when needed, validates the result, and asks for approval before anything irreversible.

The thesis is simple: chat becomes the place where real work gets executed across connected systems — with cost control, validation, operational memory, and multi-model escalation built in.

Cost-aware routing

Each task is matched to the cheapest model that can do it safely; premium only when it earns its cost.

Universal chat interface

One conversational surface that reaches into the tools and systems you connect — not ten consoles.

Operational ledger

Only stabilized state is kept — the minimum needed to reconstruct work across sessions.

Human-in-the-loop

Irreversible, external, or high-impact actions pause for an explicit human decision.

KISS — one surface, every tool
CRM console ERP screens DB client Email app Repo UI Admin panels
collapse into
One officina chatEvery scenario and tool driven from one conversational surface — nothing specialized to build, deploy, or operate.
The core

One graph: nodes and edges.

This is the heart of OFFICINA. Every decision, document, conversation, and configuration value is stored as a node in one graph — which gives the system two properties most tools never get for free: universal capture (anything can be recorded the same way) and universal retrieval (anything can be found the same way).

Unlike a traditional relational schema, where relationships are defined across separate domain-specific tables through foreign keys and joins, OFFICINA uses a graph structure in which entities are stored as nodes and their relationships are expressed directly as typed edges. In other words, the system does not depend on table-to-table structure to understand how data is connected; the connections are represented from within the graph itself.

Because the graph carries those two advantages, OFFICINA needs no specialized tables or bespoke queries per data type — one mechanism registers and retrieves everything, and that is exactly what keeps cost-aware routing, context reconstruction, and the operational ledger simple enough to actually work.

Two relationship types are enough to represent how real work connects: belonging — this task is part of that project — and dependency — this draft depends on that contract. Enough to stay useful without the graph becoming its own maintenance problem.

PostgreSQL + pgvector

One store for both the graph and the vectors — structured records and semantic search together.

Belonging edges

“Part of” relationships — this task belongs to that project, this note to that contract.

Dependency edges

“Depends on” relationships — this draft depends on that decision, this answer on that source.

KISS — one store, not a dozen tables
users documents sessions configs decisions embeddings bespoke queries
collapse into
nodes + edges + pgvectorOne registration and retrieval mechanism for everything — no specialized tables or bespoke queries to maintain.
Universal capture · universal retrieval
Any input
  • documents & policy
  • decisions & state
  • messages & config
  • code & records
universal capture
nodes + edgesPostgreSQL · pgvector

One store. No specialized tables.

universal retrieval
Any output
  • relevant context
  • semantic search
  • work reconstruction
  • graph traversal

Persistent

Identity, configuration, and policy — the rules that hold across every session.

Situational

What's true right now — the active task, the conversation in progress.

Episodic

What happened — past sessions, crystallized into memory the system can recall.

Documental

What you fed it — source documents and policy, parsed and embedded for retrieval.

How it works

From a request to a controlled result.

Every request moves through the same operational path. The model-routing and context layers run today; connector and approval coverage is in active development.

Ask Classify Retrieve Route model Call tools Validate Approve Ledger

Ask in plain language

The operator states the task through a single chat interface — no jumping between consoles.

Classify the task

The runtime reads what kind of work it is and how much it matters: routine, ambiguous, or high-consequence.

Retrieve context

Relevant project state, prior decisions, and documents are pulled from the knowledge graph and retrieval layer.

Select model & provider

It picks the cheapest sufficient model — local, open-source, or low-cost — and reserves premium models for tasks that need them.

Call tools when needed

Where the task requires it, the runtime invokes connected tools and systems through a controlled integration layer.

Validate the output

Results are checked against cost, risk, and consequence before they are treated as usable — not accepted blindly.

Approve if risk is high

Irreversible, external, or high-impact actions pause for explicit human sign-off. Nothing destructive happens on its own.

Write stabilized state

Only decisions worth keeping are persisted to an operational ledger, so the next session reconstructs work instead of restarting it.

Universal chat interface

Connect a system. Operate it from chat.

Instead of logging into ten different tools, the goal is to work from one conversational surface that reaches into the systems you connect. Connector support is in active development; the examples below show the workflows OFFICINA is being built to run.

CRM

"Summarize the accounts at risk this week" — pull customer state and surface what needs attention.

ERP

"Review open invoices and prepare follow-ups" — read records, draft, hold for approval before sending.

Email

"Reply to the supplier using the contract" — draft grounded in the right documents, not guesswork.

Database

"Query sales by product, then analyze" — run the read, return the result and a summary.

Documents / RAG

"Use the internal policy to answer this" — retrieve from your own material and answer from it.

GitHub

"Analyze the issue, change code, open a PR" — read the repo, propose changes, leave merge to a human.

Integration approach — MCP-oriented tool connectors plus provider abstraction, so adding a new system is a connector, not a rewrite.

The operational core

Three things that make AI work reliable.

Cost control, continuity, and controlled actions — the parts most "AI apps" leave out.

Cost-aware model routing

Designed to use the cheapest model that's safe for a task and escalate to premium only when consequence or complexity calls for it. A cascading fallback chain — cloud through to a fully local model — runs today.

Live: fallback chain · Building: cost policy

Operational continuity

An operational ledger keeps only stabilized state — what's needed to reconstruct work — and deliberately forgets transient thinking. Stored as nodes and edges in the same graph, so long-running workflows survive across sessions.

Live: ledger continuity

Tool orchestration

OFFICINA doesn't just call tools — it coordinates permissions, validation, risk, and consequence. Actions on real systems are controlled, not automatic chaos.

Building: permissions & approvals
Governing principles

Simple on purpose — so it adapts.

KISS here isn't minimalism for its own sake. Reducing each problem to one basic mechanism is exactly what lets one small system adapt to many scenarios — without building, deploying, and operating something specialized for each.

One interface, not many menus

A single conversational surface drives every tool and scenario — instead of specialized menus that are hard to develop, deploy, and operate.

One schema, not many tables

A single nodes-and-edges store handles all registration and retrieval — instead of many specialized tables and the bespoke queries that come with them.

One model mechanism, not per-model wiring

One routing mechanism resolves model choice automatically — instead of implementing and selecting each model by hand for every task.

MDI+ — maximum information density, structured

Every node stores the smallest operationally-complete unit of information, in a structured, consistent form. Less noise to process means cheaper inference and far more precise retrieval — density and structure are what make a deliberately simple design efficient at scale.

Use cases

What it's built for.

Built for technical builders and operating teams that need advanced AI workflows without enterprise budgets or vendor lock-in.

AI-assisted software

Read issues, change code, prepare PRs, and review — with project memory across the work, not one prompt at a time.

Small-business ops

Drive CRM, email, and admin workflows from chat, with human approval on anything that leaves the building.

ERP & back-office

Read records, draft follow-ups, and prepare actions for a person to confirm — instead of manual data shuffling.

Document & RAG

Answer from internal policy, contracts, and source material — grounded retrieval, not hallucinated guesses.

Founder & product ops

Keep decisions, context, and state coherent across many small projects without a dedicated ops team.

Cost-controlled adoption

Get advanced AI into a small team's workflow while keeping premium-model spend deliberate and bounded.

Differentiation

Known ideas — composed in a way that is new.

None of the building blocks are exotic. The contribution is the composition: assembling proven ideas into one coherent runtime where each reinforces the others, instead of living in separate tools that never share state.

Knowledge graphs → as the only substrate

Graphs are well understood. OFFICINA makes one nodes-and-edges store the single substrate for memory and live configuration — so remembering and configuring become the same operation.

Model routing → governed by cost and consequence

Routing exists everywhere. Here it is driven by an explicit cost/risk policy, with a fallback chain that ends in a fully local model — resilience and spend control in one mechanism.

Human-in-the-loop → as a structural boundary

Approval gates are common. OFFICINA makes them structural: a hard read/write line between an operational surface and a configuration plane, so high-impact actions cannot slip through.

Information density → as an operating discipline

Summarization is routine. MDI+ turns it into a rule: every node is the smallest operationally-complete, structured unit — cutting inference cost and sharpening retrieval at the same time.

Typical approach
With OFFICINA
A chatbot answers questions
Executes work across connected systems and keeps the state
Isolated agents act without oversight
High-risk and irreversible actions wait for human approval
Single-provider apps lock you to one model
Provider abstraction across local, open-source, low-cost, premium
Many specialized tables and queries
One nodes-and-edges store for all registration and retrieval
Manual workflow across tools loses context
One interface plus a ledger that reconstructs long-running work
Architecture & technology

A small, portable, cloud-ready stack.

Built on proven open components and a provider-abstraction layer, packaged to run anywhere from a single machine to cloud and GPU infrastructure.

Framework map
Interface
Operational surfaceReact — where the work happens (read-only on system state)
Control planewhere the system is configured
Runtime
FastAPI cognitive engineclassify · route · validate · orchestrate · inject context
Core substrate
PostgreSQL — nodes + edges + pgvectoruniversal capture & universal retrieval · one store for memory and live configuration
↑↓
Resources
Modelscascading fallback chain, cloud to local
ToolsMCP-oriented connectors
Agentsepisodic memory — planned

Stack

  • FastAPI — cognitive engine, context injection, routing, API.
  • PostgreSQL + pgvector — structured persistence and vector retrieval in one store.
  • React — operational interface for AI-native work.
  • Ollama — local models for offline and low-cost inference.
  • Cloud model providers — premium and hosted inference via provider abstraction.
  • MCP-oriented integrations — tools, repositories, and systems as connectors.
  • RAG — retrieval over your own documents and policy.

Cascading inference fallback — if a provider is down, rate-limited, or unreachable, the runtime steps down to the next, ending at a model that needs no network at all.

01
Vertex · GeminiPrimary hosted inference
Primary
02
GroqLow-latency alternate
Alternate
03
NVIDIAHosted alternate
Alternate
04
OpenRouterBroad model access
Alternate
05
OllamaLocal — runs with no network
Local
Live now
Document pipeline end to end (upload → parse → chunk → embed → retrieve → chat); single graph holding memory + live config; multi-provider fallback; operational-ledger continuity.
In development
Cost-aware routing policy, tool permissions and human-approval flow, model-initiated retrieval, observability surfaces.
Planned
Connectors for CRM, ERP, email, and external systems; episodic-memory agents; permanent low-cost production deployment.

Portable by design — packaged to run on a single host, in the cloud, or on GPU infrastructure.

Packaged deploy — one bundle, many targets

The whole runtime packages into a single bundle that stands up the same way on a laptop, one server, or cloud and GPU infrastructure. No per-environment rebuild — the same system, many scenarios, which is what makes adoption (and evaluation on a cloud partner) low-friction.

Cloud infrastructure

Where cloud support goes.

OFFICINA is built to run on standard cloud primitives. Credits and infrastructure support directly accelerate the runtime — here's concretely where they'd be used.

What we run on the cloud

  • Managed PostgreSQL + pgvector — the graph and vector store at production scale.
  • GPU inference — evaluating local and open-source models against premium baselines.
  • Container hosting — the FastAPI runtime and React interface, always-on.
  • Object storage — document tiers feeding the RAG pipeline.
  • CDN + edge — serving the operational interface to small teams anywhere.

Why a cloud partner benefits

  • Encourages responsible, deliberate use of premium and cloud AI rather than blanket spend.
  • Drives real evaluation workloads across inference, databases, storage, and GPU.
  • Supports an AI-native developer ecosystem built on open components.
  • Helps small teams adopt AI safely and cost-effectively — a large, underserved segment.
Current stage

Early, open, and actively built.

OFFICINA is in early bootstrap development, with a public open-source repository, a defined architecture and roadmap, operational documentation, and working design around model routing, tool orchestration, and operational continuity.

The next step is to expand the runtime, evaluate model-routing policies, build a practical developer interface, and test workflows across local, open-source, low-cost, premium, cloud, and GPU-accelerated models. We're seeking cloud and startup-program support to accelerate that.

Who's building it

OFFICINA is designed and built by its founders, working session by session with a disciplined, documented method — the public open-source repository is the running record of that work.

Who it's for

Solo developers Technical founders Startups SMBs Software teams Open-source builders
Open-source · AI-native infrastructure

Infrastructure for the next generation of AI-native work.

OFFICINA is being built to help small teams run reliable human-AI systems that remember what matters, forget what shouldn't persist, and escalate only when needed. If you're a cloud or startup-program partner, a builder, or just want to compare notes — get in touch.

PP Consultoria · Panama City, Panamá