Building an AI tutor with Gemini on Google Cloud: our architecture

MiaCortex is an AI-powered Socratic tutor for Python data structures and algorithms. From day one we optimized for three constraints: low infrastructure cost, fast response time from the model, and zero-install experience for students on any device.

The request path

A student types a question into the tutor chat. The browser sends the message, along with recent context (the current problem, their code, their attempt history), to our Next.js server running on Cloud Run. The server constructs a prompt for Google Gemini that includes:

A system instruction describing the Socratic persona and guardrails ("never give the solution directly").
The problem statement and test cases.
The student's latest code.
Recent dialogue turns.

Gemini returns a response. Our server streams tokens back to the browser, which renders them in the chat UI as they arrive. End-to-end p95 latency to first token is well under a second.

Why Pyodide for execution

Every student who writes code wants to run it immediately. The naive architecture is to send code to a sandboxed execution service; the sandbox service is the most expensive and most fragile component in any ed-tech stack.

Pyodide runs CPython compiled to WebAssembly directly in the student's browser. This moves the cost, latency, and security perimeter to where it belongs — the student's device. Our server never executes untrusted code. We pay no compute cost per run. Students get instant feedback with no round-trip.

Visualization

For algorithm visualization, we generate Mermaid diagrams server-side using Gemini, then render them in the browser. This works surprisingly well for trees, graphs, and state machines. For list/array animations we use lightweight custom SVG animations.

Persistence

Progress tracking, saved problems, and auth go through Prisma against a managed SQL database. The schema is intentionally small — users, sessions, progress, saved problems — because we've found that keeping the schema narrow makes the product easier to reason about.

What we'd change

If we were starting today, we would put an event stream in front of the database from day one. Learning analytics is an append-only workload, and forcing it through a relational schema slows down later analysis. A BigQuery pipeline from Cloud Run via Pub/Sub is the target architecture.