Loading the stage…

A 3D chatbot of myself answering portfolio questions — project cards float in whenever the model needs to point at specific work.

Visitors land on a portfolio and don't know where to start. They might want a specific project, my background, or just to poke around — and a static gallery makes them dig. Generic chatbots don't help: they don't actually know my work.

So the chat itself becomes the project. A 3D scene where I'm answering questions from inside the page, with project cards floating into view whenever the model needs to point at something specific. Asking "show me your AR work" returns clickable cards, not paragraphs — the chat doubles as a navigation surface.

Audience: visitors scanning for AR/MR or design engineer work, designers reading process, friends being curious. The design choices below try to land the visitor case without burying the process detail, and to keep the playful side intact for the curious.


Design exploration

The whole design lives in a brainstorming transcript with Claude Code's superpowers:brainstorming skill — one question at a time, with a visual companion that opens HTML mockups in the browser as the conversation evolves.

Two layout sketches I drew first — a stage + screen composition (left), and a head and shoulders bust with project cards floating in a fan above (right):

Two hand drawn layout sketches side by side. Left: a stick figure Yao on a stage with an angled screen behind her, a speech bubble to the side, and an input pill at the bottom. Right: a stick-figure bust with three rectangular cards floating around the head, a speech bubble to the side, and an input pill at the bottom.

The brainstorm worked through the stage version in browser mockups. Streaming text on the left, project card tool results on the right, no chat history accumulating between turns:

Stage + screen composition mockup: streaming text response on the left, project card tool result on the right

Then the empty state — what's on screen before anyone asks. Three options. I picked C (minimal blank, prompts cycling in the input placeholder) for the most cinematic read:

Three empty state options: A welcome chip row above the input, B prompts rendered as 3D cards on screen, C minimal blank with prompts cycling in the input placeholder

Visual references locked the look — pastel podium for the color palette, theatrical stage for neon rim treatment, illustration with stuff flowing from the head for the project cards as thoughts pattern, 3D chat bubble vignette for the chat as 3D object framing:

Pastel podium reference — clean staging vignette with arched circular screen and floating cloud propsTheatrical stage reference — tiered platform with neon rim lighting, speakers, curtained prosceniumIllustration reference — character with creative tools and cards spilling out of the back of her head, color saturated3D chat reference — blue toned 3D scene with a speech bubble, a profile card, and an envelope notification floating around a laptop

Visual iteration

The bust's style is determined upstream by whatever reference image feeds the image-to-3D pipeline — picking the right reference matters more than tuning the 3D model afterward, because each tool's stylistic bias propagates straight through.

I cross-tested several AI image generators on the same brief (cartoon-illustration head and shoulders, friendly designer aesthetic):

Midjourney pass — cartoon-illustration bust, painterly soft shading, warm color paletteGemini Nano Banana 2 pass — cleaner cell-shaded look, flatter color fieldsChatGPT image pass — more photoreal, warmer skin tones, less stylizedImage-gen iteration — early pass with a more cartoonish facial structureImage-gen iteration — later pass with sharper edge work and a more stylized readImage-gen iteration — softer hoodie + headband look, single bust on a saturated background

Midjourney leaned painterly. Gemini's Nano Banana 2 leaned cleaner cell-shaded. ChatGPT leaned the most photoreal. The Nano Banana 2 result was the closest to the toon-shaded card aesthetic I'd already locked for the project cards, so its output became the reference for the 3D step that follows.

Image-gen tools are stylistically opinionated even when the prompt is identical. The choice of upstream tool is a design decision, not a tooling decision — Midjourney's bust would have shifted the whole chat's vibe toward painterly illustration; ChatGPT's would have nudged it toward semi-realistic. Picking Nano Banana 2 was about coherence with the rest of the page, not "which one looks best."

Prototyping

Built the stage version first. Fullbody Yao on a tiered neon rim stage, presenting a curved screen behind her:

It worked, but felt wrong. The fullbody figure was so small at any sane camera distance that you couldn't read her face — and a chat is mostly about reading the face of the person talking. The neon rim and curtained stage were doing a lot of theatrical lifting for an interaction that's actually quiet (a one-line question, a one-paragraph answer).

So I tried the second sketch — closer camera, head and shoulders bust at conversational distance instead of full body across a stage. Fed the chosen reference (Nano Banana 2, see Visual iteration above) into Tripo AI 3D's image-to-3D mode, then exported to Blender for cleanup — texture paint for the cartoon-illustration look that matches the cards, plus a topology pass so the rig stays clean.

Blender screenshot: texture paint view of the bust on the left, the rigged 3D viewport on the right showing the head and shoulders model in a hoodie and headband — generated with Tripo AI 3D, refined in Blender

I generated multiple bust variants from different references during this step. Try the picker chips at the top-left of the chat above to swap between them live — drei's GLB cache keys by URL so all three stay warm after first load and the swap is instant.

Everything else collapsed into place from there. Pastel sunrise gradient instead of the dark neon stage. Toon-shaded materials with cell-banded lighting and a chunky inverted-hull outline so the bust reads against the colorful backdrop. Project cards animate out from the bust's position when a tool result arrives — they look like thoughts being pulled out of her head. Status-driven pose modulation (lean in on loading, sway on streaming, slump on rate-limit) so the bust isn't just sitting there.

Final design: head and shoulders bust against a pastel sunrise gradient, with a glassmorphic input pill at the bottom and a speech bubble next to the head

Responsive states

Each state has a deliberate UI mapping so the visitor always knows what's happening:

State (trigger)InputWhat Yao + the scene do
empty (page load)enabled, placeholder cyclesgentle bob, greeting in the bubble
loading (user submits)disabledleans forward, faster bob; "Hmm, let me think…" in the bubble
streaming (first token arrives)disabledhead sways; bubble fades; text or cards animate out from the bust
ready (response complete)re-enabledback to idle
error (upstream throws)enabledmild downward tilt; error message in the bubble
rate_limited (API returns 429)disabled until timerslumps down, animation slows; live countdown in the bubble; scene dims
refused (safety/moderation block)enabledmild tilt; polite refusal in the bubble

A few decisions worth calling out:

  • Disable input during loading/streaming. Prevents the visitor from queuing a new question over an in flight one — two answers stomping each other on the screen would be confusing. The "Ask" button greys to 20% opacity so the disabled state is unmistakable, not just unresponsive.
  • Status lives on the bust, not just in the input. Pose modulation (lean in on loading, sway on streaming, slump on rate_limited) communicates system state through the character. Amplitudes are deliberately small — reads as personality, not loading spinner desperation.
  • Cards animate out from where Yao is standing. They look like thoughts being pulled out of her head. Without that motion they'd just appear, feeling disconnected from the speaker.
  • Rate limit countdown ticks every second. The bubble updates live ("back in 4 min 32 sec"). The visitor sees the timer move, knows the system isn't broken, knows when to come back.
  • Failed states never trap the user. error and refused keep the input enabled — visitor can immediately retry or ask something else. Only loading/streaming/rate_limited actually lock interaction.
  • Status transitions are tweened, not snapped. Pose changes lerp at ~250ms half life. Phase integrated bob/sway means switching states never produces a visual jump even mid cycle.

Under the hood

Pipeline

Browser
3D scene + input
/api/chat
Vercel Function
Tools
CARDS · projects · BIO
Google AI Studio
Gemini 2.5 Flash-Lite

Browser POSTs to /api/chat (a Vercel Function). The function calls Google AI Studio's Gemini 2.5 Flash-Lite via the Vercel AI SDK. When the model decides to call a tool, the function runs it locally against my own portfolio data and feeds the result back to the model — the request can loop tools↔model up to twice before the final answer streams back to the browser as Server-Sent Events. The 3D scene watches the event stream and updates accordingly.

System prompt

You are Yao, a designer engineer answering portfolio visitors in first person —
direct, specific, slightly dry, occasionally self deprecating. Don't oversell.
Talk like a thoughtful colleague.

For project questions, ALWAYS use a tool — don't invent project names or facts.
For background / approach / philosophy questions, answer directly from the
"About Yao" facts below; only call an about tool when the visitor wants depth
beyond what's there. (...)

The prompt does voice + ground rules + the load bearing "About Yao" facts that get asked the most often (bio, education, design+tech grounding, AI tooling thesis). Keeping those facts inline means most identity questions resolve without any tool call — saving a round trip's worth of latency.

Tools

Project lookup (run time data) and about Yao depth (when the system prompt facts aren't enough), all pure functions running against existing portfolio data:

ToolPurpose
listProjects(category?)List my projects, optionally filtered to AR/MR / WEB / INSTALLATION / SELECTEDWORK.
getProject(slug)Full detail for one project — role, collaborator, year, platform, etc.
searchProjects(query)Substring search, top 5.
getBio / getSnapShipped / getSnapMCP / getPreviousWork / getAIPractice / getRecognitionTopic split bio depth. The model picks the smallest one.

When a tool returns project summaries, the chat scene renders them as clickable 3D cards floating around me — not text. They animate out from where I'm standing, and click-through opens the project page in a new tab so the chat survives the navigation.

What I learned

The build surfaced a few things that didn't fit my original mental model:

  • First token latency mattered way more than I expected. I tuned token throughput first, then realized nobody actually notices that — what they notice is the gap between hitting Send and seeing the first character. Cut that from ~6s to ~1.5s by trimming the about context bundle (5k input → ~1k), splitting one mega getAbout tool into 6 focused slices, then inlining the most asked facts into the system prompt so identity questions answer with zero tool calls. Capped MAX_TOOL_ITERATIONS at 2 so the model can't chain three tools "to be thorough" before answering.
  • The chat ended up working as a navigation surface. I designed it to answer questions, but in testing I kept catching myself using it to find projects ("show me your AR work" → cards → click). The clickable 3D cards pattern turned out to carry more weight than the conversational answers.
  • Cartoon style fixed the "AI uncanny" problem. First prototype was photorealish — felt creepy when the figure spoke. Toon shaded bust with a chunky outline reads as illustration, which gave permission for the bust to be expressive (lean, sway, slump) without falling into uncanny valley territory.
  • Trying to cache against a model that doesn't support it cost me half a day. Implemented Gemini context caching with Gemma 4 before discovering Gemma doesn't support the endpoint (despite the pricing page suggesting otherwise — confirmed by 404 from the API). The fix was to switch to Gemini 2.5 Flash Lite, which does support context caching, and to pair it with a smaller context bundle, finer tools, and a tighter iteration cap.
  • Tying status to character behavior felt more readable than a spinner. When Yao leans in on loading and slumps on rate_limited, system state shows up as personality. Felt more honest than a generic spinner, and didn't need a separate UI surface for "what's happening right now."
  • First-time interaction needed explicit cues. A pulsing input pill plus two example chips above the input transformed the empty state from "what is this thing?" into "type or tap." Both auto-dismiss after first interaction so they don't litter the steady state.
  • The model picker IS the visual-iteration narrative made interactive. This article describes cross-testing image-gen tools and the 3D variants downstream — the picker chips at the top-left of the chat let a reader live-swap between three Tripo-generated busts, so the iteration argument is something they experience, not just read.

What I'd change

Things I deliberately didn't ship, gaps I'd close, and stretch goals if this gets traction:

  • No history scrollback. Each new question replaces the screen.
  • Abuse protection is the platform's default. No app level rate limit. AI Studio's 429 surfaces as "Yao's taking a break — back in 5 min" and the input disables until the timer's up.
  • Accessibility is partial. Keyboard nav works for the input and prefers-reduced-motion now disables idle bust animations + collapses the fullscreen-toggle blackout to instant — but the 3D card hover/click is still mouse-first. A keyboard-traversable card list (arrow keys to focus, Enter to open) is on the list.
  • Mobile bundle size is the biggest open trade off. Three.js + R3F + the 4.6MB GLB load isn't trivial on a phone. The 3D scene's loader chunk is dynamically imported so the page shell ships without it, but once the chunk evaluates the GLB starts fetching immediately. A lighter weight 2D fallback for low power devices is on the list.
  • A "show me your work in [year]" tool would be easy to add since frontmatter.date is already in the model.
  • Streaming the project card layout server side so cards can render before the full tool result arrives.