Human handoff

Ambient context is the binding constraint

My Role:Founding designer, zero to one

Scope:AI Q&A and expert handoff

Launched:Sep 2023 – Aug 2024

Influence:Informed platform investment direction; presented to CTO staff and senior leadership

Overview

Intuit Expert Platform is where tax and bookkeeping experts help customers in live sessions. After ChatGPT launched, we started shipping AI features to support expert workflow — starting with Q&A and meeting summary.

Summary turned out to be more than a convenience feature. The live conversation — what the customer clarified, what the expert tried, what changed in the moment — was the richest context the platform had. We called that ambient context.

This case study is about one specific application: using ambient context to preserve continuity when a human handoff happens. But the same layer also made AI answers more specific, and eventually became a shared service other product teams could build on.

I designed the handoff system that made ambient context portable at the moment it was needed most.

Ambient context is what turns a capable model into a useful one.

The missing layer

The platform already had customer data — structured history, stored and queryable.

What it did not have was the session itself. What the customer clarified, what the expert had already tried, what changed in the moment. I called that ambient context.

Without it, AI could reason from history but not from the live situation. When a lead stepped in, they were starting from scratch — an expert on average spent 6 minutes re-explaining a case the system had already witnessed.

The design problem was clear: make ambient context portable.

Diagram

The missing

layer

Layer 2

Missing

Ambient context

The live conversation unfolding in real time

Layer 1

Customer data

Structured history already stored in the platform

Making ambient context portable

Capturing ambient context

Previously, I had launched meeting summary on the platform. From the start of every call, AI transcribed the session and assembled ambient context in the background — the conversation was no longer ephemeral. It was being preserved, continuously, in case it was needed.

To make that visible to the expert, the platform surfaced a single cue: Intuit Assist is taking notes. A quiet signal that the system was listening and preparing.

Making ambient context portable

Capturing context was only half the problem. At the moment of handoff, ambient context had to be distilled.

/lead is a perfect summary trigger — the expert's gesture carries clear intent. We tuned the prompt to be a targeted distillation of why this case needed a human.

I designed a handoff package that carried the live conversation forward. The package combined:

A concise conversation summary, with access to the full transcript
Key customer data, with access to the full customer record

At launch, the model was newer, context windows were smaller, and our prompts less refined. Hallucination was a real concern. So the package included an edit option: a deliberate human-in-the-loop moment at the point of highest risk, giving the expert a chance to correct the summary before it reached the lead.

This turned handoff from a reset into a continuation.

Support request

Conversation summary

Freelance videographer with overlapping 1099-K (~$30K, Stripe) and 1099-NEC ($10K, same client). TIN mismatch — EIN on 1099-K, SSN on 1099-NEC. Customer needs guidance on double-counting and whether a formal explanatory statement is required for audit protection.

Customer data

NameDavid M.

ProductTurboTax Online · Single filer

IncomeSelf-employed

1099-K~$30K · Stripe/PayPal

1099-NEC$10K · same client · subset of 1099-K

Design decisions that changed the outcome

1. Handoff by language or command

Experts can initiate a handoff in two ways. Natural language — "Talk to a lead," "I need help," "Can someone else take this?" — or the slash command /lead, fast, precise, and unambiguous.

Both resolve to the same action.

Two years ago, natural language recognition still had stochastic errors — /lead was the more reliable path. But I wanted both to work. Natural language should be the new interaction model: you say what you mean, the system figures out the rest.

Natural language

Slash command

2. Let the system triage before handing off

When an expert initiates escalation, they may or may not have already tried AI.

Through observation, we found that relying on a lead expert had become an easy way out — even for cases AI could have handled. Lead experts are an expensive resource, especially during peak season.

So we asked: since AI already has the full context at the trigger point, what if it evaluated the situation before routing to a human? If the issue is simple and solvable, AI gets a chance to answer. If it resolves, the expert reaches resolution faster than any human handoff could. If not, the same context package carries forward to the lead — no extra cost, no lost time.

A. Expert already tried AI

The system honored that judgment and handed off directly. No second attempt.

B. Expert had not tried AI

The system gave AI one grounded pass with both the customer record and the full conversation — context it had never seen before.

This was not about maximizing automation. It was about avoiding unnecessary handoffs when the system finally had enough to be useful.

Result: customer resolution +15%. Lead handoffs −60%.

The second number mattered most. Many handoffs were not truly necessary. The system had simply never been given enough context to help.

We also discovered an unexpected benefit: this increased AI usage overall. Once experts experienced AI answering their question accurately and fast, they came back to it more than before.

3. Remove verification from the critical path

As the model matured and prompts were fine-tuned, hallucination risk decreased. Our DS team ran two eval loops: offline human evaluation per model iteration, and online edit rate as a continuous accuracy proxy. Edit rate hovered around 6% — half were phrasing preference, leaving roughly 3% genuine accuracy failures.

The reading-and-send flow was also expensive. Experts were spending roughly a minute reviewing the summary before approving it, almost always unchanged. Verification had become ceremony.

We moved it out of the critical path. Default to send, review on demand. The experiment: would speed-to-benefit improve without accuracy regressing?

It did. 85% of experts never opened the summary. Handoff time dropped by a minute, and resolution held steady.

The trade-off, stated honestly: in ~3% of cases, an imperfect summary would reach the lead unedited. The mitigation was structural — one click to the full transcript and customer record, plus the lead could always ask questions.

We shipped the summary collapsed by default, always available, never in the critical path.

4. One package, two different readers

The sending expert and the receiving lead were looking at the same package — but doing different cognitive work.

The sending expert had lived through the conversation. They needed low friction and fast action — so the package collapsed by default.

The receiving lead had not been there. They needed rapid orientation: what happened, what was tried, what mattered, and where to go deeper — so the package opened fully on arrival.

Same context layer. Opposite design logic.

Sending expert

Records to keep

Stripe dashboard showing the $10K from that client is inside the $30K total
Invoice confirming the payment was made through Stripe
A brief reconciliation memo tying the 1099-NEC to the 1099-K line item

On the TIN mismatch

If the 1099-K was issued to her EIN and the 1099-NEC to her SSN, both should flow to the same Schedule C — but confirm this is set up correctly in TurboTax. Mixed TINs on the same Schedule C are a common source of IRS notices.

Does this help?

No, talk to lead

Got it. Let me find you a lead.

Support request

Conversation summary

Customer data

NameDavid M.

ProductTurboTax Online · Single filer

IncomeSelf-employed

1099-K~$30K · Stripe/PayPal

1099-NEC$10K · same client · subset of 1099-K

Ask anything

Important information about how we use generative AI

Receiving lead expert

You are now connected to:

Asha

CPA 2 Years

Support request

Conversation summary

Customer data

NameDavid M.

ProductTurboTax Online · Single filer

IncomeSelf-employed

1099-K~$30K · Stripe/PayPal

1099-NEC$10K · same client · subset of 1099-K

Ambient context as a service

Once ambient context worked at the handoff, the same shape kept reappearing. Transfers needed it. Multi-session conversations needed it. Downstream product teams kept asking for the same capability, scoped to their moment.

That pattern argued for treating summary not as a feature but as a service.

I joined Intuit's Global Engineering Day hackathon and built Summary as a Service with a team of engineers in a week — a shared capability other product teams could call into, parameterized by moment: handoff, transfer, session resumption. The handoff was the proof of concept. The service was the consequence.

Ambient context stopped being a concept. It became infrastructure.

What this revealed about AI systems

The handoff started as a workflow problem. It exposed a system constraint.

AI can only act on the context the platform can actually unify. Three layers had to come together: customer data for history, system knowledge for grounded domain reasoning, and ambient context for what was actually unfolding in the live session. Without all three, the system could answer from the record but not from the situation.

That made the handoff more than a feature. It became a proof point for a broader argument: the quality of the AI experience depends on the platform's ability to capture, preserve, and connect context across time. Summary as a Service made that argument concrete — ambient context stopped being a layer in a diagram and became a primitive other teams consumed.

I used key-moment prototypes and an end-to-end vision film to help senior leadership see this not as infrastructure work, but as product quality. Every system boundary that context could not cross was a limit on what the model could do. Fixing the handoff without fixing the foundation would leave that ceiling in place.

That argument helped drive investment in the underlying data foundation.

AI experience

The response, guidance, and handoff the user experiences

Ambient context

The live session: what was said, tried, and clarified in real time

System knowledge

The domain rules and tools that ground the response

Customer data

The structured records that connect history across systems

What I'd build next

Two natural extensions remain.

Smarter handoff signals

We observed that some experts waited too long to escalate. We hypothesized that a time-based reminder would improve resolution — and it did, by 10%.

But timing is a blunt signal. A stronger system would combine session duration, issue complexity, and customer sentiment to suggest earlier escalation while keeping the decision with the human.

AI-suggested questions for the lead

Once a lead joins, the next challenge is seeing what is still missing. With conversation context and customer data already packaged, AI could suggest likely gaps and next questions to speed orientation — not a script, but a thinking aid.

Final thoughts

The limiting factor in high-stakes AI is rarely the model.

It is whether the system can preserve the context well enough for either AI or a human to act with judgment.

That was the problem here — context continuity under transfer of responsibility.

AI-Native Expert Platform

Designing the control surface for continuous human–AI collaboration