Ambient context is the binding constraint

Intuit Expert Platform is where tax and bookkeeping experts help customers in live sessions. After ChatGPT launched, we started shipping AI features to support expert workflow — starting with Q&A and meeting summary.
Summary turned out to be more than a convenience feature. The live conversation — what the customer clarified, what the expert tried, what changed in the moment — was the richest context the platform had. We called that ambient context.
This case study is about one specific application: using ambient context to preserve continuity when a human handoff happens. But the same layer also made AI answers more specific, and eventually became a shared service other product teams could build on.
I designed the handoff system that made ambient context portable at the moment it was needed most.
Ambient context is what turns a capable model into a useful one.
The platform already had customer data — structured history, stored and queryable.
What it did not have was the session itself. What the customer clarified, what the expert had already tried, what changed in the moment. I called that ambient context.
Without it, AI could reason from history but not from the live situation. When a lead stepped in, they were starting from scratch — an expert on average spent 6 minutes re-explaining a case the system had already witnessed.
The design problem was clear: make ambient context portable.
Diagram
The missing
layer
Ambient context
The live conversation unfolding in real time
Customer data
Structured history already stored in the platform
Making ambient context portable
Previously, I had launched meeting summary on the platform. From the start of every call, AI transcribed the session and assembled ambient context in the background — the conversation was no longer ephemeral. It was being preserved, continuously, in case it was needed.
To make that visible to the expert, the platform surfaced a single cue: Intuit Assist is taking notes. A quiet signal that the system was listening and preparing.

Capturing context was only half the problem. At the moment of handoff, ambient context had to be distilled.
/lead is a perfect summary trigger — the expert's gesture carries clear intent. We tuned the prompt to be a targeted distillation of why this case needed a human.
I designed a handoff package that carried the live conversation forward. The package combined:
At launch, the model was newer, context windows were smaller, and our prompts less refined. Hallucination was a real concern. So the package included an edit option: a deliberate human-in-the-loop moment at the point of highest risk, giving the expert a chance to correct the summary before it reached the lead.
This turned handoff from a reset into a continuation.

Freelance videographer with overlapping 1099-K (~$30K, Stripe) and 1099-NEC ($10K, same client). TIN mismatch — EIN on 1099-K, SSN on 1099-NEC. Customer needs guidance on double-counting and whether a formal explanatory statement is required for audit protection.
Experts can initiate a handoff in two ways. Natural language — "Talk to a lead," "I need help," "Can someone else take this?" — or the slash command /lead, fast, precise, and unambiguous.
Both resolve to the same action.
Two years ago, natural language recognition still had stochastic errors — /lead was the more reliable path. But I wanted both to work. Natural language should be the new interaction model: you say what you mean, the system figures out the rest.


When an expert initiates escalation, they may or may not have already tried AI.
Through observation, we found that relying on a lead expert had become an easy way out — even for cases AI could have handled. Lead experts are an expensive resource, especially during peak season.
So we asked: since AI already has the full context at the trigger point, what if it evaluated the situation before routing to a human? If the issue is simple and solvable, AI gets a chance to answer. If it resolves, the expert reaches resolution faster than any human handoff could. If not, the same context package carries forward to the lead — no extra cost, no lost time.
The system honored that judgment and handed off directly. No second attempt.
The system gave AI one grounded pass with both the customer record and the full conversation — context it had never seen before.
This was not about maximizing automation. It was about avoiding unnecessary handoffs when the system finally had enough to be useful.
Result: customer resolution +15%. Lead handoffs −60%.
The second number mattered most. Many handoffs were not truly necessary. The system had simply never been given enough context to help.
We also discovered an unexpected benefit: this increased AI usage overall. Once experts experienced AI answering their question accurately and fast, they came back to it more than before.
As the model matured and prompts were fine-tuned, hallucination risk decreased. Our DS team ran two eval loops: offline human evaluation per model iteration, and online edit rate as a continuous accuracy proxy. Edit rate hovered around 6% — half were phrasing preference, leaving roughly 3% genuine accuracy failures.
The reading-and-send flow was also expensive. Experts were spending roughly a minute reviewing the summary before approving it, almost always unchanged. Verification had become ceremony.
We moved it out of the critical path. Default to send, review on demand. The experiment: would speed-to-benefit improve without accuracy regressing?
It did. 85% of experts never opened the summary. Handoff time dropped by a minute, and resolution held steady.
The trade-off, stated honestly: in ~3% of cases, an imperfect summary would reach the lead unedited. The mitigation was structural — one click to the full transcript and customer record, plus the lead could always ask questions.
We shipped the summary collapsed by default, always available, never in the critical path.
The sending expert and the receiving lead were looking at the same package — but doing different cognitive work.
The sending expert had lived through the conversation. They needed low friction and fast action — so the package collapsed by default.
The receiving lead had not been there. They needed rapid orientation: what happened, what was tried, what mattered, and where to go deeper — so the package opened fully on arrival.
Same context layer. Opposite design logic.
Sending expert

Records to keep
On the TIN mismatch
If the 1099-K was issued to her EIN and the 1099-NEC to her SSN, both should flow to the same Schedule C — but confirm this is set up correctly in TurboTax. Mixed TINs on the same Schedule C are a common source of IRS notices.
Does this help?
No, talk to lead
Got it. Let me find you a lead.
Freelance videographer with overlapping 1099-K (~$30K, Stripe) and 1099-NEC ($10K, same client). TIN mismatch — EIN on 1099-K, SSN on 1099-NEC. Customer needs guidance on double-counting and whether a formal explanatory statement is required for audit protection.
Important information about how we use generative AI

Receiving lead expert

You are now connected to:
Asha
CPA 2 Years
Freelance videographer with overlapping 1099-K (~$30K, Stripe) and 1099-NEC ($10K, same client). TIN mismatch — EIN on 1099-K, SSN on 1099-NEC. Customer needs guidance on double-counting and whether a formal explanatory statement is required for audit protection.
Once ambient context worked at the handoff, the same shape kept reappearing. Transfers needed it. Multi-session conversations needed it. Downstream product teams kept asking for the same capability, scoped to their moment.
That pattern argued for treating summary not as a feature but as a service.
I joined Intuit's Global Engineering Day hackathon and built Summary as a Service with a team of engineers in a week — a shared capability other product teams could call into, parameterized by moment: handoff, transfer, session resumption. The handoff was the proof of concept. The service was the consequence.
Ambient context stopped being a concept. It became infrastructure.
The handoff started as a workflow problem. It exposed a system constraint.
AI can only act on the context the platform can actually unify. Three layers had to come together: customer data for history, system knowledge for grounded domain reasoning, and ambient context for what was actually unfolding in the live session. Without all three, the system could answer from the record but not from the situation.
That made the handoff more than a feature. It became a proof point for a broader argument: the quality of the AI experience depends on the platform's ability to capture, preserve, and connect context across time. Summary as a Service made that argument concrete — ambient context stopped being a layer in a diagram and became a primitive other teams consumed.
I used key-moment prototypes and an end-to-end vision film to help senior leadership see this not as infrastructure work, but as product quality. Every system boundary that context could not cross was a limit on what the model could do. Fixing the handoff without fixing the foundation would leave that ceiling in place.
That argument helped drive investment in the underlying data foundation.
AI experience
The response, guidance, and handoff the user experiences
Ambient context
The live session: what was said, tried, and clarified in real time
System knowledge
The domain rules and tools that ground the response
Customer data
The structured records that connect history across systems
Two natural extensions remain.
We observed that some experts waited too long to escalate. We hypothesized that a time-based reminder would improve resolution — and it did, by 10%.
But timing is a blunt signal. A stronger system would combine session duration, issue complexity, and customer sentiment to suggest earlier escalation while keeping the decision with the human.
Once a lead joins, the next challenge is seeing what is still missing. With conversation context and customer data already packaged, AI could suggest likely gaps and next questions to speed orientation — not a script, but a thinking aid.
The limiting factor in high-stakes AI is rarely the model.
It is whether the system can preserve the context well enough for either AI or a human to act with judgment.
That was the problem here — context continuity under transfer of responsibility.