Designing AI for Healthcare Where the Cost of Being Wrong Is Too High
How I turned pressure to “have AI” into a trust-first product roadmap built on constraint, verification, and staged autonomy.
Project Snapshot
Impact
Because the three Ask iQueue phases launched at different times, I separated outcomes by maturity instead of blending them into one success story. Metrics Navigation, which launched first, gave us the clearest proof points. Smart Booking and Auto-Assign shipped in March 2026, so I treat them as roadmap and product-design proof points rather than overstated business wins.
80-90%
first-attempt interpretation accuracy
10-15%
required one clarifying follow-up
~0%
hallucination by architecture
60+
metrics accessible via natural language
KEY TAKEAWAY
The outcome I’m most proud of isn't a usage number. It’s that we built a system whose worst likely failure mode was a graceful refusal, not a persuasive hallucination.
“Ask iQueue pulls it so much faster than me clicking through all my locations… It pinpointed exactly what I needed. I didn’t expect it to be that helpful and it was amazing.”
Operations Leader at UC San Francisco
"Ask iQueue was pretty quick and accurate. If I want to know what was our monthly drop in the census, I could see that without having to go and navigate through metrics. I put the thumbs up on every question and answer that I received."
Operations Manager at Stanford
Customer
feedback
My Role
Helped define the phased roadmap for Ask iQueue
Designed the human-AI interaction model
Defined supported vs unsupported behaviors
Designed fallback logic and clarification patterns
Shaped trust, safety, and human-in-the-loop principles
Helped determine what AI should do first, and what it shouldn't do
Team & Timeline
This was a secondary project for design and engineering, completed part-time alongside our core commitments.
1 Product Manager
1 Product Designer (me)
2 Engineers
Q3 2025 & Q1 2026
Scope & Models
Metrics Navigation → Smart Booking → Auto-Assign
Phase 1: 60+ metrics across 150-200 units
Phase 2 & 3: 300-500 units
GPT-4o — metrics navigation
Claude Sonnet — operational tools
LEANTAAS helps hospitals optimize operations, but many workflows still FALL short of helping users complete the joB

Executives could see metrics but not quickly find the right answer.

Schedulers could see the system but not get help finding where to place appointments.

Charge nurses could use staffing tools but still had to perform operational actions manually.
KEY TAKEAWAY
Ask iQueue was my attempt to close workflow gaps with AI — but only in places where usefulness could be earned without sacrificing trust.
The company wanted an AI assistant. I thought that was the wrong starting point
Ask iQueue began during the AI boom, when every SaaS company was under pressure to prove it had an AI story.
Leadership envisioned something broad and impressive — a "ChatGPT for healthcare operations."
Explain why volumes were low.
Recommend how to improve staffing productivity.
Suggest what operational changes to make.
But our platform wasn't built to support open-ended operational reasoning.
And the cost of a persuasive wrong answer was too high:
Incorrect staffing guidance.
Misinterpreted performance metrics.
Erosion of trust in the product.
So I pushed for a different principle: the first AI capability shouldn't be the most ambitious one. It should be the safest useful one.
Something narrow enough to be reliable, observable, and easy for users to verify.
KEY TAKEAWAY
Instead of designing for maximum capability, I designed for calibrated trust: start with the safest useful capability, prove reliability, then expand autonomy only when the product had earned it.
The strategic question wasn’t “What sounds most impressive?” It was “What can users trust first?”
That decision shaped the entire Ask iQueue roadmap — from Metrics Navigation, to Smart Booking, to Auto-Assign.
What leadership wanted:
Broad AI assistant
High apparent value
Open-ended questions
High hallucination risk
What I argued for:
Constrained AI capability
Lower apparent value
Observable, verifiable outcomes
Low hallucination risk
Fastest path to earned trust
General AI assistant
Reasoning engine to generate answers
Intent parser and routing layer
Language models translate user intent into deterministic product actions
I USED RISK, NOT HYPE, TO DECIDE WHAT AI SHOULD DO FIRST
To decide where AI belonged in the product, I created a simple framework: weigh value against hallucination risk.
Some ideas sounded powerful but required open-ended reasoning the platform couldn't support.
Others were narrower, but much safer and more verifiable.
That led to a phased roadmap built around increasing autonomy only when reliability and trust had already been established.
Operational Recommendations
Very high value / High risk
Autonomous
Decision-Making
Extreme value / Critical risk
AI Navigation Assistance
Medium value / Low risk
Recommendation Explanations
High value / Medium risk
Value + Risk + Autonomy
KEY TAKEAWAY
I shaped the roadmap around what AI could do safely enough to deserve user trust — not just what it could do in theory.
Ask iQueue expanded from read-only assistance to bounded recommendation and then bounded action — always with a human in control
READ
AI Metrics Navigation
AI retrieves the right information and user verifies.
RECOMMEND
Smart Booking
AI surfaces scheduling options and user decides.
ACT
Auto-Assign
AI performs bounded actions in draft mode and user reviews before publishing.
phase 1: START WHERE HALLUCINATIONS CAN'T HURT PEOPLE
Leadership initially thought Metrics Navigation was too small to matter. I thought it was the smartest possible starting point.
I deliberately chose a capability where correctness was visible. Users could see exactly where the AI took them and adjust anything manually.
“What were my volumes last year in LT Cancer Center?”
Example user question
“Working on finding the best response. This may take a minute.”
System
“Let's look at the volumes for the past year at the LT Cancer Center.”
AI Response
The AI doesn't analyze the data, explain why a metric changed, or invent recommendations. It simply gets the user to the verified source of truth faster. That constraint was the point.
Interpret the request
Identify the metric
Extract the unit
Apply filters
Navigate to the correct dashboard
Because users could see exactly where the system took them, this was a rare AI interaction where correctness was immediately legible.
What it did:
Parsed natural-language data questions
Found the right chart
Applied the right filters
Asked clarifying questions when needed
What it didn't do:
Explain why performance changed
Answer unsupported operational questions
Invent missing metrics
Make recommendations
What shipped:
80–90% first-pass interpretation accuracy
10–15% clarification rate
~0% hallucination by architecture
60+ metrics accessible via natural language
KEY TAKEAWAY
The first AI capability needed to be useful, observable, easy to verify, and safe if imperfect. Metrics Navigation met all four. It helped users reach answers faster without asking them to trust AI-generated reasoning.
I treated unsupported cases as part of the product, not edge cases
In AI products, unsupported cases aren't a side detail. They are part of the trust model.
I designed Ask iQueue to be direct about what it couldn’t do. When it answered, I wanted users to learn that it stayed in bounds.
If a user asked for a timeframe the system couldn't support, it said so:
“What were my volumes in East 2 in January 2021?”
“The system currently supports viewing data from March 14, 2022, onwards. Unfortunately, data for January 2021 is not available. Would you like to see the volumes for a more recent period instead?”
If a user asked for a metric that didn't exist, it acknowledged the gap and redirected to something related instead:
“What was my block time in East 2 last month?”
“It seems like you're asking about "block time," but none of the available charts track this metric. I can help explore related metrics such as appointment volumes or utilization.”
KEY TAKEAWAY
A graceful refusal was a better product outcome than a broad answer users could not verify.
phase 2: EXPAND FROM NAVIGATION TO RECOMMENDATIONS
Once Ask iQueue had earned trust as a read-only assistant, we let it move one step closer to decision-making.
I designed Smart Booking as assistance, not authority: the AI translated intent, but the recommendation logic stayed deterministic and the final decision stayed human.
This distinction mattered to me. I didn't want the LLM “reasoning” about scheduling. I wanted it parsing intent and translating it into a structured request, while the underlying optimization logic remained deterministic and verifiable.
“Help me find a 60min appointment slot for today”
Example user question
“What time range and locations did you have in mind?”
Intent clarification if needed
Run data science model
System
System returns ranked options and navigates to the Scheduling page in iQueue.
AI Response
User reviews options and chooses.
Result
KEY TAKEAWAY
The model understands the request. The system makes the recommendation. The human makes the decision. That let us increase value without giving the AI false authority.
Where I pushed back: the shipped scope didn’t match the real workflow
This is where I disagreed most with what shipped.
Schedulers often manage queues of 150+ pending orders, so a one-patient-at-a-time workflow only solved a fraction of the real job.
I pushed for broader support because the value of AI depends on whether it matches the actual shape of the workflow, not whether it technically works.
We shipped the narrower version first because the underlying data science capability couldn't yet support multi-patient scheduling.
Real workflow:
High volume (150+ pending orders)
Repetitive scheduling decisions
Need batch help
Shipped scope:
1 patient at a time
Ask → wait → review → repeat
Useful but too narrow
KEY TAKEAWAY
It was a reasonable first step, but not the end state I believed users needed. AI can make a workflow look futuristic while still not solving enough of the job.
phase 3: increase autonomy without giving up control
Auto-Assign was the biggest increase in AI agency — and the phase where human control mattered most.
Ask iQueue interprets the request, turns it into structured parameters, and triggers the existing assignment engine. Users then review the draft result before publishing.
This was a meaningful shift: for the first time, Ask iQueue was no longer only retrieving information or surfacing options. It was helping perform work. That made the safety model much more important.
“Assign patients with 60min duration or less today in LT Cancer Center.”
Example user question
Ask iQueue generates the draft assignment changes.
System
User reviews, edits if needed, and publishes manually.
Result
What made this safe:
Explicit instruction only.
Visible draft before publish.
Deterministic, reversible execution.
KEY TAKEAWAY
More autonomy required more visible safeguards, not more magic. The AI translated intent into action, but it never outran human authority.
The roadmap was a progression of earned autonomy
Ask iQueue wasn't designed to jump from answering questions to acting independently. I treated autonomy as something the product had to earn. That meant moving through three stages:
Help users access truth.
Help users evaluate options.
Help users execute intent inside clear safeguards.
That progression is the central product idea behind Ask iQueue.
ACT
Auto-Assign
AI performs bounded actions in draft mode
User reviews before publish
READ
AI Metrics Navigation
AI retrieves the right information
User verifies
RECOMMEND
Smart Booking
AI surfaces options
User decides
Value + Risk + Autonomy
KEY TAKEAWAY
I expanded capability only when the previous level of trust had already been earned.
This wasn’t a chat UI problem. It was a behavior design problem
Ask iQueue changed the unit of design. I wasn’t just designing screens or prompts. I was designing:
What the AI was allowed to do.
When it should ask for clarification.
How much trust it had earned.
That was the real product: not the chat interface, but the contract between human judgment and machine behavior.
KEY TAKEAWAY
I wasn’t designing a chatbot. I was designing the boundaries of trust.
What this taught me about designing AI products
Ask iQueue left me with a simple belief: the best AI products don't start with autonomy. They earn it.
Start with what users can verify.
Make the limits obvious.
Let the model interpret, but not pretend to know.
Never ask people to trust the system more than it deserves.
KEY TAKEAWAY
The best AI products are not the ones that feel smartest. They’re the ones that make trust feel rational.
