What you'll learn: How to move from successful pilot to production deployment without catastrophic failures.
Key takeaways:
- Launch is not a single event. It's a graduated ramp with gates at every stage. Internal dogfooding, single location, limited expansion, then gradual percentage increase.
- Each ramp stage needs explicit gate criteria. Completion rate, error rate, satisfaction scores, and operational stability must meet thresholds before advancing.
- Kill switches must be granular. Per-location, per-queue, or per-use-case controls let you disable problem areas without shutting down everything.
- Practice the rollback before launch. An untested rollback takes hours under pressure. A rehearsed rollback takes minutes.
- Communicate the deployment plan to affected staff before launch. Surprised employees become resistant employees.
The pilot had succeeded at one location. The scheduling agent booked appointments accurately, patients reported positive experiences, and the no-show rate matched human-booked appointments. Leadership wanted to move fast.
The healthcare provider launched to all twelve locations simultaneously.
Within 48 hours, the operations team discovered that three locations had different EHR configurations. A required field that existed in the pilot location didn't exist in these three. The booking tool received the data, returned a success message, but the appointment never actually appeared in the EHR. The agent confirmed appointments that were never created.
They didn't have a kill switch per location. The only option was to shut down the entire deployment. The choice was to let phantom appointments continue accumulating or take down a working system at nine locations to fix three.
They shut it down. Six hundred patients had appointments that didn't exist. The rollback took four hours. The patient outreach took a week. The internal credibility damage took months to repair.
Grace, the engineering director who inherited the cleanup, rebuilt the launch process from scratch. Never again would a deployment go from zero to one hundred without stops along the way.
One location at a time
Grace designed a launch sequence with gates at every stage.
Internal dogfooding came first. The engineering and product teams used the agent to book their own appointments. They caught issues that testing missed because they used it like real patients, not like testers following scripts. A week of internal use before any external exposure.
A single location came next. One clinic, chosen for its standard EHR configuration and cooperative staff. Full production traffic, full monitoring, real patients. Two weeks at a single location with daily review of every metric.
Limited expansion added two more locations. Still small enough to monitor closely, diverse enough to surface configuration differences. Another two weeks with the expanded set.
Gradual ramp increased coverage in stages. 25% of locations, then 50%, then 75%, then 100%. Each stage lasted at least a week. Each stage required gate criteria to be met before advancing.
The full rollout that had taken two days now took eight weeks. But eight weeks without incident was better than two days followed by six months of credibility repair.
What has to be true before you advance
Each stage had explicit criteria for advancement.
Completion rate had to exceed 85% of the pilot baseline. A drop suggested something was wrong at the new locations.
The error rate had to stay below 0.5% for critical errors. More than five critical errors per thousand interactions triggered an investigation before advancing.
Patient feedback had to maintain satisfaction scores within two points of pilot levels. Patients at new locations should have comparable experiences.
Operational stability meant no unplanned incidents requiring rollback. Any incident paused the ramp until the root cause was identified and fixed.
Grace wrote the criteria before launch and posted them for the team to see. Advancing to the next stage wasn't a judgment call. Either the numbers met the criteria, or they didn't.
When the second expansion stage showed a drop in completion rate at one location, they didn't advance. They investigated and found a timezone configuration issue causing appointment times to display incorrectly. Fixed it, monitored for a week, confirmed the completion rate recovered, then advanced.
Kill switches
The original deployment had one switch: on or off for everything. Grace built granular controls.
Per-location switches let them disable the agent at specific clinics without affecting others. If three locations had EHR issues, they could disable those three while nine continued operating.
Per-queue switches let them disable specific call types. If the agent handled scheduling and prescription refills, they could disable refills without affecting scheduling.
Per-feature switches let them disable specific capabilities. If a new confirmation flow caused problems, they could revert to the previous flow without rolling back the entire agent.
Each switch had a clear owner and a documented activation procedure. Anyone on the operations team could flip a location switch. Queue and feature switches required engineering approval. The escalation path was defined before it was needed.
Grace tested the switches monthly. A kill switch that had never been tested might not work when needed. They practiced disabling and re-enabling locations during low-traffic windows. The muscle memory mattered.
Rollback planning
The original rollback had taken four hours because no one had practiced it.
Grace built a rollback plan and rehearsed it.
The plan specified exactly what steps to take. Disable the agent through the kill switch. Route calls to the backup queue. Notify the contact center team. Update the status page. The steps were documented in a runbook that anyone on call could follow.
The rehearsal happened before launch. During a maintenance window, the team executed the full rollback procedure as if an incident had occurred. They timed it. Identified three steps that took longer than expected. Optimized those steps. Rehearsed again.
The rehearsed rollback took eighteen minutes. The unrehearsed rollback had taken four hours. The difference was preparation.
Grace also defined rollback triggers. What conditions would warrant an immediate rollback rather than investigation first? A critical error rate exceeding 2% triggered an immediate rollback. A completion rate below 70% triggered an investigation with the rollback authority. Patient complaints about phantom appointments triggered immediate rollback at affected locations.
The triggers were written down before launch. No one had to decide at the moment whether things were bad enough to roll back. The criteria made the decision.
Who gets told what
The original launch caught the contact center staff by surprise. Calls suddenly routed differently, and no one had told them why.
Grace built a communication sequence.
Two weeks before launch, the contact center managers received a briefing. What the agent would do, how it would affect call routing, and what to expect from patient feedback. Managers needed time to prepare their teams.
One week before launch, frontline staff received training. How to handle patients who mentioned the automated system. How to escalate if the agent has made errors. What the rollback procedure looked like from their end.
On launch day, everyone received a status update. The rollout was beginning; here's what to watch for, and here's who to contact with issues.
Daily during the ramp, the team sent brief status updates. Metrics summary, any incidents, next stage timeline. No surprises.
Grace learned that communication failures caused as much damage as technical failures. Staff who felt blindsided became resistant. Staff who felt informed became allies who surfaced issues early.
Day-one monitoring
The first 24 hours required dedicated attention.
Grace assigned specific people to watch specific metrics. One engineer watched system health: latency, error rates, and uptime. One analyst watched business metrics: completion rates, transfer rates, and patient feedback. One operations lead watched the support queue for emerging issues.
They defined escalation triggers. Latency above 1.5 seconds for more than five minutes paged the engineer. The 10-point drop in completion rate triggered a war room. Patient complaints about incorrect appointments triggered an immediate check of the booking tool.
The team stayed in a shared channel for the first 24 hours. Real-time communication, quick decisions. After 24 hours without incident, they shifted to regular monitoring cadence.
Grace also scheduled a 24-hour retrospective. What had they seen? What had surprised them? What needed adjustment before the next stage of the ramp? The retrospective captured learning while it was fresh.
The second deployment took eight weeks instead of two days. It reached all twelve locations without a single phantom appointment.
Grace kept a photo on her desk from the first deployment. Six hundred sticky notes on a whiteboard, each representing a patient who needed to be called about an appointment that didn't exist. The photo reminded her why the process mattered.
Eight weeks felt slow when leadership wanted to move fast. But eight weeks without the sticky notes was faster than two days followed by six months of repair.

