The Algorithm Was Supposed to Save Lives, Deaths Rose Instead

Published March 2025

In 2019, UPMC quietly rolled out what it called a breakthrough. The Pittsburgh-based health giant had spent years building a machine learning algorithm, trained on records from more than 1.25 million patients, designed to predict which surgical patients were most likely to die or suffer a serious complication. Deployed across 20 of its hospitals, the tool would scan patient charts each morning and flag the highest-risk cases before surgeons ever made the first incision. The goal, according to Dr. Aman Mahajan, the algorithm's lead architect, was to give clinicians "a more objective, data-driven assessment of who might run into trouble after surgery."

The data suggests that assessment was not enough.

Surgical team in operating room

The Rise

Federal records show that in the years following the algorithm's deployment, the rate of deaths among surgical patients who experienced serious but treatable complications climbed sharply across UPMC's hospitals. At UPMC Altoona, the rate rose from 161 deaths per 1,000 such patients in the 2019–2021 period to 259 by 2022–2024, a 61 percent increase that now ranks Altoona first in the country for this measure, the highest PSI-04 mortality rate of any of the 1,524 hospitals reporting to CMS in the most recent data. UPMC Hamot rose from 158 to 234, and UPMC Mercy from 169 to 234, both ranking in the top 15 nationally.

Across the UPMC system, the rate climbed from 149 to 195, well above both the Pennsylvania state average and the national benchmark, each of which rose only modestly over the same period.

The measure, PSI-04, tracked by the Centers for Medicare and Medicaid Services, is designed specifically to capture deaths that should not have happened. It counts surgical patients who develop a serious complication and then die, with the underlying assumption that these are deaths medicine had a chance to prevent. Dr. Mahajan, in an interview, acknowledged the weight of that framing. "PSI-04 focuses on patients who developed serious complications and then died, cases where medicine, theoretically, had an opportunity to intervene," he said. "When that number increases, it raises a fundamental question: Were high-risk patients identified but not effectively managed? Or were the interventions insufficient? Prediction is only the first step. What matters is what happens after the risk is identified."

Screenshot

In theory, the tool was designed to trigger a chain of action. Each morning, surgical teams would receive a list of flagged patients, prompting, as Dr. Mahajan described it, "heightened vigilance, perhaps additional monitoring, consultations, or changes to surgical planning." But he acknowledged the gap between alert and outcome is not automatic. "An algorithm can demonstrate strong predictive accuracy, but that does not automatically improve outcomes. There has to be a clear workflow: Who responds to the alert? What resources are mobilized? Is there adequate staffing to act on those warnings? Without an effective response system, prediction alone does not save lives."

The Gap

UPMC has not responded to questions about the divergence between the algorithm's promised outcomes and its hospitals' PSI-04 trajectory. One defense the health system might offer is that its patients are simply becoming sicker and more complex over time. Dr. Mahajan pushed back on that framing. "PSI-04 is risk-adjusted," he said. "It's designed to account for underlying illness. If rates are rising substantially above state and national benchmarks, it suggests there may be structural factors at play beyond patient mix alone."

Waffle chart showing excess deaths by healthcare system compared to US benchmark

Each square represents 1 excess death per 100,000 patients compared to the US national benchmark

His broader conclusion was pointed. "AI is a tool, not a solution," he said. "If a hospital system promotes AI as a safety breakthrough, it also assumes responsibility for demonstrating that it improves real-world outcomes. Otherwise, we risk mistaking predictive sophistication for clinical effectiveness."

For the patients counted in rising mortality rates, that distinction is not academic.