Resolve AI says the AI coding boom is breaking production systems. It wants to fix that.

Resolve AI, the production-operations startup backed by Greylock and Lightspeed Venture Partners, today announced a sweeping expansion of its platform that introduces always-on background agents, a redesigned investigation architecture, and a shared workspace where engineers and AI agents collaborate in real time on live incidents.

The centerpiece of the release is a new multi-agent investigation system developed by Resolve AI's in-house research lab. Instead of deploying a single AI agent to diagnose a production failure — analogous to a lone engineer pulling an on-call shift — the platform now dispatches a coordinated team of specialized agents that pursue multiple hypotheses in parallel, independently verify each other's conclusions, and construct complete causal chains from root cause to symptom. The company says the architecture delivers more than a twofold improvement in root cause accuracy on its internal evaluation benchmarks compared to earlier versions of its platform.

"Think of a single agent being on call, the way a human would be," Resolve AI CEO and co-founder Spiros Xanthos told VentureBeat in an exclusive interview ahead of the announcement. "We now have a team of agents that all work together, almost like a team of humans debugging an issue, and that has improved quality by 2x."

The announcement arrives at a moment of acute tension in the software industry. AI-powered code generation has exploded in adoption, enabling engineering teams to ship dramatically more software than they could two years ago. But keeping that software running in production — debugging it when it breaks, monitoring it after deployment, auditing its health — remains overwhelmingly manual. For a company that raised a $125 million Series A at a $1 billion valuation earlier this year, Resolve AI is making a direct bet that the operational side of the software lifecycle is the next major frontier for AI investment.

What hundreds of real-world test cases reveal about the accuracy claim

Any accuracy claim from a startup warrants scrutiny, and Xanthos was candid about both the scale and limitations of the evaluation. The 2x figure comes from internal benchmarks, not a third-party audit, though the evaluation set was built to mirror the complexity that Resolve AI's enterprise customers encounter daily.

"These are very hard, complex evals that we built over time to represent real-world examples," Xanthos explained. "This is not customer data, but these evals represent difficult cases similar to what we've seen at some of the largest tech companies we work with." He described the set as comprising hundreds of cases that reflect the kinds of production failures encountered at companies like Coinbase, Salesforce, DoorDash, and Zscaler — all named Resolve AI customers.

The practical impact of that accuracy gain is significant. Resolve AI's agents now act as first responders for every on-call alert, typically triaging within five minutes before a human engineer even becomes involved. In previous public disclosures, the company has cited DoorDash reducing time to root cause by up to 87 percent. When asked to contextualize that figure, Xanthos described the typical baseline.

"When something goes wrong, it might take five to 10 minutes for a human to even get their laptop and connect," he said. "The typical MTTR is in the tens of minutes, sometimes hours, depending on severity. So an improvement of 80-plus percent — four to five times faster — is actually huge. It's something we've never achieved before with AI, tools, data, or observability."

How AI agents fact-check each other to prevent hallucinated root causes

One of the core challenges in applying large language models to high-stakes production environments is their tendency to generate plausible-sounding but incorrect answers — a failure mode that, in the context of a live outage, could send an engineering team chasing the wrong fix while a service stays down.

Xanthos acknowledged this directly. "This is a very common issue with models out of the box," he said. "They always try to give you an answer, and if they don't have enough evidence, they'll give you the best possible answer — which is likely to be wrong."

Resolve AI's countermeasure is a system of layered verification among its agents. Each agent investigating a hypothesis must cite every piece of evidence it relies on and present that evidence to another agent for independent review. The investigating agent must construct the full causal chain — from root cause to symptom — and peer agents actively attempt to disprove the theory by identifying gaps in the logic.

"Often, agents actually disprove those theories because they find gaps," Xanthos said. "There are many layers of defense and agentic checks that allow Resolve to be very accurate and not mislead."

Equally important, he said, is the system's willingness to say it does not know. "The bar to actually saying 'I have the answer' is very high. In those cases, it will say, 'This is the evidence I found. Here are three or four paths you can take from here, but I wasn't able to fully prove that this is the problem.' A system like this that operates in production cannot be a black box." In domains where wrong answers carry operational consequences, calibrated uncertainty can be more valuable than confident outputs. For an AI system integrated into an incident-response workflow, confidently pointing engineers in the wrong direction during a customer-facing outage could compound the very harm it was designed to prevent.

Inside the new background agents that never go off-call

Beyond incident response, Resolve AI is introducing a new class of background agents designed to handle the continuous, often invisible operational work that engineering teams are expected to perform but struggle to sustain at scale.

These agents run on schedules or wake automatically in response to events — a new deployment, a fired alert, a merged pull request — and accumulate institutional knowledge from every investigation and human interaction over time. When an engineer opens the Resolve AI interface, agents have already been working: pre-investigating priority issues, monitoring deployments, auditing alert hygiene, flagging configuration drift, and surfacing cost anomalies.

Xanthos drew a distinction between background agents and the incident-response agents that have been Resolve AI's primary offering. "You can now have these agents run in the background at all times — not only when a human asks an agent to debug a problem or when an alert fires," he said. "A lot of our customers are now monitoring changes that land in production before they cause an issue. There's an agent that monitors those all the time."

He described these background agents as "general-purpose SRE agents that are available to every developer," capable of handling tasks that range from monitoring infrastructure changes that might increase cloud costs to performing post-incident follow-up work like generating code fixes based on incident learnings. The concept addresses a structural problem in software operations: the daily tasks required to keep production systems healthy — monitoring deployments, investigating alerts, tracking changes across complex environments — are critical but reactive and manual. Engineering organizations know this work needs to happen, but it competes for attention with feature development. Automated agents that perform this work continuously could shift teams from reactive firefighting to proactive operational management.

The shared workspace where engineers and AI agents investigate together

The third major component of the release is what the company calls a shared investigation surface — a workspace where engineers and AI agents work from the same live evidence during an active incident. Reports update dynamically as investigations evolve. Every finding is inspectable. Engineers can explore side investigations without interrupting the primary workflow. Source queries are pullable and modifiable in place, evidence is embedded directly into the workspace, and remediation actions can be triggered from the same interface without switching tools.

"Think of it as an interface to all the production tools, but also an interface where humans and agents can collaborate with each other — or agents with agents," Xanthos said. "That's what gradually leads to more trust and more automation, because you work with the agent, you teach it, you see the results."

The company is also making its platform available as a REST API and an MCP (Model Context Protocol) server, enabling engineering teams to integrate Resolve AI into broader agentic workflows and infrastructure. According to Xanthos, this is already happening in practice. "A general-purpose agent that a company has built — when it comes to debugging, that agent could invoke Resolve," he said. "Or somebody works on their coding agent on the laptop, and Resolve shows up there as an MCP. If there is some production-related activity, the coding agent can invoke it." The interoperability play signals that Resolve AI sees itself not as a closed system but as a specialized node in a broader ecosystem of AI agents that will increasingly hand off tasks to one another — a pattern Xanthos compared to the open architecture of the web rather than the walled-garden model of an app store.

Why Resolve AI says it can outperform Datadog, PagerDuty, and the cloud giants

The agentic operations space has become crowded in the past year. Datadog, PagerDuty, and major cloud providers have all announced AI-augmented operations capabilities. When asked what separates Resolve AI from these incumbents, Xanthos pointed to the depth of the company's technical foundation.

"We're operating at the frontier here. There's no blueprint for how you build a system like Resolve," he said. He noted that he and co-founder Mayank Agarwal co-created OpenTelemetry, the most widely adopted open-source project in observability, which now serves as the de facto standard for collecting metrics, logs, and traces from modern software systems.

Xanthos also highlighted the company's recent AI Lab, led by a researcher he described as the former post-training lead for Meta's Llama models. "He managed to combine deep expertise of production observability with AI and models, and I think that's very unique," Xanthos said. "I don't believe any other company, whether it comes from an observability background or it's a startup, has all of that together."

The company's structural defenses, according to Xanthos, include a full environment model that Resolve builds for each customer, a memory system that learns within the customer's specific production environment, and its multi-agent architecture. The lab is now post-training frontier models on production-specific data — the kind of procedural knowledge that experienced engineers use to debug production issues but that does not appear in standard model training sets. This approach reflects an increasingly common pattern among AI application companies: using frontier foundation models as a base layer but investing heavily in domain-specific fine-tuning, retrieval, and agent architectures to achieve accuracy levels that general-purpose models cannot reach alone.

How outcome-based pricing changes the economics of AI in production

Resolve AI's pricing model departs from traditional enterprise software licensing. The company sells credits that are consumed when agents perform work — an outcome-based approach that ties cost directly to value delivered.

"We're not selling software," Xanthos said. "The way you buy and use Resolve is by buying credits that are consumed when Resolve performs an action. It's outcome-based. Only when Resolve troubleshoots an alert — that's the only time that it consumes credits."

He addressed the cost question head-on, arguing that Resolve AI is actually cheaper than the alternative of building a similar system from scratch using frontier models and MCP integrations. "If you were to take Opus or GPT-5.4 and try to build a solution like Resolve with MCPs, we measured — you actually end up consuming a lot more in tokens than what you have to pay Resolve, because our system is very optimized in terms of context, in terms of how it reads time-series data."

As for the always-on background agents, Xanthos said their continuous nature does not inherently add to cost. "The background agent doesn't mean it does intensive work all the time. It means that it can be there; you can give it any task you want. A lot of these tasks are triggered based on some action — an alert happens, somebody merges a PR, and you want to see if it has an impact on production." For enterprise customers in regulated industries — the Coinbases and Zscalers of the world — data residency and security are non-negotiable. Resolve AI accommodates this with a flexible deployment model: the data plane sits wherever the customer's existing tools already live, while the inference layer can run as a standard SaaS deployment or inside a customer-specific VPC. "We designed Resolve to work with the large enterprises where security standards are the highest," Xanthos said. "There are many measures we take to ensure Resolve is secure, including not retaining data."

Why engineering leaders are slowly learning to trust AI agents with production systems

The question of whether engineering teams will trust AI agents to take autonomous action in production — rolling back a deployment, adding capacity, generating a pull request — is one of the defining cultural challenges of this technology wave. Xanthos drew an analogy to autonomous vehicles.

"For us to allow a car to drive on its own on the street, we have to prove that it's safer than a human. Agents in production is a very similar concept," he said. He acknowledged that not every customer is comfortable with agents taking automated action, but described a gradient of trust that he expects to evolve rapidly.

"There is a set of actions that are relatively risk-free that most tech companies probably are comfortable having an agent take, and probably there is another set of actions for which the human has to approve," he said. "But as quality keeps climbing the way we see at Resolve, I would say we're going to cross the threshold this year where most of the actions will be taken by an agent automatically."

He described the typical adoption arc: companies begin with agents providing recommendations, then a human decides whether to press the button. Over weeks or months, trust builds incrementally. "I don't think this is a problem where we just let the agents run wild from the beginning," Xanthos said. The incremental approach mirrors how enterprise technology adoption has always worked — from cloud migration to container orchestration, organizations move at the speed of trust, not the speed of capability.

The argument that AI-generated code is making the production crisis worse, not better

Perhaps the most provocative argument in Resolve AI's thesis is that the explosion of AI-generated code is actually intensifying the production-operations problem. In a recent LinkedIn post, Xanthos framed the dynamic in stark terms, arguing that engineering leaders who celebrate faster code shipping without investing in production operations are effectively having their senior engineers "subsidize velocity" through increased incident-response burden.

In his interview with VentureBeat, he returned to this theme. "Now that coding agents are producing code, we produce a lot more code that we're less familiar with — humans are less familiar with — so you need the AI to be the defense," he said.

This framing positions Resolve AI not merely as a productivity tool but as a necessary counterweight to the AI coding revolution. As organizations deploy more code, written by tools that their engineers may not fully understand, running against production systems those engineers did not build, the argument is that the operational complexity — and the consequences of failure — will grow proportionally. On the Stack Overflow Podcast last October, Xanthos put numbers to this claim, estimating that engineers spend upwards of 70 percent of their time maintaining and troubleshooting production systems rather than building new features. "We're facing a new crisis where we're building faster than we can operate," he said in that conversation.

Resolve AI was founded in early 2024 by Xanthos and Agarwal, who first met during their PhD programs at the University of Illinois and have worked together for more than a decade. Xanthos previously co-founded Pattern Insight (acquired by VMware) and Omnition (acquired by Splunk), where the pair helped create OpenTelemetry. The company raised a $35 million seed round from Greylock in 2024, followed by the $125 million Series A led by Lightspeed at a $1 billion valuation earlier this year. Named customers include Coinbase, DoorDash, MSCI, Salesforce, MongoDB, and Zscaler.

Xanthos's long-term vision is expansive. "Over the long run, once agent ability surpasses that of a human software engineer, the end result is a lot more technology and a lot more software," he said. "It's not actually fewer people working on it. It's technology becoming cheaper, becoming more accessible, producing a lot more technology for the benefit of the world."

That vision will take years to realize. But the more immediate promise of today's announcement comes down to something every on-call engineer understands viscerally: the 2 a.m. page, the scramble for a laptop, the frantic search through dashboards and logs for an answer that might take minutes or might take hours. Resolve AI is betting that the next time that alert fires, a team of agents will have already investigated, verified, and documented the root cause before the engineer's phone even lights up. For a profession that has long measured its nights by mean time to resolution, the question is no longer whether AI can help — it is whether engineers will let it.

Source link