Data mining 2.0: From finding errors to preventing them

Published:

May 15, 2026

Blog title displayed on a light blue background with a laptop showing three gears and lightbulbs around it, symbolizing smarter, modern data mining.

If your payment integrity program includes data mining, you're likely to uncover a steady stream of overpayments. From duplicate payments and modifier errors to overlooked readmissions and billing discrepancies that violate contract terms, the findings pour in month after month. 

The uncomfortable truth is that most data mining programs aren’t designed to get smarter. They’re designed to find the same errors over and over, and in many cases, the vendors running them have every financial reason to keep it that way. Combined with IT bottlenecks, stale rules, and limited visibility into what’s actually driving the findings, you have a program that generates typical findings while quietly losing ground.

This is the state of data mining today. And it doesn’t have to be.

The contingency trap: Built to recycle, not improve

For decades, legacy data mining vendors have dominated the market using a contingency-based compensation model: they take a percentage of the savings they find for you. On the surface, the incentives seem aligned–they only get paid when you save money, but a closer look reveals a different story.

Contingency-based vendors have a built-in conflict of interest: they profit from the errors they are hired to catch. Their revenue model relies on recurring billing errors, as each recovery incurs a fee. Consequently, the better your payment program becomes, and the more errors are prevented at the source, the less these vendors earn. Activities that create long-term value for you–such as identifying new error patterns, educating providers, or implementing new pre-payment rules–directly undermine their business model. As a result, they have little incentive to prioritize fixing the root cause of the problem.

This leads to an outcome many payment integrity (PI) leaders privately acknowledge: receiving high-level monthly summaries with limited detail on what drove the findings and no clear path to preventing recurrence. They know what was found, but not why it keeps happening.

This isn't an accident; it's a structural feature of the contingency model. As a result, health plans are left with a program optimized for vendor revenue, rather than long-term payment accuracy.

The hidden cost of opacity

Even for health plans running data mining in-house, the visibility problem persists. Most internal programs rely on SQL developers and IT resources to build and run rules. When a claims analyst identifies a potential overpayment pattern, the workflow looks like this: identify the issue, submit an IT ticket, wait weeks for a rule to come back, iterate through multiple rounds of revision, and, in many cases, reprocess historical claims to uncover prior exposure. By the time the rule is live, the billing pattern may have shifted.

This bottleneck also complicates understanding the results, compounding the problem. When a claim is flagged, can your team determine why immediately? Can they identify the policy behind the rule? Can they tell if it’s a one-off error or part of a larger pattern? For most plans, the honest answer is no–at least not without a slow, painstaking manual review.

Health plans have always known what to look for; the challenge has been to act quickly enough on that knowledge. Vendor reporting often provides only high-level error categories, failing to detect provider-level patterns or offer the policy-level insights needed for meaningful, proactive change.

This lack of visibility makes three things impossible:

  1. Educating providers effectively and improving payment accuracy
  2. Implementing upstream interventions
  3. Validating vendor performance

The maintenance problem no one talks about

There’s a slow deterioration in almost every data mining program, and it rarely gets flagged until savings start to slip noticeably. It’s rule maintenance, or rather, the lack of it. Data mining rules are built on foundations that constantly shift–CMS guidance, payment policies, fee schedules, contract terms, code sets, and more. When this foundation changes, but the rule doesn't, the rule grows stale. It might still generate results that appear normal, but its accuracy silently erodes, causing it to miss new patterns or to flag issues that are no longer relevant.

This challenge is especially pressing for plans that manage multi-state Medicaid programs. Due to state-by-state policy variations, rule maintenance is a near-continuous requirement, but it affects any plan with a significant rule library. The business consequences are real, yet often invisible. A program that performed well two years ago–with well-maintained rules aligned with current policies–may operate with much lower fidelity today simply due to neglect. As rules become outdated, the program's exposure grows while its effectiveness stagnates.

For many health plans, rule maintenance can account for 25–40% of a data mining program’s resource allocation. That’s a significant operational burden that most plans either underestimate, under-resource, or defer until performance starts to slip.

What a smarter program actually looks like

To solve these problems, we need to rethink the model from the ground up, rather than patching a legacy approach. A truly improved data mining program is built on a unified intelligence layer that connects claims data, policy logic, and decisioning–so three capabilities can work in unison:

  1. Self-service control for PI and claims teams
    The IT dependency problem doesn’t go away by hiring more SQL developers. It goes away when the people who understand the patterns–your claims analysts, your reimbursement analysts, your senior rule authors–can build and iterate on rules themselves, without writing code. It also dramatically accelerates how quickly new opportunities can be identified and acted on. Instead of waiting weeks for development cycles, ticket queues, and multiple rounds of revisions, teams can move from hypothesis to live rule in hours. 
  1. Transparency that enables action
    Every finding should answer three key questions: Why was the claim flagged? Which policy or rule was applied? Is this an isolated error or a systemic pattern? This level of transparency elevates data mining from a simple recovery operation to a powerful intelligence tool. 
  1. Automated maintenance that keeps the program current
    The maintenance burden doesn’t disappear, but it can be automated. Policies monitored continuously. Reference tables updated on schedule. Rules are flagged automatically when the policy they’re based on changes, put on hold pending review, rather than quietly running on outdated logic.

Agentic AI takes this further: error-category-specific agents that identify patterns, share results, and automatically update supporting documentation and rules. The vision is a program that requires minimal human touch to maintain and gets more accurate over time, not one that degrades in the background until someone notices the savings are slipping.

From recovery engine to learning system

Health plans shouldn’t ask, “Are we finding overpayments?” Nearly every plan is. The better question is: “Are we getting smarter with every finding?”

Real progress comes from understanding why errors occur, keeping rules aligned with evolving policies, and building the institutional knowledge needed to reduce future exposure–not just to recover from past mistakes.

For most plans, the honest answer is no. Traditional data mining models weren’t built to learn. They were built to find overpayments, generate savings, and repeat the cycle.

But the future of payment integrity belongs to programs that do more than detect errors. It belongs to programs that surface root causes, adapt as policies evolve, and turn every finding into an opportunity to improve. The plans that build lasting payment accuracy won’t be the ones that recover the most–they’ll be the ones that learn the fastest. 

About Cohere Surface™

Cohere Surface is a next-generation, rules-based claims intelligence solution designed to give health plans complete visibility and control over their data mining programs. It offers self-service tools, agentic AI, and total transparency into every finding. With flexible deployment models–whether in-house, hybrid, or services-supported–Surface meets health plans wherever they are in their payment integrity journey.

If your data mining program is still finding the same errors year after year, it may be time for a smarter model. Explore how Cohere Surface helps health plans turn every finding into lasting payment accuracy.

No items found.

Available For Download

What’s Next?

Written by

Cohere Health

Lalithya

Yerramilli

Lalithya Yerramilli is a healthcare technology leader with over 20 years of experience driving innovation at the intersections of analytics, payment integrity, and care optimization. She founded Zigna AI, where she focused on bringing transparency and efficiency to healthcare reimbursement. Previously, she was a member of the executive leadership team at SCIO Health Analytics, leading a global analytics organization and driving growth to support a $1B pipeline. Lalithya holds a master’s in Engineering, Quantitative Finance, and Statistics from Rutgers University, and an undergraduate degree in Engineering from Osmania University.

Stay ahead with expert insights on transforming utilization management and payment integrity—delivered straight to your inbox.