Smarter Data Mining for Modern Payment Integrity

Published:

May 15, 2026

Blog title displayed on a light blue background with a laptop showing three gears and lightbulbs around it, symbolizing smarter, modern data mining.

If your payment integrity program includes data mining, you're likely to uncover a steady stream of overpayments. From duplicate payments and modifier errors to overlooked readmissions and billing discrepancies that violate contract terms, the findings pour in month after month.

The uncomfortable truth is that most data mining programs aren’t designed to get smarter. They’re designed to find the same errors over and over, and in many cases, the vendors running them have every financial reason to keep it that way. Combined with IT bottlenecks, stale rules, and limited visibility into what’s actually driving the findings, you have a program that generates typical findings while quietly losing ground.

This is the state of data mining today. And it doesn’t have to be.

The contingency trap: Built to recycle, not improve

For decades, legacy data mining vendors have dominated the market using a contingency-based compensation model: they take a percentage of the savings they find for you. On the surface, the incentives seem aligned–they only get paid when you save money, but a closer look reveals a different story.

Contingency-based vendors have a built-in conflict of interest: they profit from the errors they are hired to catch. Their revenue model relies on recurring billing errors, as each recovery incurs a fee. Consequently, the better your payment program becomes, and the more errors are prevented at the source, the less these vendors earn. Activities that create long-term value for you–such as identifying new error patterns, educating providers, or implementing new pre-payment rules–directly undermine their business model. As a result, they have little incentive to prioritize fixing the root cause of the problem.

This leads to an outcome many payment integrity (PI) leaders privately acknowledge: receiving high-level monthly summaries with limited detail on what drove the findings and no clear path to preventing recurrence. They know what was found, but not why it keeps happening.

This isn't an accident; it's a structural feature of the contingency model. As a result, health plans are left with a program optimized for vendor revenue, rather than long-term payment accuracy.

The hidden cost of opacity

Even for health plans running data mining in-house, the visibility problem persists. Most internal programs rely on SQL developers and IT resources to build and run rules. When a claims analyst identifies a potential overpayment pattern, the workflow looks like this: identify the issue, submit an IT ticket, wait weeks for a rule to come back, iterate through multiple rounds of revision, and, in many cases, reprocess historical claims to uncover prior exposure. By the time the rule is live, the billing pattern may have shifted.

This bottleneck also complicates understanding the results, compounding the problem. When a claim is flagged, can your team determine why immediately? Can they identify the policy behind the rule? Can they tell if it’s a one-off error or part of a larger pattern? For most plans, the honest answer is no–at least not without a slow, painstaking manual review.

Health plans have always known what to look for; the challenge has been to act quickly enough on that knowledge. Vendor reporting often provides only high-level error categories, failing to detect provider-level patterns or offer the policy-level insights needed for meaningful, proactive change.

This lack of visibility makes three things impossible:

Educating providers effectively and improving payment accuracy
Implementing upstream interventions
Validating vendor performance

The maintenance problem no one talks about

There’s a slow deterioration in almost every data mining program, and it rarely gets flagged until savings start to slip noticeably. It’s rule maintenance, or rather, the lack of it. Data mining rules are built on foundations that constantly shift–CMS guidance, payment policies, fee schedules, contract terms, code sets, and more. When this foundation changes, but the rule doesn't, the rule grows stale. It might still generate results that appear normal, but its accuracy silently erodes, causing it to miss new patterns or to flag issues that are no longer relevant.

This challenge is especially pressing for plans that manage multi-state Medicaid programs. Due to state-by-state policy variations, rule maintenance is a near-continuous requirement, but it affects any plan with a significant rule library. The business consequences are real, yet often invisible. A program that performed well two years ago–with well-maintained rules aligned with current policies–may operate with much lower fidelity today simply due to neglect. As rules become outdated, the program's exposure grows while its effectiveness stagnates.

For many health plans, rule maintenance can account for 25–40% of a data mining program’s resource allocation. That’s a significant operational burden that most plans either underestimate, under-resource, or defer until performance starts to slip.

What a smarter program actually looks like

To solve these problems, we need to rethink the model from the ground up, rather than patching a legacy approach.

A truly improved data mining program is a claims intelligence system built on a connected data fabric—one that unifies structured and unstructured data across claims, contracts, benefits, policies, provider behavior, and clinical documentation to improve the identification and prevention of overpayments. Instead of working from fragmented raw data and static reports, PI and claims teams operate from a continuously updated, transparent view of payment behavior—where findings are already tied to the underlying policy, contract, and claim context that produced them.

This shift enables faster, more accurate action on claims—not just better reporting or more efficient workflows. It allows teams to move from identifying an issue to understanding and acting on it in the same environment, without relying on separate IT-driven interpretation layers or disconnected systems.

Built on that foundation, three capabilities can work in unison:

Self-service control for PI and claims teams‍
The IT dependency problem doesn’t go away by hiring more SQL developers. It goes away when people who understand payment patterns–your claims analysts, reimbursement analysts, and senior rule authors–can build and refine rules directly.

To achieve this level of self-service, you need a connected data fabric that turns raw claims, policy, contract, and clinical data into a business-ready intelligence. This allows teams to understand and act on the information without technical help. When the underlying data is already linked and contextualized, business users can go straight from finding an overpayment pattern to creating and deploying a rule that impacts claims adjudication, cutting the process from weeks to hours.

Transparency that enables action‍
Every finding should answer three key questions: Why was the claim flagged? Which policy or rule was applied? Is this an isolated error or a systemic pattern? This turns data mining from a retrospective recovery tool into a decision-making system for improving payment accuracy.

Automated maintenance that keeps the program current‍
The maintenance burden doesn’t disappear, but it can be managed continuously. Policies are monitored in real time, reference data stays current, and rules are automatically flagged when underlying policies change–preventing outdated logic from driving incorrect adjudication decisions.

Automation becomes even more powerful when combined with agentic AI: error-category-specific agents that identify patterns, surface insights, and update supporting logic and documentation. The result is a system that improves over time rather than degrading between review cycles.

From recovery engine to learning system

Health plans shouldn’t ask, “Are we finding overpayments?” Nearly every plan is. The better question is: “Are we getting smarter with every finding?”

Real progress comes from understanding why errors occur, keeping rules aligned with evolving policies, and building the institutional knowledge needed to reduce future exposure–not just to recover from past mistakes.

For most plans, the honest answer is no. Traditional data mining models weren’t built to learn. They were built to find overpayments, generate savings, and repeat the cycle.

But the future of payment integrity belongs to programs that do more than detect errors. It belongs to programs that surface root causes, adapt as policies evolve, and turn every finding into an opportunity to improve. The plans that build lasting payment accuracy won’t be the ones that recover the most–they’ll be the ones that learn the fastest.

About Cohere Surface™

Cohere Surface is a next-generation, rules-based claims intelligence solution designed to give health plans complete visibility and control over their data mining programs. It offers self-service tools, agentic AI, and total transparency into every finding. With flexible deployment models–whether in-house, hybrid, or services-supported–Surface meets health plans wherever they are in their payment integrity journey.

If your data mining program is still finding the same errors year after year, it may be time for a smarter model. Explore how Cohere Surface helps health plans turn every finding into lasting payment accuracy.

No items found.

Available For Download

What’s Next?

Written by

Lalithya

Yerramilli

Lalithya Yerramilli is a healthcare technology leader with over 20 years of experience driving innovation at the intersections of analytics, payment integrity, and care optimization. She founded Zigna AI, where she focused on bringing transparency and efficiency to healthcare reimbursement. Previously, she was a member of the executive leadership team at SCIO Health Analytics, leading a global analytics organization and driving growth to support a $1B pipeline. Lalithya holds a master’s in Engineering, Quantitative Finance, and Statistics from Rutgers University, and an undergraduate degree in Engineering from Osmania University.