Why We Built Opelyx: Structured Data for Health Insurance
The first time I tried to look up health insurance plan pricing programmatically, I spent three hours downloading ZIP files from CMS.gov, wrote a script to parse the CSV, hit three different column naming inconsistencies between states, and ended up with data I still wasn’t sure was correct. That was 2024. The situation hasn’t materially improved.
Health insurance is one of the largest consumer expenditures in the United States. The average family premium for employer-sponsored coverage topped $25,000 in 2023. Marketplace plans aren’t cheaper. And yet, if you’re a developer trying to build anything with this data — a benefits comparison tool, a broker dashboard, a cost estimator — you’re essentially starting from scratch every time.
The Data Exists, But It’s Not Usable
The Centers for Medicare and Medicaid Services (CMS) publishes something called Public Use Files (PUFs) for ACA marketplace plans. These cover plan attributes, rates, benefits, service areas, and machine-readable URLs. Technically, everything you need is in there.
In practice, working with the raw PUFs is genuinely painful:
- They’re released once per year, usually in May, for the following plan year
- The format is a collection of multiple CSVs that need to be joined on different key fields
- Column names are inconsistent across file types —
PlanIdin one file,plan_idin another,StandardComponentIdin a third - Rate records use age ranges as column headers in one format and row-per-age in another, depending on the year
- The tobacco preference variants double or triple your row count and have to be deduplicated correctly
- Most critically: CMS PUF data only covers the 30 FFM (Federally Facilitated Marketplace) states. The 21 state-based marketplaces (California, New York, Colorado, etc.) publish their own data in their own formats, with their own schemas, on their own schedules
That last point is the one that kills most projects. You can process the FFM data and think you have a national health plan API. Then a user in California or New York tries it and gets nothing.
The Vendor Landscape
If you don’t want to deal with raw PUFs, there are vendors. Ideon (formerly Vericred) is the main one. They have a well-structured API with good documentation. Their plans start at around $10,000 per year and require a sales call to get started. Ribbon Health is similar — excellent data quality, enterprise contracts.
These are the right product for the right customer. If you’re building software for a large insurance carrier or a major benefits administration platform, the pricing makes sense. The problem is that a lot of the people who need this data aren’t at those companies. They’re:
- Individual developers building side projects or early-stage startups
- Researchers who need programmatic access for analysis
- Small brokerages that want to build internal tooling
- Healthcare consultants writing one-off scripts to answer client questions
For all of those use cases, a $10K/year floor with a sales call gate is effectively “no.” There’s no self-serve trial, no free tier, no way to evaluate the data quality without going through procurement.
What We Actually Built
Opelyx’s Health Plans API is our answer to this. The core product is a REST API that exposes:
- Plan search by ZIP code, plan type (HMO, PPO, EPO, POS, Indemnity), metal tier, and price range
- Age-rated premiums (we handle the lookup so you don’t have to)
- Plan detail with deductibles, OOP maximums, copays, coinsurance
- Side-by-side plan comparison for 2-4 plans
- Out-of-pocket cost estimation using SBC scenario data
- ZIP-to-rating-area resolution
The data pipeline ingests CMS PUF CSVs, normalizes the rate records, joins across file types, deduplicates tobacco variants, and loads into Cloudflare D1. We handle the SBM states separately — some through state exchange PUFs where they exist, some through SERFF filing data for states that publish it publicly.
The current database has 22,000+ plans, 1.4 million rate records, and covers all 51 jurisdictions (50 states plus D.C.). Getting from 30 FFM states to 51 required significant per-state work — California alone runs on Covered California’s XLSX exports, which have a completely different schema from the FFM CSVs.
The Pricing Philosophy
We start at $0. There’s a free tier with 100 API calls per day. No sales call, no trial request form, no waiting for a rep to get back to you. You sign up, get an API key, and start making requests.
The reason we can do this where Ideon can’t is infrastructure costs. Running this on Cloudflare Workers + D1 at the edge means our marginal cost per request is measured in fractions of a cent. We don’t need to charge enterprise prices to cover a team of account executives and a data center lease.
The data processing pipeline — downloading CSVs, normalizing, loading into D1 — is compute-intensive but it runs once per plan year. After that, serving requests is essentially free at our scale.
What’s Next
We’re planning to expand into additional financial data verticals. The MCP server we’ve already shipped has tools for SEC EDGAR filings and prediction markets. But health insurance is where we’re investing the most deeply, because it’s where the data gap is largest and the need is most acute.
If you’ve ever tried to build something with health plan data and hit the same walls I did, the API is live at api.opelyx.com. Documentation is there, along with a Scalar UI where you can make live requests without writing any code.
The problem is worth solving. We’re solving it.