Building a Health Plan Comparison API: Architecture Decisions

When I started building the Health Plans API, I had a choice to make upfront: traditional server infrastructure (a VPS or container cluster running a Node.js app backed by PostgreSQL), or something built for the edge. I went edge-first with Cloudflare Workers, D1, and Hono. Eighteen months later, I’d make the same choice again.

Here’s the architecture and why each piece is there.

Why Cloudflare Workers

The health plan data use case has an interesting access pattern. Usage spikes hard during ACA open enrollment (November 1 through January 15) and is relatively low the rest of the year. A traditional server setup means you’re either over-provisioned for most of the year or scrambling to scale up when enrollment opens.

Workers handles this natively. There are no instances to provision, no auto-scaling groups to configure, no cold start penalties beyond a few milliseconds on the first request to a new isolate. The Cloudflare network runs in 300+ data centers globally, so a request from a developer in London gets routed to a nearby edge node rather than bouncing to a US-east data center.

The cost model also works better for our pricing structure. We offer a free tier with 100 requests per day. On traditional infrastructure, serving free-tier traffic at scale has a real cost. On Workers, the cost per request is so low that free tier traffic is genuinely sustainable.

D1 for the Data Layer

Cloudflare D1 is SQLite at the edge, replicated globally and queryable from Workers. The 275MB database with 1.4M rate records and 22K plans lives in D1.

SQLite’s query planner handles our access patterns well. The primary query — plans by rating area + age + filters — uses a composite index on (state_code, rating_area_id, age, tobacco) in the rates table and a separate index on (state_code, metal_level, plan_type) in the plans table. On D1, this query runs in under 10ms.

The tradeoff D1 makes is that it’s read-optimized. Writes are slower and more constrained than PostgreSQL. For a dataset that updates once per year and is read millions of times, that’s a good tradeoff. The data pipeline that generates SQL files and imports them is a separate offline process; the API itself is purely reads.

One D1 quirk worth knowing: concurrent writes fail with a “currently processing a long-running import” error. For our use case, this doesn’t matter — the API is read-only and data loads happen in a separate offline pipeline. But if you’re building something that needs real-time writes, D1 is not the right choice today.

Hono for Routing

Hono is a TypeScript-first, edge-native HTTP framework. It runs in Workers without any compatibility shims, which matters because some Node.js-oriented frameworks have significant startup overhead when loaded into a V8 isolate.

The thing I value most about Hono is the type-safe routing. Route handlers get typed request and response objects derived from the route definition. When a route parameter is defined, the handler knows it’s a string and TypeScript enforces it:

app.get("/v1/health/plans/:id", async (c) => {
  const { id } = c.req.param(); // typed as string, not unknown
  const year = parseYear(c.req.query("year"));
  const db = createDb(getHealthDb(c.env, year));
  const plan = await getPlanById(db, id);
  if (!plan) return notFoundError(c, `Plan ${id} not found.`);
  return c.json(plan);
});

Middleware stacks cleanly in Hono. CORS, Bearer auth validation, and rate limiting all compose as middleware applied to the /v1/* route group, with the public documentation routes (/docs, /openapi.json, /health) excluded.

The Plan Search Endpoint

The search endpoint at GET /v1/health/plans is the most used endpoint and the most complex query. The parameters:

zip (required) — 5-digit ZIP code
age (required) — applicant age for premium rating
metal_level (optional) — bronze, silver, gold, platinum, catastrophic
plan_type (optional) — hmo, ppo, epo, pos, indemnity
max_premium (optional) — filter by monthly premium ceiling
per_page (optional, default 25, max 100) — page size
cursor (optional) — cursor for pagination

The execution path:

Validate all parameters with Zod at the route boundary
Look up ZIP in the zip_rating_area table to get state + rating area ID
Query plans joined to rates filtered by rating area, age, tobacco preference
Apply any additional filters (metal tier, plan type, max premium)
Apply cursor-based pagination
Return results with next cursor if more pages exist

Cursor pagination deserves some explanation. We don’t use LIMIT/OFFSET because at large offsets, SQLite has to scan all skipped rows. For a result set of 500 plans, OFFSET 400 LIMIT 25 scans 425 rows. Cursor pagination uses a stable sort key (we sort by the active sort_by column ASC, then plan_id ASC as a tiebreaker) and the cursor encodes the last seen values:

const json = JSON.stringify({ v: sortValue, id: planId });
const cursor = btoa(json)
  .replace(/\+/g, "-")
  .replace(/\//g, "_")
  .replace(/=+$/, "");

This uses web-standard btoa rather than Node.js Buffer since Workers runs on the V8 runtime. The v field holds the sort column value (premium, deductible, max OOP, or name depending on sort_by), and id is the plan ID tiebreaker.

The next query uses a compound OR condition because SQLite does not support tuple comparison syntax:

WHERE (sort_column > :cursor_value)
   OR (sort_column = :cursor_value AND plan_id > :cursor_plan_id)

D1 can satisfy this with an index scan starting at the cursor position. Constant-time pagination regardless of how deep you are in the result set.

Plan Comparison

The comparison endpoint (GET /v1/health/plans/compare) takes 2-4 plan IDs, a ZIP, and an age, and returns a structured side-by-side comparison. All endpoints in the API are GET requests since this is a read-only data service. The response includes for each plan:

Monthly premium for the requested age and rating area
Deductible (individual and family)
Out-of-pocket maximum (individual and family)
HSA eligibility and national network status
SBC scenario costs (having a baby, managing diabetes)
External links (benefits summary, brochure, formulary, provider directory)

The response also includes a comparison_summary with the lowest_premium_plan_id, lowest_deductible_plan_id, lowest_moop_plan_id, and hsa_eligible_count across the compared plans.

The data for this comes from joining the plan’s rate record (for the premium at the ZIP’s rating area) with the plan cost sharing data (for deductible and MOOP fields) and SBC scenario data stored on the plan record itself.

One implementation decision: we cap comparison at 4 plans. The comparison response is already complex. Adding a 5th or 6th plan makes the response unwieldy for most callers, and the data joins get proportionally more expensive. Four is the right number for a UI comparison table; if you need more, paginate through search results.

Cost Estimation

The estimate-oop endpoint uses Summary of Benefits and Coverage (SBC) scenario data. CMS requires every plan to publish cost estimates for standard medical scenarios. We store these estimates per plan and use them to answer “what would this plan cost if I had a baby?” or “what would it cost to manage type 2 diabetes for a year?”

The core function is calculateOopEstimate(scenario, inputs), where inputs is a PlanCostInputs object carrying the plan’s metal level, annual premium, deductible, MOOP, and all SBC scenario fields (deductible, copayment, coinsurance, limit for both having-baby and having-diabetes). The return type is an OopEstimate with estimated_annual_cost, sbc_data_available, a breakdown (annual premium, estimated OOP, deductible applied, coinsurance applied), and notes explaining the calculation.

For SBC scenarios (having_baby, having_diabetes), the function returns CMS-provided cost data directly when available. When SBC data is absent, it falls back to a heuristic model:

const afterDeductible = Math.max(0, totalAllowed - deductibleIndividual);
const deductibleApplied = Math.min(totalAllowed, deductibleIndividual);
const coinsuranceAmount = afterDeductible * coinsuranceRate;
const rawOop = deductibleApplied + coinsuranceAmount;
const cappedOop = Math.min(rawOop, moopIndividual);

The heuristic uses metal-level actuarial values for the coinsurance rate (Bronze 40%, Silver 30%, Gold 20%, Platinum 10%) and scenario-appropriate total allowed charges ($16,500 for having a baby, $10,000 for managing diabetes). The same model also powers low, medium, and high utilization scenarios with different allowed-amount assumptions. Every estimate is capped at the plan’s out-of-pocket maximum and includes the sbc_data_available flag so callers know whether they’re looking at CMS data or a heuristic.

Response Latency

In production, the p50 response time for plan search is around 35ms measured at the Cloudflare edge. The p99 is under 120ms. There are no cold starts in the traditional sense — Workers isolates stay warm as long as traffic is flowing, and even a cold isolate initializes in under 10ms.

This latency profile is why we chose this architecture. A health plan comparison UI should feel instant. If every plan search takes 300-400ms, the product feels slow regardless of how good the data is. Edge-native deployment with a local D1 replica is what makes 35ms achievable at global scale without custom CDN configuration or multi-region database replication headaches.