Real-Time Financial Data via MCP: Prediction Markets, SEC EDGAR, and More

The Model Context Protocol didn’t exist eighteen months ago. Now it’s how AI agents consume external data, and building an MCP server is meaningfully different from building a REST API — different enough that the architecture decisions don’t map cleanly from one to the other.

Here’s how we built Opelyx’s MCP server: what the 13 tools do, how the infrastructure works, the specific problems we ran into with rate limiting from Cloudflare IPs, and why MCP is the right interface for agent consumption even though REST would have been easier to build.

What the 13 Tools Actually Do

The tools group into three categories.

SEC EDGAR tools. EDGAR is the SEC’s public filing database — every public company’s 10-K, 10-Q, 8-K, proxy statement, and more, going back decades. We expose five tools here: ticker-to-CIK lookup (resolves a ticker symbol to EDGAR’s internal company identifier), full-text filing search (query by company name, keywords, form type, and date range), company filing history (retrieve all recent filings for a company), XBRL financial data (structured balance sheet, income statement, and cash flow data pulled from the company’s XBRL tags), and insider transactions (Form 4 filings that disclose buys and sells by executives and directors).

The XBRL tool deserves a note. XBRL standardization is imperfect — the “Revenue” concept, for example, is filed under different tags depending on the company and sometimes the year. We defaulted to NetIncomeLoss as the primary concept for income data after finding that Revenues was missing for a substantial portion of companies. We document the concept used in the tool response.

Prediction market tools. Three markets, five tools: Kalshi, Polymarket, and Manifold. Polymarket has two tools — market search and an events tool for grouped collections of related markets. Kalshi and Manifold each have a search tool. A fifth cross-market tool, compare_prediction_markets, queries all three sources simultaneously for a given topic and returns the results side-by-side for easy consensus or divergence analysis. Prediction markets are the most time-sensitive data we serve — prices on active contracts move continuously. Our fresh TTL is 60 seconds; we serve stale data up to 1 hour if the source is unavailable.

Government spending tools. The DOGE (Department of Government Efficiency) API tracks federal spending — contracts and grants data. We provide three tools: contract terminations, grant terminations, and a savings summary tool that aggregates across both endpoints to produce a combined total with a breakdown by agency. Fresh TTL is 600 seconds; this data doesn’t move fast.

Why Durable Objects

The MCP protocol uses a persistent connection model. Clients connect, exchange messages, and tools are called within that session. This is fundamentally different from stateless HTTP — you can’t just spin up a random Worker instance per request, because the protocol state lives on the connection.

Cloudflare Durable Objects are the right solution: each MCP session gets a Durable Object instance, which provides a single-threaded JavaScript environment with its own SQLite storage (for session state if needed) and a stable address. The gateway Worker (index.ts) routes /mcp and /sse requests to the appropriate DO instance.

The gateway also handles everything that should happen before the session starts: CORS (explicit origin allowlist, not wildcard), Bearer token auth, and rate limiting. The Durable Object sees only authenticated, rate-limited connections. This keeps the MCP server clean — it registers tools and handles calls, it doesn’t deal with auth logic.

The Kalshi 429 Problem

Kalshi rate-limits aggressively from Cloudflare IP ranges. This makes sense — Cloudflare Workers share egress IPs across thousands of customers, so a rate-limit bucket that should apply to one origin ends up applying to many simultaneous unrelated callers.

Our mitigation is layered:

First, the Cache API. Workers have access to named cache stores via caches.open(), scoped to the Cloudflare datacenter. We cache successful Kalshi responses for 60 seconds in a named cache. Most requests within a datacenter hit the cache, not Kalshi’s servers.

Second, exponential backoff on 429s. If the cache misses and the live request gets rate-limited, we retry with exponential delays starting at 500ms (500ms, 1s, 2s). We cap at 3 retries before returning a stale cache hit (up to 1 hour old) if one exists.

Third, we treat stale data as better than an error. A prediction market price that’s 45 minutes old is not ideal, but it’s better than a 429 error reaching the agent. We mark stale responses in the tool output so the agent knows the data age and can surface that to the user if relevant.

Polymarket and Manifold don’t rate-limit Cloudflare IPs as aggressively, but they get the same caching treatment. The cache implementation is shared across all data sources via cachedFetch() in cache.ts.

MCP vs REST for AI Agents

The question we get most is: why MCP at all? You could build a REST API and tell the AI to use it.

The answer is in how agents consume data. A REST API requires the agent to know the URL scheme, construct HTTP requests, parse response bodies, handle errors, and understand pagination. These are all things an LLM can do, but doing them correctly requires extensive prompt engineering and the agent still makes mistakes — wrong URLs, missing required parameters, misinterpretation of error responses.

MCP tools have typed parameters and structured return values. The tool schema tells the agent exactly what inputs are accepted and what type each one is. When the agent calls search_filings with company_id: "AAPL" and form_type: "10-K", the protocol validates the call before it reaches our code. The agent doesn’t have to guess the URL format.

The return types are also structured. A XBRL financial tool returns a typed object with labeled fields, not a blob of JSON that the agent has to parse and interpret. The agent can reason about revenue: 391000000000 more reliably than it can parse "$.financialData[0].items[?(@.label=='Revenue')].value".

The practical result: agents using the MCP tools make fewer invalid calls, handle errors more gracefully, and produce more accurate responses than the same agents hitting a REST API. We measured this informally during development by comparing Claude’s output with MCP tool access versus web browsing + parsing for the same financial questions. The MCP path was faster and more accurate on structured data retrieval.

Rate Limiting for AI Traffic

AI agent traffic is bursty in a way that human API traffic isn’t. A user researching a company might ask three related questions in quick succession, each triggering 4-5 tool calls. That’s 15 tool calls in a few seconds.

Our rate limits are daily buckets: 100 calls/day for free tier, 10,000 for pro, 100,000 for enterprise. KV stores the counter per API key, incremented on each authenticated request. The KV key uses 16 hex chars from the SHA-256 of the API key — enough for collision resistance without bloating the KV namespace.

We deliberately chose daily buckets over per-minute or per-hour buckets for the MCP server. Per-minute limits punish bursty agent sessions. A developer building a portfolio analysis agent shouldn’t get throttled because they ran a deep research session at 11pm, even if they’d normally stay well under daily limits.

What We’d Build Differently

One thing: we’d add streaming tool results from the start. Some EDGAR searches return large result sets — dozens of filing records. The current tools return complete responses, which works, but streaming would let agents start processing results before the full response arrives. The MCP protocol supports streaming; we just haven’t implemented it yet. For prediction markets where we’re returning small payloads, it doesn’t matter. For EDGAR filing search on a company with 20 years of 10-K history, it would.

The other thing is observability granularity. We log every authenticated MCP connection and every tool call as structured JSON to Workers Logs. What we don’t log yet is the time breakdown: how long did auth take, how long did the cache check take, how long did the upstream API call take. This would make it much easier to diagnose latency complaints for specific tools without reproducing the exact call sequence.

Both of these are solvable without architectural changes. The foundation — Durable Objects for session state, shared cache layer, three-tier auth, explicit CORS — is holding up well.