Deep dive: two Irish MCPs under the hood

How Irish Rail MCP (live XML API) and HSE Service Finder MCP (curated dataset) are implemented on Cloudflare Workers — two contrasting patterns behind the same MCP contract.

The kid-friendly explainer is a good mental model, but here is what is actually going on under the hood for two contrasting examples — one that wraps a live upstream API (Irish Rail) and one that serves a curated static dataset (HSE). Both share the same transport and protocol layer.

Shared skeleton

All MCPs in the irish-mcps repo follow the same shape:

Runtime: Cloudflare Workers (V8 isolates, no cold-start warmup, free tier).
Transport: MCP Streamable HTTP — plain JSON-RPC 2.0 over POST to /mcp. No Durable Objects, no SSE, no WebSockets, no session state. Every call is stateless.
Handler surface: initialize, notifications/initialized (returns 204), ping, tools/list, tools/call. Unknown methods return JSON-RPC error -32601.
CORS: Access-Control-Allow-Origin: * so the in-browser playground at irishmcp.ie can call the worker directly.
Entrypoint: export default { fetch(req) } with routes /, /health, /mcp.
Registration: a row in the Supabase mcps table with endpoint_url pointing at https://<subdomain>.irishmcp.ie/mcp. The Next.js /api/playground route proxies browser calls through to the worker.

This is why both workers fit in a single src/index.ts file — the MCP plumbing is ~80 lines of JSON-RPC switch statement, and the rest is domain logic.

Example 1 — Irish Rail MCP (live upstream API)

Source: irish-mcps/irish-rail-mcp/src/index.ts

Upstream: http://api.irishrail.ie/realtime/realtime.asmx — a legacy ASP.NET SOAP-ish endpoint that returns XML (no JSON, no auth, no rate-limit docs). Endpoints used: getAllStationsXML, getStationDataByCodeXML, getCurrentTrainsXML, getTrainMovementsXML.

Data flow per tool call

Worker receives tools/call JSON-RPC POST.
Dispatches to a tool handler (e.g. getStationTrains).
Handler builds a URL with query params and fetch()es the Irish Rail XML endpoint with a User-Agent: IrishMCP/1.0 header.
Response body is parsed with a hand-rolled regex XML parser (parseObjects(xml, tag, fields)) — no DOMParser because Cloudflare Workers do not ship one, and pulling in a full XML lib for this shape is overkill. The parser loops <tag>...</tag> blocks and extracts named child elements into a Record<string,string>.
The parsed rows are formatted into a human-readable multi-line string and returned as { content: [{ type: "text", text }] }.

Tools exposed

Tool	Purpose	Notable params
`query`	Natural-language router — pattern-matches keywords to one of the specific tools	`query: string`
`get_all_stations`	Dumps ~145 stations with codes + GPS	none
`get_station_trains`	Arrivals/departures at a station	`station_code`, `mins_ahead` (capped at 90)
`get_current_trains`	All trains live on the network	`train_type`: A/M/D/S (all/mainline/DART/suburban)
`get_train_movements`	Full schedule + realtime status for one train	`train_id`, `train_date` ("DD MMM YYYY")

Interesting bits worth stealing

Station resolution is two-tier. findStationCode() first hits a hardcoded dictionary of ~60 common station names → 5-letter codes (fast, no network). On miss, lookupStationCode() falls back to getAllStationsXML and does a whole-phrase match against StationDesc/StationAlias with a custom (?<![a-z])phrase(?![a-z]) boundary regex (plain \b does not work reliably around apostrophes and spaces in station names).
query tool is a deliberate ergonomics layer. The MCP spec wants discrete typed tools, but LLMs often generate sloppy free-form queries. The query tool accepts a string, pattern-matches intent ("movement" / "all station" / "current" / fallthrough to station lookup), and delegates. This cuts down on tool-picking mistakes by the model.
No caching. Train positions change every few seconds, so every call hits upstream. Irish Rail's API has been stable under this load; if it were not, caches.default (the Workers Cache API) with a 10–30s TTL would be the drop-in fix.
Error surface is thin. railGet throws on non-2xx; the top-level tools/call catches and returns JSON-RPC -32000 with the message. No retries — LLMs retry naturally if the response looks wrong.

Failure modes you will hit: Irish Rail occasionally returns HTML error pages instead of XML (the regex parser just returns []), and the date format for getTrainMovementsXML is unforgiving — "8 Mar 2026" fails, "08 Mar 2026" works. The tool description spells this out so the model formats correctly.

Example 2 — HSE Service Finder MCP (curated static dataset)

Source: irish-mcps/hse-service-finder-mcp/src/index.ts

Upstream: none at runtime. The HSE publishes facility listings at https://www.hse.ie/eng/services/list/ but there is no public API — it is HTML pages intended for humans. Scraping them at request time would be slow, fragile, and legally iffy. So this MCP is built around an in-code reference dataset: a Facility[] literal with ~40 entries covering every public acute hospital, 24/7 ED, local injury unit, maternity and paediatric hospital in Ireland.

Shape of each facility

type Facility = {
  name: string;
  slug: string;
  type: "acute-hospital" | "injury-unit" | "maternity" | "paediatric";
  county: string;
  region: string;         // Hospital Group
  address: string;
  phone: string;
  has_ed: boolean;        // 24/7 Emergency Department
  trauma_level?: "major-trauma" | "trauma-unit" | "injury-unit";
  url: string;
};

Data flow per tool call

Worker receives tools/call.
Handler filters/sorts the in-memory FACILITIES array.
Formats matches with formatFacility() into multi-line text.
Returns — no network I/O at all, so p50 latency is effectively the TLS handshake + worker startup (~5–15 ms from Europe).

Tools exposed

Tool	Purpose
`list_hospitals`	Filter by `county`, `ed_only`, `type`
`search_hospitals`	Weighted substring search across name/county/address (name match scores 2, county/address score 1)
`list_counties`	Coverage summary: facilities per county, ED count per county
`get_facility`	Deep-link lookup by slug
`query`	NL router (same pattern as Irish Rail)

Interesting bits worth stealing

norm() is the whole matching story. s.toLowerCase().replace(/[^a-z0-9]+/g, " ").trim() collapses "St. James's", "St James's", and "st james" into the same canonical form. Good enough, no fuzzy-search library required.
Weighted scoring instead of boolean match. search_hospitals gives 2 points for a name hit and 1 for county/address, then sorts descending. This handles "cork" correctly — "Cork University Hospital" ranks above other Cork-county facilities whose address merely contains "Cork".
Dataset lives in Git, not a DB. Updates are a PR + redeploy. For ~40 rows that change a few times a year, a Supabase table or KV would be over-engineered. It also means the dataset is versioned and reviewable.
Why not scrape on demand? (a) HSE HTML structure drifts; (b) latency budget for an MCP call is low; (c) the dataset is small and enumerable. If the shape were "thousands of entries with frequent updates", this would flip to a scheduled scraper → KV/D1 → worker reads.

Failure modes: the dataset can go stale (phone number changes, a new injury unit opens). Mitigation is social — the source_url field points back at the HSE listing so users can verify, and the Supabase row for the MCP carries a source_url too.

Side-by-side

	Irish Rail	HSE Service Finder
Data freshness	Real-time (every call)	Static (redeploy to update)
Upstream dependency	Irish Rail SOAP API	None at runtime
Parsing	Regex XML parser	None — native TS objects
Tail latency	Bounded by upstream (100–800 ms)	Bounded by Worker cold path (~10 ms)
Failure surface	Upstream down, XML shape drift	Dataset staleness
Right call when	Source has a stable API and data changes faster than you can redeploy	Source has no API, data is small, updates are rare

Both converge on the same MCP contract, which is the point: the protocol hides the difference between "I am scraping XML from a 2008 SOAP endpoint" and "I am filtering an in-memory array", and the LLM calling the tool does not need to care.