Deep dive: two Irish MCPs under the hood
How Irish Rail MCP (live XML API) and HSE Service Finder MCP (curated dataset) are implemented on Cloudflare Workers — two contrasting patterns behind the same MCP contract.
The kid-friendly explainer is a good mental model, but here is what is actually going on under the hood for two contrasting examples — one that wraps a live upstream API (Irish Rail) and one that serves a curated static dataset (HSE). Both share the same transport and protocol layer.
Shared skeleton
All MCPs in the irish-mcps repo follow the same shape:
- Runtime: Cloudflare Workers (V8 isolates, no cold-start warmup, free tier).
- Transport: MCP Streamable HTTP — plain JSON-RPC 2.0 over POST to
/mcp. No Durable Objects, no SSE, no WebSockets, no session state. Every call is stateless. - Handler surface:
initialize,notifications/initialized(returns 204),ping,tools/list,tools/call. Unknown methods return JSON-RPC error-32601. - CORS:
Access-Control-Allow-Origin: *so the in-browser playground at irishmcp.ie can call the worker directly. - Entrypoint:
export default { fetch(req) }with routes/,/health,/mcp. - Registration: a row in the Supabase
mcpstable withendpoint_urlpointing athttps://<subdomain>.irishmcp.ie/mcp. The Next.js/api/playgroundroute proxies browser calls through to the worker.
This is why both workers fit in a single src/index.ts file — the MCP plumbing is ~80 lines of JSON-RPC switch statement, and the rest is domain logic.
Example 1 — Irish Rail MCP (live upstream API)
Source: irish-mcps/irish-rail-mcp/src/index.ts
Upstream: http://api.irishrail.ie/realtime/realtime.asmx — a legacy ASP.NET SOAP-ish endpoint that returns XML (no JSON, no auth, no rate-limit docs). Endpoints used: getAllStationsXML, getStationDataByCodeXML, getCurrentTrainsXML, getTrainMovementsXML.
Data flow per tool call
- Worker receives
tools/callJSON-RPC POST. - Dispatches to a tool handler (e.g.
getStationTrains). - Handler builds a URL with query params and
fetch()es the Irish Rail XML endpoint with aUser-Agent: IrishMCP/1.0header. - Response body is parsed with a hand-rolled regex XML parser (
parseObjects(xml, tag, fields)) — noDOMParserbecause Cloudflare Workers do not ship one, and pulling in a full XML lib for this shape is overkill. The parser loops<tag>...</tag>blocks and extracts named child elements into aRecord<string,string>. - The parsed rows are formatted into a human-readable multi-line string and returned as
{ content: [{ type: "text", text }] }.
Tools exposed
| Tool | Purpose | Notable params |
|---|---|---|
query | Natural-language router — pattern-matches keywords to one of the specific tools | query: string |
get_all_stations | Dumps ~145 stations with codes + GPS | none |
get_station_trains | Arrivals/departures at a station | station_code, mins_ahead (capped at 90) |
get_current_trains | All trains live on the network | train_type: A/M/D/S (all/mainline/DART/suburban) |
get_train_movements | Full schedule + realtime status for one train | train_id, train_date ("DD MMM YYYY") |
Interesting bits worth stealing
- Station resolution is two-tier.
findStationCode()first hits a hardcoded dictionary of ~60 common station names → 5-letter codes (fast, no network). On miss,lookupStationCode()falls back togetAllStationsXMLand does a whole-phrase match againstStationDesc/StationAliaswith a custom(?<![a-z])phrase(?![a-z])boundary regex (plain\bdoes not work reliably around apostrophes and spaces in station names). querytool is a deliberate ergonomics layer. The MCP spec wants discrete typed tools, but LLMs often generate sloppy free-form queries. Thequerytool accepts a string, pattern-matches intent ("movement"/"all station"/"current"/ fallthrough to station lookup), and delegates. This cuts down on tool-picking mistakes by the model.- No caching. Train positions change every few seconds, so every call hits upstream. Irish Rail's API has been stable under this load; if it were not,
caches.default(the Workers Cache API) with a 10–30s TTL would be the drop-in fix. - Error surface is thin.
railGetthrows on non-2xx; the top-leveltools/callcatches and returns JSON-RPC-32000with the message. No retries — LLMs retry naturally if the response looks wrong.
Failure modes you will hit: Irish Rail occasionally returns HTML error pages instead of XML (the regex parser just returns []), and the date format for getTrainMovementsXML is unforgiving — "8 Mar 2026" fails, "08 Mar 2026" works. The tool description spells this out so the model formats correctly.
Example 2 — HSE Service Finder MCP (curated static dataset)
Source: irish-mcps/hse-service-finder-mcp/src/index.ts
Upstream: none at runtime. The HSE publishes facility listings at https://www.hse.ie/eng/services/list/ but there is no public API — it is HTML pages intended for humans. Scraping them at request time would be slow, fragile, and legally iffy. So this MCP is built around an in-code reference dataset: a Facility[] literal with ~40 entries covering every public acute hospital, 24/7 ED, local injury unit, maternity and paediatric hospital in Ireland.
Shape of each facility
type Facility = {
name: string;
slug: string;
type: "acute-hospital" | "injury-unit" | "maternity" | "paediatric";
county: string;
region: string; // Hospital Group
address: string;
phone: string;
has_ed: boolean; // 24/7 Emergency Department
trauma_level?: "major-trauma" | "trauma-unit" | "injury-unit";
url: string;
};
Data flow per tool call
- Worker receives
tools/call. - Handler filters/sorts the in-memory
FACILITIESarray. - Formats matches with
formatFacility()into multi-line text. - Returns — no network I/O at all, so p50 latency is effectively the TLS handshake + worker startup (~5–15 ms from Europe).
Tools exposed
| Tool | Purpose |
|---|---|
list_hospitals | Filter by county, ed_only, type |
search_hospitals | Weighted substring search across name/county/address (name match scores 2, county/address score 1) |
list_counties | Coverage summary: facilities per county, ED count per county |
get_facility | Deep-link lookup by slug |
query | NL router (same pattern as Irish Rail) |
Interesting bits worth stealing
norm()is the whole matching story.s.toLowerCase().replace(/[^a-z0-9]+/g, " ").trim()collapses"St. James's","St James's", and"st james"into the same canonical form. Good enough, no fuzzy-search library required.- Weighted scoring instead of boolean match.
search_hospitalsgives 2 points for a name hit and 1 for county/address, then sorts descending. This handles "cork" correctly — "Cork University Hospital" ranks above other Cork-county facilities whose address merely contains "Cork". - Dataset lives in Git, not a DB. Updates are a PR + redeploy. For ~40 rows that change a few times a year, a Supabase table or KV would be over-engineered. It also means the dataset is versioned and reviewable.
- Why not scrape on demand? (a) HSE HTML structure drifts; (b) latency budget for an MCP call is low; (c) the dataset is small and enumerable. If the shape were "thousands of entries with frequent updates", this would flip to a scheduled scraper → KV/D1 → worker reads.
Failure modes: the dataset can go stale (phone number changes, a new injury unit opens). Mitigation is social — the source_url field points back at the HSE listing so users can verify, and the Supabase row for the MCP carries a source_url too.
Side-by-side
| Irish Rail | HSE Service Finder | |
|---|---|---|
| Data freshness | Real-time (every call) | Static (redeploy to update) |
| Upstream dependency | Irish Rail SOAP API | None at runtime |
| Parsing | Regex XML parser | None — native TS objects |
| Tail latency | Bounded by upstream (100–800 ms) | Bounded by Worker cold path (~10 ms) |
| Failure surface | Upstream down, XML shape drift | Dataset staleness |
| Right call when | Source has a stable API and data changes faster than you can redeploy | Source has no API, data is small, updates are rare |
Both converge on the same MCP contract, which is the point: the protocol hides the difference between "I am scraping XML from a 2008 SOAP endpoint" and "I am filtering an in-memory array", and the LLM calling the tool does not need to care.