Blog/Deep dive: two Irish MCPs under the hood

Deep dive: two Irish MCPs under the hood

How Irish Rail MCP (live XML API) and HSE Service Finder MCP (curated dataset) are implemented on Cloudflare Workers — two contrasting patterns behind the same MCP contract.

1 day ago

The kid-friendly explainer is a good mental model, but here is what is actually going on under the hood for two contrasting examples — one that wraps a live upstream API (Irish Rail) and one that serves a curated static dataset (HSE). Both share the same transport and protocol layer.

Shared skeleton

All MCPs in the irish-mcps repo follow the same shape:

  • Runtime: Cloudflare Workers (V8 isolates, no cold-start warmup, free tier).
  • Transport: MCP Streamable HTTP — plain JSON-RPC 2.0 over POST to /mcp. No Durable Objects, no SSE, no WebSockets, no session state. Every call is stateless.
  • Handler surface: initialize, notifications/initialized (returns 204), ping, tools/list, tools/call. Unknown methods return JSON-RPC error -32601.
  • CORS: Access-Control-Allow-Origin: * so the in-browser playground at irishmcp.ie can call the worker directly.
  • Entrypoint: export default { fetch(req) } with routes /, /health, /mcp.
  • Registration: a row in the Supabase mcps table with endpoint_url pointing at https://<subdomain>.irishmcp.ie/mcp. The Next.js /api/playground route proxies browser calls through to the worker.

This is why both workers fit in a single src/index.ts file — the MCP plumbing is ~80 lines of JSON-RPC switch statement, and the rest is domain logic.


Example 1 — Irish Rail MCP (live upstream API)

Source: irish-mcps/irish-rail-mcp/src/index.ts

Upstream: http://api.irishrail.ie/realtime/realtime.asmx — a legacy ASP.NET SOAP-ish endpoint that returns XML (no JSON, no auth, no rate-limit docs). Endpoints used: getAllStationsXML, getStationDataByCodeXML, getCurrentTrainsXML, getTrainMovementsXML.

Data flow per tool call

  1. Worker receives tools/call JSON-RPC POST.
  2. Dispatches to a tool handler (e.g. getStationTrains).
  3. Handler builds a URL with query params and fetch()es the Irish Rail XML endpoint with a User-Agent: IrishMCP/1.0 header.
  4. Response body is parsed with a hand-rolled regex XML parser (parseObjects(xml, tag, fields)) — no DOMParser because Cloudflare Workers do not ship one, and pulling in a full XML lib for this shape is overkill. The parser loops <tag>...</tag> blocks and extracts named child elements into a Record<string,string>.
  5. The parsed rows are formatted into a human-readable multi-line string and returned as { content: [{ type: "text", text }] }.

Tools exposed

ToolPurposeNotable params
queryNatural-language router — pattern-matches keywords to one of the specific toolsquery: string
get_all_stationsDumps ~145 stations with codes + GPSnone
get_station_trainsArrivals/departures at a stationstation_code, mins_ahead (capped at 90)
get_current_trainsAll trains live on the networktrain_type: A/M/D/S (all/mainline/DART/suburban)
get_train_movementsFull schedule + realtime status for one traintrain_id, train_date ("DD MMM YYYY")

Interesting bits worth stealing

  • Station resolution is two-tier. findStationCode() first hits a hardcoded dictionary of ~60 common station names → 5-letter codes (fast, no network). On miss, lookupStationCode() falls back to getAllStationsXML and does a whole-phrase match against StationDesc/StationAlias with a custom (?<![a-z])phrase(?![a-z]) boundary regex (plain \b does not work reliably around apostrophes and spaces in station names).
  • query tool is a deliberate ergonomics layer. The MCP spec wants discrete typed tools, but LLMs often generate sloppy free-form queries. The query tool accepts a string, pattern-matches intent ("movement" / "all station" / "current" / fallthrough to station lookup), and delegates. This cuts down on tool-picking mistakes by the model.
  • No caching. Train positions change every few seconds, so every call hits upstream. Irish Rail's API has been stable under this load; if it were not, caches.default (the Workers Cache API) with a 10–30s TTL would be the drop-in fix.
  • Error surface is thin. railGet throws on non-2xx; the top-level tools/call catches and returns JSON-RPC -32000 with the message. No retries — LLMs retry naturally if the response looks wrong.

Failure modes you will hit: Irish Rail occasionally returns HTML error pages instead of XML (the regex parser just returns []), and the date format for getTrainMovementsXML is unforgiving — "8 Mar 2026" fails, "08 Mar 2026" works. The tool description spells this out so the model formats correctly.


Example 2 — HSE Service Finder MCP (curated static dataset)

Source: irish-mcps/hse-service-finder-mcp/src/index.ts

Upstream: none at runtime. The HSE publishes facility listings at https://www.hse.ie/eng/services/list/ but there is no public API — it is HTML pages intended for humans. Scraping them at request time would be slow, fragile, and legally iffy. So this MCP is built around an in-code reference dataset: a Facility[] literal with ~40 entries covering every public acute hospital, 24/7 ED, local injury unit, maternity and paediatric hospital in Ireland.

Shape of each facility

type Facility = {
  name: string;
  slug: string;
  type: "acute-hospital" | "injury-unit" | "maternity" | "paediatric";
  county: string;
  region: string;         // Hospital Group
  address: string;
  phone: string;
  has_ed: boolean;        // 24/7 Emergency Department
  trauma_level?: "major-trauma" | "trauma-unit" | "injury-unit";
  url: string;
};

Data flow per tool call

  1. Worker receives tools/call.
  2. Handler filters/sorts the in-memory FACILITIES array.
  3. Formats matches with formatFacility() into multi-line text.
  4. Returns — no network I/O at all, so p50 latency is effectively the TLS handshake + worker startup (~5–15 ms from Europe).

Tools exposed

ToolPurpose
list_hospitalsFilter by county, ed_only, type
search_hospitalsWeighted substring search across name/county/address (name match scores 2, county/address score 1)
list_countiesCoverage summary: facilities per county, ED count per county
get_facilityDeep-link lookup by slug
queryNL router (same pattern as Irish Rail)

Interesting bits worth stealing

  • norm() is the whole matching story. s.toLowerCase().replace(/[^a-z0-9]+/g, " ").trim() collapses "St. James's", "St James's", and "st james" into the same canonical form. Good enough, no fuzzy-search library required.
  • Weighted scoring instead of boolean match. search_hospitals gives 2 points for a name hit and 1 for county/address, then sorts descending. This handles "cork" correctly — "Cork University Hospital" ranks above other Cork-county facilities whose address merely contains "Cork".
  • Dataset lives in Git, not a DB. Updates are a PR + redeploy. For ~40 rows that change a few times a year, a Supabase table or KV would be over-engineered. It also means the dataset is versioned and reviewable.
  • Why not scrape on demand? (a) HSE HTML structure drifts; (b) latency budget for an MCP call is low; (c) the dataset is small and enumerable. If the shape were "thousands of entries with frequent updates", this would flip to a scheduled scraper → KV/D1 → worker reads.

Failure modes: the dataset can go stale (phone number changes, a new injury unit opens). Mitigation is social — the source_url field points back at the HSE listing so users can verify, and the Supabase row for the MCP carries a source_url too.


Side-by-side

Irish RailHSE Service Finder
Data freshnessReal-time (every call)Static (redeploy to update)
Upstream dependencyIrish Rail SOAP APINone at runtime
ParsingRegex XML parserNone — native TS objects
Tail latencyBounded by upstream (100–800 ms)Bounded by Worker cold path (~10 ms)
Failure surfaceUpstream down, XML shape driftDataset staleness
Right call whenSource has a stable API and data changes faster than you can redeploySource has no API, data is small, updates are rare

Both converge on the same MCP contract, which is the point: the protocol hides the difference between "I am scraping XML from a 2008 SOAP endpoint" and "I am filtering an in-memory array", and the LLM calling the tool does not need to care.