Gauging on-time performance of commuter-rail lines and trains

Sharon Machlis grabbed a couple of months of commuter-rail performance data and created a site that lets you see on-time performance for lines and trains.

Cool site and nice to have access to the data like this.

One quibble:

(Excludes cancelled trains with no arrival times)

There needs to be some sort of graph or metric that at least tells us how many trips were just shit-canned completely. Traditionally that has been the way the commuter rail operators have upped their on-time statistics - by just dumping out of the analysis canceled trips. "We weren't late - we just never showed up!" From the riders' perspective if the train arriving after your cancelled trip is on-time, you're still really friggin late from what you planned based on their schedule.

Yes, a cancelled train should use the following train for its arrival time, as well as counting that train separately, since that is the end result. For example, if train '123' was supposed to arrive at 8am but was cancelled, and the next train '456' arrived on time at 9am, train 123 was 60 minutes late, and 456 was on time. That illustrates the actual results, which is what we should want to know.

That a cancelled train should be in the more than 15 minutes late category, since in reality that run (say, train 0613) would be (at least) 24 hours late.

Or they could track cancelled trains. That's probably the better metric. There might be a good reason to cancel a train, but in the end it was cancelled, which has an effect.

I would love for more pedantic specifics on what they count as "arrived". I'm on an awful lot of trains - maybe a majority of my inbounds - that pull out of Back Bay, and then sit on the tracks right at the curve before South Station for a while waiting for dispatch to clear a platform. We have not "arrived" as I can't get off the train. The doors open 10 minutes after the scheduled arrival time, delaying me from my next destination by that long. Is this on-time? The data says that "only" ~17% of inbounds on that line are >5min late, but that seems low.

I'm suspicious that they're counting arrivals as trains that pass the sensors at the curve (or radio pending arrival to dispatch) rather than physical unboardings.

Do you ride at rush hour? There could be a lot of empty off-peak trains which face less terminal congestion and pull up the average.

The site shows the raw data, so you can check the stats on the specific train you ride.

No show?

Same for the Old Colony lines (Kingston, Middleboro, Greenbush), maybe the data for these lines isn't available?

