State blames balky firewall for 911 outage

By adamg on Wed, 06/19/2024 - 1:11pm

State 911 officials report a firewall aimed at keeping hackers out of the system instead started blocking calls from the statewide 911 system to local emergency dispatch centers around 1:15 p.m. yesterday.

Officials say Comtech, the company that runs the statewide system - which routes 911 calls to the appropriate local dispatchers or State Police - has determined that the problem was not due to any hackers but that "the exact reason the firewall stopped calls from reaching dispatch centers remains under review."

Still, the state adds, Comtech did some sort of fiddling to "ensure that this does not happen again," whatever it was.

Also, the state "has not received any reports of emergencies impacted during the interruption" during the two hours 911 wasn't working in Massachusetts.

Neighborhoods:

Boston

Free tagging:

911

Ad:

Like the job UHub is doing? Consider a contribution. Thanks!

Comments

911 Single Point of Failure

By markg on Wed, 06/19/2024 - 1:35pm.

How can the failure of single network element bring down the entire 911 network? Prudent network design provides alternative traffic paths, instead of an apparent single point of failure. One should also question why the system isn't regionalized to protect against state wide outages.

Overall an inexcusable failure of network design by the vendor and oversight by the commonwealth.

Speaking from experience

By BostonDog on Wed, 06/19/2024 - 1:51pm.

By far, most widespread network outages at the enterprise level are cause by a misconfiguration of a firewall or switch. ("Guy on a backhoe" used to be #1 but that's less common now.)

I don't disagree, This Should Never Happen, but even the biggest tech companies have hours-long outages for the same reason.

I agree

By markg on Wed, 06/19/2024 - 3:12pm.

Firewall or Switches are often the cause of major outages. In my experience you can design in additional levels of redundancy, at the cost of additional management complexity and money. Cost is usually the driver.

I maintain segmenting the network might provide additional redundancy. Upon reflection I don’t understand call flows well enough to know for sure, especially how call referral works. I.E. rerouting calls between sites an interesting problem if all nodes in the network are no longer peers.

IT Manager here

By cybah on Wed, 06/19/2024 - 4:22pm.

yup. The driver here is usually cost.

I think.. THINK.. the state uses a Cisco Unified phone system.. yes even for 911 operators. And to do HA with firewalls and maintain the phone connections (UC or SIP) can be very costly. Because once something fails, the HA has to kick over to anther FW AND remember the states of calls (meaning if a FW goes down mid-call, does it pass the call and its data to the other firewall). Without expensive firewall add ons, this doesn't work. Also doesn't work too well if you are using non-Cisco firewalls. (my company has this issue).

Kinda irks that this might be a cost thing. How much $ is in the state budget and how important is EMS and 911 to everyone? You think cost would not be a barrier here.

please i used to work in IT for porn and we had immediate HA switchovers.. including to an off shore site. I think at the time, we were very high tech as this was uncommon back then. Management's thought process was, every minute viewers cannot view our stuff (and pay for subscriptions) was money lost, so we were told "make it happen" and were given a blank check.

Of course it's cost

By merlinmurph on Wed, 06/19/2024 - 6:15pm.

First, this is the state, not a profit-seeking entity.

Next, IT is viewed as an expense and something like the 911 system is not going to have investments, er expenses, in redundancy. Just not going to happen.

Lawsuits

By cybah on Thu, 06/20/2024 - 8:07am.

It would take ONE lawsuit from someone who died during the outage because they could not dial 911 to exceed the budget for redundancy.

We also PAY a tax for 911 service and many providers charge for this service (VoIP) so outside of the general fund, this system is funded nicely and should be built for redundancy.

I'm going to assume you are not 'in the know' so you might be wrong here.

Low bidder

By TOFD on Thu, 06/20/2024 - 9:18am.

when Verizon did Mass E911 they had 4 switch redundancy...

How much do you spend?

By merlinmurph on Thu, 06/20/2024 - 9:57am.

Yes, IT is not my field and I am not "in the know", as you say. I am/was a software engineer.

There is virtually no limit how much one can spend to make the system bomb-proof. Where does it end? No matter how much you spend, there are always more things one can do to make it better. This is a decision made every day by IT groups. You do this, right? Can you absolutely guarantee that the systems you protect(ed) will never fail? I seriously doubt it. There is always a non-zero chance something can happen. That's all I'm saying.

This is like software bugs. There are always bugs, regardless of the amount of testing done and the amount of time the software is with customers.

The big multi-state service

By Dave on Wed, 06/19/2024 - 2:23pm.

The big multi-state service outage a while back was reportedly caused by some work that involved one light pole being installed.

I just

By cybah on Wed, 06/19/2024 - 3:03pm.

I just look at other state agencies and how their IT is run to know... I am not surprised at all. Not surprised at all.

Comtech

By Kaz on Wed, 06/19/2024 - 1:56pm.

I think you meant Comtech:

https://comtech.com/press-releases/2024/05/21/commonwealth-of-massachuse...

Thanks

By adamg on Wed, 06/19/2024 - 2:10pm.

Typo-b-gone.

Somebody call 911!

By Friartuck on Wed, 06/19/2024 - 3:02pm.

Shawty fire burning on the dance floor, whoa

Shorter Comtech

By SomervilleSteve on Wed, 06/19/2024 - 4:43pm.

Shorter Comtech: “We don’t know why this happened, AND we’ve ensured it won’t happen again.”

Uh-huh.

Not impossible.

By Tim Mc. on Thu, 06/20/2024 - 9:42am.

At work we often do root cause analysis for outages, and we might identify a number of contributing issues to the same incident. So for instance there might be a failure to test a change in a staging environment first (due to missing step in runbook, or lax culture, or a broken staging environment, etc.) and then the "actual" bug (a code error). You can fix the first part even if you haven't tracked down the exact details of the second part yet.

(No particular opinion on whether this is what Comtech is talking about, or whether they're just bullshitting.)

State blames balky firewall for 911 outage

Comments

Support Universal Hub

In other Boston news