No excuse for derailing station IT

No excuse for derailing station IT

Marie Clutterbuck, CMO at Tectrade, discusses the rail sector’s IT vulnerabilities, particularly at stations, and outlines the benefits of zero-day recovery IT architecture in Rail Technology Magazine.

Are you truly a commuter until you’ve reached a mainline station at 6pm on a muggy summer Friday evening and experienced the dreaded digital signage failure? Major stations – which are already an uncomfortable place to be when surrounded by thousands of irate commuters wanting to get home at the end of the week – are made even worse when nobody knows what is going on. Situations like this might be a boon to the pub around the corner, but for passengers and staff alike, these kinds of outages can cause all manner of nightmares.

With rail fares making Britain the second-most expensive rail network in Europe (us Brits pay 54p per mile), and costs rising by 36% since 2010 (2.6 times more than the increase in average earnings), operators are under an ever-increasing amount of pressure to ensure smooth and efficient service. However, with 14% of trains missing the industry measure of punctuality between mid-2017 and mid-2018, that is evidentially not being delivered. 

IT is obviously not responsible for all aspects of rail operations, but when it comes to the station itself almost all aspects are defined by interconnectivity. Some people may romanticise the clackety wooden panels and hand-punched tickets of yesteryear, but today’s technological innovations have undoubtedly improved the efficiency of station operations. From the ability to use smartphones as tickets to the digital departure boards informing passengers of platforms or delays, technology has radically changed the way that we travel for the better. However, there is a trade-off to this. While theoretically these systems will mean greater efficiency for the station, problems will emerge should the IT systems responsible suffer outages.

Causes of outages

Ransomware wrought havoc in 2017 and 2018 to businesses and consumers alike. The WannaCry cyber attack alone affected more than 200,000 victims and spread to over 150 countries. Ransomware, which takes computers and their data hostage with the demand of cryptocurrency, has the capability to grind operations to a halt. This was the case for the NHS, where at least 80 out of 236 hospital trusts in England, as well as 603 primary care and affiliate NHS organisations, suffered disruption in the highest-profile UK ransomware attack.

More relevant to the rail industry was the May 2017 attack on the German Deutsche Bahn (DB), which saw 450 computers infected with WannaCry. This brought down passenger information systems, ticket machines, and CCTV networks. The DB case is evidence that the rail industry was overly confident in its vulnerabilities, to ransomware in particular. Other transit system outages tell the same story: the San Francisco Mamba ransomware attack in 2016 which forced the city to allow passengers to ride for free; and a ransomware attack on Sacramento’s transit system which deleted an estimated 30 million files.

This is an issue affecting industries of all shapes and sizes, and a significant reason why is because attitudes and focus towards cybersecurity have not always been matched by those towards day-to-day IT. That is to say, the outlay in spend on exciting new applications has historically been a priority over that on operations protection and maintenance of existing investments. This is not necessarily malicious. Many people do not associate an Industrial Control System (ICS) environment with IT, and as such there is lack of awareness about cybersecurity’s criticality. Preventing ransomware at rail stations is as much about spending appropriately on cybersecurity as it is about raising awareness.

However, outages are not always caused by malicious actors. “It’s probably leaves on the tracks” is the catch-all line of thinking that many people use when it comes to unexplained delays. While leaves on the ground of a train station will not do much to alter service, uncontrollable natural forces can come into play to have a serious effect on IT infrastructure. Areas that experience sever weather are likely to suffer power and, consequently, network outages. The UK is seldom subject to the sort of weather conditions that cause blackouts, but accidents do happen. A burst pipe, or workers drilling through electrical or network cables, for example, could cause a power outage at a station for an indeterminate period of time. With these types of situations having the potential to cause significant issues, operators must consider the effect that the environment can play on their operations.

Unlike in an office where any outages, as costly as they might be to the business, ultimately won’t stop employees from going about their day-to-day lives, downtime at rail stations can have serious knock-on effects. Millions of people rely on trains daily, and while they might groan at the prospects of delays, it is cancellations that leave travellers – the customers – most disappointed. Tom Hanks might have had an interesting time in ‘The Terminal,’ but I don’t think thousands of commuters unable to get to and from work would be quite so accommodating. And, ultimately, it is the staff at the stations that will suffer as a result.

Arriving at the destination on time

In the event of a station going offline, operators need a way to get their system back online – and fast. But before they can do that, they must first understand the IT systems and data that are driving the ICS environment.

Shockingly, the majority of organisations that do not have a system fully-managed with external support see almost 25% of their nightly backups fail. Most businesses will have no idea what is in that lost or unavailable data. If that data was, for example, records of ticket purchases, you’d be forced to go back potentially 48 hours or more, depending on when that last backup failed.

With this in mind, operators need to perform disaster recovery testing on their data. Without testing in a controlled and simulated environment, it is impossible for IT and security teams to fully understand their system’s integrity. It’s exactly the same reason why we’re always told to regularly test fire alarms. You don’t want to discover your fire alarm doesn’t work when you most need it, just like how you don’t want to find out your disaster recovery system is ineffective in the event of an outage.

With this knowledge, station operators should look to a zero-day recovery architecture that allows them to priorities workloads and quickly bring them back online without paying a ransom or worrying about whether the workload is compromised. An evolution of the 3-2-1 backup rule (three copies of your data stored on two different media and one backup kept offsite), zero-day recovery enables an IT department to partner with the cyber team and create a set of policies which define the architecture for what they want to do with data backups being stored offsite, normally in the cloud. This policy could, for example, mean that a particular workload – such as ticketing machines – needs to be brought back into the system within 20 minutes, while another workload can wait a couple of days.

Whether it’s because of malware or a power surge, outages in 2019 are a question of when, not if. With a great deal of the infrastructure at a train station relying on IT, it is vital that operators look to a solution that can minimise outages to keep the station moving along and the rat race undisturbed.

See the full article in Rail Technology Magazine or online here