- Trams Downunder

Re: Sydney train commuters to get free transport day after rail network outage causes chaos | Sydney | The Guardian
bblunt3473

TfNSW's social media was telling us to "check the screens for updated information".
Yeh, right?
Brian
On Friday, 10 March 2023 at 11:57:16 am AEDT, Tony Galloway arg@...> wrote:

One of the reasons it took so long for solid state systems to replace relays in seriously vital signalling and interlocking applications was building in sufficient redundancy to guarantee the same level of safety. A system like US&S Microlok, used in NSW, generates 3 different algorithms that have to agree before it clears a route set by the signaller. And the systems also have to fail safe and not give false clears - wrong side failures.

The relay based systems have many proven ways to enable this, with vital relays in NSW (my experience) having metal on carbon contacts so metal contacts can't weld together, other relay systems that have all metal vital contacts have all top contacts (normally open) proven by bottom contacts that are normally closed (normal means the relay isn’t energised) energising “proofing” circuits. At signals with train stops other proving circuits go through train stop contacts to ensure the train stop corresponds to the signal indication, and that it has operated properly - a train stop contact operates a stick relay when the arm rises, which enables the signal to clear once the section it protects is unoccupied, if the arm doesn’t rise because the train stop is defective, the signal won’t clear.

The difference between vital and non-vital components of signalling is the non-vital controls the vital functions but cannot override any interlocking that is protecting against conflicting moves being set by a signaller. That is a real definition, having something as external and peripheral as a radio comms system declared to be as “vital” as this basic signalling principle is a false equivalence as while it is useful, it’s not vital and not having it doesn’t render train operation unsafe.

Tony

> On 10 Mar 2023, at 10:18, Matthew Geier matthew@...> wrote:

>

> On 10/3/23 09:32, TP wrote:

>> Typically, with a failure of automated tech, there's a human being or three in the chain that led to the failure. So, yes, technology is great, until humans start interfering in it, from setting up through to operation.

>

> It was also designed and built by failure prone humans.

>

> But Wednesday problem in Sydney was entirely avoidable. It was only because some one defined the train radio system as 'vital' (as it's an important part of the incident management system) that caused the shutdown.

>

> A 'vital' system had crashed, so they stopped everything with out any real understanding of where this system sat in the safety system chain. Traffic ontrol was working, all interlockings were working. All what we consider 'traditional' safety systems were all operating properly.

>

> There is also an expectation that such systems will be 'redundant' and 'fail over' to backups when there are problems. And when the fail over doesn't happen, people sit there dazed and confused as they were told it's 'fail safe' and they never practiced this scenario of an actual systems failure.

>

> My personal experience with corporate IT systems and 'fail over redundancy' is that the redundancy system components introduce additional complexity and fail more often than the underlying systems they are supposed to protect.

>

> But true redundant system design and implementation is fiendishly difficult. (and expensive.) In particular you really need to do all development twice with different teams with each system component being able to swap in for a component developed by the other team. (And all this is exhaustively tested at all stages of the development process). If you just take the primary system and build a duplicate using the same hardware and software, more often than not a system bug that takes out the primary will also take out the secondary when it takes over and gets to the same scenario that killed the primary. Seen that happen and cripple my work organization when the central 'redundant data store' went down. The primary controller hit a situation it couldn't handle and crashed - the secondary took over, hit the SAME situation and also crashed. The secondary controller was just a copy of the primary, so it had exactly the same bugs.

>

> --

> You received this message because you are subscribed to the Google Groups "TramsDownUnder" group.

> To unsubscribe from this group and stop receiving emails from it, send an email totramsdownunder+unsubscribe@....

> To view this discussion on the web visit https://groups.google.com/d/msgid/tramsdownunder/69bc66ca-51c4-6ff2-594e-065eecc203fc%40sleeper.apana.org.au.

Show full size

1678410710600blob | 390W x 260H | 182.64 KB |