by Dimitris Gizopoulos on Sep 16, 2024 | Tags: fault tolerance, Reliability, silent data corruptions
Data center hyperscalers (Meta, Google, Alibaba) have disclosed over the last four years an unexpectedly high number of CPUs (~1 in 1000) that produce Silent Data Corruptions (SDCs), i.e. program executions that produce wrong results without any observable indication....
Read more...