Beacon Chain Incident #1 — The Case for SSV Based Validator Client Diversity
One of the first major issues we saw with the Beacon Chain was a Prysm issue that caused 70% missed blocks. See how SSV can solve this issue.
*update: Prysmatic Labs has released an official incident retrospective, check it out for an in-depth discussion about the cause and remediation of the event.
Over the weekend starting at epoch 32302 the beacon chain experienced some turbulence as block proposals started to be missed across the entire network due to an edge case issue with Prysm.
This is the first incident to send shockwaves across all of mainnet as suddenly, more than 70% of blocks were missed in a short time period, as well as participation rate falling sharply from its average of 99%.
After 18 epochs on epoch 32320, the issue resolved itself and the Beacon Chain began operating seemingly as normal — but the problem was to return the following day for another 18 epochs — and resolve itself again.
In short, the issue involved the way Prysm viewed the state of Eth1 deposit contract; if this is not viewed correctly, blocks cannot be proposed. The issue was previously known as it occurred in an isolated event back in late January, but was thought to be a one-off edge case;
A fix has now been pushed but stakers across the network experienced delayed and missed attestations and in some cases missed block proposals. The chain continued to reach finality throughout the event but rewards across the entire network were slightly affected.
This comes as a stark warning for heavy network reliance on a single validator client, as estimates point towards ±70% of the network currently using Prysm. Prysm is doing an excellent job facilitating ETH staking, but it would be naive to assume that any system can exist without at least some trials and tribulations along the way.
The ‘simplest’ solution would be for validators to leverage other clients, or in an ideal world to implement active-active redundancy that switches between clients when something goes awry. Unfortunately at the client level, active failover between nodes operating on different clients is currently impossible as specifications dictate that validator clients must communicate with client-specific beacon node instances.
Active-passive redundancy configurations are by definition complicated to achieve making these solutions somewhat inaccessible for at-home validators and introducing pain points for infrastructure providers. The plain truth is that it is difficult to achieve optimal client-level redundancy in a singular staking setup.
SSV (Secret Shared Validators) is the first secure and robust way to split a validator key into ‘shares’ for ETH staking between non-trusting machines or operators. A sophisticated multisig governed by a consensus algorithm. Each machine does not need to trust the other to operate, and no machine can recreate a validator key signature on its own.
An easy use case example is a network of trusted operators providing trustless staking as a group. Several staking services can get together and split the burden of a user’s stake. This is not the same mechanism as in a staking pool that socializes losses if something goes wrong for a single operator and those they are responsible for; in an SSV setup, the solution is to build redundancy by leveraging various different operators and the network can tolerate a certain amount of then being faulty while still attesting correctly and avoiding losses.
Imagine 7 of the most prominent staking services (n) joining forces in an SSV setup, a certain threshold of faulty operators (f) can be tolerated as defined by n≥3f+1.
7≥3f+1
6≥3f
2≥f
In this example, it is still possible to reach consensus with 2 faulty operators. This alone will significantly lighten the burden for staking services as they can confidently go offline (for maintenance or whatever reason) with the value add of offering decentralized staking to clients despite their level of private key custody. The result of which is both great for the staker that does not want to give up control of their keys (they can opt-in, opt-out, and change providers at any time) and the network as a whole by allowing for decentralization and infrastructure resilience at scale.
But this alone doesn’t immediately solve the problem of validator client diversity; what if ALL of the operators are using the same client? If that one client fails, then the entire SSV network could fail together until some level of active-passive redundancy has been achieved by at least 5 of the 7 operators. What’s worse, it can’t protect against the ripple effect of all validators suffering when one dominant client experiences issues, as we saw during the latest event.
Looking past the obvious use case example above, SSV will allow for diversification of different (and customizable) staking setups across the board,
“SSV configurations allow for active-active cluster redundancy across all layers of the infrastructure’s sub-components including active redundancy across different validator clients enabled through the SSV API. The main purpose of an active-active cluster is to achieve minimal service disruptions and enhanced resiliency against various types of node failures” — Mara Shmiedt & Collin Meyers
If a single operator wishes to find an active-active solution at the client level for their staking service, they can do so by creating separate setups for each validator client (following client-specific specifications) and then joining them together with SSV! If 1 validator client setup is not operating properly, the others keep the SSV network running as usual while the faulty setup is attended to.
Now imagine as a staker, a customizable SSV network made up of the same 7 staking services in the example above, with the addition of client diversification for each! If each of the 7 services offers 3 different validator clients, there will be 21 different setups to choose from, and an incredible amount of different total configuration possibilities.
Looking forward, SSV configurations don’t stop just at the validator client level, operators can offer many different staking architectures, including different hardware options, location-based solutions, and cloud providers!
The implications are far-reaching and a total win for the entire network by providing fault tolerance, infrastructure diversification & resilience, and seemingly infinite staking options for all who wish to secure Ethereum.