The Impact of Alan on Mainnet: Post-Fork Analysis
Discover the impact of the Alan Fork! Learn about performance optimization, resource efficiency, and positive impact on the SSV nodes.
The SSV Network DAO is happy to see SSV Labs release the Alan fork. This article was originally posted by the SSV Labs team. The SSV Network DAO will always highlight the community’s helpful contributions.
The SSV Labs team has been working around the clock over the last couple of months to make significant improvements at both the client and protocol levels. The main focus was on reducing resource usage and optimizing the overall network. Credit goes to the brilliant researchers and developers at SSV Labs!
As part of the Observability team here at SSV Labs, we aim to provide insights into network performance, stability, efficiency, and everything in between. In this post, we’ll dive into our initial findings, as well as go over the upcoming improvements you can expect in the SSV ecosystem.
Refer to the official Alan fork announcement for details on how Alan works under the hood.
First and foremost, the SSV Labs team is happy to report that the Alan Fork has been a monumental success! As SSV Labs continues to iron out a few edge cases, the performance improvements we’ve seen have been nothing short of impressive.
The Alan Fork has been a significant milestone in SSV’s journey to make DVT infrastructure more scalable and performant. As the SSV Network ecosystem continues to grow, these improvements are essential to support it, and this is precisely what the Alan fork has done.
From a high level, the SSV Labs team has seen dramatic improvements in the CPU usage of nodes. Combining this with reduced bandwidth usage means SSV Network nodes run better and faster with fewer resources.
As important as performance is, the other side is that by improving resource requirements, Alan has also lowered the barrier to run a node on the SSV Network. This enables more participants to become part of the decentralized backbone supporting Ethereum’s validation layer.
The following measurements may vary between operators as many factors can influence a node’s performance, such as validator count, hardware specifications, and potential virtualization overhead.
Since the fork introduced various changes, the team has not yet provided ready-to-use Grafana dashboards. The SSV Labs team is addressing this and will update the community accordingly.
The following is a 24-hour snapshot of the CPU time used by a mainnet SSV node with ~150 validators before the fork.
This is the same node after the fork:
The mean CPU time between graphs has gone from 1.05 to 0.126, an approximate improvement of 89.65%.
In the previous post, we discussed CPU profiling and how it enables us to see CPU time spent on each process and function running within the stack over a specific period. The first thing to notice is that over a ~7-hour period, the baseline (pre-fork) used 3.86 hours of CPU time, whereas post-fork used 32.1 minutes.
The following flame graph shows which parts of the code have a bigger or smaller share compared to prefork.
The larger the span, the larger the share of CPU time. As you can see, the most significant wins come from the libp2p libraries. This aligns with the general assumption: the fork significantly reduces the number of network messages, and therefore, the CPU spends much less time running that particular function over a long period.
But what about the red? In the above graph, the most prominent span of red belongs to a process responsible for updating validator metadata. Between pre-fork and post-fork, the total CPU time has decreased by such a magnitude to the point that if other parts of the code still take the same amount of time, they’ll naturally gain CPU time share percentage and show in a bright red color.
The following example will apply to a subset of operators since the following metrics would generally be considered a best case. Here, we highlight an operator that is only part of a single SSV cluster (committee), which means that the benefits of the fork are even more pronounced.
Pre-fork:
Post-fork:
SSV Labs is vigilant when looking at these numbers (due to this setup being defined as “the best case”). Over the coming weeks, some changes will be released, and we expect this setup to stabilize at ~450kB/s, yielding an 84% improvement compared to pre-fork.
The improvements are tangible for all operators, which we will discuss later with some community examples.
SSV Labs runs SSV Node Exporters, a mode for the SSV node that allows looking into and collecting network events; they provide the necessary information to fuel the SSV explorer. As a result of the fork, SSV Labs noticed some unforeseen benefits (it’s a feature, not a bug!). Since there are a lot fewer events happening on the network due to the new aggregation dynamics, there are a lot fewer events to keep track of and store/log.
IOPS, or Input/Output Operations Per Second, measures how many read and write operations a storage device (like a hard drive or SSD) can perform each second. It’s a key indicator of storage performance, especially for applications requiring rapid data access.
Pre-fork:
Post-fork:
Throughput measures the rate at which data is processed, in this case, the amount of data read from and written to the filesystem by the SSV node running in exporter mode.
Pre-fork:
Post-fork:
These improvements translate into consumers (such as explorers or services running data analysis) of the SSV Node Exporter seeing more reliable reporting with far less pressure on the disk, which in turn means more reliable read/write operations.
SSV Network’s permissionless and globally distributed network of node operators comprises many different node operators, ranging from enterprise-level to solo node operators. Following the fork, each of these participants has seen improvements in resource utilization and bandwidth, in some cases further improving validator performance.
Since Alan has gone live, positive feedback from the ecosystem has been streaming in, and the ssv.network DAO and SSV Labs are happy to share it with the community.
Improved resource utilization leads to increased performance for solo node operators:
Node Operators in the APAC region see dramatically improved resource utilization:
Node Operators in the EU region:
Statistics from Node Operators in US regions.
As mentioned, consumers of the SSV Node Exporter will see more reliable reporting with far less disk pressure. The new aggregation dynamics have significantly reduced the number of events occurring on the network, minimizing the volume of events that need tracking and logging.
SSV Labs is redesigning observability from the ground up. Looking ahead, it is clear that OpenTelemetry is the way to go. It is an open-source standard for observability that provides a single set of APIs, libraries, agents, and instrumentation to capture distributed traces, metrics, and logs without tying you to a specific vendor, a historical problem in the observability space.
SSV Labs expects to deliver the first iteration of OpenTelemetry instrumentation, which covers metrics, in the upcoming weeks. The team aims to make an SSV node operator’s life easier by providing the tools to monitor their node’s performance and health.
SSV Pulse is a lightweight, open-source tool that has two main features:
– Benchmarking: This automated checklist will help ensure your SSV node and infrastructure meet the requirements to run a node. The tool is heavily influenced by the lessons the ssv.network DAO community has learned in the staking space over the years.
– Analyze: Debugging is never easy – SSV network is a complex distributed system with often no offline coordination available. This tool primarily aims to look at a set of logs emitted by the SSV node and provide insights into what might be going wrong. SSV Pulse analyzes logs from multiple nodes in the same cluster, helping to spot issues that might otherwise be hard to find.
Now that SSV Labs has validated the improvements brought by the Alan fork, the team is working on increasing the validator limit per operator from 500 to 1000. While this is not available yet, anyone can monitor the progress of this initiative on the SIPs repository and in the SSV spec repository.
Increasing the number of peers improves consensus performance by reducing the RTT (Round Trip Time) required for SSV consensus. With more peers, operators can communicate across a wider range of regions, boosting the chances of connecting with geographically diverse peers. To achieve this, the team has increased the Max Peers setting to 150, though this adjustment is only necessary for popular public operators managing 300+ validators across various clusters.
Lastly, further research is being conducted on network topology and operators’ subnet participation. The team is carefully reviewing the trade-offs of this optimization, but it promises significant scalability improvements.
The Alan Fork marks a pivotal step forward for the SSV Network, delivering essential improvements in performance, resource efficiency, and accessibility. By dramatically reducing CPU usage, bandwidth demands, and disk activity, the fork has made it easier for operators to run nodes more effectively. These upgrades help underscore the network’s scalability, cementing SSV’s role as critical infrastructure for decentralized ETH staking. Building on this success, the integration of OpenTelemetry, the launch of tools like SSV Pulse, and plans to double the validator limit per operator demonstrate the commitment to continual innovation.
Website | Builders Hub | Network Hub | Discord | Dev Center | Documentation | GitHub