03.04.2024|Storm SlivkoffGeorgios Konstantopoulos
State growth and its relationship to Ethereum’s gas limit are widely misunderstood. It is commonly believed that state growth is Ethereum’s primary scaling bottleneck. However, discussions of state growth are often held back by imprecise terminology and a lack of detailed quantitative evidence.
Embracing a data-driven approach brings significant clarity to the state growth issue. In this article we leverage high-resolution datasets to understand the size and shape of state growth. In doing so, we reach the surprising conclusion that modern consumer hardware can sustain current rates of state growth for at least a decade. Furthermore, this runway is likely to be extended indefinitely by upcoming improvements to software and hardware.
We believe that Ethereum has a clear roadmap toward 1) complete elimination of state growth as a scaling bottleneck and 2) raising the gas limit to a level that supports a global-scale decentralized financial system. The goal of this blogpost series is to develop a scientific approach for understanding and enacting this scaling roadmap.
This article is part 1 in a blogpost series about Ethereum scaling. Part 1 is about state growth, part 2 is about history growth, part 3 is about state access, and part 4 is about the gas limit.
The term “state growth” is commonly used as a catch-all for any Ethereum scaling bottleneck where data size exceeds the capacity of Ethereum node hardware. However, state growth should not be thought of in this monolithic way. There are multiple types of Ethereum data, each with its own relationship to a node’s underlying hardware components. It is thus crucial to use precise terminology that disentangles each distinct scaling bottleneck.
State is the set of data necessary for building and validating new Ethereum blocks. State is composed of contract bytecode, contract storage, account balances, and account nonces. History is the set of data necessary for syncing a node from Genesis to the latest block. History is composed of blocks and transactions. State and history are non-overlapping datasets. From these definitions there are at least 3 distinct phenomena that put significant stress on a node’s hardware:
Each of these bottlenecks has a unique relationship to a node’s hardware constraints. The four most relevant1 hardware constraints are:
The relationships between these bottlenecks and hardware constraints is illustrated in Figure 1.
Figure 1: Ethereum Scaling Bottlenecks
Starting at the top of the diagram, every time Ethereum executes a transaction, all resources used by that transaction are priced in terms of gas. Ethereum’s gas limit is thus a single-dimensional quantity that rate-limits all forms of on chain activity2. Downstream of the gas limit are block size and operations per block. The more bytes per block, the faster history will grow. The more IO operations per block, the greater the rate of state access and (usually) the greater rate of state growth.
Thus, the scaling bottlenecks are connected to the node’s hardware constraints as follows3:
For state growth in particular, the main challenge is ensuring that state size does not grow at a faster rate than can be sustained by ongoing improvements to consumer hardware. Node memory and storage are finite resources, and so they will eventually reach capacity unless either the state stops growing or hardware is periodically upgraded. It is fortunate that memory and storage hardware have been improving for many years. Even so, the exact forecast of these improvements is not certain, and it should not be taken as a given that their rapid growth will continue indefinitely.
Note that data blobs introduced by the upcoming EIP-4844 will bring some changes to these scaling relationships. After EIP-4844, it is expected that much less history will accumulate on disk, network IO might increase significantly for transmitting large amounts of blob data.
In this article we will focus mainly on the state size and state growth rate rather than memory size and state access patterns. We will investigate these or other topics in future work.
The next step in understanding state growth is examining the total size of state along with the relative size of each state contribution. Currently, Ethereum state takes up about 245.5 GiB on disk. These numbers were measured using a reth node, but the numbers for each node client are roughly comparable as shown in this comparison spreadsheet. Accounts, contract bytecodes, and contract storage occupy 14.1%, 4.3% and 81.7% of state respectively.
Figure 2 shows how much state is occupied by various categories of smart contract protocols. In this visualization, the size of each contract category represents the number of bytes occupied by its storage slots and bytecodes. Contract categories are hierarchical and can be navigated using mouse clicks. A category is also included to represent the total amount of state taken up by account balances and nonces.
Figure 2: Distribution of Ethereum State (click it)
The numbers in Figure 2 represent the total bytes that a node client must store on disk. This includes data used by indexes and other types of storage overhead. Numbers were taken from a reth node, but values are roughly comparable across clients. The average size to store each account and each storage slot were 133.6 bytes and 191.3 bytes respectively.
There are many interesting patterns within Figure 2, but here are some of the most important takeaways:
The most important aspect of state growth is the evolution of state growth rates over time. These rates reveal the severity of the problem and whether the severity is trending upward or downward.
Figure 3 shows state growth rates since Ethereum’s inception in 2015. These rates are computed by summing the contract bytecodes and contract storage within each contract category. Subsets of these categories can be visualized by clicking and double clicking items in the figure legend.
Figure 3: Ethereum state growth over time
Double click the legend to filter
There are many interesting patterns within Figure 3, but here are some of the most important takeaways:
We now know the Ethereum state’s 1) size, 2) composition, and 3) growth rate. How do we determine the range of acceptable state growth values? This question is complicated because it depends on both unpredictable market forces and philosophical choices about which tradeoffs Ethereum should make.
Let’s start with the simplest possible model of how long the current levels of state growth are sustainable on common consumer hardware, assuming no future hardware improvements. As shown in Figure 3, in recent years the state has been growing at an annualized rate somewhere between 31GiB/year to 72GiB/year. Common consumer hardware currently tops out around 4TiB of storage and around 64GiB of memory. From this we can create a simple forecast of storage and memory requirements:
This is a simplified model with many assumptions. Possible extensions to this model include 1) history growth, 2) nonlinear scaling of memory requirements, 3) decreasing hardware costs, 4) increases to gas limit, 5) opcode gas repricing, and 6) future Ethereum architecture improvements. Each of these factors can interact nonlinearly and evolve over time. We will explore these model extensions in future work.
It must be emphasized that long-term sustainability is a good thing. Even if modern hardware can support many years of runway, shortening this runway should never be taken lightly. Any plan that accelerates state growth should include a significant buffer for unforeseen changes to the hardware or software landscape.
Many different solutions have been proposed for addressing state growth. Three improvements to Ethereum architecture stand out: rollups, Verkle tries, and state expiry. Taken together, these form a comprehensive roadmap for solving state growth in the short, medium, and long term.
Short term: Rollups do not solve state growth, but they do ease the burden. As shown in Figure 2 and Figure 3, rollups are able to use state more efficiently than mainnet. Offloading activity to L2’s does require some amount of state to be stored on mainnet in order to support user exits. However, the state footprint of L2 transactions is much lower than the footprint of transactions on mainnet. Rollups thus make it more sustainable to increase total activity in the ecosystem. The adoption of rollups is expected to grow with the upcoming EIP-4844, which will make rollups much cheaper through use of blobs.
Medium term: Verkle tries solve state growth for validator nodes, but not for nodes that need to build new transactions: Verkle tries are a new data structure for Ethereum state. They enable more efficient light clients and “stateless” nodes. These nodes will be able to validate new blocks without any knowledge of existing state values. This eliminates the state growth problem for validator nodes. The building of new transactions will still require storing and accessing state, but this would still be a more sustainable situation than we have today, because transaction construction is a task that can be easily distributed across many machines. In terms of scope, Verkle tries represent a significant engineering effort that could take years to implement.
Long term: State expiry solves state growth for all nodes, but requires additional infrastructure. State expiry allows nodes to discard inactive portions of the state, such as the dormant state shown in Figure 2. Note that the term “state hibernation” might be a more appropriate name, as most existing proposals allow for recovering “expired” state via proofs. With regard to concerns around expired state being lost over time, as long as the history (block and transaction data) is available then the state can be reconstructed. Thus, whatever solution is developed for the history preservation problem of EIP-4444 will also solve the state preservation problem. It is possible that state expiry might be unnecessary in a world where Verkle Tries succeed at their goals.
These are not the only solutions proposed to address state growth. Others include state rent and sharding, but historically these have had concerns around UX or soundness. A combination of these solutions and others may be necessary for reaching an endgame solution in the more distant future.
Although state growth is a key challenge for scaling Ethereum, we believe it is a solvable problem using known technical solutions. By our reading of the data, Ethereum can sustain current levels of state growth for many years, with a comfortable buffer for experimenting with architectural upgrades.
We believe that empirical methods will be essential for engineering Ethereum’s gas limit and steering Ethereum toward endgame scaling solutions. This article is only a single step toward that goal. There are other types of data beyond state, each imposing their own scaling burdens on an Ethereum node and on the Ethereum gas limit. We hope to explore these other bottlenecks in future work.
If you are excited about research in Ethereum scaling, reach out to storm@paradigm.xyz and georgios@paradigm.xyz. We’d love to hear about how you are thinking about the problem and potentially collaborate. The data and code used for this article can be found on Github here.
Thank you to Tim Beiko, Péter Szilágyi, Guillaume Ballet, Banteg, Alex Stokes, Jesse Pollak, Toni Wahrstaetter, Patrick O’Grady, lightclient, Ansgar Dietrichs, Frankie, Dan Robinson, Matt Huang, Doug Feagin, and Arjun Balaji for review and feedback.
Thank you to Achal Srinivasan for the Figure 1 graphics.
Copyright © 2024 Paradigm Operations LP All rights reserved. “Paradigm” is a trademark, and the triangular mobius symbol is a registered trademark of Paradigm Operations LP