Parallel Processing: Solana’s Key to Hardware Scalability
Parallel Processing: Solana’s Key to Hardware Scalability
“The sky is falling, Moore’s Law is failing” has been going around scientific circles in recent years. This is partially true: adding more transistors doesn’t correlate to increased performance, especially since clock frequency has plateaued. Single-threaded blockchain systems may suffer greatly from the retardation of single core performance. However, Sealevel, Solana’s parallel runtime innovation, allows Solana to scale throughput and minimize block-time in a way which takes full advantage of multi-core capacity increases. In fact, Solana’s founder Anatoly said that “Other than Solana, all blockchains are single-threaded processors. That is, they can only make one state update at a time.” Moreover, Moore’s Law will help the Solana network become increasingly decentralized as hardware costs decrease on a per transistor-chip basis, thereby increasing accessibility. After all, at its core, Moore’s Law is an economic phenomenon as much as it is a technical one.
In 1965, Intel founder Gordon Moore postulated that there was a virtuous cycle to expanding the breadth of transistors that can be packed onto an integrated circuit design. With unit costs halving, a manufacturer could effectively double transistor count every 18 months to 2 years. Consumers’ expectations for performance increases would then meet demand perfectly, providing the revenue necessary to justify the R&D and production expenditures.
The growth function of Moore’s Law is monotonically exponential, derived from taking the log of the standard form:
Moore’s Law may hold true for some time as engineers innovate by either making bigger chips, or stacking multiple layers of transistors in the 3rd dimension.
What drives Moore’s Law?
- Manufacturers wishing to keep up with the law (self-fulfilling prophecy?)
- Competition between manufactures
- Successive technologies providing better design tools
- Customer demand for better products
Dennard Scaling states that as transistors get smaller, their power density stays constant. In 1974, Robert Dennard observed that, with every halving of CMOS gate width, clock frequency and supply voltage, threshold voltage and capacitance also halve linearly, and power density stays constant. Back then, CMOS transistors were 5 micron (5000 nm) wide. In the modern day, transistors have been scaled down by a factor of 360 to 140 Angstroms (just 14nm)!
Why people are freaking out: Single-thread performance is stalling
The logarithmic uptrend in transistor count seems to have stayed consistent with Moore’s Law, but single-thread performance, frequency/switching rate, and power have mysteriously collapsed. Why?
As components get smaller, surface area increases disproportionately relative to their size. This causes current to leak out, meaning the chip must be juiced with more and more power to see tangible improvements. Eventually, the chip is wasting more current than it uses, at least on a marginal basis.
A more technical analysis points to threshold voltage as the problem. Clock frequency is at best proportional to V_dd - V_th(supply voltage minus threshold voltage). However, as V_th is scaled down proportionally to CMOS gate width, power required and leakage current exponentially increases (this is very problematic!). At 90nm, transistor gates became so thin that current started leaking out into the substrate.
The faster the clock speed the larger the difference between supply and threshold voltage needs to be to make a coherent signal. In the 2000s, accumulative changes to transistor size had caused the difference between supply voltage and threshold voltage to become too close, capping voltage at the saddle point of V_dd = 1.3V and V_th = 0.5V. Tiny gate densities made heat and power dissipation extremely challenging, the only options left being expensive and unviable systems such as liquid cooling. Moreover, smaller and smaller components are harder and harder to keep electronically stable, especially with just standard silicon technology. Until an innovative, new material such as diamonds, or molecular, quantum computers reach mainstream production, MHz is limited.
The switching rate of transistors has plateaued, indicating a stagnation in the rate of single-threaded compute. Of course, frequency is not the only factor affecting microprocessor performance. Branch prediction, execution units, cache hierarchy, and instruction sets are also large influences. However, it was clear that a ceiling had been reached from adding so many tiny transistors within a finite space.
There is a quadratic relationship between heat and clock cycles when focusing on making cores faster. When the parabolic vertex is reached and the heat/clock ratio hits parity, it's time to get another core. According to NEWBEDEV, this is the point where it’s simply not worth it to increase the clock speed any more, as the increased temperature would be more than it would be to simply add another core. It is logical, then, to just increase the number of cores. By adding more cores, the heat goes up linearly. i.e. there is a constant ratio between clock speed and power draw.
With these underlying technical considerations in mind, it is not surprising that, after hitting an inflexion point in the mid 2000’s, Moore’s Law (i.e. increasing transistor counts) has led to stalling single-threaded performance whilst the number of logical cores has increased rapidly.
Breaking speed barriers through parallel computation
As the number of logical cores increases and validators upgrade to new hardware, the speed of the system should increase. Why? Solana is not restricted by the slowdown in single-thread performance. The current foreseeable roadmap envisions 750k TPS and 150ms block times, if not more. This places Solana in a whole different playing field compared to other Layer 1 protocols such as Ethereum, where blocks are processed in a linear, atomic fashion.
Growing into terminal capacity is still some time away as reflected by the Solana network outage on Sept 14, 2021 where nodes were overwhelmed by 300k TPS influx, arising from the GRAPE Protocol IDO.
Jump Crypto’s Post-mortem Forensic of Slots per Second
Peak transactions received by validator's banking stages across the network
It is important to nuance that multithreading and multicore processors do not exactly multiply processing speed. The idea that a quad-core processor running at 4 GHz results in an overall performance of 16 GHz is not exactly accurate. Furthermore, while there are many parallelized verifiers to keep the system in check, state generation is being run by a single machine - the leader.
However, multithreading does allow for parallel computation which is useful for ‘multitasking’ on separate tasks. But this shouldn’t help because blockchains are single-threaded, right? Most traditional blockchains are single-threaded, like Ethereum which processes transactions one at a time to avoid conflicts. However, Solana runs thousands of transactions in parallel. This is analogous to upgrading a single lane road to a multi-lane highway.
How is Solana capable of this? On Ethereum, storage is contained within each smart contract. For example, if Alice transfers Bob $x, this state is stored inside the executing smart contract. If Ethereum tried to process transactions simultaneously, double-spending or conflicting states could occur.
In comparison, Solana's parallel runtime, Sealevel, separates data (e.g. account balances), instructions are stateless and specify which data they will modify beforehand. Programs that do not conflict are thus able to run in parallel. The Solana VM sorts the incoming transactions to run the non-overlapping transactions in parallel across multiple cores. This is only possible due to Solana’s special account-based VM which requires transactions to describe all the states they will read or write to while executing. This allows for non-overlapping transactions which affect mutually exclusive accounts to be vectorized, scheduled, and executed simultaneously. Solana’s VM then builds a set of parallel queues for each processor thread on a single node, ensuring any account accessed multiple times is only listed sequentially in one queue. This allows transactions that are reading from the same state to execute concurrently as well. Transactions are classified as conflicting if one depends on the output of another, such as when a transaction wants to write an account that another transaction wants to read or write as well, thereby affecting the same state in the ‘VM memory’. Any remaining transactions not processed in the leader node's block time are then bundled and forwarded to the next scheduled leader to attempt to process. Only a single node (the scheduled leader) is processing transactions for an individual block. All parallelization happens on the processor threads in that individual node.
While a multiprocessor can only execute a single program instruction at any moment, it is able to execute that instruction over many different inputs in parallel. If the incoming transactions that are loaded by Sealvel all call the same program instructions, such as Saber::Swap<>, Solana can execute all the transactions concurrently over all the available CUDA cores.
SIMD instructions (Single instruction, multiple data) allow for a single piece of code to execute over multiple data streams. This means that Sealevel can execute an additional optimization, which is unique to Solana design:
- Sort all the instructions by program ID.
- Run the same Program over all accounts concurrently.
Solana Foundation recommends validators possess CPU’s with at least 12 cores and 24 threads. Although they sound synonymous, concurrency technically differs from parallelism. It is important to note that concurrency occurs when a machine is working on one task at a time, but can start, stop and re-pick-up different tasks across multiple concurrent threads. Parallelism is a much more powerful notion: the ability to work on multiple computations at the same time.
Solana currently has 1,140 active validators at the time of writing and the largest 19 constitute the superminority (33% control of network). Hardware costs are around $5,000, and voting costs can go up to 1.1 SOL per day. A helpful break-even calculator is found here. The Solana Foundation subsidizes staking costs for validators with a staking pool delegation baseline, except for US-based validators due to regulatory concerns.
A common misconception with Moore’s Law is that adding more transistors doesn’t correlate to increased performance, especially since clock frequency has plateaued. This may be true for single-threaded blockchain systems, but Sealevel, Solana’s parallel runtime innovation, allows Solana to scale throughput and minimize block-time in a way which takes full advantage of multithreading. Moreover, Moore’s Law will help the Solana network become increasingly decentralized as hardware costs decrease on a per transistor-chip basis, thereby increasing accessibility. After all, at its core, Moore’s Law is an economic phenomenon as much as it is a technical one.
Credits go to Anatoly, Scott, and Jump Crypto for elucidating for me many of the topics covered in this post. We stand on the shoulders of giants.