Setting up a validator or RPC node is just the beginning, real performance comes from tuning what’s under the hood. In this guide, we’ll walk through practical system- and hardware-level optimizations for running Agave (Solana-based) nodes on Cherry Servers.
These tips are based on real-world experience from Everstake’s DevOps team, including internal benchmarks and lessons from production. If you’ve already followed the basic setup docs, this article will help you take the next step and make your infrastructure faster and more stable.
About Cherry Servers
Cherry Servers is a European bare-metal cloud provider offering high-performance, customizable infrastructure for demanding workloads. Unlike traditional virtualized environments, Cherry gives you full control over dedicated hardware, making it a strong choice for blockchain validators and RPC node operators.
At Everstake, we’ve chosen Cherry Servers as one of our key infrastructure partners for running validators, including on Agave. In production, we primarily use high-performance bare metal servers powered by AMD Threadripper PRO 7975WX and AMD EPYC 9354P. Both offer excellent multi-core performance, reliable Gen4 NVMe storage, and BIOS-level control.
We needed a provider that could offer consistent hardware performance, fast NVMe storage, and the ability to fine-tune the system at a low level, without the overhead of virtualization.
When Should You Optimize?
If your validator sometimes falls behind, takes too long to restart, or misses slots under load, it might not be a hardware problem, but a configuration one. Even powerful machines can underperform if the system isn’t tuned properly.
This guide is especially useful if:
- You’re running RPC nodes with high request volume;
- You manage multiple validators or plan long-term deployments;
- You want to reduce downtime, increase stability, and extend hardware lifespan.
Basic setup is enough to get started, but fine-tuning is what helps your node stay in sync, even under network stress or during upgrades. In the following sections, we’ll walk through real-world performance tips and configuration insights shared by our DevOps team.
Operating System Considerations
Both Ubuntu 22.04 and 24.04 are used by node operators. While there’s no strict recommendation at the time of writing, some performance features, such as the amd_pstate driver, require kernel 6.5 or higher, which is not available by default in Ubuntu 22.04.
For Ubuntu 22.04 users, the HWE (Hardware Enablement) stack can be installed to access newer kernel versions. This is often necessary to enable modern CPU scaling drivers and unlock more aggressive performance profiles.
Important note regarding HWE:
Upgrading to a kernel version >6.5 may lead to network interface renaming. If the interface name changes and your system is not properly configured to handle it, you might lose remote access after a reboot.
How to prevent issues?
- Disable Predictable Network Interface Naming by adding the following kernel parameters: net.ifnames=0 biosdevname=0
- Alternatively, update Netplan configuration to account for both old and new interface names, or configure it using MAC addresses.
Hardware Configuration
Cherry Servers typically ship with optimal BIOS/UEFI configurations for Solana nodes. However, some hardware-specific adjustments may still be necessary:
- Boost states: These may be disabled by default. Run a basic CPU benchmark to verify if your processor is boosting. If not, check your motherboard model using dmidecode -t baseboard. Then, refer to the manufacturer’s documentation to manually enable CPU boost in BIOS/UEFI settings.
- CPU scaling: In rare cases, the OS may not have access to manage CPU frequency scaling. If tools like cpupower don’t report frequencies, this likely indicates firmware restrictions. On Cherry’s Supermicro servers, this is typically not an issue, but still worth verifying.
- C-States: Disabling C-States may improve latency consistency by preventing the CPU from entering power-saving modes.
- Cooling: Manually increasing fan speeds can help maintain thermal stability, especially under sustained load.
- Overclocking: Some motherboards support CPU or memory overclocking. However, we do not recommend this for mission-critical infrastructure like Solana validators unless you fully understand the trade-offs.
In general, BIOS/UEFI modifications should only be made if necessary or if you’re troubleshooting specific issues.
System Tuning
While the official validator setup guide already includes key system tuning parameters, some additional kernel-level configurations can help further optimize performance. These adjustments should be applied cautiously, especially on production systems.
Here are some commonly used sysctl settings:
# TCP Buffer Sizes (10k min, 87.38k default, 12M max)
net.ipv4.tcp_rmem=10240 87380 12582912
net.ipv4.tcp_wmem=10240 87380 12582912
# Increase UDP buffer sizes
net.core.rmem_default = 134217728
net.core.rmem_max = 134217728
net.core.wmem_default = 134217728
net.core.wmem_max = 134217728
# TCP Optimization
net.ipv4.tcp_congestion_control=westwood
net.ipv4.tcp_fastopen=3
net.ipv4.tcp_timestamps=0
net.ipv4.tcp_sack=1
net.ipv4.tcp_low_latency=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_moderate_rcvbuf=1
# Kernel Optimization
kernel.timer_migration=0
kernel.hung_task_timeout_secs=30
kernel.pid_max=49152
# Virtual Memory Tuning
vm.swappiness=0
vm.max_map_c
These values are not universally optimal. Test and benchmark changes on non-critical machines before applying them to production infrastructure.
Drives and Storage Layout
For best performance and data separation, a Solana node should be deployed on a system with at least three NVMe SSDs:
- System drive: OS, user directories, and log files (/, /home, etc.)
- Ledger drive: stores the blockchain ledger data.
- Accounts + Snapshots drive: holds accounts data and snapshot files. These can be placed on a single disk or split across two drives for improved I/O throughput.
The official documentation warns against hosting Accounts and Ledger on the same disk, as this may lead to I/O contention and performance degradation.
Snapshot creation places a heavy load on the accounts disk. To reduce I/O pressure, consider increasing the snapshot interval. Keep in mind that this may lead to longer restart times.
Some operators configure a RAID array for the accounts volume to avoid bottlenecks caused by a single drive maxing out under load.
Filesystem choice
- ext4 is generally recommended. It’s reliable and performs well under Solana’s workload.
- XFS has been tested by some operators, but reports of inconsistent behavior and obscure errors make it less suitable unless you know exactly what you’re doing.
Recommended mount options for ext4:
ext4 defaults,noatime,nodiratime,barrier=0,data=writeback 0 0
These options reduce unnecessary disk writes and improve performance by disabling access time tracking and some journaling operations. However, using barrier=0,data=writeback may lead to data loss in case of a system crash or unexpected reboot. For validator nodes, this risk is generally acceptable, since the node can restart from a snapshot, and if that snapshot is corrupted, a fresh one can be downloaded from a trusted validator.
Always test mount options on a non-critical node first to avoid unintended side effects.
CPU Governor and Scaling Drivers
Solana validators benefit from having all CPU cores run at full speed. Using the correct CPU governor ensures consistent performance and reduces latency spikes during high transaction throughput.
Kernel 5.15 and older (acpi_cpufreq)
On these systems, use cpufreq.default_governor=performance as a kernel parameter to achieve maximum frequencies. To preserve the governor setting after a reboot, pass the kernel parameter in the /etc/default/grub file and update GRUB:
GRUB_CMDLINE_LINUX_DEFAULT="cpufreq.default_governor=performance"
update-grub
It can be done with this command on the go (won’t persist after a reboot):
echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Kernel 6.5+ (amd_pstate)
Newer kernels support the amd_pstate driver, which is designed for modern AMD CPUs (Zen 2 and above). It offers better responsiveness and boost behavior compared to legacy drivers, though the impact depends on the specific CPU model. On older EPYC processors, the difference is minimal, but on newer EPYC chips and Threadrippers, performance improvements can be significant.
Supported modes:
- active (CPPC autonomous)
- passive (CPPC non-autonomous)
- guided (CPPC guided autonomous)
By default, amd_pstate is enabled in active mode on supported CPUs.
However, keep in mind that amd_pstate requires CPPC (Collaborative Processor Performance Control) to function. Some systems may need this feature to be enabled in BIOS/UEFI.
As with older kernels, you’ll still need to set the governor to performance to ensure the CPU runs at maximum frequency. You can set the mode and governor through the GRUB configuration, and make the changes persistent after reboot. If needed, amd_pstate can also be disabled, which will switch the system back to using the legacy acpi_cpufreq driver.
To preserve the pstate settings after a reboot, choose the governor mode, pass the kernel parameter in the /etc/default/grub file, update GRUB, and reboot:
GRUB_CMDLINE_LINUX_DEFAULT="amd_pstate={active|passive|guided} cpufreq.default_governor=performance"
update-grub
reboot
You can disable amd_pstate entirely by setting amd_pstate=disable and falling back to acpi_cpufreq if compatibility issues arise.
Benchmarks and Performance Testing
Run Rust Benchmarks
There is a very versatile test and benchmarking included in the Agave and Jito-Solana repositories.
It provides a detailed view of what happens under the hood of the validator which might help in optimizing the hardware.
Be aware that building it may cause issues on some release branches. However, in general, the process looks like this:
1. Clone the repo (Agave or Jito, depending on which client you are using):
git clone --recurse-submodules https://github.com/jito-foundation/jito-solana.git
git clone --recurse-submodules https://github.com/anza-xyz/agave.git
2. Use the nightly version of Rust to build and run the benchmark:
./cargo nightly bench
Note that it requires a specific nightly build of Rust. The build process may take a long time, and benchmarking can take even longer, but the insights gained from the results can be worth the wait.
Run Cluster Benchmarks
To evaluate how well your Solana node performs under load, you can run synthetic benchmarks using the official bench-tps tool. This is especially useful after hardware or kernel tuning, to confirm improvements or detect regressions.
You can launch a local single-node testnet and measure transactions per second (TPS), CPU usage, and disk throughput. This setup helps simulate real-world validator workload in a controlled environment.
Detailed instructions on how to set up a benchmark are available in the official guide from Anza Doc.
Remember that benchmark results may vary depending on hardware, storage layout, and system tuning. Use consistent test parameters when comparing changes.
Compiler Optimizations
For maximum control and future compatibility, it’s recommended to compile validator binaries from source instead of relying on precompiled versions. This is especially relevant now that Agave will stop distributing them starting from version 3.0.0.
While compiler optimizations like lto=fat, lld, and target-cpu=native are often used to squeeze more performance out of the final binary, the actual performance gains may be minimal and depend heavily on your CPU, workload, and environment. These optimizations are more about giving you full control over your build than delivering guaranteed speed improvements.
Note: The release-with-lto profile is not available by default. You’ll need to manually add it to your build config (instructions below).
Required Packages
Before compiling, make sure you have Rust and all necessary system dependencies installed.
Install required system packages:
apt-get install \
build-essential \
libssl-dev \
libudev-dev \
pkg-config \
zlib1g-dev \
llvm \
clang \
libclang-dev \
cmake \
libprotobuf-dev \
protobuf-compiler
If you plan to use the lld linker (recommended for faster builds):
sudo apt-get install lld
Enabling the release-with-lto Profile
In your Cargo.toml, add the following at the end of the file:
[profile.release-with-lto]
inherits = "release"
lto = "fat"
opt-level = 3
codegen-units = 1
Then, in the scripts/cargo-install-all.sh script, add the new parameter:
elif [[ $1 = --release-with-lto ]]; then
buildProfileArg='--profile release-with-lto'
buildProfile='release-with-lto'
shift
Building the Validator Binaries
Once everything is configured, you can compile validator-only binaries using:
RUSTFLAGS="-Clink-arg=-fuse-ld=lld -C target-cpu=native" \
scripts/cargo-install-all.sh --validator-only --release-with-lto \
~/.local/share/solana/install/releases/2.2.17-agave
It’s not strictly required to compile the binaries on the same machine they will be run on, even with target-cpu=native. However, for best compatibility, it’s recommended to build on the same OS version where the binaries will later be used, especially if compiling on a separate machine.
Conclusion
Running a validator or RPC node requires more than following the standard setup instructions. In this guide, we explored the deeper layers of configuration, from choosing the right kernel and CPU governor mode to tuning the filesystem, networking stack, and even compiler settings. These optimizations can make a real difference in how your node handles load, stays in sync, and recovers after restarts.
As the Agave ecosystem evolves, and with precompiled binaries becoming less available, staying in control of your infrastructure becomes even more important. Compiling from source, benchmarking regularly, and testing different system configurations are all part of building a resilient, production-grade node setup.
For a full picture, it’s worth revisiting our initial setup guide and the article on why we chose Cherry Servers.
Stake with Everstake | Follow us on X | Connect with us on Discord
***
Everstake is a software platform that provides infrastructure tools and resources for users but does not offer investment advice or investment opportunities, manage funds, facilitate collective investment schemes, provide financial services, or take custody of, or otherwise hold or manage, customer assets. Everstake does not conduct any independent diligence on or substantive review of any blockchain asset, digital currency, cryptocurrency, or associated funds. Everstake’s provision of technology services allowing a user to stake digital assets is not an endorsement or a recommendation of any digital assets by it. Users are fully and solely responsible for evaluating whether to stake digital assets.
The configurations described in this guide are intended for experienced operators. Changes to system-level parameters can lead to instability or data loss if misapplied. Always test on a non-production node first.