Running a node on Walrus might seem straightforward, but many operators — especially newcomers — encounter unexpected roadblocks. Misconfigurations, incorrect assumptions, and overlooked best practices can lead to downtime, performance issues, or even data corruption.
Everstake has been part of the Walrus Testnet since day one, closely following the Walrus Discord channel and observing multiple waves of new operators joining the network. We want to commend the excellent work of the MystenLabs team, particularly their communication with node operators. They are always responsive, providing timely answers to questions, and they proactively monitor all nodes, alerting operators in case of malfunctions.
While we haven’t encountered major issues ourselves, we’ve noticed some common pitfalls that new operators often face. This article shares our essential practices to help new node operators maintain stable and efficient Walrus nodes.
6 Best Practices For Node Maintenance
Setting up and maintaining a Walrus node requires careful attention to several technical aspects. Many new operators make mistakes that can lead to downtime, performance issues, or even data corruption. Below, we provide some of the most frequent misconceptions and how to avoid them.
1. Avoid Using Reverse Proxy
A storage node in the Walrus network handles its own TLS encryption using a self-signed certificate that is pinned on-chain. If a client (publisher, aggregator, or another node) attempts to connect through a reverse proxy with SSL termination, it will result in a certificate mismatch error. To prevent connectivity issues, always allow direct connections to the storage node.
2. Check Firewall Configuration
Ensure that your node is accessible from the internet, not just from within your VPN, gateway, or internal network. Incorrect firewall settings may block external connections, causing disruptions in communication between your node and other network participants.
3. Never Force Kill The Walrus-Node Process
If you need to stop your node, use systemctl stop and wait patiently for the process to shut down gracefully. This may take a few minutes, but it prevents database corruption. If the process is forcibly terminated, the database could become corrupted, requiring a full restoration, which may take several days.
4. Set Up Monitoring
Regular monitoring is essential to detect failures early. The deployment documentation includes monitoring guidelines, but many operators neglect this step. Implement basic node health checks and alerting mechanisms.
We recommend setting up alerts for at least these checks:
- The server is up.
- Systemd reports that walrus-node.service is up.
- Health endpoint ( /v1/health ) is available to the world and returns “nodeStatus”: “Active.” It is essential to do this check using public_host and public_port from your node config.
- walrus_event_cursor_progress and event_processor_latest_downloaded_checkpoint metrics continuously increase.
5. Log Rotation
Storage nodes generate large volumes of logs daily (hundreds of GB). Without proper log rotation, disk space can quickly become an issue. Set up log rotation to manage storage efficiently.
6. Understand Aggregator & Publisher Behavior
Many operators mistakenly assume that Aggregators and Publishers function as caching mechanisms for their nodes. In reality, these services interact with the entire Walrus network and do not prioritize an individual operator’s node
Bottom Line
Ensuring the stability of a Walrus node requires a thoughtful approach and adherence to best practices. Small misconfigurations can lead to significant issues, from downtime to data corruption. Proactive monitoring, proper setup, and a clear understanding of the network’s architecture are key to maintaining a reliable node.
By following these recommendations, operators can avoid common pitfalls and contribute to the Walrus network’s overall resilience.
Stake with Everstake | Follow us on X | Connect with us on Discord