Do You Want to Become a Responsible Validator? Get Advice from Everstake DevOps

31 Oct 2022
13 min read
solana
13 min read
Article content
Validator Checklist
Selecting a Blockchain
Hosting and Infrastructure
Managing the Keys
Server Security
Monitoring the Node State
Updates
Interacting with Other Network Participants
Conclusion

At the end of September, the Everstake team held a workshop for validators at the invitation of our friends in the Solana Foundation, Validator Relations Lead Tim Garcia and Product Manager Ella Kuzmenko. 

We centered the workshop around questions from 20+ solo validators and rookie teams and split it up into three sections: marketing, technology, and liquid staking pools. The event lasted three and a half hours and answered 37 questions. Incidentally, that’s how many people were present.

Today, we’re happy to share a recap of the second section on tech specifications for validators that was hosted by Andrii Kravets, Head of DevOps at Everstake. The final section on staking pools will be published in the coming weeks. 

Validator Checklist

The duty of a validator on any PoS blockchain is to try to achieve 100% uptime, not miss blocks, and keep delegators’ funds safe. According to the guide, this is much harder than just running a node, especially if you want to validate multiple networks.

Here's what you should heed:

  1. Blockchain selection.
  2. Hosting and infrastructure.
  3. Key management.
  4. Security.
  5. Monitoring.
  6. Updates.
  7. Interaction with other validators.

Everstake works with over 70 blockchains. The team's technical specialists have faced and solved many problems, and now, having checked point 7, they want to share their experience with others.

Selecting a Blockchain

The first rule of working with blockchains, especially if you risk money, is do your own research. You need to research the technology, the experience and expertise of the core teams, the plans for developing the protocol, the costs of maintaining a node, and the economic incentives.

Everstake regularly reviews the blockchains it works with and sometimes makes the tough decision to shut down the nodes. A prime example is Terra, where one failure caused the entire network to collapse and the validators to lose potential revenue.

New blockchains are particularly difficult: the code is not quite ready, documentation is sparse, and half the features are missing. It's hard to guess what will happen to the native token in a year. There needs to be serious research to assess whether the solution is competitive and whether it can interest other participants in the blockchain ecosystem.

Hosting and Infrastructure

It is essential to understand that it is not possible to find the perfect hosting provider that will fit all parameters. When choosing a hosting provider, you should have wiggle room in case of problems: for instance, don't sign a contract with a new provider requiring you to pay for a year in advance.

It is a good idea to test a few hosting providers first and choose the one that provides the best service. Major validators have it a little easier: it's simpler to get discounts if you spend a lot on infrastructure, and the provider is more proactive in answering questions and resolving problems. Still, rookie validators also have an advantage: they can test three or four providers weekly, whereas in our case, it takes two months and much more money. 

The most common problem with hosting providers is rate-limiting. You can take a server with 10 GB/s NIC and get only 200 MB/s or even less. 

Image: Volume of the network traffic

Don't abandon the research even if you have found an excellent hosting service. Manufacturers are constantly releasing new equipment, providers are upgrading their hardware, and moving a node to a new server can significantly improve performance. Everstake has a provider that uses Intel Optane NVMes with read/write speeds slightly lower than RAM. The team cares a lot about these servers. 

It will help if you also discuss with the providers what they will do in an abnormal situation. For example, your node may get too much traffic, and the provider may shut down your server with its protection protocol, blacklist your IP address, or send you a bill for "deflecting an attack." These points need to be discussed.

Managing the Keys

Consider three main principles: minimum privileges, constant monitoring, and mild paranoia. 

  • Minimum privileges: If something minor will improve or worsen security, it definitely should be, or should not be, done. If something shouldn't be on a server, don't upload it there. If you can use a multi-sig, by all means, do so. If it is possible to plug in a hardware security module, be sure to consider it. 
  • Always be in control: You need to control the keys and the funds in all situations. Blockchains are decentralized, and while you sleep, someone else is awake looking for vulnerabilities in your server. 

Therefore, keys on a server should always be encrypted because even a shut-down server can be attacked. It will be best if you also look into ways to protect yourself as a validator on a blockchain, such as reading about disputes and the ability to reclaim validation rights if your key is slashed or the likelihood of your token issuer putting a contract on hold, freezing it, or somehow depriving you of revenue. You need to know what could happen in theory and prepare for it.

  • Mild paranoia: Only trust yourself (and even so, not all the time). People write the code. They make mistakes and leave bugs that attackers can exploit. For example, in the latest Ubuntu, there was an already solved vulnerability in the deb-package of the Tomb encryption tool, and we bypassed it by compiling from the sources. So make sure you check exactly what you're encrypting with and if there are any current security issues reported on Github. And store the master key on Ledger or another HSM.

There's also a risk that the hosting provider's employees might decide to steal your keys or someone else could get physical access to a server. Everything should be encrypted and multi-sigged so that you won't lose your keys in an emergency, and your employees can access them and keep working.

Server Security

The main thing to know is that you can't do without a security professional. No matter how good your DevOps, system admin or developer is, there is no substitute for a trained information security specialist. 

You also need to make sure the teams do not interfere with each other. There are times when security specialists break the DevOps methodology, and the project doesn't work with all security stuff. In such a case, a decision to, if possible, use alternative security measures must be made after weighing the pros and cons.

The second obvious point that is often forgotten is the official binaries. It is better to build everything yourself than to use some interactive tool and then find network ports that should not be open. The official binaries reduce potential attack vectors in case of problems or unwillingness to build a binary yourself.

There is also an example of firewalls. UFW is easy to use, but it's hard to spot vulnerabilities in the generated rules. Iptables, on the other hand, are more challenging to deploy, but they don’t hide abstractions and are ultimately safer. 

Third, validators also need backups. While everyone considers the risk of an attack on the validator or the network, few think about the hosting provider's uptime. It is worth developing a migration procedure and keeping backups with other providers so that if there are problems on their side, you retain the ability to switch to another one quickly.

Monitoring the Node State

There is never too much monitoring. When the infrastructure grows, it is essential to monitor all components, identify problems on time or find ways to improve performance. 

You should not do everything at once but make one good solution and duplicate it for other tasks. This way, you can maintain, improve, and double-check how it works. For example, you may have no trigger on a stopped validator due to some edge case. Yes, it can happen, too.

Image: Solana monitoring dashboard

We use Zabbix and Grafana with duplicate triggers and cross-draw data. We've come a long way in three years, and we've seen a lot of weird and absurd things, so it's impossible to give a concrete guide on how to set up monitoring. You have to do it and refine it yourself.

You also need to know that monitoring runs through the whole team. You need to break down responsibilities and design in advance who will be responsible for what, who to write to if someone is unavailable, and what to do in an emergency. 

For example, we have a bot that transmits alerts to Telegram, and if a person in charge does not respond in due time, it writes to other messengers or another person. Again, the bot also needs to be monitored. Ideally, make sure to duplicate the alerts in another way.

Another helpful thing is data source checking. You can get data about your node and other nodes from public services, which process data too slowly. For example, Solana Validators gather data and output an average result for the nodes’ performance. If you are a validator and run a node yourself, it makes sense to do the calculations, too: take the number of blocks created in a given period, divide it by the number of blocks you were obliged to create, and get the actual skip rate.

Image: Solana validator skip-rate and delinquent status

Also, don't forget to monitor testnets and other services running on your servers. This way, you can detect any faults typical for your deployment and then improve your nodes on the mainnet.

Updates

Core teams constantly update things and sometimes forget to tell you about it. On the other hand, you mustn't forget to read the documentation and change logs to see if anything useful is available in terms of security or performance.

Also, feel free to test new kernels, nodes, and systems. Do this on spare nodes but not on the main validator. Current updates often improve node performance on modern CPUs, and the newer the kernel, the fewer security issues it has (not always, of course).

Interacting with Other Network Participants

The key is to ask questions of central blockchain organizations (foundations). They are used to this and can provide valuable insights: how to set up a node in a secure way, which providers are best suited for hosting, and what problems may arise and how to solve them. 

Blockchains are decentralized and open, and validators should strive for that. If you find a bug, don't try to fix it yourself, but report it to GitHub. Maybe they will offer you a workaround. This way, you help yourself and the entire ecosystem, ultimately affecting your earnings and prospects. 

The same goes for communicating with other validators. Yes, we are competitors, but we are working together to make blockchain work, to share expertise, and to improve protocols. If you have something to share, publish open source, share dashboards, and discuss ideas. There will always be people who can help you improve your solution, and you won't lose your competitive edge. You'll even win. If you have done a lot of work and don't want to give it away for free, you can apply for a grant and turn it into a paid service.

Conclusion

The first thing to do is research, ask other validators and project representatives, and find out what they have and how they are doing.

Next, find a good hosting provider and discuss the details of cooperation, but don't stop testing others or thinking about alternatives.

Once the node runs, you need to secure the keys and the server, develop a threat model and think about the potential risks and losses. Security must be handled by a specialist, not an employee in another specialty that you are taking away from their direct responsibilities.

Nodes must be monitored all the time but set up alert systems and work out response scenarios for all sorts of problems. It's better to make a good solution for monitoring the most critical indicators than use a fancy service with an unclear methodology.

It is necessary and useful to update the node, especially if you are running on the newest hardware. But update wisely so as not to break the core validator.

And while all validators are technically competitors, they are willing to help each other for a common goal. So don't hesitate to ask questions and seek help.

Check out our Twitter for the latest updates, and see our Blog for further valuable information, news, and best industry practices.

Dark - Light
Everstake Logo
Everstake
Content Manager
Everstake is one of the most reliable PoS validators on the market, with current volumes of customer staked funds exceeding 2B$ and over 735K+ delegators as of March 2023.

Contact us

Have questions?
We’re always there to answer!

Our distributed team of 20+ community managers is online 24/7 and is ready to assist you.