Part 9 - Node Operator's Configuration Guide for Dummies - Monitoring & Maintenance Cont'd.
Updated: Mar 10, 2022
Part 9 - Monitoring & Maintenance Cont'd
Part 1 - Rocket Pool - Node Operator's Configuration Guide for Dummies
Part 4 - Installing Rocket Pool
Part 5 - Starting RP & Setting up Node Wallet
Part 6 - Preparing your Node for Operation
Part 7 - Creating a New Minipool
Part 9 - Monitoring & Maintenance Cont'd
Checking for Updates
One of the responsibilities of a node operator is making sure your system is up-to-date with the latest security patches. Automatic updates are convenient but can interfere with your node operation, so it may be preferable to run them manually. In either case, you must make sure that your machine is regularly patched!
NOTE Generally, updating will not require your system to be down for more than a few minutes. You might be concerned that such downtime will negatively affect your Beacon Chain balance. Rest assured, the penalty for being offline for such a short period of time is completely negligible.
Each attestation you miss will penalize you for slightly less than the amount you'd earn from a successful attestation. As a rule of thumb, if you're offline for an hour, you will earn it all back after being online for an hour again.
Also, note that there is absolutely no chance that you will be slashed by going offline for a short time. Slashing only occurs if you attack the network, and going offline for maintenance does not count as attacking the network.
Please keep your systems up to date - don't worry about the downtime penalties!
Updating your Operating System
You should frequently check your Operating System's package manager or update service to ensure that quickly apply any important new security patches. The exact instructions vary for each Operating System and can be found with your system's documentation, but here is Ubuntu.
In a terminal, type the following:
sudo apt update
This will access the package servers and check to see if any of your installed packages have new versions available. If updates are available, the output will look like this:
Fetched 3974 kB in 2s (1641 kB/s) Reading package lists... Done Building dependency tree Reading state information... Done 12 packages can be upgraded. Run 'apt list --upgradable' to see them.
You can install the updates with the following command:
sudo apt dist-upgrade
This will show you the list of packages that are about to be updated, and if the total installation size is large enough, it will show you the size and prompt you to confirm that you accept:
12 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. Need to get 51.3 MB of archives. After this operation, 52.2 kB of additional disk space will be used. Do you want to continue? [Y/n]
Ensure that you have enough space available to do this, then press y and Enter to begin the update process.
Once the progress bar is finished, and you're dropped back into the terminal prompt, run the following command to clean up any old versions of packages that were just replaced:
sudo apt autoremove
Next, check if your system needs to be rebooted:
If the above command prints *** System restart required ***, then you should restart your machine to finish applying the updates when you are able:
Rocket Pool will gracefully shut down and automatically start back up with the system once it reboots.
Updating the Smartnode Stack
Occasionally, Rocket Pool will release a new version of the Smartnode stack. Updates can contain new versions of the CLI or the Rocket Pool Docker containers, as well as new versions of the ETH1 and ETH2 clients.
The most consistent way to find out about new releases is to subscribe to the Rocket Pool Discord server; they will always be posted in the Announcements channel, and you will receive a notification.
NOTE Note that running apt update will not update the node software. This must be done manually using the steps below.
Here are the steps to upgrade the node software on Docker/Linux:
Stop the Rocket Pool services:
rocketpool service stop
Download the new Smartnode CLI:
For x64 systems (most normal machines):
wget https://github.com/rocket-pool/smartnode-install/releases/latest/download/rocketpool-cli-linux-amd64 -O ~/bin/rocketpool
Now run the install command:
rocketpool service install -d
The -d flag tells it to ignore system dependencies like Docker, since you already have them.
Next, start Rocket Pool up again:
rocketpool service start
Finally, check the version to make sure the CLI and Smartnode stack are both up to date:
rocketpool service version Rocket Pool client version: 1.0.0-rc3 Rocket Pool service version: 1.0.0-rc3 Selected Eth 1.0 client: Geth (rocketpool/client-go:v1.10.4) Selected Eth 2.0 client: Nimbus (statusim/nimbus:v1.4.0)
Both the client and service should match the new release version.
Manually Updating the ETH1 or ETH2 Client
Each new release of the Smartnode stack will come with updated references to the latest compatible versions of the ETH1 and ETH2 Docker containers. In some cases, however, you might want to upgrade one of those clients before waiting for a new Smartnode stack release.
This section will show you how to do just that.
Updating to new client versions is fairly straightforward in Docker mode. Start by shutting down the containers that you want to update. For ETH1:
docker stop rocketpool_eth1
docker stop rocketpool_eth2 docker stop rocketpool_validator
Next, open the file ~/.rocketpool/config.yml in your favorite text editor. Scroll down to the section that describes the client you want to update. For example, here is an excerpt from the Geth section:
name: Geth desc: "\tGeth is one of the three original implementations of the\n \t\tEthereum protocol. It is written in Go, fully open source and\n \t\tlicensed under the GNU LGPL v3." image: ethereum/client-go:v1.10.4 link: https://geth.ethereum.org/
Note the image: ethereum/client-go:v1.10.4 line. This refers to the name and version of the image to download from Docker Hub. Replace the version number here with the updated version number, then save the file and exit the editor.
Finally, run rocketpool service start to automatically download the new images, and restart the containers you had shut down.
You should follow the service logs closely after the upgrade to ensure that the new client works as expected. Once you're satisfied that everything is working, then you're done. That's all there is to it!
NOTE This process is slightly different for Prysm because the Smartnode stack needs to use the DEBUG images that Prysm provides instead of the normally versioned ones. For help upgrading Prysm manually, please visit the smart-nodes channel in the Rocket Pool Discord.
If you use geth as your primary ETH1 client, you will likely notice that your node's free disk space slowly decreases over time. Geth is by far the biggest contributor to this; depending on how much RAM you allocated to its cache during rocketpool service config, it can grow at a rate of several gigabytes per day!
To handle this, Geth provides a special function called pruning that lets it scan and clean up its database safely to reclaim some free space. Every node operator using Geth will have to prune it eventually.
If you have a 2 TB SSD, you can usually go for months between rounds of pruning. For 1 TB SSD users, you will have to prune more frequently.
If you have the Grafana dashboard enabled, a good rule of thumb is to start thinking about pruning Geth when your node's used disk space exceeds 80%.
When you decide that it's time, the Smartnode comes with the ability to prune Geth for you upon request. Read below to learn how it works, and what to expect.
NOTE When using the Smartnode to prune Geth, it's assumed that Geth is your primary ETH1 client and is managed by Rocket Pool (e.g. you chose Geth as your ETH1 client in rocketpool service config.
Pruning Geth means taking the primary ETH1 client offline, so it can clean itself up. When this happens, the Smartnode (and your ETH2 client) will need some other way to access the ETH1 chain in order to function properly.
The easiest way to provide this is with a fallback ETH1 client. If you configured an ETH1 fallback client using rocketpool service config already, then the Smartnode will automatically switch over to it when your Geth container goes down for maintenance for you. It will also inform your ETH2 client to use the fallback as well.
WARNING If you don't have an ETH1 fallback client configured, your Smartnode will stop working until Geth finishes pruning. Your ETH2 client will still attest, but it will fail any block proposals it makes.
With that in mind, the following two conditions are required to successfully prune Geth:
A working ETH1 fallback client configured
At least 50 GB of free space remaining on your SSD
Starting a Prune
When you want to prune Geth, simply run this command:
rocketpool service prune-eth1
This will present you something similar to the following message, depending on which ETH1 fallback client you're using:
This will shut down your main ETH1 client and prune its database, freeing up disk space. Once pruning is complete, your ETH1 client will restart automatically. You have a fallback ETH1 client configured (custom). Rocket Pool (and your ETH2 client) will use that while the main client is pruning.
NOTE If you're using Infura, you will see this warning message:
If you are using Infura's free tier, you may hit its rate limit if pruning takes a long time. If this happens, you should temporarily disable the `rocketpool_node` container until pruning is complete. This will: - Stop collecting Rocket Pool's network metrics in the Grafana dashboard - Stop automatic operations (claiming RPL rewards and staking new minipools) To disable the container, run: `docker stop rocketpool_node` To re-enable the container one pruning is complete, run: `docker start rocketpool_node`
This warning is to let you know that Infura's free tier may not be enough to reliably act as a fallback for everything Rocket Pool needs, while Geth prunes. The Smartnode carries out quite a few queries to the ETH1 client (which scales linearly with the number of minipools you have running).
If you ever notice that the Smartnode or Grafana dashboard stop working properly, take a look at the fallback client's logs with:
rocketpool service logs eth1-fallback
If you see many messages about rate limits being hit or exceeded, then you will likely need to stop your node container as the warning text suggests so that the Smartnode isn't trying to query Infura so much.
After this, you will be prompted to confirm that you're ready to prune. If you accept, you'll see a few details as the Smartnode prepares things; it should end with a success message:
Are you sure you want to prune your main ETH1 client? [y/n] y Your disk has 303 GiB free, which is enough to prune. Stopping rocketpool_eth1... Provisioning pruning on volume rocketpool_eth1clientdata... Restarting rocketpool_eth1... Done! Your main ETH1 client is now pruning. You can follow its progress with `rocketpool service logs eth1`. Once it's done, it will restart automatically and resume normal operation. NOTE: While pruning, you **cannot** interrupt the client (e.g. by restarting) or you risk corrupting the database! You must let it run to completion!
With that, Geth is now pruning, and you're all set! You can follow its progress with:
rocketpool service logs eth1
Once it's done pruning, it will restart automatically and the Smartnode will resume using it again instead of your fallback.