Here the general method for running in clusters:
Terminology/Context
Default: the default way to run a node-- not manually setting data workers
Cluster: a grouped set of servers that utilize the a central peerId/config, or in other words control process, by manually defining and running data workers across each server defined in that config file
Control Process: the process that controls the data worker processes
Data Worker Process: a process in which does all the heavy lifting for proving and computation
Assuming 2 Machines/Servers in a Cluster
You have at least 2 machines that are on the same network:
- Machine A
- Internal IP address: 192.168.0.200
- 4 cores
- Machine B
- Internal IP address: 192.168.0.201
- 4 cores
You want to run as many cores as possible using PeerId Qmabc123.
You will need a dedicated core for for controlling the data workers, so this means that you only have 7 available cores for data workers.
Set-up on each machine for Config
You don’t need to copy your keys or store, but you will need to have a .config directory with at least the config.yml file for getting the data worker process running.
- For the machine with the control process: you need the whole .config directory per usual, with keys.yml, store directory, etc.
- For the machines with only data workers: you ONLY need the .config directory with only the config.yml file in it.
.config Directory placement
You can either place the .config
directory in the default place (ceremonyclient/node/.config) or you can place it anywhere and when starting your processes to use the --config /location/of/.config/dir
parameter.
Reasoning for this
As noted below in the “Further Thoughts” section, Cassie mentions:
… the [data worker ports] are not inherently secured by any authorization, so if you leave a data worker open, anyone can send it prover tasks (and thus earn rewards from it).
This would imply you don’t need the keys or store file on the data worker-only machines, just networked that only you can connect to the data workers with your control process.
As far as I can tell, the reasons the config is required is because the startup process will generate one if not found to use the defaults found there after it loads it into the application, as well as for defining the RPC Multiaddr for the data worker for this core.
Modifying the .config/config.yml file
So find the ./config/config.yml file for that Peer ID and modify the .engine.dataWorkerMultiaddrs
array to include workers from each machine.
Relevant YAML syntax notation
The config files are written in YAML, so learning a bit of YAML to be able to modify your config files yourself would be recommended. For the tutorials sake, as there are a lot of beginners, I will cover the relevant syntax to get you through this tutorial.
YML array syntax
There are a couple options, single-line or multi-line.
For those interested to read more, here is a SO link.
Single-line arrays
# Single-line notation
# Each element is comma-separated inside the array closures.
# May have new lines.
engine:
dataWorkersMultiaddrs: [
"/ip4/127.0.0.1/tcp/40000",
....
]
# or
engine:
dataWorkersMultiaddrs: [ "/ip4/127.0.0.1/tcp/40000", "/ip4/127.0.0.1/tcp/40001", # can all be on one line
"/ip4/127.0.0.1/tcp/40003", # or on multiple lines
"/ip4/127.0.0.1/tcp/40004", # or on multiple lines
....
]
Multi-line arrays
# one element per line, uses dash to indicate individual array elements
# no commas between entries
engine:
dataWorkersMultiaddrs:
- "/ip4/127.0.0.1/tcp/40000"
....
Writing the data-worker elements
Data-workers will be mapped as follows:
- Base port (by default) 40000
- –core starts at 1
What this maps out to is:
Command:
node-1.4.21.1-linux-amd64 --core 1
# start a node on port 40000 on the machine it runs on.
Config Mapping:
The index in your engine.dataWorkerMultiaddrs
array of 0 (core index - 1) must be /ip4/<ip4-address>/tcp/40000
where <ip4-address>
is the internal/private IP address of the machine the above command will be ran on.
Full Example
Assuming machine A will have the control process, then that means Machine A will only have 3 definitions, and Machine B will have 4.
Config
engine:
...
dataWorkerMultiaddrs:
# Machine A data workers
- /ip4/192.168.0.200/tcp/40000
- /ip4/192.168.0.200/tcp/40001
- /ip4/192.168.0.200/tcp/40002
# Machine B data workers
- /ip4/192.168.0.201/tcp/40003
- /ip4/192.168.0.201/tcp/40004
- /ip4/192.168.0.201/tcp/40005
- /ip4/192.168.0.201/tcp/40006
difficulty: 0
...
Now copy this config to both Machine A and Machine B.
Commands
The commands for this script MUST be run attached to a parent process. While you can run in detached mode, I found it to be more hassle than it’s worth trying to keep core processes alive.
This means that the commands run below need to be run in a script in which you can get a process id to attach the core processes to.
This typically is done by something like:
# this gets the process of the script this is run in
PARENT_PROCESS_PID=$$
On the servers run the following:
Machine A
#!/bin/bash
PARENT_PROCESS_PID=$$
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 // this will start up the parent process, but not any worker nodes because the .engine.dataWorkerMultiaddrs array is not empty
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 1 --parent-process $PARENT_PROCESS_PID
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 2 --parent-process $PARENT_PROCESS_PID
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 3 --parent-process $PARENT_PROCESS_PID
Machine B
#!/bin/bash
PARENT_PROCESS_PID=$$
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 4 --parent-process $PARENT_PROCESS_PID
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 5 --parent-process $PARENT_PROCESS_PID
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 6 --parent-process $PARENT_PROCESS_PID
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 7 --parent-process $PARENT_PROCESS_PID
You must use the incrementing core id or it will not find the right index in the .engine.dataWorkerMultiaddr
array.
The Parent process ID does not have to the be control/master process. Just the process that is calling the commands.
At this point you will just want to write a loop that takes an input for the starting index and cores to start that is installed on each server.
Borrowing from Kingcaster’s documenation:
#!/bin/bash
# start-cluster.sh
START_CORE_INDEX=1
DATA_WORKER_COUNT=$(nproc)
PARENT_PID=$$
# Some variables for paths and binaries
QUIL_NODE_PATH=$HOME/ceremonyclient/node
NODE_BINARY=node-1.4.21.1-linux-amd64 # or whatever it is
# Parse command line arguments
while [[ $# -gt 0 ]]; do
case $1 in
--core-index-start)
START_CORE_INDEX="$2"
shift 2
;;
--data-worker-count)
DATA_WORKER_COUNT="$2"
shift 2
;;
*)
echo "Unknown option: $1"
exit 1
;;
esac
done
# Validate START_CORE_INDEX
if ! [[ "$START_CORE_INDEX" =~ ^[0-9]+$ ]]; then
echo "Error: --core-index-start must be a non-negative integer"
exit 1
fi
# Validate DATA_WORKER_COUNT
if ! [[ "$DATA_WORKER_COUNT" =~ ^[1-9][0-9]*$ ]]; then
echo "Error: --data-worker-count must be a positive integer"
exit 1
fi
# Get the maximum number of CPU cores
MAX_CORES=$(nproc)
# Adjust DATA_WORKER_COUNT if START_CORE_INDEX is 1
if [ "$START_CORE_INDEX" -eq 1 ]; then
# Adjust MAX_CORES if START_CORE_INDEX is 1
echo "Adjusting max cores available to $((MAX_CORES - 1)) (from $MAX_CORES) due to starting the master node on core 0"
MAX_CORES=$((MAX_CORES - 1))
fi
# If DATA_WORKER_COUNT is greater than MAX_CORES, set it to MAX_CORES
if [ "$DATA_WORKER_COUNT" -gt "$MAX_CORES" ]; then
DATA_WORKER_COUNT=$MAX_CORES
echo "DATA_WORKER_COUNT adjusted down to maximum: $DATA_WORKER_COUNT"
fi
MASTER_PID=0
# kill off any stragglers
pkill node-*
start_master() {
$QUIL_NODE_PATH/$NODE_BINARY &
MASTER_PID=$!
}
if [ $START_CORE_INDEX -eq 1 ]; then
start_master
fi
# Loop through the data worker count and start each core
start_workers() {
# start the master node
for ((i=0; i<DATA_WORKER_COUNT; i++)); do
CORE=$((START_CORE_INDEX + i))
echo "Starting core $CORE"
$QUIL_NODE_PATH/$NODE_BINARY --core $CORE --parent-process $PARENT_PID &
done
}
is_master_process_running() {
ps -p $MASTER_PID > /dev/null 2>&1
return $?
}
start_workers
while true
do
# we only care about restarting the master process because the cores should be alive
# as long as this file is running (and this will only run on the machine with a start index of 1)
if [ $START_CORE_INDEX -eq 1 ] && ! is_master_process_running; then
echo "Process crashed or stopped. restarting..."
start_master
fi
sleep 440
done
I run this script above as a service (/etc/systemd/system/ceremonyclient.service
):
[Unit]
Description=Quilibrium Node Service (Cluster Mode)
[Service]
Type=simple
Restart=always
RestartSec=50ms
User=<name of user>
Group=<name of user>
# this WorkingDirectory is needed to find the .config directory
WorkingDirectory=$HOME/ceremonyclient/node
ExecStart=<absolute-path-to-script>/start-cluster.sh --core-index-start 1 --data-worker-count 255
[Install]
WantedBy=multi-user.target
Technical Note:
I started with --core 1
because it’s the core param cannot be 0 (it would result in a out of bounds array index error. For those curious for the technical reason: node/main.go:389
in the source repo works as follows:
if you define --core it will pass in the value and attempt to find the appropriate rpcMultiaddr:
if len(nodeConfig.Engine.DataWorkerMultiaddrs) != 0 {
rpcMultiaddr = nodeConfig.Engine.DataWorkerMultiaddrs[*core-1]
}
Firewall Rules for Clusters
You could run Machine B without any firewalls, just connected to Machine A, which would/should have a firewall.
However if Machine B is a cloud device with an IP address you will want the firewall anyways.
If you have these firewalls active, you need to add local network exceptions to the firewall on servers that do not have a control process. Doing so will allow your control process to communicate to your data workers.
This would look something like this:
# On Machine B
sudo ufw allow from 192.168.0.201 to any port 40003:4006 proto tcp
# On Machine C
sudo ufw allow from 192.168.0.201 to any port 40007:40016 proto tcp
...
Scaling this out further
Now, introduce a new server, Machine C (internal IP address of 192.168.0.203 with 10 workers)., then the config needs to be updated on A and B (as well as copied to C) and the commands for each of the machines would look as follows:
Config.yml (not all content is shown, and comments are just for clarity’s sake, not for any config function)
engine:
...
dataWorkerMultiaddrs:
# Machine A data workers
- /ip4/192.168.0.200/tcp/40000
- /ip4/192.168.0.200/tcp/40001
- /ip4/192.168.0.200/tcp/40002
# Machine B data workers
- /ip4/192.168.0.201/tcp/40003
- /ip4/192.168.0.201/tcp/40004
- /ip4/192.168.0.201/tcp/40005
- /ip4/192.168.0.201/tcp/40006
# Machine C data workers
- /ip4/192.168.0.203/tcp/40007
- /ip4/192.168.0.203/tcp/40008
- /ip4/192.168.0.203/tcp/40009
- /ip4/192.168.0.203/tcp/40010
- /ip4/192.168.0.203/tcp/40011
- /ip4/192.168.0.203/tcp/40012
- /ip4/192.168.0.203/tcp/40013
- /ip4/192.168.0.203/tcp/40014
- /ip4/192.168.0.203/tcp/40015
- /ip4/192.168.0.203/tcp/40016
difficulty: 0
Adjusting Service Files
You would then adjust the service files on each server to start from the right index and amount of workers you want.
And same for any further servers you may later add to this cluster.
Restarting the Nodes
You will need to restart the nodes for each config change, so as the cluster gets bigger, scripts become more useful in automating the above commands.
Adding “Breathing Room”
It may be worth not running your machines at fully capacity, especially the machine with the control process as to allow it sufficient CPU capacity.
You could technically run multiple control processes on one CPU, but I don’t see the benefit unless running a pool of other peoples where your CPU isn’t running any data workers for yourself at all. In such case you are just managing 100% control processes on a smaller core server and editing config files to connect to other people’s data workers.
Further thoughts
Here are some additional thoughts in regards to this topic.
Efficiency of clustering
Processing Power
The benefits of clustering become more obvious the more machines/servers you operate.
For instance, if a Node Runner has 8 Mac Minis with 8 CPU cores and ran them all separately on their own PeerIds, they’d use 1/8 of their cores for the control process. Scale that up to 8 more and that is leaving a whole Mac Mini’s worth of data workers/processing power on the table.
Leveraging PeerId Seniority
One of the bigger upsides to this if a Node Runner wants to scale, they would not have to start over from scratch, rather they could cluster it and from day one gain the advantages of the more senior PeerId with more processing power.
I am not sure exactly how this will play out in 2.0 with the addition to prover rings, but I imagine having faster proving times means more times in queue for another task.
Downsides
The downsides are more about that you aren’t running as many configs, which may be useful for splitting your node’s proving across different sectors/applications.
This is my speculation, anyways, for 2.0. For all I know, there are ways to split your data workers across different areas while clustered.
Running a mining pool
I was musing about this since it has been brought up, but I’m not sure exactly how this would work in splitting rewards, as I am not aware of a way to figure out how much each QUIL a data worker brings to the table. This may be a reason I suspect mining pools will have limited core counts and perhaps white-listed machines, as 1000 cores with a hodge-podge of cores may not bring as much benefit than 500 vs how much rewards are split.
There may be cases where 1000 cores would actually produce more rewards, say in the case where you are in the inner prover rings, than say if you were just starting.
Using Public IPs vs Internal
You theoretically can open your firewall for these ports and use public IPs rather than internal, but @cassie mentioned in regards that,
[It would easier] to just use internal IPs, [but if you use public IPs you should at least] secure your transport links between workers and master.
… the [data worker ports] are not inherently secured by any authorization, so if you leave a data worker open, anyone can send it prover tasks (and thus earn rewards from it).
Authorization wouldn’t be hard to add, but it needs to be intrinsically pluggable because it’s an inevitability that people will want to join up in prover pools (in the same way people joined up in mining pools for bitcoin).
The authorization loops would be very different from a privately run single owner pool versus a public pool
VPNs
A VPN could be used to connect the remote devices together, and latency at 100ms+ while slow, is not an issue (pre-2.0).
Important note:
It should be enough to just add firewall exceptions for your parent/control process server on the data-worker machines.
With a VPN (tailscale uses WireGuard under the hood) you can secure your inter-node traffic on a secure, private network. When creating your data worker definitions you use the IP address assigned by tailscale or whatever vpn service you use. Your data workers can be completely firewalled except for the rule to accept traffic over your tailscale control node’s IP address.
Tailscale makes it easy and most people should be fine with the free plan, however I’m sure there are tutorials how to set up something yourself with WireGuard if you wanted to roll your own.
I myself could not get my first cluster until I used TailScale and their internal IP addresses for each machine. It just works for me so I will continue to use them.
That’s all, folks
I welcome any input/edits for further developing this documentation if there is something I missed or should be added.