How to Run Nodes in a Cluster

Here the general method for running in clusters:

Terminology/Context

Default: the default way to run a node-- not manually setting data workers
Cluster: a grouped set of servers that utilize the a central peerId/config, or in other words control process, by manually defining and running data workers across each server defined in that config file
Control Process: the process that controls the data worker processes
Data Worker Process: a process in which does all the heavy lifting for proving and computation

Assuming 2 Machines/Servers in a Cluster

You have at least 2 machines that are on the same network:

  • Machine A
    • Internal IP address: 192.168.0.200
    • 4 cores
  • Machine B
    • Internal IP address: 192.168.0.201
    • 4 cores

You want to run as many cores as possible using PeerId Qmabc123.

You will need a dedicated core for for controlling the data workers, so this means that you only have 7 available cores for data workers.

Set-up on each machine for Config

You don’t need to copy your keys or store, but you will need to have a .config directory with at least the config.yml file for getting the data worker process running.

  • For the machine with the control process: you need the whole .config directory per usual, with keys.yml, store directory, etc.
  • For the machines with only data workers: you ONLY need the .config directory with only the config.yml file in it.

.config Directory placement

You can either place the .config directory in the default place (ceremonyclient/node/.config) or you can place it anywhere and when starting your processes to use the --config /location/of/.config/dir parameter.

Reasoning for this

As noted below in the “Further Thoughts” section, Cassie mentions:

… the [data worker ports] are not inherently secured by any authorization, so if you leave a data worker open, anyone can send it prover tasks (and thus earn rewards from it).

This would imply you don’t need the keys or store file on the data worker-only machines, just networked that only you can connect to the data workers with your control process.

As far as I can tell, the reasons the config is required is because the startup process will generate one if not found to use the defaults found there after it loads it into the application, as well as for defining the RPC Multiaddr for the data worker for this core.

Modifying the .config/config.yml file

So find the ./config/config.yml file for that Peer ID and modify the .engine.dataWorkerMultiaddrs array to include workers from each machine.

Relevant YAML syntax notation

The config files are written in YAML, so learning a bit of YAML to be able to modify your config files yourself would be recommended. For the tutorials sake, as there are a lot of beginners, I will cover the relevant syntax to get you through this tutorial.

YML array syntax

There are a couple options, single-line or multi-line.

For those interested to read more, here is a SO link.

Single-line arrays

# Single-line notation
# Each element is comma-separated inside the array closures. 
# May have new lines.
engine:
  dataWorkersMultiaddrs: [
      "/ip4/127.0.0.1/tcp/40000",
      ....
]

# or 

engine:
  dataWorkersMultiaddrs: [ "/ip4/127.0.0.1/tcp/40000", "/ip4/127.0.0.1/tcp/40001", # can all be on one line
    "/ip4/127.0.0.1/tcp/40003", # or on multiple lines
    "/ip4/127.0.0.1/tcp/40004", # or on multiple lines
      ....
]

Multi-line arrays

# one element per line, uses dash to indicate individual array elements
# no commas between entries
engine:
  dataWorkersMultiaddrs: 
      - "/ip4/127.0.0.1/tcp/40000"
      ....

Writing the data-worker elements

Data-workers will be mapped as follows:

  • Base port (by default) 40000
  • –core starts at 1

What this maps out to is:

Command:
node-1.4.21.1-linux-amd64 --core 1
# start a node on port 40000 on the machine it runs on.
Config Mapping:

The index in your engine.dataWorkerMultiaddrs array of 0 (core index - 1) must be /ip4/<ip4-address>/tcp/40000 where <ip4-address> is the internal/private IP address of the machine the above command will be ran on.

Full Example

Assuming machine A will have the control process, then that means Machine A will only have 3 definitions, and Machine B will have 4.

Config

engine:
  ...
  dataWorkerMultiaddrs: 
    # Machine A data workers
    - /ip4/192.168.0.200/tcp/40000
    - /ip4/192.168.0.200/tcp/40001
    - /ip4/192.168.0.200/tcp/40002
    # Machine B data workers
    - /ip4/192.168.0.201/tcp/40003
    - /ip4/192.168.0.201/tcp/40004
    - /ip4/192.168.0.201/tcp/40005
    - /ip4/192.168.0.201/tcp/40006
  difficulty: 0
...

Now copy this config to both Machine A and Machine B.

Commands

The commands for this script MUST be run attached to a parent process. While you can run in detached mode, I found it to be more hassle than it’s worth trying to keep core processes alive.

This means that the commands run below need to be run in a script in which you can get a process id to attach the core processes to.

This typically is done by something like:

# this gets the process of the script this is run in
PARENT_PROCESS_PID=$$

On the servers run the following:

Machine A
#!/bin/bash

PARENT_PROCESS_PID=$$

~/ceremonyclient/node/node-1.4.21.1-linux-amd64 // this will start up the parent process, but not any worker nodes because the .engine.dataWorkerMultiaddrs array is not empty

~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 1 --parent-process $PARENT_PROCESS_PID
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 2 --parent-process $PARENT_PROCESS_PID
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 3 --parent-process $PARENT_PROCESS_PID
Machine B
#!/bin/bash

PARENT_PROCESS_PID=$$

~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 4 --parent-process $PARENT_PROCESS_PID
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 5 --parent-process $PARENT_PROCESS_PID
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 6 --parent-process $PARENT_PROCESS_PID
~/ceremonyclient/node/node-1.4.21.1-linux-amd64 --core 7 --parent-process $PARENT_PROCESS_PID

You must use the incrementing core id or it will not find the right index in the .engine.dataWorkerMultiaddr array.

The Parent process ID does not have to the be control/master process. Just the process that is calling the commands.

At this point you will just want to write a loop that takes an input for the starting index and cores to start that is installed on each server.

Borrowing from Kingcaster’s documenation:

#!/bin/bash
# start-cluster.sh

START_CORE_INDEX=1
DATA_WORKER_COUNT=$(nproc)
PARENT_PID=$$

# Some variables for paths and binaries
QUIL_NODE_PATH=$HOME/ceremonyclient/node
NODE_BINARY=node-1.4.21.1-linux-amd64 # or whatever it is

# Parse command line arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        --core-index-start)
            START_CORE_INDEX="$2"
            shift 2
            ;;
        --data-worker-count)
            DATA_WORKER_COUNT="$2"
            shift 2
            ;;
        *)
            echo "Unknown option: $1"
            exit 1
            ;;
    esac
done


# Validate START_CORE_INDEX
if ! [[ "$START_CORE_INDEX" =~ ^[0-9]+$ ]]; then
    echo "Error: --core-index-start must be a non-negative integer"
    exit 1
fi

# Validate DATA_WORKER_COUNT
if ! [[ "$DATA_WORKER_COUNT" =~ ^[1-9][0-9]*$ ]]; then
    echo "Error: --data-worker-count must be a positive integer"
    exit 1
fi

# Get the maximum number of CPU cores
MAX_CORES=$(nproc)

# Adjust DATA_WORKER_COUNT if START_CORE_INDEX is 1
if [ "$START_CORE_INDEX" -eq 1 ]; then
    # Adjust MAX_CORES if START_CORE_INDEX is 1
    echo "Adjusting max cores available to $((MAX_CORES - 1)) (from $MAX_CORES) due to starting the master node on core 0"
    MAX_CORES=$((MAX_CORES - 1))
fi

# If DATA_WORKER_COUNT is greater than MAX_CORES, set it to MAX_CORES
if [ "$DATA_WORKER_COUNT" -gt "$MAX_CORES" ]; then
    DATA_WORKER_COUNT=$MAX_CORES
    echo "DATA_WORKER_COUNT adjusted down to maximum: $DATA_WORKER_COUNT"
fi

MASTER_PID=0

# kill off any stragglers
pkill node-*

start_master() {
    $QUIL_NODE_PATH/$NODE_BINARY &
    MASTER_PID=$!
}

if [ $START_CORE_INDEX -eq 1 ]; then
    start_master
fi

# Loop through the data worker count and start each core
start_workers() {
    # start the master node
    for ((i=0; i<DATA_WORKER_COUNT; i++)); do
        CORE=$((START_CORE_INDEX + i))
        echo "Starting core $CORE"
        $QUIL_NODE_PATH/$NODE_BINARY --core $CORE --parent-process $PARENT_PID &
    done
}

is_master_process_running() {
    ps -p $MASTER_PID > /dev/null 2>&1
    return $?
}

start_workers

while true
do
  # we only care about restarting the master process because the cores should be alive 
  # as long as this file is running (and this will only run on the machine with a start index of 1)
  if [ $START_CORE_INDEX -eq 1 ] && ! is_master_process_running; then
    echo "Process crashed or stopped. restarting..."
    start_master
  fi
  sleep 440
done

I run this script above as a service (/etc/systemd/system/ceremonyclient.service):

[Unit]
Description=Quilibrium Node Service (Cluster Mode)

[Service]
Type=simple
Restart=always
RestartSec=50ms
User=<name of user>
Group=<name of user>
# this WorkingDirectory is needed to find the .config directory
WorkingDirectory=$HOME/ceremonyclient/node
ExecStart=<absolute-path-to-script>/start-cluster.sh --core-index-start 1 --data-worker-count 255

[Install]
WantedBy=multi-user.target

Technical Note:

I started with --core 1 because it’s the core param cannot be 0 (it would result in a out of bounds array index error. For those curious for the technical reason: node/main.go:389 in the source repo works as follows:

if you define --core it will pass in the value and attempt to find the appropriate rpcMultiaddr:

if len(nodeConfig.Engine.DataWorkerMultiaddrs) != 0 {
    rpcMultiaddr = nodeConfig.Engine.DataWorkerMultiaddrs[*core-1]
}

Firewall Rules for Clusters

You could run Machine B without any firewalls, just connected to Machine A, which would/should have a firewall.

However if Machine B is a cloud device with an IP address you will want the firewall anyways.

If you have these firewalls active, you need to add local network exceptions to the firewall on servers that do not have a control process. Doing so will allow your control process to communicate to your data workers.

This would look something like this:

# On Machine B
sudo ufw allow from 192.168.0.201 to any port 40003:4006 proto tcp 
# On Machine C
sudo ufw allow from 192.168.0.201 to any port 40007:40016 proto tcp
...

Scaling this out further

Now, introduce a new server, Machine C (internal IP address of 192.168.0.203 with 10 workers)., then the config needs to be updated on A and B (as well as copied to C) and the commands for each of the machines would look as follows:

Config.yml (not all content is shown, and comments are just for clarity’s sake, not for any config function)

engine:
  ...
  dataWorkerMultiaddrs: 
    # Machine A data workers
    - /ip4/192.168.0.200/tcp/40000
    - /ip4/192.168.0.200/tcp/40001
    - /ip4/192.168.0.200/tcp/40002
    # Machine B data workers
    - /ip4/192.168.0.201/tcp/40003
    - /ip4/192.168.0.201/tcp/40004
    - /ip4/192.168.0.201/tcp/40005
    - /ip4/192.168.0.201/tcp/40006
    # Machine C data workers
    - /ip4/192.168.0.203/tcp/40007
    - /ip4/192.168.0.203/tcp/40008
    - /ip4/192.168.0.203/tcp/40009
    - /ip4/192.168.0.203/tcp/40010
    - /ip4/192.168.0.203/tcp/40011
    - /ip4/192.168.0.203/tcp/40012
    - /ip4/192.168.0.203/tcp/40013
    - /ip4/192.168.0.203/tcp/40014
    - /ip4/192.168.0.203/tcp/40015
    - /ip4/192.168.0.203/tcp/40016
  difficulty: 0

Adjusting Service Files

You would then adjust the service files on each server to start from the right index and amount of workers you want.

And same for any further servers you may later add to this cluster.

Restarting the Nodes

You will need to restart the nodes for each config change, so as the cluster gets bigger, scripts become more useful in automating the above commands.

Adding “Breathing Room”

It may be worth not running your machines at fully capacity, especially the machine with the control process as to allow it sufficient CPU capacity.

You could technically run multiple control processes on one CPU, but I don’t see the benefit unless running a pool of other peoples where your CPU isn’t running any data workers for yourself at all. In such case you are just managing 100% control processes on a smaller core server and editing config files to connect to other people’s data workers.

Further thoughts

Here are some additional thoughts in regards to this topic.

Efficiency of clustering

Processing Power

The benefits of clustering become more obvious the more machines/servers you operate.

For instance, if a Node Runner has 8 Mac Minis with 8 CPU cores and ran them all separately on their own PeerIds, they’d use 1/8 of their cores for the control process. Scale that up to 8 more and that is leaving a whole Mac Mini’s worth of data workers/processing power on the table.

Leveraging PeerId Seniority

One of the bigger upsides to this if a Node Runner wants to scale, they would not have to start over from scratch, rather they could cluster it and from day one gain the advantages of the more senior PeerId with more processing power.

I am not sure exactly how this will play out in 2.0 with the addition to prover rings, but I imagine having faster proving times means more times in queue for another task.

Downsides

The downsides are more about that you aren’t running as many configs, which may be useful for splitting your node’s proving across different sectors/applications.

This is my speculation, anyways, for 2.0. For all I know, there are ways to split your data workers across different areas while clustered.

Running a mining pool

I was musing about this since it has been brought up, but I’m not sure exactly how this would work in splitting rewards, as I am not aware of a way to figure out how much each QUIL a data worker brings to the table. This may be a reason I suspect mining pools will have limited core counts and perhaps white-listed machines, as 1000 cores with a hodge-podge of cores may not bring as much benefit than 500 vs how much rewards are split.

There may be cases where 1000 cores would actually produce more rewards, say in the case where you are in the inner prover rings, than say if you were just starting.

Using Public IPs vs Internal

You theoretically can open your firewall for these ports and use public IPs rather than internal, but @cassie mentioned in regards that,

[It would easier] to just use internal IPs, [but if you use public IPs you should at least] secure your transport links between workers and master.

… the [data worker ports] are not inherently secured by any authorization, so if you leave a data worker open, anyone can send it prover tasks (and thus earn rewards from it).

Authorization wouldn’t be hard to add, but it needs to be intrinsically pluggable because it’s an inevitability that people will want to join up in prover pools (in the same way people joined up in mining pools for bitcoin).

The authorization loops would be very different from a privately run single owner pool versus a public pool

VPNs

A VPN could be used to connect the remote devices together, and latency at 100ms+ while slow, is not an issue (pre-2.0).

Important note:

It should be enough to just add firewall exceptions for your parent/control process server on the data-worker machines.

With a VPN (tailscale uses WireGuard under the hood) you can secure your inter-node traffic on a secure, private network. When creating your data worker definitions you use the IP address assigned by tailscale or whatever vpn service you use. Your data workers can be completely firewalled except for the rule to accept traffic over your tailscale control node’s IP address.

Tailscale makes it easy and most people should be fine with the free plan, however I’m sure there are tutorials how to set up something yourself with WireGuard if you wanted to roll your own.

I myself could not get my first cluster until I used TailScale and their internal IP addresses for each machine. It just works for me so I will continue to use them.

That’s all, folks

I welcome any input/edits for further developing this documentation if there is something I missed or should be added.

8 Likes

Thanks for a great tutorial!

To secure the connection between the remote machines, couldn’t we simply set up the firewall like below?

sudo ufw allow from 192.168.0.201 to any port 40000:40032 proto tcp
sudo ufw allow from 192.168.0.203 to any port 40033:40064 proto tcp
etc.

Your clustered machines do not need to be connected to the internet, just networked to your control process, therefore doesn’t need the same firewall protections. If you wanted to be explicit or the machines have a public IP, yes, you could still add the firewall and then add the routing tables for those ports.

I will add that note as well as one about the whole .config file.

Yes, in my case the machines are all remote and have public IPs. I am very new to this and probably there are smarter ways to do this (some you have mentioned), but based on what I already know, using firewall rules could be the easier way to protect those ports for me.

One thing that is not clear to me is the following:
do all machines have the same keys.yml, and config.yml ?

Or teach one has their own keys and config and only the dataWorkerMultiaddrs: settings is repeated in each machine?

Thanks for the guide Tyga!

Since many of use rented bare metal configurations, I was wondering if clustering would work with two or more nodes that are being located at different server providers with different locations?

1 Like

I would use tailscale, and use the IP addresses it gives there and add the ufw rules for those ports & IP addresses.

I will add some more details tomorrow.

Only the master node/server needs the keys.yml file. All servers (master and workers) need the same config.yml.

Master:

  • Complete .config directory including store, keys.yml and config.yml.

Workers:

  • Only .config/config.yml. Nothing else.
2 Likes

Thanks!
I would add this specification to the main tutorial @Tyga

I think you meant

Workers:

  • Only .config/config.yml. Nothing else.

and not .config/keys.yml

Opps thanks for spotting the typo!

Is it an option to restrict the firewall rule, which allows traffic on port 40000:40032 on Machine B, to only the WAN IP of Machine A

Those ufw rules mentioned account for that in the main tutorial. Is it not clear that that’s what I mean or do I need to rephrase it?

question - do we need to run the processes in a particular order. I remember some article where it was mentioned - run all the workers first then run the processes on the main server. is that a requirement?

I would start with the control process and work your way through the data workers. If that does t work try it in a different way.

the threads compt like cores?

Threads are essentially cores that are hyper-threaded.

AMD and Intel have “Cores” which are hyper-threaded, which are essentially splits a single core into more than one logical processor.

So for instance, it is common for AMD have a 1:2 ratio for Core/Threads for the entire CPU. E.g. to have 32 cores, but 64 threads.

Intel CPUs tend to have mixed cores. For example, the Core i9-14900KF processor has 24 cores, made up of 8 Performance Cores (P-Cores) and 16 Efficiency Cores (E-Cores).

The P-Cores are hyper-threaded, meaning each core is allocated two threads, resulting in 8 (P-Cores) x 2 = 16 threads, which then gives the i9-14900KF a 32 threads across 24 cores.

The Operating System when delegating tasks will delegate individual tasks to each thread, or as some like to call them, vCPU.

than k u for ur answer but i just ask if i have to compt in 40001, 40002 etc

hello @Tyga
first of all thank you very much for this tutorial.
I tried carefully to do the process unfortunately it doesn’t work… I don’t understand what’s wrong on my side

Simply running this on slave nodes gives the error
panic: parent process pid not specified

I noticed that this script Kingcaster | Quil Parallel Nodes Guide solves this by adding the script PID as parent process.

Even with that script though, I am not able to make the cluster work as I get this error on my master node:

panic: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp <MASTER-IP>:40002: connect: connection refused"

Where MASTER-IP is the IP generated by tailscale for the master node.

I tried to debug this for some time but with no luck. I even tried to disable the firewall completely.


Later on, I developed a completely custom script to be run via service.
This one seems to work and I get no errors in the node log.
BUT the slave nodes do not fire up (CPU stays at 0), even if I see the processes listed in htop correctly.
Only the master node works at full power.

I have no idea why the slave nodes are not firing up, although the processes are running. Firewall rules seem ok, and they allow the master to communicate with the nodes on the necessary ports.

Encountered the same problem as you