How to Run Nodes in a Cluster

The only time that error occurs is if you do not specify the data-workers in your DataWorkerMuliaddrs field:

This is only useful for data-workers that will be running on your localhost (same as control process) as remote machines will not be able to look on the control process server for the control process PID.

Looking further into the code, the use of the parent process PID is to monitor the parent:
Here it passes the parent process ID input to the data worker:

And then it saves it to the Data worker object to be used later:

and then on Start calls the monitorParent function:

which calls the monitorParent function:

Noteably, the monitor parent function will just log that it is running in detatched mode if the parentProcessId is - (default if not passed in).

Itā€™s the only time the PID is used at all.

So to me, this points to your config not working on your remote machines (non-master).

Iā€™d check:

  1. config not being properly formatted on your clustered machines
  2. your indexing is off starting the node command
  3. the config is not duplicated to the non-master machines (or not findable/in right location)
  4. network connection (can your devices actually communicate)
  5. ports listening to see if your data workers are actually listening

Do we need same performance servers for a cluster? If one server is much faster than the others, Iā€™m wondering how quil deals with it, as we only have one ā€œtime_takenā€, will this fast server have to wait for the others?

yes. Youā€™d want similar hardware across your entire cluster as anything else just introduces inefficiencies/wasted timeā€” you could start a new cluster with more powerful hardware that isnā€™t dragged down by your older hardware and you wouldnā€™t want to add slower hardware into your clusters as you would be shooting yourself in the foot.

1 Like

Thank you for explaining the code :slight_smile:

I was able to get this working thanks to @blacks1

  1. In the ceremonyclient.service file I had ā€˜/rootā€™ specified as a working directory. So the node binary was looking for (and generated default) config files in /root/.config directory. So I updated the working directory to ā€˜/root/ceremonyclient/nodeā€™.
  2. The config files had '- ā€™ chars appended to dataWorkerMuktiaddrs entries which is not valid for yml. I removed those prefixes.

@Tyga, maybe you should add a notation in the main tutorial to either use the square brackets or the '- ā€™ chars for the dataWorkerMultiaddrs: entries
It can be confusing for first timers.

1 Like

How can I leave breathing room on my master node? I tried deleting 1 worker from multiadrs but it crashed, then i launched from --core 2 and runned less cores but it crashed aswell.

Now iā€™m thinking about cpulimit but im not sure will the control process take resources. Any suggetsions?

I removed '- ā€™ as you were using sware bracket ā€˜[ā€™ and ā€˜]ā€™ to enclose dataWorkerMuktiaddrs entries (just like the default config file suggests).
In the tutorial an alternative notation without square brackets is used which is likely valid.

1 Like

ok. Just removed last workers from masterā€™s config and in script changed to less cores and it worked out. Gonna test is it yielding more rewards!

Does someone know if one could use Tailscale magic DNS, such as machine-a.taila7r35.ts.net in the config.yml and in the firewall rules?

This would allow for easy management if switching a machine, since AFAIU you can maintain the same DNS which is derived from the name you assign to the machine in Tailscale.

dataWorkerMultiaddrs: [
   # Machine A
   /ip4/machine-a.taila7r35.ts.net/tcp/40000,
   /ip4/machine-a.taila7r35.ts.net/tcp/40001,
   /ip4/machine-a.taila7r35.ts.net/tcp/40002,
   ...

and
sudo ufw allow from machine-a.taila7r35.ts.net to any port 40003:40006 proto tcp

I made this thingy

6 Likes

/dns/<dns-address>/tcp/40000

and

1 Like

I have looked around at other firewall solution, but I donā€™t find anything that supports DNS in their rules.
But I havenā€™t dug too much TBH.

Pity, because this way when switching machines one would have to re-edit at least the firewall rules.

Although, I see that Tailscale allows you to change the IP address of each machine. Wondering if you would be able to delete an old machine, add a new one and then change its IP to the one of the old machineā€¦

Would it be enough for the slaves node to only have this in the config file and nothing else?

engine:
  dataWorkerMultiaddrs: 
    # Machine A data workers
    - /ip4/192.168.0.200/tcp/40000
    - /ip4/192.168.0.200/tcp/40001
    - /ip4/192.168.0.200/tcp/40002
    # Machine B data workers
    - /ip4/192.168.0.201/tcp/40003
    - /ip4/192.168.0.201/tcp/40004
    - /ip4/192.168.0.201/tcp/40005
    - /ip4/192.168.0.201/tcp/40006

UPDATE:
Just tested and I confirm that it works. This is very good for security to avoid keeping any sensible data on the slave nodes.

1 Like

yes. As you found out, currently the only reason for the config on the slave node(s) is to get the addresses to create DataWorker processes for (and their ports).

How to solve the issue where a problem with any single node in a cluster causes the entire cluster to stop working?

This should be noted that in 2.1 you need to copy your keys and config files to the slaves (store not needed).

On the slave:

$HOME/ceremonyclient/node/
$HOME/ceremonyclient/node/.config/
$HOME/ceremonyclient/node/.config/keys.yml
$HOME/ceremonyclient/node/.config/config.yml

this is solved in 2.0. It will just continue without that worker, even though it initially complains about it.