Master node can't connect to worker after upgrading to 1.4.20.1

Hello,
I have a problem with my master, just after upgrade. Could you help me please, i didn’t find. My prerequisites are ok. Thanks

Hi,

Welcome to the forum! I edited the title a bit to make it easier to find for others. Hope you don’t mind!

Regarding your question, it looks like your master node can’t connect to one of the workers (the one running on 192.168.1.99). Has that worker been upgraded?

1 Like

All workers are upgraded, but i saw a little difference between config.yml
In one of them, there are this lines :
network: 0 in p2p and
dataWorkerBaseListenMultiaddr: “”
dataWorkerBaseListenPort: 0
dataWorkerMemoryLimit: 0
in engine
What is it ??
The other ones have not this.

I need some information please.

I’m refering to this website : Kingcaster | Quil Parallel Nodes Guide
Kingcaster said : Above is the setup for an 3 x (4 cores) - 12 cores total, 1 core is reserved for task scheduling - machine clusters, note 40001 - 40004, with more cores increment the port 40005… 40006… and so on

but the configurator is not updated : addrs - JSFiddle - Code Playground
so i launched my nodes like this :
bash para.sh linux amd64 0 31 1.4.20.1
bash para.sh linux amd64 31 31 1.4.20.1
bash para.sh linux amd64 62 31 1.4.20.1
bash para.sh linux amd64 93 31 1.4.20.1
bash para.sh linux amd64 124 31 1.4.20.1

and like this
bash para.sh linux amd64 0 32 1.4.20.1
bash para.sh linux amd64 32 32 1.4.20.1
bash para.sh linux amd64 64 32 1.4.20.1
bash para.sh linux amd64 96 32 1.4.20.1
bash para.sh linux amd64 128 32 1.4.20.1 as king said.(slaves first, master last)
before upgrade i launched with 1 core less and as the little configurator
witch generate you your peers.

I verified my ips and cores numbers but, at the end i’m confused, one day i m on this problem.

Could you affirm one method please ?

I’m not familiar with the worker set up, but this might mean that one of the workers is not set up correctly. I’ll ping the author of the guide to see if they can help.

Or should we just run ./release_autorun.sh on all workers?

i have this error on 3 nodes (node 2 3 and 4)
panic: start: listen tcp4 192.168.1.56:40001: bind: cannot assign requested address

i have this error on last node : panic: runtime error: index out of range [159] with length 159

i have this error on master : panic: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing: dial tcp 192.168.1.98:40001: connect: connection refused”

Or should we just run ./release_autorun.sh on all workers?

I haven’t used the guide, but my understanding is that it asks to run a custom script on all machines.

i have this error on 3 nodes (node 2 3 and 4)
panic: start: listen tcp4 192.168.1.56:40001: bind: cannot assign requested address

This means that the address is either already on use on that node (you can check with ps aux | grep node if there is a node process running already) or that the IP doesn’t belong to the node.

i have this error on master : panic: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing: dial tcp 192.168.1.98:40001: connect: connection refused”

This means that the node isn’t running on 192.168.1.98 or that it’s unreachable from the master process.

i have this error on last node : panic: runtime error: index out of range [159] with length 159

I got the info from the guide author that the guide had an update to fix the issues listed below. This might be related to that.

  1. the data…Addrs was generating 1 less entry to the configuration
  2. the run params (outlined at the end of the tutorial) is incorrect, the startingCore should be -1 of the intended core (it should be the position of the entry of the first data worker of the cluster in the data…Addrs config).

Hope this helps!

Thank you, my errors are corrected with the updated guide. ( 0 32, 31 32, 63 32, 95 32…etc )
Thanks a lot Abo and Beepboop !

1 Like