Master node can't connect to worker after upgrading to 1.4.20.1

Tinou · June 30, 2024, 1:33am

Hello,
I have a problem with my master, just after upgrade. Could you help me please, i didn’t find. My prerequisites are ok. Thanks

abc · June 30, 2024, 9:29am

Hi,

Welcome to the forum! I edited the title a bit to make it easier to find for others. Hope you don’t mind!

Regarding your question, it looks like your master node can’t connect to one of the workers (the one running on 192.168.1.99). Has that worker been upgraded?

Tinou · June 30, 2024, 3:27pm

All workers are upgraded, but i saw a little difference between config.yml
In one of them, there are this lines :
network: 0 in p2p and
dataWorkerBaseListenMultiaddr: “”
dataWorkerBaseListenPort: 0
dataWorkerMemoryLimit: 0
in engine
What is it ??
The other ones have not this.

Tinou · June 30, 2024, 3:34pm

I need some information please.

I’m refering to this website : Kingcaster | Quil Parallel Nodes Guide
Kingcaster said : Above is the setup for an 3 x (4 cores) - 12 cores total, 1 core is reserved for task scheduling - machine clusters, note 40001 - 40004, with more cores increment the port 40005… 40006… and so on

but the configurator is not updated : addrs - JSFiddle - Code Playground
so i launched my nodes like this :
bash para.sh linux amd64 0 31 1.4.20.1
bash para.sh linux amd64 31 31 1.4.20.1
bash para.sh linux amd64 62 31 1.4.20.1
bash para.sh linux amd64 93 31 1.4.20.1
bash para.sh linux amd64 124 31 1.4.20.1

and like this
bash para.sh linux amd64 0 32 1.4.20.1
bash para.sh linux amd64 32 32 1.4.20.1
bash para.sh linux amd64 64 32 1.4.20.1
bash para.sh linux amd64 96 32 1.4.20.1
bash para.sh linux amd64 128 32 1.4.20.1 as king said.(slaves first, master last)
before upgrade i launched with 1 core less and as the little configurator
witch generate you your peers.

I verified my ips and cores numbers but, at the end i’m confused, one day i m on this problem.

Could you affirm one method please ?

abc · June 30, 2024, 3:51pm

I’m not familiar with the worker set up, but this might mean that one of the workers is not set up correctly. I’ll ping the author of the guide to see if they can help.

Tinou · June 30, 2024, 4:37pm

Or should we just run ./release_autorun.sh on all workers?

i have this error on 3 nodes (node 2 3 and 4)
panic: start: listen tcp4 192.168.1.56:40001: bind: cannot assign requested address

i have this error on last node : panic: runtime error: index out of range [159] with length 159

i have this error on master : panic: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing: dial tcp 192.168.1.98:40001: connect: connection refused”

abc · June 30, 2024, 9:45pm

Or should we just run ./release_autorun.sh on all workers?

I haven’t used the guide, but my understanding is that it asks to run a custom script on all machines.

i have this error on 3 nodes (node 2 3 and 4)
panic: start: listen tcp4 192.168.1.56:40001: bind: cannot assign requested address

This means that the address is either already on use on that node (you can check with ps aux | grep node if there is a node process running already) or that the IP doesn’t belong to the node.

i have this error on master : panic: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing: dial tcp 192.168.1.98:40001: connect: connection refused”

This means that the node isn’t running on 192.168.1.98 or that it’s unreachable from the master process.

i have this error on last node : panic: runtime error: index out of range [159] with length 159

I got the info from the guide author that the guide had an update to fix the issues listed below. This might be related to that.

the data…Addrs was generating 1 less entry to the configuration

the run params (outlined at the end of the tutorial) is incorrect, the startingCore should be -1 of the intended core (it should be the position of the entry of the first data worker of the cluster in the data…Addrs config).

Hope this helps!

Tinou · June 30, 2024, 11:43pm

Thank you, my errors are corrected with the updated guide. ( 0 32, 31 32, 63 32, 95 32…etc )
Thanks a lot Abo and Beepboop !

Topic		Replies	Views
Can we have a proper tutorial for multi workers? Node Running question , answered	2	552	September 16, 2024
Address already in use error with cluster Node Running question , answered	6	184	October 5, 2024
Can someone help me with a large cluster setup? I have 32 servers, each with 64 cores Node Running question , unanswered	0	295	October 6, 2024
How to Run Nodes in a Cluster Node Running	35	4837	November 22, 2024
V1.4.20-p1 Release Notes Node Running release-notes	3	1096	June 30, 2024

Master node can't connect to worker after upgrading to 1.4.20.1

Related topics