Why do I get same rewards with 64 and 128 cores?

Hi there,

Last week, a few person complained about 128 cores doing the same job than 64 cores servers.
I waited one week after the correction if that would go any different but here we are in 1.4.20 and the results are still pretty similar;
This is quite frustrating cause these monster 128c machines are super powerful and cost 4x times the price of the 64 cores, but have been doing the exact same amount of rewards for the past 2 weeks.

Before 1.4.19, these 128 cores servers were performing crazily good compared to 64 cores servers, and that made sense. If I were to compare then, it was doing triple the amount of Quils. Now, doing the exact same.

The bug correction has not done anything, the rewards are still the same for these monster servers since 1.4.19.
I clearly invested into the “More cores, more rewards” lead, but now it’s a bitter pill to swallow. @cassie, any chance there is still a bug to correct on this ?
I find myself with 5 expensive and powerful 128 cores servers, that just don’t work well since 1.4.19.

This investment is costing a lot, and if you tell me there is nothing we can do about it since 1.4.19, I will just turn them off and cry alone. But hopefully, there is something going on with the code and we can fix that ?

I know I’m not the only one there (alban, barktem, I believe Ozgur still have the issue and a few others).

There’s many details that are missing here from these servers that are needed to fully answer the question at hand – while I have said “more cores, more rewards”, this has also been contingent on the other factor I’ve mentioned which is “not all processors are made the same”: Branch prediction methodology, cache size, tiers of cache, SMT vs non-threaded cores, even the OS and its configuration are all factors that come into play with a given system’s performance. Typically though, the easiest way to assess similar CPUs comes down to the following:

  • Least Likely Does the smaller core processor have the same scale of threading (e.g. a 12 core with SMT – 24 threads, a 24 core with SMT4 – 96 threads)? Hyperthreading has poor performance for data workers, as they do not share any resources for compute – each thread works on its own bit of data.
  • Less likely Are there other processes running on the machine? Some data centers have different monitoring software they pre-load on servers, and if you’re splitting workloads between different providers, some may be inherently slower due to not only the way they provision their hardware (in terms of set clock speed, enable/disable burst, voltage, etc.), but also because they have aggressive monitoring tools that are taking precious CPU cycles.
  • Most likely: is the clock speed of the larger core CPU slower than it’s related lesser core CPU? This tends to happen because there are some scenarios where the tradeoff is preferable. That tradeoff isn’t preferable for Q, and it’s why I have brought up clock speed in previous discussions.

Diving into advanced tuning, there are a few other things you can try:

  • Core pinning: taskset is a tool that allows you to specify the cores being used by a process, preempting the OS scheduler from switching cores on a given process. This isn’t always an issue, and ironically a naive scheduler may actually do better than something more clever, but it depends on far too many factors to list.
  • Clock boost: if your processor supports clock boosting and the provider has it disabled, check with the provider on whether or not they will allow it. I will forewarn, it likely is not going to be allowed if they manually disabled it.

If the 128 core servers are just not coping with the workload like the 64 cores are and none of the above criteria has been a factor (e.g. same clock speed, same SMT basis, same L1/L2/L3 cache allocation per core, same provider, same memory bandwidth/channel usage, same everything), while this would be surprising to see, the answer would be indeed to find better performing hardware.

3 Likes

Thanks for the response.

Clock boosting is already turned on for these servers.
And yes, same SMT basis, same L1/L2/L3 cache allocation per core, same provider, same memory bandwidth/channel usage but yes the 64c are 2.8 ghZ while the 128c are 2.0 ghz.
So I guess I have my answer.
I invested in these when I saw the crazy difference in 1.4.18, but I clearly would have not invested in these if I had knew… :smiling_face_with_tear:

If anything changes please update this topic. I was just looking and ready to pull the trigger on dual 7702 or 7742 but after reading about this I’m having to rethink what path I wanna take if this is the outcome some are getting.

@Orlando I have nothing more to say.
Clock boosting is already on.
And Q is already taking all the cores from these monster machines.

Therefore, I just have this conclusion that I invested a lot in these monster servers for nothing. I could have comforted myself with the really high rewards during 1.4.18, that would have pay off the price of these servers, but that too got modified in a way that these servers just got me loosing a lot of money.

1 Like

I don’t think that you are the only one to do this.

When the rewards posted did not reflect the actual amount due to growing pains in the network due to so many people flooding the network and causing reward calculations issues (the initial calculation was incomplete) a lot of people went out and bought more machines (myself included).

I think that this is due to a number of reasons, but ultimately, when people make decisions on limited, incomplete data, there can be varied outcomes. Annie Duke in her book Thinking in Bets puts it like this (paraphrasing) sometimes, in the presence of not knowing all the facts or details a person can make good decisions that produce bad outcomes.

Unfortunately, for us, we were incorrect in our calculations because we didn’t know to how to account for this network infancy in our model or know how each machine would actually perform in real-time.

We are all figuring these things out as we go as nobody really knows how everything will go when what is theorized is implemented. Some of us hit a home run or two when we ran nodes early. And then struck out on hardware and/or 1.4.18 rewards.

That said, those interested in the network must learn from that and move on from what is just the start of the first of hopefully many games.

3 Likes