PSA: 2.0 Launch Steps

0xquilkker · September 1, 2024, 8:53am

will the bridge reopen at the same time？

Tyga · September 1, 2024, 10:43pm

This is what you’re asking about.

abc · September 3, 2024, 7:04pm

An attack was discovered which would have enabled an attacker to produce proofs which would allow a malicious user to create arbitrarily large transcript proofs that could rapidly inflate storage state to a point where token issuance would be halted for the current generation. No risk of invalid fund creation was possible from this. We are working towards an alternative integration to the MPC-in-the-head proposer that will decrease proofs to a constant size, which will resolve and enable more powerful use cases. An updated ETA will be announced in a few days once we feel confident in this adjusted approach.

To recap: a vulnerability was found last minute that would’ve enabled a very sophisticated attacker to grief the network by preventing new token issuance for proof of meaningful work. Fortunately Cassie caught the vulnerability in time and is working on a fix now.

As exhibited by this issue, this is a very complex launch and every precaution must be taken which leads to delays. We all hope that the network can be launched soon, but I’m personally glad that we prioritize a safe launch.

cassie · September 6, 2024, 8:43pm

Going to add more color to the work in progress, what’s taking the time to getting to a decision of what the updated ETA is, where next steps are heading, and why.

TL;DR: The offline prover system was vulnerable to a specialized spam attack that could dramatically bloat storage space, which could at a minimum take nodes offline from their prover rings, at a maximum increase stored data to where generational issuance for the network is effectively stopped. Some solutions are being considered with various tradeoffs, each with their own impacts.

Problem: MPC-in-the-head is a really powerful form of proof, because it shares the MPC technique we use for performing compute on the network, which reduces overall dependencies and thus is a net benefit for security (fewer moving pieces = fewer attack vectors). MPC-in-the-head proofs, however, have the tendency to be quite large. We considered the tradeoff of large proofs to be fine contrasted against the fact our global state is maintained with single slot finality, so we can drop proofs which are no longer needed (e.g. A → A’ → A’’ → A’‘’, we only need the latest for A’‘’). As a consequence of runaway proof sizes relative to stored data, these proofs were a factor in overall network state size measurement, and therefore consequentially impact the issuance rate of the network. Due to their size, this could not be decoupled. For expected initial behaviors (mostly QUIL movement) on the network, the impact of this inclusion in calculation is negligible. An attacker on the other hand, could construct a clever attack which drastically bloats the proof size relative to data impacted, in such a way that the ratio of proof to data it altered is astronomical. In traditional computer attack terms, this is the proof size equivalent of a zip bomb. Over the past few days, I have been cycling through and evaluating options (many thanks to the cryptographers over the past few days who have lent their advice and expertise on alternative routes). I’d like to take a moment to present what those options are, what their advantages and disadvantages are, and what the most pragmatic route to wrapping up 2.0 will be.

Option 1: MPC for online, Risc0 for proofs

Advantages

Battle tested prover system
Switches target language for development from bespoke golang subset to broader rust subset
Brings us closer, faster to supporting all EVM targets for trustless bridging

Disadvantages

Harder to develop for
Most substantial change and impactful to timeline
Prover times are awful on commodity hardware, this would centralize provers immensely and run afoul of our intended outcome of easy and fair participation
Instruction sets are not aligned, would require work to align MPC compiler to work with RV32IM

Option 2: MPC for online, SP1 for proofs

Advantages:

Roughly the same advantages as Risc0
Faster proof times than Risc0 (still slow)

Disadvantages:

Roughly the same disadvantages as Risc0
no support for continuations or parallelization

Option 3: Keep full prover system as is, find solution for reducing size of proofs through a recursive compression technique

Advantages:

Obscenely faster than the above two.
Fits in nicely with the tools already at our disposal
I have found zero literature on this, solving this would be greenfield, but not impossible

Disadvantages:

We have taken conservative stances to avoid completely novel cryptography, e.g. is why PCAS signing is shelved until peer review

Option 4: Impose a tx fee system from day one, rather than optional, proof size counts towards this, and institute a per-tx execution cap.

Advantages

Immediately stomps on the attack with having an upper bound
Expensive compute becomes expensive outright (the data greedy model remains default, but miners get a bonus of tx fees on top)
Is the fastest route to getting 2.0 out, and we can work towards Option 3 as a milestone to lower fees.

Disadvantages

Has impacts on execution (streaming MPC would necessarily be dead in the water for now, full circuit garbling must be upfront)
Fee markets would necessarily become a part of protocol instead of out-of-band, but are constrained to shards

Concluding Remarks

At this time, options 1 and 2 have been evaluated, and they simply are not fast enough for general purpose. Option 3 remains an open question, and is left as a milestone objective following Option 4. Option 4 from these choices appears to be the fastest path forward, and will require the addition of three protocol changes, in order of time cost from greatest to least:

Dynamic fee markets
Fee market competition in prover ring precedence (to avoid squatted/monopolized rings)
Incorporating domain separation of the proof outputs of a given prover (previously constrained only to domain separation of the choice of prover) to later support compressed outputs from the prover

I am scoping the work required to complete these features, and will issue an updated ETA for 2.0 rollout following this. I have received a tremendous amount of support, understanding, patience, and grace from most of the members of the community in this process, and I am eternally grateful to you all. For folks who are focused strictly on “wen”, I understand the frustration, and encourage you to invest the time constructively by investigating solutions for Option 3 so we can accelerate through future milestones more rapidly. Crypto is a deep, dark, forest, and as evidenced in the past two months alone, millions of dollars have been hacked in projects who have had multiple audits from credible auditors, rushing carelessly to a finish line for the sake of “wen” is a surefire way to end up another tombstone in the history of projects.

I am going to review this in deeper detail on the scheduled stream to be simulcasted on Twitch and Unlonely at 2024-09-09T01:30:00Z to walk through the problem in depth, plus a deeper dive on developer information and integration.

Scofield · September 7, 2024, 9:42am

If the delay continues, will there be too many additional issuances? Many people are approaching the 700k limit and will soon reach a production of 1 billion

cassie · September 9, 2024, 4:07am

If the delay continues, will there be too many additional issuances? Many people are approaching the 700k limit and will soon reach a production of 1 billion

Ran the latest stats before this evening’s dev call, we’re still within expected issuance, also seems there’s a good number of stragglers who haven’t upgraded their nodes (a weird number on .18 or earlier, no idea why, so despite appearing in the manifests they’re not even earning), and quite a few (roughly 13,000 of the ~29,000 reporting nodes) at or below 24 cores (half of those are at 16 cores or less). The 48 core average is skewed towards a few hundred with absolute beast clusters.

Prover Rings and Seniority

The seniority topic was highlighted of greatest interest, most likely due to the number of folks who have clustered their nodes to maximize rewards from the .19/.20/.21 changes (leaving some peer ids with a loss of seniority due to inactivity).

Seniority extends from the start of the network’s Dawn phase (from late last year), evaluated by participation time in the network to today (the .19/.20/.21 series more precisely measured by iterations). On 2.0, this is realtime based on the proof intervals within the prover rings themselves.

Quilibrium is designed on the notion of logical shards in the form of a tiering structure of three levels (global - 256 shards, core - 65,536 shards, data - 16,777,216*64 byte chunk shards) under a combinatorial of these three global shards, e.g. C(256, 3)*C(65536, 24)*1GB) that will inevitably become wholly dedicated data shards as the network grows. It is expected that initially the network in the default “data greedy” mode will opt for the largest shard coverage and thus global, and as limits are reached for the peer allocations under their coverage, they will shrink their shard commitment farther down the tree. Economically driven client forks will likely do this sooner, but the mainline node repo will likely not have this behavior implemented, or at least not as a default.

Sequencing the prover inclusion

For a given ring, at their given tier (e.g. one of 256 of the highest level - global), a prover emits a message to declare their intent to join a prover ring. This message is sequenced by the next sequencer’s frame, and they may begin participating in proving the subsequent frame when their elected slot is requested.

Preventing malicious/unfair sequencing

RPM is not only crucial for ensuring all messages to include in a frame are indeed included (otherwise the misbehaving mixer is identified and kicked), but also ensures frontrunning is not possible where choice is ambiguous. Say for example, all provers wishing to declare membership in a ring are perfectly equal in seniority. Without a mixnet, the selected producer of the sequenced frame with no mixnet involvement could maliciously order (in Ethereum parlance, reorder transactions to “frontrun”) to choose desired outcomes of who is included. With the mixnet, it is impossible, as the ordering is assuredly randomized, with only deference in order by seniority. (For those who have been following development the longest, this was one of the biggest reasons why 1.5 needed to be skipped, as RPM was not a part of the release)

Seniority and Inactivity

Inactivity has two different contexts. Example: as of now, no prover rings exist as they are a 2.0 feature. Measurement of seniority is based effectively on historic data in rewards (and iterations from .19/.20/.21). The deeper the seniority, the lesser the inactivity has an impact on prover inclusion – the inactivity basically affects the “age”, in that it is no longer increasing in seniority, but is otherwise non-impactful.

Inactivity in 2.0 when a prover is explicitly in a prover ring however, has a penalty, in which the missed intervals degrade the seniority value of the prover, in a square of the missed intervals. For example, if a prover misses three intervals, the penalty is nine times as impactful as missing one. Restoring presence on the ring will stop the penalty from continuing to accrue, but the penalty will begin anew if the trend continues. When penalties accrue to a significant degree, the provers in the outer rings of the given shard will be promoted in order of precedence. If a prover wishes to retain seniority with no penalty but needs to shut down for some time (upgrades should have no impact, but server hardware migration may take time for it to be worthwhile), they should emit a leave message for the ring to avoid score penalization.

beepboop · September 9, 2024, 5:05pm

If a prover wishes to retain seniority with no penalty but needs to shut down for some time (upgrades should have no impact, but server hardware migration may take time for it to be worthwhile), they should emit a leave message for the ring to avoid score penalization.

How many node software updates can we expect after 2.0?

abc · September 11, 2024, 5:51am

New blog post by Cassie on prover rings and seniority answering some frequently asked questions about the release process: Quilibrium Blog | One Ring to Prove them All

Tyga · September 11, 2024, 8:24am

If you’ve listened to the goals that Cassie talks about in her videos, then the answer to this is, “a lot.”

2.0 is just the starting point.

cassie · September 17, 2024, 7:03am

Quick progress report prior to a fuller detailed post: Things are looking solid with the dynamic fee markets work, along with a strategy around preventing squatted/extorted prover ranges. I’ll have a full blog post up on what to expect re: impacts on how this relates to fees/provers reward-wise sometime in the next 24 hours or so, going to try to split it into a technical part, an economics part, and as close to an ELI5 part as I can muster. Domain separation is an easy next step, and then it’s a re-run of affected test grinds. Holding on to the same ETA as given previously under the last stream, but as stated before, it may be sooner, but if it is, there will be advance notice.

cassie · September 19, 2024, 8:16am

Took a bit longer to condense what was going to be a much longer read, but it is now published:

abc · September 23, 2024, 3:25pm

2 posts were split to a new topic: What are the current daily quil emissions?

cassie · September 24, 2024, 1:08am

From the status page:

2.0 Upgrade: Mainnet

The affected portion of the network design has been resolved (see Quilibrium Blog | Dynamic Fee Markets for details) and is undergoing the previous suite of performance and security tests, the previous announcement’s ETA for 2.0 Mainnet of end of month remains in effect.

cassie · September 26, 2024, 5:30am

Some additional details regarding launch and a few more FAQs have been posted here: Quilibrium Blog | What to Expect When You're Expecting 2.0

abc · September 30, 2024, 5:28pm

New update on the status page by @cassie:

Certain scenarios were identified with the dynamic fee market that can impact performance on the network in ways that could operate as a griefing vector. A root cause is being identified, but release will need to be held back once more. We are setting a final ETA of 10/10, 11:00pm UTC. We greatly appreciate your patience.

To elaborate, the extensive testing that Cassie has been doing after implementing the dynamic fee market turned up a scenario that could lead to a chain halt or kicking out specific provers.

Undoubtedly many people will be disappointed by the delay, but unfortunately this is a showstopper that must be fixed and Cassie is working on it.

My personal take is that Quilibrium is a very ambitious project and it fills me with confidence that such a level of diligence is exerted at every step of the way and no shortcuts are taken that could compromise the soundness of the network.

abc · October 1, 2024, 5:16am

2 posts were merged into an existing topic: Delay Venting Topic

Topic		Replies	Views
PSA: 2.0 10/10 Launch Node Running	6	1587	October 15, 2024
Comprehensive 2.0 FAQ General question , answered	12	5472	September 1, 2024
Reopen claim and bridge discussion General	3	405	September 7, 2024
PSA: Bootstrap runners Node Running	15	424	July 19, 2024
Questions about the release process (team, testers, ETA) General question , answered	1	788	August 16, 2024