Object size mismatch error

hello,
Hello, I encountered an error message,

panic: L6: 000280.sst: object size mismatch (.config/store/000280.sst): 3899392 (disk) !=  5344592 (MANIFEST)

Viewing the file 000280.sst, it shows January 1, 1970, and the program cannot start properly

# ll .config/store/000280.sst 
-rw-r--r-- 1 root root 3899392 Jan  1  1970 .config/store/000280.sst

Is there a way to ignore this error and let the program start normally
The error message is as follows:

{"level":"info","ts":1720867277.8080835,"caller":"rpc/data_worker_ipc_server.go:92","msg":"data worker listening","address":"/ip4/127.0.0.1/tcp/40000"}
panic: L6: 000280: object size mismatch (.config/store/000280.sst): 3899392 (disk) != 5344592 (MANIFEST)

goroutine 1 [running]:
source.quilibrium.com/quilibrium/monorepo/node/store.NewPebbleDB(0xc00032a0c0)
        /opt/ceremonyclient/node/store/pebble.go:18 +0x8d
source.quilibrium.com/quilibrium/monorepo/node/app.NewNode(0xc00061a550, 0xc000668000)
        /opt/ceremonyclient/node/app/wire_gen.go:72 +0x65
main.main()
        /opt/ceremonyclient/node/main.go:426 +0xae5
{"level":"error","ts":1720867289.786702,"caller":"rpc/data_worker_ipc_server.go:125","msg":"parent process not found,

Hello, it looks like a store file got corrupted somehow. Is it possible that you restored from a backup and something went wrong?

Yes, it is a file recovered from a backup. I don’t know what caused the damage. Is there any way to repair the file

Not sure tbh, I’ve never done this, but maybe you could try to update the MANIFEST file to match the expected size for the sst file and see what happens?

Why this happens:

This, I believe, is caused from one of your MANIFEST file being outdated or being included (when it should have been removed from the backup).

Outdated = your backup didn’t need to delete a MANIFEST file, but also failed to update it to reflect the SST file’s size change

Including an unneeded MANIFEST = as your database grows, some .sst, .log, MANIFEST and OPTIONS files are removed as some are modified to include that data (called consolidation)

If your backup doesn’t remove these unneeded files or track these file changes, then database software attempt to load them back into the node’s database on initialization.

Order of operations

It goes something like this:

  • The database software consolidates an older SST into a newer one
  • It updates the MANIFEST/OPTIONS file(s) to reflect this change
  • Occasionally, MANIFEST/OPTIONS files will also be consolidated into newer ones, and old ones removed
  • Your backup saw it already had the MANIFEST file and didn’t sync it or didn’t delete it from your backup after it was consolidated
  • when you restored and ran your node, it ran through all the MANIFEST files and attempted to restore that state-- however because your file sizes are now changed due to the consolidation, any outdated or previously removed MANIFEST files will see this as a mismatch and fail, as it should (the file shouldn’t be there or the referenced files are not the expected content).

Other considerations

  • It also could be that your SST files were not updated in the backups as well.
  • your backups are changing the files as part of the backup process and your restorations do not take this into account

Solutions:

  1. if you have your previously working node’s store still available (I recommend doing a few comparisons to see if your backed up store matches your current working store state before completely decommissioning a node, just to verify your backups work as intended, use ls -al or equivalent to verify file sizes are there) and then finding a way to create a backup in a way that actually works while you still have the working database files
  2. you can attempt to remove older MANIFEST/OPTION files to see if that works (always work on a copy, so you can restart)
  3. try to figure out what your backup command is doing and seeing if there is something you aren’t accounting for when restoring your files from it

In my case, my backups were not syncing the files after they were created in the backup, meaning that they only were backed up as the they were when the first backup happened and any changes were ignored-- essentially all store updates to files were lost in the backup. Realizing this by evaluating my backup command, I determined that trying my point #2 would be a waste of time and #1 was not an option for most of my nodes with faulty backups already having been decommissioned.

3 Likes

Okay, thank you both

1 Like