Postgres database corruption errors

There are times due to one reason or other Postgres database might report errors like below:

invalid page in block 1 of relation base/*/*

This indicates the one of the databases on the server was corrupted due to mostly a hardware failure e.g. bad sectors in a hard disk. It is always better to try to recover the data or repair the hard disk as soon as this is detected; the chances to recover the data decreases over time. Follow the steps below to recover the databases with as little data loss as possible.

  • Login to the postgresql cli
  • Run the below code one after the other
    • SET zero_damaged_pages=on;
    • VACUUM FULL;
    • VACUUM FREEZE;
    • REINDEX database databasename;

When you run VACCUM commands below warnings will show once the bad sector/data is encountered.

WARNING:  invalid page in block 1 of relation base/*/*; zeroing out page
ERROR:  index "indexname" contains unexpected zero page at block 1
HINT:  Please REINDEX it.

Once the reindex step mentioned above is run, this will fix itself.

Repeat the process for all the databases on the server. Once complete the Postgres database should be back online with minimal data loses.

Below are few advanced articles you can follow to have more granular control of the recovery process.

Tracking Down Database Corruption With psql.

S6: Invalid page / page verification failed (data corrupted)

Replacing Bad Hard Drive in Linux

This is an eventual possibility that one has to deal with someday when dealing with self maintained Linux systems. Due to many possible reasons a hard drive may develop unrecoverable bad sectors. The easiest approach to the solution is described here, not to mention that you have to be a little lucky for the drive to not fail on you in its totality.

If you want to try out your luck, please follow below steps:

  • Buy a new hard drive of equal or greater specifications.
  • Create a USB flash drive or USB hard drive for Clonezilla following instructions given here.
  • Shutdown the system.
  • Connect the new hard drive to the system. Do not remove the bad hard drive yet, as we are going to use that as a source for cloning the disks/partitions to the new hard drive.
  • Now connect the Clonezilla live USB hard drive to the system and boot.
  • System will boot into Clonezilla.
  • Follow the guidelines and select device to device or disk to disk path.
  • On the source disk/partition option select the old bad hard drive/partition.
  • On the destination disk/partition option select the new hard drive/partition.
  • Make selections based on your preference on next few steps.
  • Start the cloning process.
  • This will complete successfully, if Clonezilla reports bad sectors and cannot clone a particular partition, do not worry yet, let the process complete. Clonezilla will now suggest running the command with rescue option.
  • Restart Clonezilla and follow the same process (except now you can select the particular partition that failed in the previous step instead of the whole disk), but on the confirmation step at last, after every option was supplied, select NO to proceed. Clonezilla will save a file with the command it created.
  • Given an option to go to Clonezilla command line, go there.
  • Copy the command that is saved in the file and append the –rescue option and run the command.
  • Clonezilla will report warnings but will complete successfully.
  • Once Clonezilla completes, shutdown the system and remove the Clonezilla USB hard drive from the system.
  • Remove the old bad hard drive from the system.
  • Reboot.
  • The system should now have a valid and good hard drive with few missing data. This is at least a better option when you have to choose between entire loss versus loosing a little.

One of the helpful articles about finding bad sectors on a hard disk is referred below.

How to Check Bad Sectors or Bad Blocks on Hard Disk in Linux?