2.8 KiB
2.8 KiB
PostgreSQL WAL Corruption Recovery Guide
Symptoms
PostgreSQL container crashes on startup with logs showing:
LOG: unexpected pageaddr X/Y in WAL segment ...
LOG: invalid checkpoint record
PANIC: could not locate a valid checkpoint record at ...
LOG: startup process (PID N) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure
The container restarts repeatedly, each time hitting the same error.
Cause
The Write-Ahead Log (WAL) was corrupted by an unclean shutdown (power loss, host crash, force kill, etc.). PostgreSQL cannot find a valid checkpoint to resume from.
What You Risk Losing
- Committed data: Safe. It is already written to the data files.
- Uncommitted transactions from the moment of the crash: Lost. These were only in WAL.
- Recent changes that were committed but not yet checkpointed: Usually safe, but there is a small risk of inconsistency.
Recovery Procedure
1. Stop the Crashing Container
cd /path/to/postgres/service
docker compose down
2. Run pg_resetwal
This resets the WAL and forces a clean start.
If your data is in a named Docker volume (e.g., pgdata):
docker run --rm \
-v pgdata:/var/lib/postgresql \
--user postgres \
postgres:18 \
pg_resetwal -f /var/lib/postgresql/18/docker
Adjust the path
/var/lib/postgresql/18/dockerto match yourPGDATAsetting.
If your data is in a bind mount (e.g., ./data):
docker run --rm \
-v $(pwd)/data:/var/lib/postgresql/data \
--user postgres \
postgres:18 \
pg_resetwal -f /var/lib/postgresql/data
3. Start the Database
docker compose up -d
4. Verify
docker compose logs --tail=20
docker inspect --format='{{.State.Health.Status}}' postgres
You should see:
LOG: database system is ready to accept connections
And the health status should be healthy.
Prevention
- Ensure graceful shutdowns:
docker compose downinstead ofdocker kill - Use a UPS if running on bare metal to avoid power-loss crashes
- Keep backups of important data volumes
- Consider setting
restart: unless-stoppedinstead ofalwaysto prevent rapid crash loops
When NOT to Use This Fix
Do not use pg_resetwal if:
- You have a recent base backup and WAL archive — restore from backup instead
- You suspect data file corruption (not just WAL corruption)
- You can recover by other means (e.g., starting from a replication standby)
If unsure, copy the data directory somewhere safe before running pg_resetwal.
One-Liner for Future Emergencies
If you're sure it's WAL corruption and you know your setup:
docker compose down && \
docker run --rm -v pgdata:/var/lib/postgresql --user postgres postgres:18 pg_resetwal -f /var/lib/postgresql/18/docker && \
docker compose up -d