Files
Docker-Projects/backend/postgres/guides/WAL_RECOVERY.md
2026-05-21 19:12:01 +03:00

2.8 KiB

PostgreSQL WAL Corruption Recovery Guide

Symptoms

PostgreSQL container crashes on startup with logs showing:

LOG:  unexpected pageaddr X/Y in WAL segment ...
LOG:  invalid checkpoint record
PANIC:  could not locate a valid checkpoint record at ...
LOG:  startup process (PID N) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure

The container restarts repeatedly, each time hitting the same error.

Cause

The Write-Ahead Log (WAL) was corrupted by an unclean shutdown (power loss, host crash, force kill, etc.). PostgreSQL cannot find a valid checkpoint to resume from.

What You Risk Losing

  • Committed data: Safe. It is already written to the data files.
  • Uncommitted transactions from the moment of the crash: Lost. These were only in WAL.
  • Recent changes that were committed but not yet checkpointed: Usually safe, but there is a small risk of inconsistency.

Recovery Procedure

1. Stop the Crashing Container

cd /path/to/postgres/service
docker compose down

2. Run pg_resetwal

This resets the WAL and forces a clean start.

If your data is in a named Docker volume (e.g., pgdata):

docker run --rm \
  -v pgdata:/var/lib/postgresql \
  --user postgres \
  postgres:18 \
  pg_resetwal -f /var/lib/postgresql/18/docker

Adjust the path /var/lib/postgresql/18/docker to match your PGDATA setting.

If your data is in a bind mount (e.g., ./data):

docker run --rm \
  -v $(pwd)/data:/var/lib/postgresql/data \
  --user postgres \
  postgres:18 \
  pg_resetwal -f /var/lib/postgresql/data

3. Start the Database

docker compose up -d

4. Verify

docker compose logs --tail=20
docker inspect --format='{{.State.Health.Status}}' postgres

You should see:

LOG:  database system is ready to accept connections

And the health status should be healthy.

Prevention

  • Ensure graceful shutdowns: docker compose down instead of docker kill
  • Use a UPS if running on bare metal to avoid power-loss crashes
  • Keep backups of important data volumes
  • Consider setting restart: unless-stopped instead of always to prevent rapid crash loops

When NOT to Use This Fix

Do not use pg_resetwal if:

  • You have a recent base backup and WAL archive — restore from backup instead
  • You suspect data file corruption (not just WAL corruption)
  • You can recover by other means (e.g., starting from a replication standby)

If unsure, copy the data directory somewhere safe before running pg_resetwal.

One-Liner for Future Emergencies

If you're sure it's WAL corruption and you know your setup:

docker compose down && \
docker run --rm -v pgdata:/var/lib/postgresql --user postgres postgres:18 pg_resetwal -f /var/lib/postgresql/18/docker && \
docker compose up -d