Compare commits

..

19 Commits

Author SHA1 Message Date
35c8657f13 add wal recovery guide 2026-05-21 19:12:01 +03:00
c6436d720e change to CF_DNS_API_TOKEN 2026-05-21 19:10:53 +03:00
8e38610b28 correct acme email env var 2026-05-21 19:07:12 +03:00
b1cd0e1050 add .gitignore 2026-05-14 19:45:31 +03:00
46f8b5e0a5 add .gitignore 2026-05-14 19:43:30 +03:00
ea33c80bb6 update network config 2026-05-14 19:43:20 +03:00
793e11699d add .gitignore 2026-05-14 19:43:04 +03:00
8057b502ef add .gitignore 2026-05-14 19:42:50 +03:00
994703974d add .gitignore 2026-05-14 19:42:40 +03:00
bed8333d37 update backend network 2026-05-14 19:42:30 +03:00
50eabae5ab add .gitignore 2026-05-14 19:42:18 +03:00
a2928c888f update backend network 2026-05-14 19:42:13 +03:00
dc0d3801bd add .gitignore 2026-05-14 19:41:58 +03:00
9f3339b6da update backend network 2026-05-14 19:41:50 +03:00
27d61f8ee7 add mcp network 2026-05-14 19:41:28 +03:00
be46801df6 bind traefik to mgmt network 2026-05-14 19:41:21 +03:00
65cff068d6 add .gitignore 2026-05-14 19:40:08 +03:00
f47da92f3c add .gitignore 2026-05-14 19:40:02 +03:00
c3c305ac54 update backend network 2026-05-14 19:39:48 +03:00
17 changed files with 156 additions and 17 deletions

View File

@@ -0,0 +1,107 @@
# PostgreSQL WAL Corruption Recovery Guide
## Symptoms
PostgreSQL container crashes on startup with logs showing:
```
LOG: unexpected pageaddr X/Y in WAL segment ...
LOG: invalid checkpoint record
PANIC: could not locate a valid checkpoint record at ...
LOG: startup process (PID N) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure
```
The container restarts repeatedly, each time hitting the same error.
## Cause
The Write-Ahead Log (WAL) was corrupted by an unclean shutdown (power loss, host crash, force kill, etc.). PostgreSQL cannot find a valid checkpoint to resume from.
## What You Risk Losing
- **Committed data**: Safe. It is already written to the data files.
- **Uncommitted transactions** from the moment of the crash: Lost. These were only in WAL.
- **Recent changes** that were committed but not yet checkpointed: Usually safe, but there is a small risk of inconsistency.
## Recovery Procedure
### 1. Stop the Crashing Container
```bash
cd /path/to/postgres/service
docker compose down
```
### 2. Run `pg_resetwal`
This resets the WAL and forces a clean start.
**If your data is in a named Docker volume (e.g., `pgdata`):**
```bash
docker run --rm \
-v pgdata:/var/lib/postgresql \
--user postgres \
postgres:18 \
pg_resetwal -f /var/lib/postgresql/18/docker
```
> Adjust the path `/var/lib/postgresql/18/docker` to match your `PGDATA` setting.
**If your data is in a bind mount (e.g., `./data`):**
```bash
docker run --rm \
-v $(pwd)/data:/var/lib/postgresql/data \
--user postgres \
postgres:18 \
pg_resetwal -f /var/lib/postgresql/data
```
### 3. Start the Database
```bash
docker compose up -d
```
### 4. Verify
```bash
docker compose logs --tail=20
docker inspect --format='{{.State.Health.Status}}' postgres
```
You should see:
```
LOG: database system is ready to accept connections
```
And the health status should be `healthy`.
## Prevention
- Ensure graceful shutdowns: `docker compose down` instead of `docker kill`
- Use a UPS if running on bare metal to avoid power-loss crashes
- Keep backups of important data volumes
- Consider setting `restart: unless-stopped` instead of `always` to prevent rapid crash loops
## When NOT to Use This Fix
Do **not** use `pg_resetwal` if:
- You have a recent base backup and WAL archive — restore from backup instead
- You suspect data file corruption (not just WAL corruption)
- You can recover by other means (e.g., starting from a replication standby)
If unsure, copy the data directory somewhere safe before running `pg_resetwal`.
## One-Liner for Future Emergencies
If you're sure it's WAL corruption and you know your setup:
```bash
docker compose down && \
docker run --rm -v pgdata:/var/lib/postgresql --user postgres postgres:18 pg_resetwal -f /var/lib/postgresql/18/docker && \
docker compose up -d
```

View File

@@ -15,12 +15,12 @@ services:
env_file:
- .env
networks:
- db
- backend
volumes:
- redis_data:/data
volumes:
redis_data:
name: redis_data
networks:
db:
backend:
external: true

3
frontend/cloudflared/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
.env
tunnel.json
config.yml

View File

@@ -3,6 +3,8 @@ SUBDOMAIN=
# TRAEFIK_USER=
SSL_EMAIL_FILE=/run/secrets/CF_API_EMAIL
CF_API_EMAIL_FILE=/run/secrets/CF_API_EMAIL
CF_API_KEY_FILE=/run/secrets/CF_API_KEY
ACME_EMAIL=
CF_DNS_API_TOKEN_FILE=/run/secrets/CF_DNS_API_TOKEN
SSH_PORT=
TZ=
TZ=
TAG=

2
frontend/traefik/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
.env
traefik_data/

View File

@@ -25,7 +25,7 @@ services:
- "--entryPoints.websecure.forwardedHeaders.trustedIPs=173.245.48.0/20,103.21.244.0/22,103.22.200.0/22,103.31.4.0/22,141.101.64.0/18,108.162.192.0/18,190.93.240.0/20,188.114.96.0/20,197.234.240.0/22,198.41.128.0/17,162.158.0.0/15,104.16.0.0/13,104.24.0.0/14,172.64.0.0/13,131.0.72.0/22"
- "--certificatesresolvers.cloudflare.acme.dnschallenge=true"
- "--certificatesresolvers.cloudflare.acme.dnschallenge.provider=cloudflare"
- "--certificatesresolvers.cloudflare.acme.email=${CF_API_EMAIL_FILE}"
- "--certificatesresolvers.cloudflare.acme.email=${ACME_EMAIL}"
- "--certificatesresolvers.cloudflare.acme.storage=/letsencrypt/acme.json"
labels:
- traefik.enable=true
@@ -48,11 +48,12 @@ services:
- traefik.http.middlewares.traefik_dashboard.headers.STSIncludeSubdomains=true
- traefik.http.middlewares.traefik_dashboard.headers.STSPreload=true
- traefik.http.middlewares.traefik_dashboard.headers.frameDeny=true
- traefik.docker.network=mgmt
env_file:
- .env
secrets:
- CF_API_KEY
- CF_API_EMAIL
- CF_DNS_API_TOKEN
volumes:
- ./traefik_data:/letsencrypt
- /var/run/docker.sock:/var/run/docker.sock:ro
@@ -64,11 +65,12 @@ services:
- webapp
- mgmt
- jump
- mcp
secrets:
CF_API_KEY:
file: .secrets/CF_API_KEY
CF_API_EMAIL:
file: .secrets/CF_API_EMAIL
CF_DNS_API_TOKEN:
file: .secrets/CF_DNS_API_TOKEN
networks:
frontend:
external:
@@ -82,3 +84,6 @@ networks:
jump:
external:
true
mcp:
external:
true

3
local/adguard/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
.env
conf/
work/

1
mgmt/adminer/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
.env

View File

@@ -25,11 +25,11 @@ services:
- .env
networks:
- mgmt
- db
- backend
networks:
mgmt:
external:
true
db:
backend:
external:
true

3
mgmt/authentik/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
certs/
media/
custom-templates/

View File

@@ -13,7 +13,7 @@ services:
- ./custom-templates:/templates
networks:
- mgmt
- db
- backend
labels:
- traefik.enable=true
- traefik.http.routers.$SUBDOMAIN.rule=Host(`${SUBDOMAIN}.${DOMAIN_NAME}`)
@@ -55,7 +55,7 @@ services:
- DB_PASS
user: root
networks:
- db
- backend
volumes:
# - /var/run/docker.sock:/var/run/docker.sock # Optional, only if using external outposts
- ./media:/media
@@ -64,7 +64,7 @@ services:
networks:
mgmt:
external: true
db:
backend:
external: true
secrets:
SECRET_KEY:

View File

@@ -38,14 +38,14 @@ services:
- /etc/localtime:/etc/localtime:ro
networks:
- mgmt
- db
- backend
volumes:
gitea-data:
name: gitea-data
networks:
mgmt:
external: true
db:
backend:
external: true
secrets:
DB_PASS:

2
mgmt/portainer/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
data/
.env

2
mgmt/vaultwarden/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
.env
.recovery_code

2
webapp/n8n/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
.env
local-files

View File

@@ -30,7 +30,8 @@ services:
- ./local-files:/files
networks:
- webapp
- db
- backend
- mcp
secrets:
DB_PASS:
file: .secrets/DB_PASS
@@ -38,7 +39,10 @@ networks:
webapp:
external:
true
db:
backend:
external:
true
mcp:
external:
true
volumes:

3
webapp/navidrome/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
data/
music/
music