Backup & Infrastructure CRITICAL — 8 FEB 2026

Database backup failure discovered, emergency backup completed, infrastructure fixes applied. This page documents the incident and current state.

Database Backup Failure

CRITICAL: Daily pg_dump has been producing empty files since backup was first set up. The database has been running without valid SQL backups.

Root Cause

The database is 46GB (80GB+ with indexes). pg_dump hangs due to the Bingo chemical cartridge creating circular dependencies on rxn_bingo_idx_shadow tables. The dump process enters an infinite wait and eventually produces a 0-byte file. Individual table dumps work fine (tested: users 5KB, files 7KB).

Mitigating Factor

The rsync file backup has been working perfectly throughout — only the SQL dump component was broken. This means file uploads, SDF/MOL files, and exported data were always backed up. The risk was limited to database records (users, shares, activity, audit logs, molecule metadata).

Emergency Backup

RESOLVED: Emergency manual backup completed successfully — 8.7GB compressed (6.5GB gzipped) saved to USB storage.
/mnt/usb-backup/backups/discoverant/backup-20260208/postgres/data/

Backup Architecture

ComponentPath / Detail
Backup script/opt/adroit/backups/backup.sh
Cron schedule/etc/cron.d/discoverant-backup — 3:00 AM daily
NAS storage/mnt/nas/backups/discoverant/ — 3.6TB CIFS mount, 14% used
USB storage/mnt/usb-backup/backups/discoverant/ — 1.9TB exFAT, 22% used
DB useradroit_user (not postgres — that role doesn't exist)
Rotation policy7 daily, 4 weekly, 3 monthly
File methodrsync with hard links (incremental) — WORKING
DB methodpg_dumpBROKEN (empty files)

Required Fix: Exclude Bingo Shadow + Reference Tables

The backup script needs to exclude tables that cause pg_dump to hang (Bingo shadow tables) and large reference tables that are reloadable from public sources. This will reduce the dump to user data, project data, shares, audit logs, and configuration — the irreplaceable stuff.

Tables to Exclude from pg_dump

CategoryTablesReason
BLOCKER Bingo shadowrxn_bingo_idx_shadow and relatedCircular dependency causes pg_dump to hang
SureChEMBLsurechembl_*24.4M compounds — reloadable from public source
ChEMBLchembl_* (activities, molecules, compound_structures)2.87M molecules — reloadable
STRINGstring_*13.7M protein interactions — reloadable
Reactomereactome_*Pathway data — reloadable
ORDord_*2.38M reactions — reloadable
PDBpdb_*Protein structures — reloadable
UniProtuniprot_*Protein sequences — reloadable
PubChempubchem_*Reference compounds — reloadable
What MUST be backed up (irreplaceable)
users — accounts, roles, consent, deleted_at
tenants — organisations, sharing policy, domains
files + file_shares — uploads, share records, OTP verification
molecules + reactions + proteins — user project data
projects — user-created projects
guest_activity — engagement tracking
audit_log — compliance trail
security_alerts — anomaly detection records
share_otp_codes — verification codes
reference_library — admin-pushed reference files

Infrastructure Fixes Applied

FixProblemResolutionSeverity
discoverant.service WorkingDirectory pointed to non-existent path — containers wouldn't survive a server reboot Fixed path in systemd unit file CRITICAL
Certbot post-hook Used start instead of reload for nginx — broke nginx entirely on certificate renewal Changed to reload CRITICAL
Postfix exposure Listening on all interfaces — potential open relay / security risk Locked to localhost-only + UFW deny port 25 HIGH

Disk & System Status

MetricValueStatus
Free disk (system)26.3GBWATCH
Docker reclaimable457GB (images + build cache)CLEANUP NEEDED
Containers running7OK
Database size46GB (80GB+ with indexes)INFO
Pending system updates26PENDING
API endpoints284OK

Outstanding TODO

PriorityItemDetail
URGENT Fix backup script pg_dump Exclude Bingo shadow + reference tables. Verify tonight's 3am backup produces non-empty dump.
HIGH Docker disk cleanup 457GB reclaimable. docker system prune or selective image cleanup.
HIGH Apply 26 system updates Pending Ubuntu security patches. Schedule during maintenance window.
MONITOR Verify backup after fix Check 3am run produces valid dump. Test restore on separate container.