PostgreSQL Scheduler Hangs and Errors, Solved: Interfacing with pg_locks and Connection Management

If you've been spending time debugging a scheduler that suddenly stops or throws errors, this post might help. I encountered an issue where a previously working scheduler would permanently hang under specific circumstances. Ultimately, the root cause was found in lock management and connection handling.

Attempts and Pitfalls

Initially, to improve the scheduler's stability, I changed the existing advisory lock mechanism to a leader election method using a lease table. I also introduced a singleton pattern to add a self-healing feature, allowing the scheduler to detect and recover from issues on its own.

However, lock-related problems still occurred intermittently, and the scheduler hangs didn't completely disappear. For deeper debugging, I isolated the connections used by the scheduler into a dedicated asyncpg connection pool to dive deeper into lock-related issues.