fix(catalog): Fix catalog refresh_state deadlock when running multiple replicas

getProcessableEntities uses SELECT ... FOR UPDATE SKIP LOCKED to prevent concurrent processors from selecting the same rows, but it was called with a raw Knex instance instead of a transaxction. This meant the row locks were released immediately after the SELECT, before the subsequent UPDATE executed - making SKIP LOCKED ineffective and allowing multiple replicas to update overlapping rows, causing PostgreSQL deadlock (error 40P01). Wrapping the call in a transaction ensures the locks are held through the UPDATE, so concurrent replicas correctly skip already-claimed rows.

Signed-off-by: Michael Walsh <walshmichael310@gmail.com>
This commit is contained in:
Michael Walsh
2026-03-20 17:19:09 +01:00
parent 9d851ea7c6
commit 375b546fa1
2 changed files with 9 additions and 2 deletions
@@ -0,0 +1,5 @@
---
'@backstage/plugin-catalog-backend': patch
---
Fixed a deadlock in the catalog processing loop that occurred when running multiple replicas. The `getProcessableEntities` method used `SELECT ... FOR UPDATE SKIP LOCKED` to prevent concurrent processors from claiming the same rows, but the call was not wrapped in a transaction, so the row locks were released before the subsequent `UPDATE` executed. This allowed multiple replicas to select and update overlapping rows, causing PostgreSQL deadlock errors (code 40P01).
@@ -143,8 +143,10 @@ export class DefaultCatalogProcessingEngine {
loadTasks: async count => {
try {
const { items } =
await this.processingDatabase.getProcessableEntities(this.knex, {
processBatchSize: count,
await this.processingDatabase.transaction(async tx => {
return this.processingDatabase.getProcessableEntities(tx, {
processBatchSize: count,
});
});
return items;
} catch (error) {