fix(catalog): Fix catalog refresh_state deadlock when running multiple replicas
getProcessableEntities uses SELECT ... FOR UPDATE SKIP LOCKED to prevent concurrent processors from selecting the same rows, but it was called with a raw Knex instance instead of a transaxction. This meant the row locks were released immediately after the SELECT, before the subsequent UPDATE executed - making SKIP LOCKED ineffective and allowing multiple replicas to update overlapping rows, causing PostgreSQL deadlock (error 40P01). Wrapping the call in a transaction ensures the locks are held through the UPDATE, so concurrent replicas correctly skip already-claimed rows. Signed-off-by: Michael Walsh <walshmichael310@gmail.com>
This commit is contained in:
@@ -0,0 +1,5 @@
|
||||
---
|
||||
'@backstage/plugin-catalog-backend': patch
|
||||
---
|
||||
|
||||
Fixed a deadlock in the catalog processing loop that occurred when running multiple replicas. The `getProcessableEntities` method used `SELECT ... FOR UPDATE SKIP LOCKED` to prevent concurrent processors from claiming the same rows, but the call was not wrapped in a transaction, so the row locks were released before the subsequent `UPDATE` executed. This allowed multiple replicas to select and update overlapping rows, causing PostgreSQL deadlock errors (code 40P01).
|
||||
@@ -143,8 +143,10 @@ export class DefaultCatalogProcessingEngine {
|
||||
loadTasks: async count => {
|
||||
try {
|
||||
const { items } =
|
||||
await this.processingDatabase.getProcessableEntities(this.knex, {
|
||||
processBatchSize: count,
|
||||
await this.processingDatabase.transaction(async tx => {
|
||||
return this.processingDatabase.getProcessableEntities(tx, {
|
||||
processBatchSize: count,
|
||||
});
|
||||
});
|
||||
return items;
|
||||
} catch (error) {
|
||||
|
||||
Reference in New Issue
Block a user