catalog-backend: drive paginated entity ordering from search-by-key (#34162)

* draft: restructure entities() ordering to drive from search-by-key

When a sort field is specified, build the query so the search table
filtered by that key is the driving relation, instead of left-joining
search onto final_entities and sorting after. The planner can then walk
the (key, value, entity_id) index in already-sorted order and short
circuit on LIMIT.

Measured against a production replica, the catalog UI's default
"first page of components ordered by metadata.name" query goes from
~940 ms to ~8 ms. See PR description for the full numbers.

Known caveats this draft does not yet address:
- Entities lacking the order field are excluded; the previous shape
  put them at the end with NULLS LAST. A UNION ALL pattern can
  preserve the old semantics.
- Multi-field order falls back to the OLD shape (only the first field
  is taken when present today; subsequent fields acted as
  tie-breakers). Restoring tie-breakers needs additional joins or a
  CTE.
- queryEntities (/entities/by-query) is left untouched; the same
  optimization applies but the CTE/cursor structure makes the rewrite
  more involved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Fredrik Adelöw <freben@spotify.com>

* two-phase entities() ordering: fast path + NULLs-last fallback

When an order field is specified, run a fast path first that drives
from the search-by-key index (excluding entities without the field).
If that path doesn't produce enough rows to cover offset+limit+1, run
a fallback that picks up the no-field entities in entity_id order.
This preserves NULLs-LAST semantics while keeping the fast plan for
the common case where every entity has the order field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Fredrik Adelöw <freben@spotify.com>

* fix: multi-field order fallback + MySQL quoting in entities()

Fall back to the original LEFT JOIN shape when multiple order fields
are specified, since tie-breaking on secondary fields inherently
requires materialization. The fast INNER-JOIN-driven path is used only
for single-field order (the typical UI case).

Fix the phase 2 NOT EXISTS clause to use knex's ?? identifier escaping
instead of hardcoded double-quotes, which broke on MySQL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Fredrik Adelöw <freben@spotify.com>

* catalog-backend: fix phase 2 ordering and update changeset

Fix two issues raised in review:

- Phase 2 (entities lacking the sort field) now always sorts by
  entity_id ASC regardless of the primary sort direction, matching the
  original NULLS-LAST behaviour where the NULL group was always
  entity_id ASC.

- Update changeset to describe the actual two-phase behaviour (NULLS
  LAST preserved) instead of the stale description that said entities
  without the field are excluded.

Signed-off-by: Fredrik Adelöw <freben@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* catalog-backend: apply offset when limit is undefined in runOrderedEntitiesQuery

Previously the fast-path returned the full combined array when no limit
was specified, silently ignoring any pagination offset. The old
implementation always pushed offset to SQL independently of limit.
Restore parity by slicing the combined array by the offset even when
limit is absent.

Signed-off-by: Fredrik Adelöw <freben@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* catalog-backend: add pagination and phase-boundary tests for entities() ordering

Add two new test cases that cover the parts of runOrderedEntitiesQuery not
exercised before:

- "paginates correctly through single-field ordering": exercises limit, offset,
  and hasNextPage through the fast path (Phase 1 only) across all DB engines.

- "paginates across the Phase 1 / Phase 2 boundary": exercises the case where
  the requested page straddles entities that have the sort field (Phase 1) and
  those that do not (Phase 2), and verifies that Phase 2 entities are always
  ordered ASC by entity_id regardless of the primary sort direction.

Also fixes a double-space caught by prettier in DefaultEntitiesCatalog.ts.

Signed-off-by: Fredrik Adelöw <freben@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* catalog-backend: treat null sort-field value the same as a missing sort field

buildEntitySearch stores value=NULL for entity fields that are explicitly
null or exceed MAX_VALUE_LENGTH. Previously, Phase 1 included those rows via
the INNER JOIN (key matches, value IS NULL), causing them to sort ahead of
entities that have no row for the key at all — changing semantics vs the old
LEFT JOIN shape where both cases landed in the same NULLS-LAST bucket.

Fix Phase 1 to require order_0.value IS NOT NULL, and fix Phase 2's NOT EXISTS
to check for no non-null value (value IS NOT NULL) so that both null-valued
and missing-key entities are collected in Phase 2 and ordered together by
entity_id ASC.

Adds a regression test covering an entity with spec.b=null alongside one with
no spec.b, asserting both appear after sorted entities regardless of direction.

Signed-off-by: Fredrik Adelöw <freben@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Signed-off-by: Fredrik Adelöw <freben@spotify.com>
Signed-off-by: Fredrik Adelöw <freben@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Fredrik Adelöw
2026-05-12 20:20:14 +02:00
committed by GitHub
parent ab690eea30
commit add5d1ace7
3 changed files with 363 additions and 53 deletions
+5
View File
@@ -0,0 +1,5 @@
---
'@backstage/plugin-catalog-backend': patch
---
Restructured the entity listing endpoint so that, when a sort field is specified, the search-by-key index drives the query rather than being side-joined onto `final_entities`. This lets PostgreSQL walk the `(key, value, entity_id)` index in already-sorted order and short-circuit on `LIMIT`, reducing typical broad-filter paginated list times from seconds to milliseconds. Entities that lack the sort field still appear at the end of sorted results (NULLS LAST semantics preserved), ordered by `entity_id`.
@@ -741,6 +741,185 @@ describe('DefaultEntitiesCatalog', () => {
).resolves.toEqual(['n4', 'n3', 'n1', 'n2']);
},
);
it.each(databases.eachSupportedId())(
'paginates correctly through single-field ordering, %p',
async databaseId => {
await createDatabase(databaseId);
// All four entities have metadata.name — fast path uses Phase 1 only
for (const name of ['n1', 'n2', 'n3', 'n4']) {
await addEntityToSearch({
apiVersion: 'a',
kind: 'k',
metadata: { name },
});
}
const catalog = new DefaultEntitiesCatalog({
database: knex,
logger: mockServices.logger.mock(),
stitcher,
});
async function page(limit: number, offset?: number): Promise<string[]> {
const r = await catalog.entities({
order: [{ field: 'metadata.name', order: 'asc' }],
pagination: { limit, offset },
credentials: mockCredentials.none(),
});
return entitiesResponseToObjects(r.entities).map(
e => e!.metadata.name,
);
}
async function hasNext(
limit: number,
offset?: number,
): Promise<boolean> {
const r = await catalog.entities({
order: [{ field: 'metadata.name', order: 'asc' }],
pagination: { limit, offset },
credentials: mockCredentials.none(),
});
return r.pageInfo.hasNextPage;
}
await expect(page(2)).resolves.toEqual(['n1', 'n2']);
expect(await hasNext(2)).toBe(true);
await expect(page(2, 2)).resolves.toEqual(['n3', 'n4']);
expect(await hasNext(2, 2)).toBe(false);
await expect(page(2, 1)).resolves.toEqual(['n2', 'n3']);
expect(await hasNext(2, 1)).toBe(true);
await expect(page(100)).resolves.toEqual(['n1', 'n2', 'n3', 'n4']);
},
);
it.each(databases.eachSupportedId())(
'paginates across the Phase 1 / Phase 2 boundary, %p',
async databaseId => {
await createDatabase(databaseId);
// n1 and n2 have spec.b (Phase 1); n3 and n4 do not (Phase 2).
// Explicit UIDs pin Phase 2 ordering (entity_id ASC) to a known sequence.
await addEntityToSearch({
apiVersion: 'a',
kind: 'k',
metadata: { name: 'n1' },
spec: { b: 'alpha' },
});
await addEntityToSearch({
apiVersion: 'a',
kind: 'k',
metadata: { name: 'n2' },
spec: { b: 'beta' },
});
await addEntityToSearch({
apiVersion: 'a',
kind: 'k',
metadata: { name: 'n3', uid: 'aaaa-n3' },
});
await addEntityToSearch({
apiVersion: 'a',
kind: 'k',
metadata: { name: 'n4', uid: 'bbbb-n4' },
});
const catalog = new DefaultEntitiesCatalog({
database: knex,
logger: mockServices.logger.mock(),
stitcher,
});
async function page(
limit: number,
offset?: number,
order: 'asc' | 'desc' = 'asc',
): Promise<string[]> {
const r = await catalog.entities({
order: [{ field: 'spec.b', order }],
pagination: { limit, offset },
credentials: mockCredentials.none(),
});
return entitiesResponseToObjects(r.entities).map(
e => e!.metadata.name,
);
}
// Page that straddles the Phase 1 / Phase 2 boundary
await expect(page(3)).resolves.toEqual(['n1', 'n2', 'n3']);
await expect(page(3, 1)).resolves.toEqual(['n2', 'n3', 'n4']);
// Phase 2 entities (no spec.b) are always ordered ASC by entity_id
// regardless of the primary sort direction
await expect(page(4, 0, 'asc')).resolves.toEqual([
'n1',
'n2',
'n3',
'n4',
]);
await expect(page(4, 0, 'desc')).resolves.toEqual([
'n2',
'n1',
'n3',
'n4',
]);
},
);
it.each(databases.eachSupportedId())(
'treats a null sort-field value the same as a missing sort field, %p',
async databaseId => {
await createDatabase(databaseId);
// n1 has spec.b with a real value (Phase 1)
// n2 has spec.b explicitly set to null — buildEntitySearch stores value=NULL
// n3 has no spec.b at all
// n2 and n3 must both end up in the NULLS-LAST bucket (Phase 2),
// ordered by entity_id, regardless of primary sort direction.
await addEntityToSearch({
apiVersion: 'a',
kind: 'k',
metadata: { name: 'n1' },
spec: { b: 'alpha' },
});
await addEntityToSearch({
apiVersion: 'a',
kind: 'k',
metadata: { name: 'n2', uid: 'aaaa-n2' },
spec: { b: null },
});
await addEntityToSearch({
apiVersion: 'a',
kind: 'k',
metadata: { name: 'n3', uid: 'bbbb-n3' },
});
const catalog = new DefaultEntitiesCatalog({
database: knex,
logger: mockServices.logger.mock(),
stitcher,
});
async function page(order: 'asc' | 'desc'): Promise<string[]> {
const r = await catalog.entities({
order: [{ field: 'spec.b', order }],
credentials: mockCredentials.none(),
});
return entitiesResponseToObjects(r.entities).map(
e => e!.metadata.name,
);
}
// n2 (null value) and n3 (missing key) must sort together after n1,
// ordered by entity_id ASC, regardless of primary direction
await expect(page('asc')).resolves.toEqual(['n1', 'n2', 'n3']);
await expect(page('desc')).resolves.toEqual(['n1', 'n2', 'n3']);
},
);
});
describe('entitiesBatch', () => {
@@ -122,64 +122,89 @@ export class DefaultEntitiesCatalog implements EntitiesCatalog {
async entities(request?: EntitiesRequest): Promise<EntitiesResponse> {
const db = this.database;
const { limit, offset } = parsePagination(request?.pagination);
const primaryOrder = request?.order?.[0];
let entitiesQuery =
db<DbFinalEntitiesRow>('final_entities').select('final_entities.*');
request?.order?.forEach(({ field }, index) => {
const alias = `order_${index}`;
entitiesQuery = entitiesQuery.leftOuterJoin(
{ [alias]: 'search' },
function search(inner) {
inner
.on(`${alias}.entity_id`, 'final_entities.entity_id')
.andOn(`${alias}.key`, db.raw('?', [field]));
},
// When exactly one order field is specified we run a two-phase fetch
// that drives from the search-by-key index for that field. The index
// walks rows in already-sorted order, so the planner can short-circuit
// on LIMIT instead of having to materialise and sort the full filtered
// set. Phase 2 appends entities that lack the order field (NULLS LAST)
// and is skipped when phase 1 already fills the request.
//
// Multi-field ordering falls back to the original LEFT JOIN shape
// because tie-breaking on a second field requires materialisation of
// the full set anyway.
const useFastPath = primaryOrder && (request?.order?.length ?? 0) <= 1;
let rows: DbFinalEntitiesRow[];
if (useFastPath) {
rows = await this.runOrderedEntitiesQuery(
request!,
primaryOrder,
limit,
offset,
);
});
entitiesQuery = entitiesQuery.whereNotNull('final_entities.final_entity');
if (request?.filter) {
entitiesQuery = applyEntityFilterToQuery({
filter: request.filter,
targetQuery: entitiesQuery,
onEntityIdField: 'final_entities.entity_id',
knex: db,
});
}
request?.order?.forEach(({ order }, index) => {
if (db.client.config.client === 'pg') {
// pg correctly orders by the column value and handling nulls in one go
entitiesQuery = entitiesQuery.orderBy([
{ column: `order_${index}.value`, order, nulls: 'last' },
]);
} else {
// sqlite and mysql translate the above statement ONLY into "order by (value is null) asc"
// no matter what the order is, for some reason, so we have to manually add back the statement
// that translates to "order by value <order>" while avoiding to give an order
entitiesQuery = entitiesQuery.orderBy([
{ column: `order_${index}.value`, order: undefined, nulls: 'last' },
{ column: `order_${index}.value`, order },
]);
}
});
if (!request?.order) {
entitiesQuery = entitiesQuery.orderBy('final_entities.entity_ref', 'asc'); // default sort
} else {
entitiesQuery.orderBy('final_entities.entity_id', 'asc'); // stable sort
let entitiesQuery =
db<DbFinalEntitiesRow>('final_entities').select('final_entities.*');
request?.order?.forEach(({ field }, index) => {
const alias = `order_${index}`;
entitiesQuery = entitiesQuery.leftOuterJoin(
{ [alias]: 'search' },
function search(inner) {
inner
.on(`${alias}.entity_id`, 'final_entities.entity_id')
.andOn(`${alias}.key`, db.raw('?', [field]));
},
);
});
entitiesQuery = entitiesQuery.whereNotNull('final_entities.final_entity');
if (request?.filter) {
entitiesQuery = applyEntityFilterToQuery({
filter: request.filter,
targetQuery: entitiesQuery,
onEntityIdField: 'final_entities.entity_id',
knex: db,
});
}
if (request?.order) {
request.order.forEach(({ order }, index) => {
if (db.client.config.client === 'pg') {
entitiesQuery = entitiesQuery.orderBy([
{ column: `order_${index}.value`, order, nulls: 'last' },
]);
} else {
entitiesQuery = entitiesQuery.orderBy([
{
column: `order_${index}.value`,
order: undefined,
nulls: 'last',
},
{ column: `order_${index}.value`, order },
]);
}
});
entitiesQuery.orderBy('final_entities.entity_id', 'asc');
} else {
entitiesQuery = entitiesQuery.orderBy(
'final_entities.entity_ref',
'asc',
);
}
if (limit !== undefined) {
entitiesQuery = entitiesQuery.limit(limit + 1);
}
if (offset !== undefined) {
entitiesQuery = entitiesQuery.offset(offset);
}
rows = await entitiesQuery;
}
if (limit !== undefined) {
entitiesQuery = entitiesQuery.limit(limit + 1);
}
if (offset !== undefined) {
entitiesQuery = entitiesQuery.offset(offset);
}
let rows = await entitiesQuery;
let pageInfo: DbPageInfo;
if (limit === undefined || rows.length <= limit) {
pageInfo = { hasNextPage: false };
@@ -211,6 +236,107 @@ export class DefaultEntitiesCatalog implements EntitiesCatalog {
};
}
/**
* Two-phase fetch used when the caller has specified an order field.
* See entities() for a longer description of the rationale.
*/
private async runOrderedEntitiesQuery(
request: EntitiesRequest,
primaryOrder: EntityOrder,
limit: number | undefined,
offset: number | undefined,
): Promise<DbFinalEntitiesRow[]> {
const db = this.database;
const isPg = db.client.config.client === 'pg';
const wantedRows =
limit === undefined ? Number.MAX_SAFE_INTEGER : (offset ?? 0) + limit + 1;
const applyFilter = <T extends object>(
query: Knex.QueryBuilder<T>,
): Knex.QueryBuilder<T> => {
if (!request.filter) {
return query;
}
return applyEntityFilterToQuery({
filter: request.filter,
targetQuery: query,
onEntityIdField: 'final_entities.entity_id',
knex: db,
});
};
// Phase 1 -- entities that have a non-NULL value for the order field.
// Rows where the key exists but value IS NULL (e.g. the entity field is
// explicitly null, or exceeded MAX_VALUE_LENGTH in buildEntitySearch) are
// excluded here so they fall through to Phase 2 and sort in the same
// NULLS-LAST bucket as entities that have no row for the key at all —
// preserving the semantics of the previous LEFT JOIN approach.
let withField = db('search as order_0')
.innerJoin(
'final_entities',
'final_entities.entity_id',
'order_0.entity_id',
)
.where('order_0.key', primaryOrder.field)
.whereNotNull('order_0.value')
.whereNotNull('final_entities.final_entity')
.select<DbFinalEntitiesRow[]>('final_entities.*');
withField = applyFilter(withField);
withField = isPg
? withField.orderBy([
{ column: 'order_0.value', order: primaryOrder.order, nulls: 'last' },
{ column: 'final_entities.entity_id', order: 'asc' },
])
: withField.orderBy([
{ column: 'order_0.value', order: undefined, nulls: 'last' },
{ column: 'order_0.value', order: primaryOrder.order },
{ column: 'final_entities.entity_id', order: 'asc' },
]);
if (wantedRows < Number.MAX_SAFE_INTEGER) {
withField = withField.limit(wantedRows);
}
const withFieldRows = await withField;
// If phase 1 already covered everything we asked for, skip the second
// phase entirely. This is the common UI case where every entity in the
// filtered set has the order field.
if (withFieldRows.length >= wantedRows) {
const skip = offset ?? 0;
return withFieldRows.slice(skip, skip + (limit ?? wantedRows) + 1);
}
// Phase 2 -- entities that lack the order field, appended after.
let withoutField = db<DbFinalEntitiesRow>('final_entities')
.select<DbFinalEntitiesRow[]>('final_entities.*')
.whereNotNull('final_entities.final_entity')
.whereNotExists(qb =>
qb
.from('search')
.where('search.entity_id', db.ref('final_entities.entity_id'))
.andWhere('search.key', primaryOrder.field)
.whereNotNull('search.value'),
);
withoutField = applyFilter(withoutField);
withoutField = withoutField.orderBy(
'final_entities.entity_id',
'asc', // NULL group always stable-sorted ASC regardless of primary direction
);
if (limit !== undefined) {
// Phase 2 only contributes the rows that phase 1 didn't cover.
const remaining =
wantedRows - Math.min(withFieldRows.length, (offset ?? 0) + limit + 1);
withoutField = withoutField.limit(Math.max(0, remaining));
}
const withoutFieldRows = await withoutField;
const combined = [...withFieldRows, ...withoutFieldRows];
if (limit === undefined) {
return combined.slice(offset ?? 0);
}
const skip = offset ?? 0;
return combined.slice(skip, skip + limit + 1);
}
async entitiesBatch(
request: EntitiesBatchRequest,
): Promise<EntitiesBatchResponse> {