catalog-backend: drive paginated entity ordering from search-by-key (#34162)
* draft: restructure entities() ordering to drive from search-by-key When a sort field is specified, build the query so the search table filtered by that key is the driving relation, instead of left-joining search onto final_entities and sorting after. The planner can then walk the (key, value, entity_id) index in already-sorted order and short circuit on LIMIT. Measured against a production replica, the catalog UI's default "first page of components ordered by metadata.name" query goes from ~940 ms to ~8 ms. See PR description for the full numbers. Known caveats this draft does not yet address: - Entities lacking the order field are excluded; the previous shape put them at the end with NULLS LAST. A UNION ALL pattern can preserve the old semantics. - Multi-field order falls back to the OLD shape (only the first field is taken when present today; subsequent fields acted as tie-breakers). Restoring tie-breakers needs additional joins or a CTE. - queryEntities (/entities/by-query) is left untouched; the same optimization applies but the CTE/cursor structure makes the rewrite more involved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Fredrik Adelöw <freben@spotify.com> * two-phase entities() ordering: fast path + NULLs-last fallback When an order field is specified, run a fast path first that drives from the search-by-key index (excluding entities without the field). If that path doesn't produce enough rows to cover offset+limit+1, run a fallback that picks up the no-field entities in entity_id order. This preserves NULLs-LAST semantics while keeping the fast plan for the common case where every entity has the order field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Fredrik Adelöw <freben@spotify.com> * fix: multi-field order fallback + MySQL quoting in entities() Fall back to the original LEFT JOIN shape when multiple order fields are specified, since tie-breaking on secondary fields inherently requires materialization. The fast INNER-JOIN-driven path is used only for single-field order (the typical UI case). Fix the phase 2 NOT EXISTS clause to use knex's ?? identifier escaping instead of hardcoded double-quotes, which broke on MySQL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Fredrik Adelöw <freben@spotify.com> * catalog-backend: fix phase 2 ordering and update changeset Fix two issues raised in review: - Phase 2 (entities lacking the sort field) now always sorts by entity_id ASC regardless of the primary sort direction, matching the original NULLS-LAST behaviour where the NULL group was always entity_id ASC. - Update changeset to describe the actual two-phase behaviour (NULLS LAST preserved) instead of the stale description that said entities without the field are excluded. Signed-off-by: Fredrik Adelöw <freben@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> * catalog-backend: apply offset when limit is undefined in runOrderedEntitiesQuery Previously the fast-path returned the full combined array when no limit was specified, silently ignoring any pagination offset. The old implementation always pushed offset to SQL independently of limit. Restore parity by slicing the combined array by the offset even when limit is absent. Signed-off-by: Fredrik Adelöw <freben@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> * catalog-backend: add pagination and phase-boundary tests for entities() ordering Add two new test cases that cover the parts of runOrderedEntitiesQuery not exercised before: - "paginates correctly through single-field ordering": exercises limit, offset, and hasNextPage through the fast path (Phase 1 only) across all DB engines. - "paginates across the Phase 1 / Phase 2 boundary": exercises the case where the requested page straddles entities that have the sort field (Phase 1) and those that do not (Phase 2), and verifies that Phase 2 entities are always ordered ASC by entity_id regardless of the primary sort direction. Also fixes a double-space caught by prettier in DefaultEntitiesCatalog.ts. Signed-off-by: Fredrik Adelöw <freben@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> * catalog-backend: treat null sort-field value the same as a missing sort field buildEntitySearch stores value=NULL for entity fields that are explicitly null or exceed MAX_VALUE_LENGTH. Previously, Phase 1 included those rows via the INNER JOIN (key matches, value IS NULL), causing them to sort ahead of entities that have no row for the key at all — changing semantics vs the old LEFT JOIN shape where both cases landed in the same NULLS-LAST bucket. Fix Phase 1 to require order_0.value IS NOT NULL, and fix Phase 2's NOT EXISTS to check for no non-null value (value IS NOT NULL) so that both null-valued and missing-key entities are collected in Phase 2 and ordered together by entity_id ASC. Adds a regression test covering an entity with spec.b=null alongside one with no spec.b, asserting both appear after sorted entities regardless of direction. Signed-off-by: Fredrik Adelöw <freben@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> --------- Signed-off-by: Fredrik Adelöw <freben@spotify.com> Signed-off-by: Fredrik Adelöw <freben@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,5 @@
|
||||
---
|
||||
'@backstage/plugin-catalog-backend': patch
|
||||
---
|
||||
|
||||
Restructured the entity listing endpoint so that, when a sort field is specified, the search-by-key index drives the query rather than being side-joined onto `final_entities`. This lets PostgreSQL walk the `(key, value, entity_id)` index in already-sorted order and short-circuit on `LIMIT`, reducing typical broad-filter paginated list times from seconds to milliseconds. Entities that lack the sort field still appear at the end of sorted results (NULLS LAST semantics preserved), ordered by `entity_id`.
|
||||
@@ -741,6 +741,185 @@ describe('DefaultEntitiesCatalog', () => {
|
||||
).resolves.toEqual(['n4', 'n3', 'n1', 'n2']);
|
||||
},
|
||||
);
|
||||
|
||||
it.each(databases.eachSupportedId())(
|
||||
'paginates correctly through single-field ordering, %p',
|
||||
async databaseId => {
|
||||
await createDatabase(databaseId);
|
||||
|
||||
// All four entities have metadata.name — fast path uses Phase 1 only
|
||||
for (const name of ['n1', 'n2', 'n3', 'n4']) {
|
||||
await addEntityToSearch({
|
||||
apiVersion: 'a',
|
||||
kind: 'k',
|
||||
metadata: { name },
|
||||
});
|
||||
}
|
||||
|
||||
const catalog = new DefaultEntitiesCatalog({
|
||||
database: knex,
|
||||
logger: mockServices.logger.mock(),
|
||||
stitcher,
|
||||
});
|
||||
|
||||
async function page(limit: number, offset?: number): Promise<string[]> {
|
||||
const r = await catalog.entities({
|
||||
order: [{ field: 'metadata.name', order: 'asc' }],
|
||||
pagination: { limit, offset },
|
||||
credentials: mockCredentials.none(),
|
||||
});
|
||||
return entitiesResponseToObjects(r.entities).map(
|
||||
e => e!.metadata.name,
|
||||
);
|
||||
}
|
||||
|
||||
async function hasNext(
|
||||
limit: number,
|
||||
offset?: number,
|
||||
): Promise<boolean> {
|
||||
const r = await catalog.entities({
|
||||
order: [{ field: 'metadata.name', order: 'asc' }],
|
||||
pagination: { limit, offset },
|
||||
credentials: mockCredentials.none(),
|
||||
});
|
||||
return r.pageInfo.hasNextPage;
|
||||
}
|
||||
|
||||
await expect(page(2)).resolves.toEqual(['n1', 'n2']);
|
||||
expect(await hasNext(2)).toBe(true);
|
||||
|
||||
await expect(page(2, 2)).resolves.toEqual(['n3', 'n4']);
|
||||
expect(await hasNext(2, 2)).toBe(false);
|
||||
|
||||
await expect(page(2, 1)).resolves.toEqual(['n2', 'n3']);
|
||||
expect(await hasNext(2, 1)).toBe(true);
|
||||
|
||||
await expect(page(100)).resolves.toEqual(['n1', 'n2', 'n3', 'n4']);
|
||||
},
|
||||
);
|
||||
|
||||
it.each(databases.eachSupportedId())(
|
||||
'paginates across the Phase 1 / Phase 2 boundary, %p',
|
||||
async databaseId => {
|
||||
await createDatabase(databaseId);
|
||||
|
||||
// n1 and n2 have spec.b (Phase 1); n3 and n4 do not (Phase 2).
|
||||
// Explicit UIDs pin Phase 2 ordering (entity_id ASC) to a known sequence.
|
||||
await addEntityToSearch({
|
||||
apiVersion: 'a',
|
||||
kind: 'k',
|
||||
metadata: { name: 'n1' },
|
||||
spec: { b: 'alpha' },
|
||||
});
|
||||
await addEntityToSearch({
|
||||
apiVersion: 'a',
|
||||
kind: 'k',
|
||||
metadata: { name: 'n2' },
|
||||
spec: { b: 'beta' },
|
||||
});
|
||||
await addEntityToSearch({
|
||||
apiVersion: 'a',
|
||||
kind: 'k',
|
||||
metadata: { name: 'n3', uid: 'aaaa-n3' },
|
||||
});
|
||||
await addEntityToSearch({
|
||||
apiVersion: 'a',
|
||||
kind: 'k',
|
||||
metadata: { name: 'n4', uid: 'bbbb-n4' },
|
||||
});
|
||||
|
||||
const catalog = new DefaultEntitiesCatalog({
|
||||
database: knex,
|
||||
logger: mockServices.logger.mock(),
|
||||
stitcher,
|
||||
});
|
||||
|
||||
async function page(
|
||||
limit: number,
|
||||
offset?: number,
|
||||
order: 'asc' | 'desc' = 'asc',
|
||||
): Promise<string[]> {
|
||||
const r = await catalog.entities({
|
||||
order: [{ field: 'spec.b', order }],
|
||||
pagination: { limit, offset },
|
||||
credentials: mockCredentials.none(),
|
||||
});
|
||||
return entitiesResponseToObjects(r.entities).map(
|
||||
e => e!.metadata.name,
|
||||
);
|
||||
}
|
||||
|
||||
// Page that straddles the Phase 1 / Phase 2 boundary
|
||||
await expect(page(3)).resolves.toEqual(['n1', 'n2', 'n3']);
|
||||
await expect(page(3, 1)).resolves.toEqual(['n2', 'n3', 'n4']);
|
||||
|
||||
// Phase 2 entities (no spec.b) are always ordered ASC by entity_id
|
||||
// regardless of the primary sort direction
|
||||
await expect(page(4, 0, 'asc')).resolves.toEqual([
|
||||
'n1',
|
||||
'n2',
|
||||
'n3',
|
||||
'n4',
|
||||
]);
|
||||
await expect(page(4, 0, 'desc')).resolves.toEqual([
|
||||
'n2',
|
||||
'n1',
|
||||
'n3',
|
||||
'n4',
|
||||
]);
|
||||
},
|
||||
);
|
||||
|
||||
it.each(databases.eachSupportedId())(
|
||||
'treats a null sort-field value the same as a missing sort field, %p',
|
||||
async databaseId => {
|
||||
await createDatabase(databaseId);
|
||||
|
||||
// n1 has spec.b with a real value (Phase 1)
|
||||
// n2 has spec.b explicitly set to null — buildEntitySearch stores value=NULL
|
||||
// n3 has no spec.b at all
|
||||
// n2 and n3 must both end up in the NULLS-LAST bucket (Phase 2),
|
||||
// ordered by entity_id, regardless of primary sort direction.
|
||||
await addEntityToSearch({
|
||||
apiVersion: 'a',
|
||||
kind: 'k',
|
||||
metadata: { name: 'n1' },
|
||||
spec: { b: 'alpha' },
|
||||
});
|
||||
await addEntityToSearch({
|
||||
apiVersion: 'a',
|
||||
kind: 'k',
|
||||
metadata: { name: 'n2', uid: 'aaaa-n2' },
|
||||
spec: { b: null },
|
||||
});
|
||||
await addEntityToSearch({
|
||||
apiVersion: 'a',
|
||||
kind: 'k',
|
||||
metadata: { name: 'n3', uid: 'bbbb-n3' },
|
||||
});
|
||||
|
||||
const catalog = new DefaultEntitiesCatalog({
|
||||
database: knex,
|
||||
logger: mockServices.logger.mock(),
|
||||
stitcher,
|
||||
});
|
||||
|
||||
async function page(order: 'asc' | 'desc'): Promise<string[]> {
|
||||
const r = await catalog.entities({
|
||||
order: [{ field: 'spec.b', order }],
|
||||
credentials: mockCredentials.none(),
|
||||
});
|
||||
return entitiesResponseToObjects(r.entities).map(
|
||||
e => e!.metadata.name,
|
||||
);
|
||||
}
|
||||
|
||||
// n2 (null value) and n3 (missing key) must sort together after n1,
|
||||
// ordered by entity_id ASC, regardless of primary direction
|
||||
await expect(page('asc')).resolves.toEqual(['n1', 'n2', 'n3']);
|
||||
await expect(page('desc')).resolves.toEqual(['n1', 'n2', 'n3']);
|
||||
},
|
||||
);
|
||||
});
|
||||
|
||||
describe('entitiesBatch', () => {
|
||||
|
||||
@@ -122,64 +122,89 @@ export class DefaultEntitiesCatalog implements EntitiesCatalog {
|
||||
async entities(request?: EntitiesRequest): Promise<EntitiesResponse> {
|
||||
const db = this.database;
|
||||
const { limit, offset } = parsePagination(request?.pagination);
|
||||
const primaryOrder = request?.order?.[0];
|
||||
|
||||
let entitiesQuery =
|
||||
db<DbFinalEntitiesRow>('final_entities').select('final_entities.*');
|
||||
|
||||
request?.order?.forEach(({ field }, index) => {
|
||||
const alias = `order_${index}`;
|
||||
entitiesQuery = entitiesQuery.leftOuterJoin(
|
||||
{ [alias]: 'search' },
|
||||
function search(inner) {
|
||||
inner
|
||||
.on(`${alias}.entity_id`, 'final_entities.entity_id')
|
||||
.andOn(`${alias}.key`, db.raw('?', [field]));
|
||||
},
|
||||
// When exactly one order field is specified we run a two-phase fetch
|
||||
// that drives from the search-by-key index for that field. The index
|
||||
// walks rows in already-sorted order, so the planner can short-circuit
|
||||
// on LIMIT instead of having to materialise and sort the full filtered
|
||||
// set. Phase 2 appends entities that lack the order field (NULLS LAST)
|
||||
// and is skipped when phase 1 already fills the request.
|
||||
//
|
||||
// Multi-field ordering falls back to the original LEFT JOIN shape
|
||||
// because tie-breaking on a second field requires materialisation of
|
||||
// the full set anyway.
|
||||
const useFastPath = primaryOrder && (request?.order?.length ?? 0) <= 1;
|
||||
let rows: DbFinalEntitiesRow[];
|
||||
if (useFastPath) {
|
||||
rows = await this.runOrderedEntitiesQuery(
|
||||
request!,
|
||||
primaryOrder,
|
||||
limit,
|
||||
offset,
|
||||
);
|
||||
});
|
||||
|
||||
entitiesQuery = entitiesQuery.whereNotNull('final_entities.final_entity');
|
||||
|
||||
if (request?.filter) {
|
||||
entitiesQuery = applyEntityFilterToQuery({
|
||||
filter: request.filter,
|
||||
targetQuery: entitiesQuery,
|
||||
onEntityIdField: 'final_entities.entity_id',
|
||||
knex: db,
|
||||
});
|
||||
}
|
||||
|
||||
request?.order?.forEach(({ order }, index) => {
|
||||
if (db.client.config.client === 'pg') {
|
||||
// pg correctly orders by the column value and handling nulls in one go
|
||||
entitiesQuery = entitiesQuery.orderBy([
|
||||
{ column: `order_${index}.value`, order, nulls: 'last' },
|
||||
]);
|
||||
} else {
|
||||
// sqlite and mysql translate the above statement ONLY into "order by (value is null) asc"
|
||||
// no matter what the order is, for some reason, so we have to manually add back the statement
|
||||
// that translates to "order by value <order>" while avoiding to give an order
|
||||
entitiesQuery = entitiesQuery.orderBy([
|
||||
{ column: `order_${index}.value`, order: undefined, nulls: 'last' },
|
||||
{ column: `order_${index}.value`, order },
|
||||
]);
|
||||
}
|
||||
});
|
||||
|
||||
if (!request?.order) {
|
||||
entitiesQuery = entitiesQuery.orderBy('final_entities.entity_ref', 'asc'); // default sort
|
||||
} else {
|
||||
entitiesQuery.orderBy('final_entities.entity_id', 'asc'); // stable sort
|
||||
let entitiesQuery =
|
||||
db<DbFinalEntitiesRow>('final_entities').select('final_entities.*');
|
||||
|
||||
request?.order?.forEach(({ field }, index) => {
|
||||
const alias = `order_${index}`;
|
||||
entitiesQuery = entitiesQuery.leftOuterJoin(
|
||||
{ [alias]: 'search' },
|
||||
function search(inner) {
|
||||
inner
|
||||
.on(`${alias}.entity_id`, 'final_entities.entity_id')
|
||||
.andOn(`${alias}.key`, db.raw('?', [field]));
|
||||
},
|
||||
);
|
||||
});
|
||||
|
||||
entitiesQuery = entitiesQuery.whereNotNull('final_entities.final_entity');
|
||||
|
||||
if (request?.filter) {
|
||||
entitiesQuery = applyEntityFilterToQuery({
|
||||
filter: request.filter,
|
||||
targetQuery: entitiesQuery,
|
||||
onEntityIdField: 'final_entities.entity_id',
|
||||
knex: db,
|
||||
});
|
||||
}
|
||||
|
||||
if (request?.order) {
|
||||
request.order.forEach(({ order }, index) => {
|
||||
if (db.client.config.client === 'pg') {
|
||||
entitiesQuery = entitiesQuery.orderBy([
|
||||
{ column: `order_${index}.value`, order, nulls: 'last' },
|
||||
]);
|
||||
} else {
|
||||
entitiesQuery = entitiesQuery.orderBy([
|
||||
{
|
||||
column: `order_${index}.value`,
|
||||
order: undefined,
|
||||
nulls: 'last',
|
||||
},
|
||||
{ column: `order_${index}.value`, order },
|
||||
]);
|
||||
}
|
||||
});
|
||||
entitiesQuery.orderBy('final_entities.entity_id', 'asc');
|
||||
} else {
|
||||
entitiesQuery = entitiesQuery.orderBy(
|
||||
'final_entities.entity_ref',
|
||||
'asc',
|
||||
);
|
||||
}
|
||||
|
||||
if (limit !== undefined) {
|
||||
entitiesQuery = entitiesQuery.limit(limit + 1);
|
||||
}
|
||||
if (offset !== undefined) {
|
||||
entitiesQuery = entitiesQuery.offset(offset);
|
||||
}
|
||||
|
||||
rows = await entitiesQuery;
|
||||
}
|
||||
|
||||
if (limit !== undefined) {
|
||||
entitiesQuery = entitiesQuery.limit(limit + 1);
|
||||
}
|
||||
if (offset !== undefined) {
|
||||
entitiesQuery = entitiesQuery.offset(offset);
|
||||
}
|
||||
|
||||
let rows = await entitiesQuery;
|
||||
let pageInfo: DbPageInfo;
|
||||
if (limit === undefined || rows.length <= limit) {
|
||||
pageInfo = { hasNextPage: false };
|
||||
@@ -211,6 +236,107 @@ export class DefaultEntitiesCatalog implements EntitiesCatalog {
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Two-phase fetch used when the caller has specified an order field.
|
||||
* See entities() for a longer description of the rationale.
|
||||
*/
|
||||
private async runOrderedEntitiesQuery(
|
||||
request: EntitiesRequest,
|
||||
primaryOrder: EntityOrder,
|
||||
limit: number | undefined,
|
||||
offset: number | undefined,
|
||||
): Promise<DbFinalEntitiesRow[]> {
|
||||
const db = this.database;
|
||||
const isPg = db.client.config.client === 'pg';
|
||||
const wantedRows =
|
||||
limit === undefined ? Number.MAX_SAFE_INTEGER : (offset ?? 0) + limit + 1;
|
||||
|
||||
const applyFilter = <T extends object>(
|
||||
query: Knex.QueryBuilder<T>,
|
||||
): Knex.QueryBuilder<T> => {
|
||||
if (!request.filter) {
|
||||
return query;
|
||||
}
|
||||
return applyEntityFilterToQuery({
|
||||
filter: request.filter,
|
||||
targetQuery: query,
|
||||
onEntityIdField: 'final_entities.entity_id',
|
||||
knex: db,
|
||||
});
|
||||
};
|
||||
|
||||
// Phase 1 -- entities that have a non-NULL value for the order field.
|
||||
// Rows where the key exists but value IS NULL (e.g. the entity field is
|
||||
// explicitly null, or exceeded MAX_VALUE_LENGTH in buildEntitySearch) are
|
||||
// excluded here so they fall through to Phase 2 and sort in the same
|
||||
// NULLS-LAST bucket as entities that have no row for the key at all —
|
||||
// preserving the semantics of the previous LEFT JOIN approach.
|
||||
let withField = db('search as order_0')
|
||||
.innerJoin(
|
||||
'final_entities',
|
||||
'final_entities.entity_id',
|
||||
'order_0.entity_id',
|
||||
)
|
||||
.where('order_0.key', primaryOrder.field)
|
||||
.whereNotNull('order_0.value')
|
||||
.whereNotNull('final_entities.final_entity')
|
||||
.select<DbFinalEntitiesRow[]>('final_entities.*');
|
||||
withField = applyFilter(withField);
|
||||
withField = isPg
|
||||
? withField.orderBy([
|
||||
{ column: 'order_0.value', order: primaryOrder.order, nulls: 'last' },
|
||||
{ column: 'final_entities.entity_id', order: 'asc' },
|
||||
])
|
||||
: withField.orderBy([
|
||||
{ column: 'order_0.value', order: undefined, nulls: 'last' },
|
||||
{ column: 'order_0.value', order: primaryOrder.order },
|
||||
{ column: 'final_entities.entity_id', order: 'asc' },
|
||||
]);
|
||||
if (wantedRows < Number.MAX_SAFE_INTEGER) {
|
||||
withField = withField.limit(wantedRows);
|
||||
}
|
||||
const withFieldRows = await withField;
|
||||
|
||||
// If phase 1 already covered everything we asked for, skip the second
|
||||
// phase entirely. This is the common UI case where every entity in the
|
||||
// filtered set has the order field.
|
||||
if (withFieldRows.length >= wantedRows) {
|
||||
const skip = offset ?? 0;
|
||||
return withFieldRows.slice(skip, skip + (limit ?? wantedRows) + 1);
|
||||
}
|
||||
|
||||
// Phase 2 -- entities that lack the order field, appended after.
|
||||
let withoutField = db<DbFinalEntitiesRow>('final_entities')
|
||||
.select<DbFinalEntitiesRow[]>('final_entities.*')
|
||||
.whereNotNull('final_entities.final_entity')
|
||||
.whereNotExists(qb =>
|
||||
qb
|
||||
.from('search')
|
||||
.where('search.entity_id', db.ref('final_entities.entity_id'))
|
||||
.andWhere('search.key', primaryOrder.field)
|
||||
.whereNotNull('search.value'),
|
||||
);
|
||||
withoutField = applyFilter(withoutField);
|
||||
withoutField = withoutField.orderBy(
|
||||
'final_entities.entity_id',
|
||||
'asc', // NULL group always stable-sorted ASC regardless of primary direction
|
||||
);
|
||||
if (limit !== undefined) {
|
||||
// Phase 2 only contributes the rows that phase 1 didn't cover.
|
||||
const remaining =
|
||||
wantedRows - Math.min(withFieldRows.length, (offset ?? 0) + limit + 1);
|
||||
withoutField = withoutField.limit(Math.max(0, remaining));
|
||||
}
|
||||
const withoutFieldRows = await withoutField;
|
||||
|
||||
const combined = [...withFieldRows, ...withoutFieldRows];
|
||||
if (limit === undefined) {
|
||||
return combined.slice(offset ?? 0);
|
||||
}
|
||||
const skip = offset ?? 0;
|
||||
return combined.slice(skip, skip + limit + 1);
|
||||
}
|
||||
|
||||
async entitiesBatch(
|
||||
request: EntitiesBatchRequest,
|
||||
): Promise<EntitiesBatchResponse> {
|
||||
|
||||
Reference in New Issue
Block a user