Document stream-based search
Signed-off-by: Eric Peterson <ericpeterson@spotify.com>
This commit is contained in:
@@ -0,0 +1,14 @@
|
||||
---
|
||||
'@backstage/plugin-search-backend-node': minor
|
||||
'@backstage/search-common': minor
|
||||
---
|
||||
|
||||
The Backstage Search Platform's indexing process has been rewritten as a stream
|
||||
pipeline in order to improve efficiency and performance on large document sets.
|
||||
|
||||
The concepts of `Collator` and `Decorator` have been replaced with readable and
|
||||
transform object streams (respectively), as well as factory classes to
|
||||
instantiate them.
|
||||
|
||||
Accordingly, the `SearchEngine.index()` method has also been replaced with a
|
||||
`getIndexer()` factory method that resolves to a writable object stream.
|
||||
@@ -0,0 +1,6 @@
|
||||
---
|
||||
'@backstage/plugin-search-backend-module-pg': minor
|
||||
---
|
||||
|
||||
The `PgSearchEngine` implements the new stream-based indexing process expected
|
||||
by the latest `@backstage/search-backend-node`.
|
||||
@@ -0,0 +1,13 @@
|
||||
---
|
||||
'@backstage/plugin-techdocs-backend': patch
|
||||
---
|
||||
|
||||
A `DefaultTechDocsCollatorFactory`, which works with the new stream-based
|
||||
search indexing subsystem, is now available. The `DefaultTechDocsCollator` will
|
||||
continue to be available for those unable to upgrade to the stream-based
|
||||
`@backstage/search-backend-node` (and related packages), however it is now
|
||||
marked as deprecated and will be removed in a future version.
|
||||
|
||||
To upgrade this plugin and the search indexing subsystem in one go, check
|
||||
[this changelog](https://github.com/backstage/backstage/blob/master/packages/create-app/CHANGELOG.md)
|
||||
for necessary changes to your search backend plugin configuration.
|
||||
@@ -0,0 +1,43 @@
|
||||
---
|
||||
'@backstage/create-app': patch
|
||||
---
|
||||
|
||||
The Backstage Search Platform's indexing process has been rewritten as a stream
|
||||
pipeline in order to improve efficiency and performance on large document sets.
|
||||
|
||||
To take advantage of this, upgrade to the latest version of
|
||||
`@backstage/plugin-search-backend-node`, as well as any backend plugins whose
|
||||
collators you are using. Then, make the following changes to your
|
||||
`/packages/backend/src/plugins/search.ts` file:
|
||||
|
||||
```diff
|
||||
-import { DefaultCatalogCollator } from '@backstage/plugin-catalog-backend';
|
||||
-import { DefaultTechDocsCollator } from '@backstage/plugin-techdocs-backend';
|
||||
+import { DefaultCatalogCollatorFactory } from '@backstage/plugin-catalog-backend';
|
||||
+import { DefaultTechDocsCollatorFactory } from '@backstage/plugin-techdocs-backend';
|
||||
|
||||
// ...
|
||||
|
||||
const indexBuilder = new IndexBuilder({ logger, searchEngine });
|
||||
|
||||
indexBuilder.addCollator({
|
||||
defaultRefreshIntervalSeconds: 600,
|
||||
- collator: DefaultCatalogCollator.fromConfig(config, { discovery }),
|
||||
+ factory: DefaultCatalogCollatorFactory.fromConfig(config, { discovery }),
|
||||
});
|
||||
|
||||
indexBuilder.addCollator({
|
||||
defaultRefreshIntervalSeconds: 600,
|
||||
- collator: DefaultTechDocsCollator.fromConfig(config, {
|
||||
+ factory: DefaultTechDocsCollatorFactory.fromConfig(config, {
|
||||
discovery,
|
||||
logger,
|
||||
}),
|
||||
});
|
||||
```
|
||||
|
||||
If you've written custom collators, decorators, or search engines in your
|
||||
Backstage backend instance, you will need to re-implement them as readable,
|
||||
transform, and writable streams respectively (including factory classes for
|
||||
instantiating them). [A how-to guide for refactoring](https://backstage.io/docs/features/search/how-to-guides#rewriting-alpha-style-collators-for-beta)
|
||||
existing implementations is available.
|
||||
@@ -0,0 +1,6 @@
|
||||
---
|
||||
'@backstage/plugin-search-backend-module-elasticsearch': minor
|
||||
---
|
||||
|
||||
The `ElasticSearchSearchEngine` implements the new stream-based indexing
|
||||
process expected by the latest `@backstage/search-backend-node`.
|
||||
@@ -0,0 +1,13 @@
|
||||
---
|
||||
'@backstage/plugin-catalog-backend': patch
|
||||
---
|
||||
|
||||
A `DefaultCatalogCollatorFactory`, which works with the new stream-based
|
||||
search indexing subsystem, is now available. The `DefaultCatalogCollator` will
|
||||
continue to be available for those unable to upgrade to the stream-based
|
||||
`@backstage/search-backend-node` (and related packages), however it is now
|
||||
marked as deprecated and will be removed in a future version.
|
||||
|
||||
To upgrade this plugin and the search indexing subsystem in one go, check
|
||||
[this changelog](https://github.com/backstage/backstage/blob/master/packages/create-app/CHANGELOG.md)
|
||||
for necessary changes to your search backend plugin configuration.
|
||||
@@ -213,6 +213,7 @@ parallelization
|
||||
Patrik
|
||||
Peloton
|
||||
performant
|
||||
Performant
|
||||
plantuml
|
||||
Platformize
|
||||
Podman
|
||||
|
||||
@@ -54,13 +54,14 @@ An index is a collection of such documents of a given type.
|
||||
### Collators
|
||||
|
||||
You need to be able to search something! Collators are the way to define what
|
||||
can be searched. Specifically, they're classes which return documents conforming
|
||||
to a minimum set of fields (including a document title, location, and text), but
|
||||
which can contain any other fields as defined by the collator itself. One
|
||||
collator is responsible for defining and collecting documents of a type.
|
||||
can be searched. Specifically, they're readable object streams of documents that
|
||||
conform to a minimum set of fields (including a document title, location, and
|
||||
text), but which can contain any other fields as defined by the collator itself.
|
||||
One collator is responsible for defining and collecting documents of a type.
|
||||
|
||||
Some plugins, like the Catalog Backend, provide so-called "default" collators
|
||||
which you can use out-of-the-box to start searching across Backstage quickly.
|
||||
Some plugins, like the Catalog Backend, provide so-called "default" collator
|
||||
factories which you can use out-of-the-box to start searching across Backstage
|
||||
quickly.
|
||||
|
||||
### Decorators
|
||||
|
||||
@@ -68,9 +69,15 @@ Sometimes you want to add extra information to a set of documents in your search
|
||||
index that the collator may not be aware of. For example, the Software Catalog
|
||||
knows about software entities, but it may not know about their usage or quality.
|
||||
|
||||
Decorators are classes which can add extra fields to pre-collated documents.
|
||||
This extra metadata could then be used to bias search results or otherwise
|
||||
improve the search experience in your Backstage instance.
|
||||
Decorators are transform streams which sit between a collator (read stream) and
|
||||
an indexer (write stream) during the indexing process. It can be used to add
|
||||
extra fields to documents as they are being collated and indexed. This extra
|
||||
metadata could then be used to bias search results or otherwise improve the
|
||||
search experience in your Backstage instance.
|
||||
|
||||
In addition to adding extra metadata, decorators (like any transform stream) can
|
||||
also be used to remove metadata, filter out, or even add extra documents at
|
||||
index-time.
|
||||
|
||||
### The Scheduler
|
||||
|
||||
|
||||
@@ -48,10 +48,10 @@ const app = createApp({
|
||||
## How to index TechDocs documents
|
||||
|
||||
The TechDocs plugin has supported integrations to Search, meaning that it
|
||||
provides a default collator ready to be used.
|
||||
provides a default collator factory ready to be used.
|
||||
|
||||
The purpose of this guide is to walk you through how to register the
|
||||
[DefaultTechDocsCollator](https://github.com/backstage/backstage/blob/master/plugins/techdocs-backend/src/search/DefaultTechDocsCollator.ts)
|
||||
[DefaultTechDocsCollatorFactory](https://github.com/backstage/backstage/blob/master/plugins/techdocs-backend/src/search/DefaultTechDocsCollatorFactory.ts)
|
||||
in your App, so that you can get TechDocs documents indexed.
|
||||
|
||||
If you have been through the
|
||||
@@ -60,18 +60,19 @@ you should have the `packages/backend/src/plugins/search.ts` file available. If
|
||||
so, you can go ahead and follow this guide - if not, start by going through the
|
||||
getting started guide.
|
||||
|
||||
1. Import the DefaultTechDocsCollator from `@backstage/plugin-techdocs-backend`.
|
||||
1. Import the `DefaultTechDocsCollatorFactory` from
|
||||
`@backstage/plugin-techdocs-backend`.
|
||||
|
||||
```typescript
|
||||
import { DefaultTechDocsCollator } from '@backstage/plugin-techdocs-backend';
|
||||
import { DefaultTechDocsCollatorFactory } from '@backstage/plugin-techdocs-backend';
|
||||
```
|
||||
|
||||
2. Register the DefaultTechDocsCollator with the IndexBuilder.
|
||||
2. Register the `DefaultTechDocsCollatorFactory` with the IndexBuilder.
|
||||
|
||||
```typescript
|
||||
indexBuilder.addCollator({
|
||||
defaultRefreshIntervalSeconds: 600,
|
||||
collator: DefaultTechDocsCollator.fromConfig(config, {
|
||||
factory: DefaultTechDocsCollatorFactory.fromConfig(config, {
|
||||
discovery,
|
||||
logger,
|
||||
tokenManager,
|
||||
@@ -131,3 +132,264 @@ indexBuilder.addCollator({
|
||||
|
||||
As shown above, you can add a catalog entity filter to narrow down what catalog
|
||||
entities are indexed by the search engine.
|
||||
|
||||
## How to migrate from Search Alpha to Beta
|
||||
|
||||
For the purposes of this guide, Search Beta version is defined as:
|
||||
|
||||
- **Search Plugin**: At least `v0.x.y`
|
||||
- **Search Backend Plugin**: At least `v0.x.y`
|
||||
- **Search Backend Node**: At least `v0.x.y`
|
||||
|
||||
In the Beta version, the Search Platform's indexing process has been rewritten
|
||||
as a stream pipeline in order to improve efficiency and performance on large
|
||||
sets of documents.
|
||||
|
||||
If you've not yet extended the Search Platform with custom code, and have
|
||||
instead taken advantage of default collators, decorators, and search engines
|
||||
provided by existing plugins, the migration process is fairly straightforward:
|
||||
|
||||
1. Upgrade to at least version `0.x.y` of
|
||||
`@backstage/plugin-search-backend-node`, as well as any backend plugins whose
|
||||
collators you are using (e.g. at least version `0.x.y` of
|
||||
`@backstage/plugin-catalog-backend` and/or version `0.x.y` of
|
||||
`@backstage/plugin-techdocs-backend`).
|
||||
2. Then, make the following changes to your
|
||||
`/packages/backend/src/plugins/search.ts` file:
|
||||
|
||||
```diff
|
||||
-import { DefaultCatalogCollator } from '@backstage/plugin-catalog-backend';
|
||||
-import { DefaultTechDocsCollator } from '@backstage/plugin-techdocs-backend';
|
||||
+import { DefaultCatalogCollatorFactory } from '@backstage/plugin-catalog-backend';
|
||||
+import { DefaultTechDocsCollatorFactory } from '@backstage/plugin-techdocs-backend';
|
||||
// ...
|
||||
const indexBuilder = new IndexBuilder({ logger, searchEngine });
|
||||
indexBuilder.addCollator({
|
||||
defaultRefreshIntervalSeconds: 600,
|
||||
- collator: DefaultCatalogCollator.fromConfig(config, { discovery }),
|
||||
+ factory: DefaultCatalogCollatorFactory.fromConfig(config, { discovery }),
|
||||
});
|
||||
indexBuilder.addCollator({
|
||||
defaultRefreshIntervalSeconds: 600,
|
||||
- collator: DefaultTechDocsCollator.fromConfig(config, {
|
||||
+ factory: DefaultTechDocsCollatorFactory.fromConfig(config, {
|
||||
discovery,
|
||||
logger,
|
||||
}),
|
||||
});
|
||||
```
|
||||
|
||||
Any custom collators, decorators, or search engine implementations will require
|
||||
minor refactoring. Continue on for details.
|
||||
|
||||
### Rewriting alpha-style collators for beta
|
||||
|
||||
In alpha versions of the Backstage Search Platform, collators were classes that
|
||||
implemented an `execute` method which resolved an `IndexableDocument` array.
|
||||
|
||||
In beta versions, the logic encapsulated by the aforementioned `execute` method
|
||||
is contained within an [object-mode][obj-mode] `Readable` stream where each
|
||||
object pushed onto the stream is of type `IndexableDocument`. Instances of this
|
||||
stream are instantiated by a factory class conforming to the
|
||||
`DocumentCollatorFactory` interface.
|
||||
|
||||
The optimal conversion strategy will vary depending on the collator's logic, but
|
||||
the simplest conversion can follow a process like this:
|
||||
|
||||
1. Rename your collator class to something like `YourCollatorFactory` and update
|
||||
it to implement `DocumentCollatorFactory` instead of `DocumentCollator`.
|
||||
2. Update its `execute` method so that it resolves
|
||||
`AsyncGenerator<YourIndexableDocument>` instead of `YourIndexableDocument[]`.
|
||||
3. Implement `DocumentCollatorFactory`'s `getCollator` method which resolves to
|
||||
`Readable.from(this.execute())` (which is a utility for creating [readable
|
||||
streams][read-stream] from [async generators][async-gen]).
|
||||
|
||||
```ts
|
||||
import { DocumentCollatorFactory } from '@backstage/plugin-search-backend-node';
|
||||
import { Readable } from 'stream';
|
||||
export class YourCollatorFactory implements DocumentCollatorFactory {
|
||||
public readonly type: string = 'your-type';
|
||||
async *execute(): AsyncGenerator<YourIndexableDocument> {
|
||||
const widgets = await this.client.getWidgets();
|
||||
for (const widget of widgets) {
|
||||
yield {
|
||||
title: widget.name,
|
||||
location: widget.url,
|
||||
text: widget.description,
|
||||
};
|
||||
}
|
||||
}
|
||||
getCollator() {
|
||||
return Readable.from(this.execute());
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note: it may be possible to simplify your collator dramatically! If your custom
|
||||
collator was previously using streams under the hood (for example, by reading
|
||||
newline delimited JSON from a local or remote file), you could just expose the
|
||||
stream directly via a simple factory class:
|
||||
|
||||
```ts
|
||||
import { DocumentCollatorFactory } from '@backstage/plugin-search-backend-node';
|
||||
import { createReadStream } from 'fs';
|
||||
import { parse } from '@jsonlines/core';
|
||||
export class YourCollatorFactory implements DocumentCollatorFactory {
|
||||
public readonly type: string = 'your-type';
|
||||
async getCollator() {
|
||||
const parseStream = parse();
|
||||
return createReadStream('./documents.ndjson').pipe(parseStream);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Rewriting alpha-style decorators for beta
|
||||
|
||||
In alpha versions of the Backstage Search Platform, decorators were classes that
|
||||
implemented an `execute` method which took an `IndexableDocument` array as an
|
||||
argument, and resolved a modified array of the same type.
|
||||
|
||||
In beta versions, the logic encapsulated by the aforementioned `execute` method
|
||||
is contained within an object-mode `Transform` stream which reads objects of
|
||||
type `IndexableDocument`, and writes objects of a conforming type. Similar to
|
||||
collators, instances of this stream are instantiated by a factory class
|
||||
conforming to the `DocumentDecoratorFactory` interface.
|
||||
|
||||
Although you can choose to implement a `Transform` stream from scratch, the
|
||||
`@backstage/plugin-search-backend-node` package provides a `DecoratorBase` class
|
||||
in order to simplify the developer experience. With this base class, all that's
|
||||
needed is to transfer your old decorator class logic into the base class' three
|
||||
methods (`initialize`, `decorate`, and `finalize`), and implement the factory
|
||||
class that instantiates the stream:
|
||||
|
||||
```ts
|
||||
import { DecoratorBase } from '@backstage/plugin-search-backend-node';
|
||||
export class YourDecorator extends DecoratorBase {
|
||||
async initialize() {
|
||||
// Setup logic. Performed once before any documents are consumed.
|
||||
}
|
||||
async decorate(
|
||||
document: YourIndexableDocument,
|
||||
): Promise<YourIndexableDocument | YourIndexableDocument[] | undefined> {
|
||||
// Perform transformation logic here.
|
||||
return document;
|
||||
}
|
||||
async finalize() {
|
||||
// Teardown logic. Performed once after all documents have been consumed.
|
||||
}
|
||||
}
|
||||
export class YourDecoratorFactory implements DocumentDecoratorFactory {
|
||||
async getDecorator() {
|
||||
return new YourDecorator();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note the return type of the `decorate` method and how each can be used to
|
||||
different effect.
|
||||
|
||||
- By resolving a single `YourIndexableDocument` object, your decorator can be
|
||||
used to make simple transformations:
|
||||
|
||||
```ts
|
||||
class BooleanWidgetCoolnessDecorator extends DecoratorBase {
|
||||
async decorator(widget) {
|
||||
// Perform a simple, 1:1 transformation.
|
||||
widget.isCool = widget.isCool === 'true' ? true : false;
|
||||
return widget;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- By resolving `undefined`, your decorator can filter out documents which
|
||||
shouldn't be in the index:
|
||||
|
||||
```ts
|
||||
class OnlyCoolWidgetsDecorator extends DecoratorBase {
|
||||
async decorator(widget) {
|
||||
// Perform a simple filter operation.
|
||||
return widget.isCool ? widget : undefined;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- By resolving an array of `YourIndexableDocument` objects, you can generate
|
||||
multiple documents based on the content of one:
|
||||
|
||||
```ts
|
||||
class WidgetByVariantDecorator extends DecoratorBase {
|
||||
async decorator(widget) {
|
||||
// Generate one widget doc per widget variant.
|
||||
return widget.variants.map(variant => {
|
||||
// Each widget doc is the given widget plus a "variant" property
|
||||
// pulled from a widget.variants string array.
|
||||
return {
|
||||
...widget,
|
||||
variant,
|
||||
};
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
In alpha versions, a decorator had access to every `IndexableDocument`
|
||||
simultaneously. This is no longer possible in beta versions (precisely to make
|
||||
the indexing process more efficient and performant). You will need to modify
|
||||
your decorator's logic so that it does not need access to every document at
|
||||
once.
|
||||
|
||||
### Rewriting alpha-style search engines for beta
|
||||
|
||||
Search Engines are responsible for both querying and indexing documents to an
|
||||
underlying search engine technology. While the search engine query interface
|
||||
didn't change between alpha and beta versions, the indexing half of the
|
||||
interface _did_ change.
|
||||
|
||||
In alpha versions of the Backstage Search Platform, a search engine implemented
|
||||
an `index` method which took a `type` and an `IndexableDocument` array and was
|
||||
responsible for writing these documents to the underlying search engine.
|
||||
|
||||
In beta versions, the logic encapsulated by the aforementioned `index` method is
|
||||
contained within an object-mode `Writable` stream which expects objects of type
|
||||
`IndexableDocument`. On the search engine class itself, the `index` method is
|
||||
replaced with a `getIndexer` factory method which still takes the `type`, but
|
||||
resolves an instance of the aforementioned `Writable` stream.
|
||||
|
||||
Although you can choose to implement a `Writable` stream from scratch, the
|
||||
`@backstage/plugin-search-backend-node` package provides a
|
||||
`BatchSearchEngineIndexer` class in order to simplify the developer experience.
|
||||
With this base class, which collects documents in batches of a configurable size
|
||||
on your behalf, all that's needed is to transfer your old `index` method logic
|
||||
into the base class' three methods (`initialize`, `index`, and `finalize`), and
|
||||
implement the factory method that instantiates the stream:
|
||||
|
||||
```ts
|
||||
import { BatchSearchEngineIndexer } from '@backstage/plugin-search-backend-node';
|
||||
import { SearchEngine } from '@backstage/search-common';
|
||||
export class YourSearchEngineIndexer extends BatchSearchEngineIndexer {
|
||||
constructor({ type }: { type: string }) {
|
||||
// Customize the number of documents passed to the index method per batch.
|
||||
super({ batchSize: 500 });
|
||||
// An imaginary search engine indexing client.
|
||||
this.index = new SomeSearchEngineIndex({ indexName: type });
|
||||
}
|
||||
async initialize() {
|
||||
// Setup logic. Performed once before any documents are consumed.
|
||||
}
|
||||
async index(documents: IndexableDocument[]) {
|
||||
await this.index.batchOf(documents);
|
||||
}
|
||||
async finalize() {
|
||||
// Teardown logic. Performed once after all documents have been consumed.
|
||||
}
|
||||
}
|
||||
export class YourSearchEngine implements SearchEngine {
|
||||
async getIndexer(type: string) {
|
||||
return new YourSearchEngineIndexer({ type });
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
[obj-mode]: https://nodejs.org/docs/latest-v14.x/api/stream.html#stream_object_mode
|
||||
[read-stream]: https://nodejs.org/docs/latest-v14.x/api/stream.html#stream_readable_streams
|
||||
[async-gen]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for-await...of#iterating_over_async_generators
|
||||
|
||||
Reference in New Issue
Block a user