diff --git a/docs/integrations/github/discovery--old.md b/docs/integrations/github/discovery--old.md new file mode 100644 index 0000000000..b94cb256c4 --- /dev/null +++ b/docs/integrations/github/discovery--old.md @@ -0,0 +1,376 @@ +--- +id: discovery--old +title: GitHub Discovery +sidebar_label: Discovery +# prettier-ignore +description: Automatically discovering catalog entities from repositories in a GitHub organization +--- + +:::info +This documentation is written for the old backend which has been replaced by [the new backend system](../../backend-system/index.md), being the default since Backstage [version 1.24](../../releases/v1.24.0.md). If have migrated to the new backend system, you may want to read [its own article](./discovery.md) instead. Otherwise, [consider migrating](../../backend-system/building-backends/08-migrating.md)! +::: + +## GitHub Provider + +The GitHub integration has a discovery provider for discovering catalog +entities within a GitHub organization. The provider will crawl the GitHub +organization and register entities matching the configured path. This can be +useful as an alternative to static locations or manually adding things to the +catalog. This is the preferred method for ingesting entities into the catalog. + +## Installation without Events Support + +You will have to add the provider in the catalog initialization code of your +backend. They are not installed by default, therefore you have to add a +dependency on `@backstage/plugin-catalog-backend-module-github` to your backend +package. + +```bash +# From your Backstage root directory +yarn --cwd packages/backend add @backstage/plugin-catalog-backend-module-github +``` + +And then add the entity provider to your catalog builder: + +```ts title="packages/backend/src/plugins/catalog.ts" +/* highlight-add-next-line */ +import { GithubEntityProvider } from '@backstage/plugin-catalog-backend-module-github'; + +export default async function createPlugin( + env: PluginEnvironment, +): Promise { + const builder = await CatalogBuilder.create(env); + /* highlight-add-start */ + builder.addEntityProvider( + GithubEntityProvider.fromConfig(env.config, { + logger: env.logger, + scheduler: env.scheduler, + }), + ); + /* highlight-add-end */ + + // .. +} +``` + +## Installation with Events Support + +_For the legacy backend system, please read the sub-section below._ + +The catalog module for GitHub comes with events support enabled. +This will make it subscribe to its relevant topics (`github.push`) +and expects these events to be published via the `EventsService`. + +Additionally, you should install the +[event router by `events-backend-module-github`](https://github.com/backstage/backstage/tree/master/plugins/events-backend-module-github/README.md) +which will route received events from the generic topic `github` to more specific ones +based on the event type (e.g., `github.push`). + +In order to receive Webhook events by GitHub, you have to decide how you want them +to be ingested into Backstage and published to its `EventsService`. +You can decide between the following options (extensible): + +- [via HTTP endpoint](https://github.com/backstage/backstage/tree/master/plugins/events-backend/README.md) +- [via an AWS SQS queue](https://github.com/backstage/backstage/tree/master/plugins/events-backend-module-aws-sqs/README.md) + +### Legacy Backend System + +Please follow the installation instructions at + +- +- + +Additionally, you need to decide how you want to receive events from external sources like + +- [via HTTP endpoint](https://github.com/backstage/backstage/tree/master/plugins/events-backend/README.md) +- [via an AWS SQS queue](https://github.com/backstage/backstage/tree/master/plugins/events-backend-module-aws-sqs/README.md) + +Set up your provider + +```ts title="packages/backend/src/plugins/catalog.ts" +import { CatalogBuilder } from '@backstage/plugin-catalog-backend'; +/* highlight-add-next-line */ +import { GithubEntityProvider } from '@backstage/plugin-catalog-backend-module-github'; +import { ScaffolderEntitiesProcessor } from '@backstage/plugin-scaffolder-backend'; +import { Router } from 'express'; +import { PluginEnvironment } from '../types'; + +export default async function createPlugin( + env: PluginEnvironment, +): Promise { + const builder = await CatalogBuilder.create(env); + builder.addProcessor(new ScaffolderEntitiesProcessor()); + /* highlight-add-start */ + const githubProvider = GithubEntityProvider.fromConfig(env.config, { + events: env.events, + logger: env.logger, + scheduler: env.scheduler, + }); + builder.addEntityProvider(githubProvider); + /* highlight-add-end */ + const { processingEngine, router } = await builder.build(); + await processingEngine.start(); + return router; +} +``` + +You can check the official docs to [configure your webhook](https://docs.github.com/en/developers/webhooks-and-events/webhooks/creating-webhooks) and to [secure your request](https://docs.github.com/en/developers/webhooks-and-events/webhooks/securing-your-webhooks). The webhook will need to be configured to forward `push` events. + +## Configuration + +To use the discovery provider, you'll need a GitHub integration +[set up](locations.md) with either a [Personal Access Token](../../getting-started/config/authentication.md) or [GitHub Apps](./github-apps.md). For Personal Access Tokens you should pay attention to the [required scopes](https://backstage.io/docs/integrations/github/locations/#token-scopes), where you will need at least the `repo` scope for reading components. For GitHub Apps you will need to grant it the [required permissions](https://backstage.io/docs/integrations/github/github-apps#app-permissions) instead, where you will need at least the `Contents: Read-only` permissions for reading components. + +Then you can add a `github` config to the catalog providers configuration: + +```yaml +catalog: + providers: + github: + # the provider ID can be any camelCase string + providerId: + organization: 'backstage' # string + catalogPath: '/catalog-info.yaml' # string + filters: + branch: 'main' # string + repository: '.*' # Regex + schedule: # same options as in TaskScheduleDefinition + # supports cron, ISO duration, "human duration" as used in code + frequency: { minutes: 30 } + # supports ISO duration, "human duration" as used in code + timeout: { minutes: 3 } + customProviderId: + organization: 'new-org' # string + catalogPath: '/custom/path/catalog-info.yaml' # string + filters: # optional filters + branch: 'develop' # optional string + repository: '.*' # optional Regex + wildcardProviderId: + organization: 'new-org' # string + catalogPath: '/groups/**/*.yaml' # this will search all folders for files that end in .yaml + filters: # optional filters + branch: 'develop' # optional string + repository: '.*' # optional Regex + topicProviderId: + organization: 'backstage' # string + catalogPath: '/catalog-info.yaml' # string + filters: + branch: 'main' # string + repository: '.*' # Regex + topic: 'backstage-exclude' # optional string + topicFilterProviderId: + organization: 'backstage' # string + catalogPath: '/catalog-info.yaml' # string + filters: + branch: 'main' # string + repository: '.*' # Regex + topic: + include: ['backstage-include'] # optional array of strings + exclude: ['experiments'] # optional array of strings + validateLocationsExist: + organization: 'backstage' # string + catalogPath: '/catalog-info.yaml' # string + filters: + branch: 'main' # string + repository: '.*' # Regex + validateLocationsExist: true # optional boolean + visibilityProviderId: + organization: 'backstage' # string + catalogPath: '/catalog-info.yaml' # string + filters: + visibility: + - public + - internal + enterpriseProviderId: + host: ghe.example.net + organization: 'backstage' # string + catalogPath: '/catalog-info.yaml' # string +``` + +This provider supports multiple organizations via unique provider IDs. + +> **Note:** It is possible but certainly not recommended to skip the provider ID level. +> If you do so, `default` will be used as provider ID. + +- **`catalogPath`** _(optional)_: + Default: `/catalog-info.yaml`. + Path where to look for `catalog-info.yaml` files. + You can use wildcards - `*` or `**` - to search the path and/or the filename. + Wildcards cannot be used if the `validateLocationsExist` option is set to `true`. +- **`filters`** _(optional)_: + - **`branch`** _(optional)_: + String used to filter results based on the branch name. + - **`repository`** _(optional)_: + Regular expression used to filter results based on the repository name. + - **`topic`** _(optional)_: + Both of the filters below may be used at the same time but the exclusion filter has the highest priority. + In the example above, a repository with the `backstage-include` topic would still be excluded + if it were also carrying the `experiments` topic. + - **`include`** _(optional)_: + An array of strings used to filter in results based on their associated GitHub topics. + If configured, only repositories with one (or more) topic(s) present in the inclusion filter will be ingested + - **`exclude`** _(optional)_: + An array of strings used to filter out results based on their associated GitHub topics. + If configured, all repositories _except_ those with one (or more) topics(s) present in the exclusion filter will be ingested. + - **`visibility`** _(optional)_: + An array of strings used to filter results based on their visibility. Available options are `private`, `internal`, `public`. If configured (non empty), only repositories with visibility present in the filter will be ingested +- **`host`** _(optional)_: + The hostname of your GitHub Enterprise instance. It must match a host defined in [integrations.github](locations.md). +- **`organization`**: + Name of your organization account/workspace. + If you want to add multiple organizations, you need to add one provider config each. +- **`validateLocationsExist`** _(optional)_: + Whether to validate locations that exist before emitting them. + This option avoids generating locations for catalog info files that do not exist in the source repository. + Defaults to `false`. + Due to limitations in the GitHub API's ability to query for repository objects, this option cannot be used in + conjunction with wildcards in the `catalogPath`. +- **`schedule`**: + - **`frequency`**: + How often you want the task to run. The system does its best to avoid overlapping invocations. + - **`timeout`**: + The maximum amount of time that a single task invocation can take. + - **`initialDelay`** _(optional)_: + The amount of time that should pass before the first invocation happens. + - **`scope`** _(optional)_: + `'global'` or `'local'`. Sets the scope of concurrency control. + +## GitHub API Rate Limits + +GitHub [rate limits](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting) API requests to 5,000 per hour (or more for Enterprise +accounts). The snippet below refreshes the Backstage catalog data every 35 minutes, which issues an API request for each discovered location. + +If your requests are too frequent then you may get throttled by +rate limiting. You can change the refresh frequency of the catalog in your `app-config.yaml` file by controlling the `schedule`. + +```yaml +schedule: + frequency: { minutes: 35 } + timeout: { minutes: 3 } +``` + +More information about scheduling can be found on the [TaskScheduleDefinition](https://backstage.io/docs/reference/backend-tasks.taskscheduledefinition) page. + +Alternatively, or additionally, you can configure [github-apps](github-apps.md) authentication +which carries a much higher rate limit at GitHub. + +This is true for any method of adding GitHub entities to the catalog, but +especially easy to hit with automatic discovery. + +## GitHub Processor (To Be Deprecated) + +The GitHub integration has a special discovery processor for discovering catalog +entities within a GitHub organization. The processor will crawl the GitHub +organization and register entities matching the configured path. This can be +useful as an alternative to static locations or manually adding things to the +catalog. + +## Installation + +You will have to add the processors in the catalog initialization code of your +backend. They are not installed by default, therefore you have to add a +dependency on `@backstage/plugin-catalog-backend-module-github` to your backend +package, plus `@backstage/integration` for the basic credentials management: + +```bash +# From your Backstage root directory +yarn --cwd packages/backend add @backstage/integration @backstage/plugin-catalog-backend-module-github +``` + +And then add the processors to your catalog builder: + +```ts title="packages/backend/src/plugins/catalog.ts" +/* highlight-add-start */ +import { + GithubDiscoveryProcessor, + GithubOrgReaderProcessor, +} from '@backstage/plugin-catalog-backend-module-github'; +import { + ScmIntegrations, + DefaultGithubCredentialsProvider, +} from '@backstage/integration'; +/* highlight-add-end */ + +export default async function createPlugin( + env: PluginEnvironment, +): Promise { + const builder = await CatalogBuilder.create(env); + /* highlight-add-start */ + const integrations = ScmIntegrations.fromConfig(env.config); + const githubCredentialsProvider = + DefaultGithubCredentialsProvider.fromIntegrations(integrations); + builder.addProcessor( + GithubDiscoveryProcessor.fromConfig(env.config, { + logger: env.logger, + githubCredentialsProvider, + }), + GithubOrgReaderProcessor.fromConfig(env.config, { + logger: env.logger, + githubCredentialsProvider, + }), + ); + /* highlight-add-end */ + + // .. +} +``` + +## Configuration + +To use the discovery processor, you'll need a GitHub integration +[set up](locations.md) with either a [Personal Access Token](../../getting-started/config/authentication.md) or [GitHub Apps](./github-apps.md). + +Then you can add a location target to the catalog configuration: + +```yaml +catalog: + locations: + # (since 0.13.5) Scan all repositories for a catalog-info.yaml in the root of the default branch + - type: github-discovery + target: https://github.com/myorg + # Or use a custom pattern for a subset of all repositories with default repository + - type: github-discovery + target: https://github.com/myorg/service-*/blob/-/catalog-info.yaml + # Or use a custom file format and location + - type: github-discovery + target: https://github.com/*/blob/-/docs/your-own-format.yaml + # Or use a specific branch-name + - type: github-discovery + target: https://github.com/*/blob/backstage-docs/catalog-info.yaml +``` + +Note the `github-discovery` type, as this is not a regular `url` processor. + +When using a custom pattern, the target is composed of three parts: + +- The base organization URL, `https://github.com/myorg` in this case +- The repository blob to scan, which accepts \* wildcard tokens. This can simply + be `*` to scan all repositories in the organization. This example only looks + for repositories prefixed with `service-`. +- The path within each repository to find the catalog YAML file. This will + usually be `/blob/main/catalog-info.yaml`, `/blob/master/catalog-info.yaml` or + a similar variation for catalog files stored in the root directory of each + repository. You could also use a dash (`-`) for referring to the default + branch. + +## GitHub API Rate Limits + +GitHub [rate limits](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting) API requests to 5,000 per hour (or more for Enterprise +accounts). The default Backstage catalog backend refreshes data every 100 +seconds, which issues an API request for each discovered location. + +This means if you have more than ~140 catalog entities, you may get throttled by +rate limiting. You can change the refresh rate of the catalog in your `packages/backend/src/plugins/catalog.ts` file: + +```typescript +const builder = await CatalogBuilder.create(env); + +// For example, to refresh every 5 minutes (300 seconds). +builder.setProcessingIntervalSeconds(300); +``` + +Alternatively, or additionally, you can configure [github-apps](github-apps.md) authentication +which carries a much higher rate limit at GitHub. + +This is true for any method of adding GitHub entities to the catalog, but +especially easy to hit with automatic discovery. diff --git a/docs/integrations/github/discovery.md b/docs/integrations/github/discovery.md index d0721e06f6..f16d1c17b2 100644 --- a/docs/integrations/github/discovery.md +++ b/docs/integrations/github/discovery.md @@ -6,6 +6,10 @@ sidebar_label: Discovery description: Automatically discovering catalog entities from repositories in a GitHub organization --- +:::info +This documentation is written for [the new backend system](../../backend-system/index.md) which is the default since Backstage [version 1.24](../../releases/v1.24.0.md). If you are still on the old backend system, you may want to read [its own article](./discovery--old.md) instead, and [consider migrating](../../backend-system/building-backends/08-migrating.md)! +::: + ## GitHub Provider The GitHub integration has a discovery provider for discovering catalog @@ -14,10 +18,9 @@ organization and register entities matching the configured path. This can be useful as an alternative to static locations or manually adding things to the catalog. This is the preferred method for ingesting entities into the catalog. -## Installation without Events Support +## Installation -You will have to add the provider in the catalog initialization code of your -backend. They are not installed by default, therefore you have to add a +You will have to add the GitHub Entity provider to your backend as it is not installed by default, therefore you have to add a dependency on `@backstage/plugin-catalog-backend-module-github` to your backend package. @@ -29,13 +32,12 @@ yarn --cwd packages/backend add @backstage/plugin-catalog-backend-module-github And then update your backend by adding the following line: ```ts title="packages/backend/src/index.ts" -// github discovery +backend.add(import('@backstage/plugin-catalog-backend/alpha')); +/* highlight-add-start */ backend.add(import('@backstage/plugin-catalog-backend-module-github/alpha')); ``` -## Installation with Events Support - -_For the legacy backend system, please read the sub-section below._ +## Events Support The catalog module for GitHub comes with events support enabled. This will make it subscribe to its relevant topics (`github.push`) @@ -53,47 +55,6 @@ You can decide between the following options (extensible): - [via HTTP endpoint](https://github.com/backstage/backstage/tree/master/plugins/events-backend/README.md) - [via an AWS SQS queue](https://github.com/backstage/backstage/tree/master/plugins/events-backend-module-aws-sqs/README.md) -### Legacy Backend System - -Please follow the installation instructions at - -- -- - -Additionally, you need to decide how you want to receive events from external sources like - -- [via HTTP endpoint](https://github.com/backstage/backstage/tree/master/plugins/events-backend/README.md) -- [via an AWS SQS queue](https://github.com/backstage/backstage/tree/master/plugins/events-backend-module-aws-sqs/README.md) - -Set up your provider - -```ts title="packages/backend/src/plugins/catalog.ts" -import { CatalogBuilder } from '@backstage/plugin-catalog-backend'; -/* highlight-add-next-line */ -import { GithubEntityProvider } from '@backstage/plugin-catalog-backend-module-github'; -import { ScaffolderEntitiesProcessor } from '@backstage/plugin-scaffolder-backend'; -import { Router } from 'express'; -import { PluginEnvironment } from '../types'; - -export default async function createPlugin( - env: PluginEnvironment, -): Promise { - const builder = await CatalogBuilder.create(env); - builder.addProcessor(new ScaffolderEntitiesProcessor()); - /* highlight-add-start */ - const githubProvider = GithubEntityProvider.fromConfig(env.config, { - events: env.events, - logger: env.logger, - scheduler: env.scheduler, - }); - builder.addEntityProvider(githubProvider); - /* highlight-add-end */ - const { processingEngine, router } = await builder.build(); - await processingEngine.start(); - return router; -} -``` - You can check the official docs to [configure your webhook](https://docs.github.com/en/developers/webhooks-and-events/webhooks/creating-webhooks) and to [secure your request](https://docs.github.com/en/developers/webhooks-and-events/webhooks/securing-your-webhooks). The webhook will need to be configured to forward `push` events. ## Configuration @@ -236,121 +197,3 @@ which carries a much higher rate limit at GitHub. This is true for any method of adding GitHub entities to the catalog, but especially easy to hit with automatic discovery. - -## GitHub Processor (To Be Deprecated) - -The GitHub integration has a special discovery processor for discovering catalog -entities within a GitHub organization. The processor will crawl the GitHub -organization and register entities matching the configured path. This can be -useful as an alternative to static locations or manually adding things to the -catalog. - -## Installation - -You will have to add the processors in the catalog initialization code of your -backend. They are not installed by default, therefore you have to add a -dependency on `@backstage/plugin-catalog-backend-module-github` to your backend -package, plus `@backstage/integration` for the basic credentials management: - -```bash -# From your Backstage root directory -yarn --cwd packages/backend add @backstage/integration @backstage/plugin-catalog-backend-module-github -``` - -And then add the processors to your catalog builder: - -```ts title="packages/backend/src/plugins/catalog.ts" -/* highlight-add-start */ -import { - GithubDiscoveryProcessor, - GithubOrgReaderProcessor, -} from '@backstage/plugin-catalog-backend-module-github'; -import { - ScmIntegrations, - DefaultGithubCredentialsProvider, -} from '@backstage/integration'; -/* highlight-add-end */ - -export default async function createPlugin( - env: PluginEnvironment, -): Promise { - const builder = await CatalogBuilder.create(env); - /* highlight-add-start */ - const integrations = ScmIntegrations.fromConfig(env.config); - const githubCredentialsProvider = - DefaultGithubCredentialsProvider.fromIntegrations(integrations); - builder.addProcessor( - GithubDiscoveryProcessor.fromConfig(env.config, { - logger: env.logger, - githubCredentialsProvider, - }), - GithubOrgReaderProcessor.fromConfig(env.config, { - logger: env.logger, - githubCredentialsProvider, - }), - ); - /* highlight-add-end */ - - // .. -} -``` - -## Configuration - -To use the discovery processor, you'll need a GitHub integration -[set up](locations.md) with either a [Personal Access Token](../../getting-started/config/authentication.md) or [GitHub Apps](./github-apps.md). - -Then you can add a location target to the catalog configuration: - -```yaml -catalog: - locations: - # (since 0.13.5) Scan all repositories for a catalog-info.yaml in the root of the default branch - - type: github-discovery - target: https://github.com/myorg - # Or use a custom pattern for a subset of all repositories with default repository - - type: github-discovery - target: https://github.com/myorg/service-*/blob/-/catalog-info.yaml - # Or use a custom file format and location - - type: github-discovery - target: https://github.com/*/blob/-/docs/your-own-format.yaml - # Or use a specific branch-name - - type: github-discovery - target: https://github.com/*/blob/backstage-docs/catalog-info.yaml -``` - -Note the `github-discovery` type, as this is not a regular `url` processor. - -When using a custom pattern, the target is composed of three parts: - -- The base organization URL, `https://github.com/myorg` in this case -- The repository blob to scan, which accepts \* wildcard tokens. This can simply - be `*` to scan all repositories in the organization. This example only looks - for repositories prefixed with `service-`. -- The path within each repository to find the catalog YAML file. This will - usually be `/blob/main/catalog-info.yaml`, `/blob/master/catalog-info.yaml` or - a similar variation for catalog files stored in the root directory of each - repository. You could also use a dash (`-`) for referring to the default - branch. - -## GitHub API Rate Limits - -GitHub [rate limits](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting) API requests to 5,000 per hour (or more for Enterprise -accounts). The default Backstage catalog backend refreshes data every 100 -seconds, which issues an API request for each discovered location. - -This means if you have more than ~140 catalog entities, you may get throttled by -rate limiting. You can change the refresh rate of the catalog in your `packages/backend/src/plugins/catalog.ts` file: - -```typescript -const builder = await CatalogBuilder.create(env); - -// For example, to refresh every 5 minutes (300 seconds). -builder.setProcessingIntervalSeconds(300); -``` - -Alternatively, or additionally, you can configure [github-apps](github-apps.md) authentication -which carries a much higher rate limit at GitHub. - -This is true for any method of adding GitHub entities to the catalog, but -especially easy to hit with automatic discovery. diff --git a/docs/integrations/github/org--old.md b/docs/integrations/github/org--old.md new file mode 100644 index 0000000000..2f1b39a1f8 --- /dev/null +++ b/docs/integrations/github/org--old.md @@ -0,0 +1,368 @@ +--- +id: org--old +title: GitHub Organizational Data +sidebar_label: Org Data +# prettier-ignore +description: Importing users and groups from a GitHub organization into Backstage +--- + +:::info +This documentation is written for the old backend which has been replaced by [the new backend system](../../backend-system/index.md), being the default since Backstage [version 1.24](../../releases/v1.24.0.md). If have migrated to the new backend system, you may want to read [its own article](./org.md) instead.Otherwise, [consider migrating](../../backend-system/building-backends/08-migrating.md)! +::: + +The Backstage catalog can be set up to ingest organizational data - users and +teams - directly from an organization in GitHub or GitHub Enterprise. The result +is a hierarchy of +[`User`](../../features/software-catalog/descriptor-format.md#kind-user) and +[`Group`](../../features/software-catalog/descriptor-format.md#kind-group) kind +entities that mirror your org setup. + +> Note: This adds `User` and `Group` entities to the catalog, but does not +> provide authentication. See the +> [GitHub auth provider](../../auth/github/provider.md) for that. + +## Installation without Events Support + +This guide will use the Entity Provider method. If you for some reason prefer +the Processor method (not recommended), it is described separately below. + +The provider is not installed by default, therefore you have to add a dependency +to `@backstage/plugin-catalog-backend-module-github` to your backend package. + +```bash +# From your Backstage root directory +yarn --cwd packages/backend add @backstage/plugin-catalog-backend-module-github +``` + +> Note: When configuring to use a Provider instead of a Processor you do not +> need to add a _location_ pointing to your GitHub server/organization + +Update the catalog plugin initialization in your backend to add the provider and +schedule it: + +```ts title="packages/backend/src/plugins/catalog.ts" +/* highlight-add-next-line */ +import { GithubOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github'; + +export default async function createPlugin( + env: PluginEnvironment, +): Promise { + const builder = await CatalogBuilder.create(env); + + /* highlight-add-start */ + // The org URL below needs to match a configured integrations.github entry + // specified in your app-config. + builder.addEntityProvider( + GithubOrgEntityProvider.fromConfig(env.config, { + id: 'production', + orgUrl: 'https://github.com/backstage', + logger: env.logger, + schedule: env.scheduler.createScheduledTaskRunner({ + frequency: { minutes: 60 }, + timeout: { minutes: 15 }, + }), + }), + ); + /* highlight-add-end */ + + // .. +} +``` + +Alternatively, if you wish to ingest data from multiple GitHub organizations you can use +the `GithubMultiOrgEntityProvider` instead. Note that by default, this provider will namespace +groups according to the org they originate from to avoid potential name duplicates: + +```ts title="packages/backend/src/plugins/catalog.ts" +/* highlight-add-next-line */ +import { GithubMultiOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github'; + +export default async function createPlugin( + env: PluginEnvironment, +): Promise { + const builder = await CatalogBuilder.create(env); + + /* highlight-add-start */ + // The GitHub URL below needs to match a configured integrations.github entry + // specified in your app-config. + builder.addEntityProvider( + GithubMultiOrgEntityProvider.fromConfig(env.config, { + id: 'production', + githubUrl: 'https://github.com', + // Set the following to list the GitHub orgs you wish to ingest from. You can + // also omit this option to ingest all orgs accessible by your GitHub integration + orgs: ['org-a', 'org-b'], + logger: env.logger, + schedule: env.scheduler.createScheduledTaskRunner({ + frequency: { minutes: 60 }, + timeout: { minutes: 15 }, + }), + }), + ); + /* highlight-add-end */ + + // .. +} +``` + +## Installation with Events Support + +_For the legacy backend system, please read the subsection below._ + +The catalog module `github-org` comes with events support enabled for the `GithubMultiOrgEntityProvider`. +This will make it subscribe to its relevant topics and expects these events to be published via the `EventsService`. + +Topics: + +- `github.installation` +- `github.membership` +- `github.organization` +- `github.team` + +Additionally, you should install the +[event router by `events-backend-module-github`](https://github.com/backstage/backstage/tree/master/plugins/events-backend-module-github/README.md) +which will route received events from the generic topic `github` to more specific ones +based on the event type (e.g., `github.membership`). + +In order to receive Webhook events by GitHub, you have to decide how you want them +to be ingested into Backstage and published to its `EventsService`. +You can decide between the following options (extensible): + +- [via HTTP endpoint](https://github.com/backstage/backstage/tree/master/plugins/events-backend/README.md) +- [via an AWS SQS queue](https://github.com/backstage/backstage/tree/master/plugins/events-backend-module-aws-sqs/README.md) + +### Legacy Backend System + +Please follow the installation instructions at + +- +- + +Additionally, you need to decide how you want to receive events from external sources like + +- [via HTTP endpoint](https://github.com/backstage/backstage/tree/master/plugins/events-backend/README.md) +- [via an AWS SQS queue](https://github.com/backstage/backstage/tree/master/plugins/events-backend-module-aws-sqs/README.md) + +Set up your provider + +```ts title="packages/backend/src/plugins/catalog.ts" +import { CatalogBuilder } from '@backstage/plugin-catalog-backend'; +/* highlight-add-next-line */ +import { GithubOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github'; +import { ScaffolderEntitiesProcessor } from '@backstage/plugin-scaffolder-backend'; +import { Router } from 'express'; +import { PluginEnvironment } from '../types'; + +export default async function createPlugin( + env: PluginEnvironment, +): Promise { + const builder = await CatalogBuilder.create(env); + builder.addProcessor(new ScaffolderEntitiesProcessor()); + /* highlight-add-start */ + const githubOrgProvider = GithubOrgEntityProvider.fromConfig(env.config, { + id: 'production', + orgUrl: 'https://github.com/backstage', + logger: env.logger, + events: env.events, + schedule: env.scheduler.createScheduledTaskRunner({ + frequency: { minutes: 60 }, + timeout: { minutes: 15 }, + }), + }); + builder.addEntityProvider(githubOrgProvider); + /* highlight-add-end */ + const { processingEngine, router } = await builder.build(); + await processingEngine.start(); + return router; +} +``` + +Or, alternatively, if using the `GithubMultiOrgEntityProvider`: + +```ts title="packages/backend/src/plugins/catalog.ts" +/* highlight-add-next-line */ +import { GithubMultiOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github'; + +export default async function createPlugin( + env: PluginEnvironment, +): Promise { + const builder = await CatalogBuilder.create(env); + + /* highlight-add-start */ + // The GitHub URL below needs to match a configured integrations.github entry + // specified in your app-config. + builder.addEntityProvider( + GithubMultiOrgEntityProvider.fromConfig(env.config, { + id: 'production', + githubUrl: 'https://github.com', + // Set the following to list the GitHub orgs you wish to ingest from. You can + // also omit this option to ingest all orgs accessible by your GitHub integration + orgs: ['org-a', 'org-b'], + logger: env.logger, + events: env.events, + schedule: env.scheduler.createScheduledTaskRunner({ + frequency: { minutes: 60 }, + timeout: { minutes: 15 }, + }), + }), + ); + /* highlight-add-end */ + + // .. +} +``` + +You can check the official docs to [configure your webhook](https://docs.github.com/en/developers/webhooks-and-events/webhooks/creating-webhooks) and to [secure your request](https://docs.github.com/en/developers/webhooks-and-events/webhooks/securing-your-webhooks). +The webhook will need to be configured to forward `organization`,`team` and `membership` events. + +## Configuration + +As mentioned above, you also must have some configuration in your app-config +that describes the targets that you want to import. This lets the entity +provider know what authorization to use, and what the API endpoints are. You may +or may not have such an entry already added since before: + +```yaml +integrations: + github: + # example for public github + - host: github.com + token: ${GITHUB_TOKEN} + # example for a private GitHub Enterprise instance + - host: ghe.example.net + apiBaseUrl: https://ghe.example.net/api/v3 + token: ${GHE_TOKEN} +``` + +These examples use `${}` placeholders to reference environment variables. This +is often suitable for production setups, but also means that you will have to +supply those variables to the backend as it starts up. If you want, for local +development in particular, you can experiment first by putting the actual tokens +in a mirrored config directly in your `app-config.local.yaml` as well. + +If Backstage is configured to use GitHub Apps authentication you must grant +`Read-Only` access for `Members` under `Organization` in order to ingest users +correctly. You can modify the app's permissions under the organization settings, +`https://github.com/organizations/{ORG}/settings/apps/{APP_NAME}/permissions`. + +![permissions](../../assets/integrations/github/permissions.png) + +**Please note that when you change permissions, the app owner will get an email +that must be approved first before the changes are applied.** + +![email](../../assets/integrations/github/email.png) + +### Custom Transformers + +You can inject your own transformation logic to help map from GH API responses +into backstage entities. You can do this on the user and team requests to +enable you to do further processing or updates to the entities. + +To enable this you pass a function into the `GitHubOrgEntityProvider`. You can +pass a `UserTransformer`, `TeamTransformer` or both. The function is invoked +for each item (user or team) that is returned from the API. You can either +return an Entity (User or Group) or `undefined` if you do not want to import +that item. + +There is also a `defaultUserTransformer` and `defaultOrganizationTeamTransformer`. +You could use these and simply decorate the response from the default +transformation if you only need to change a few properties. + +### Resolving GitHub users via organization email + +When you authenticate users you should resolve them to an entity within the +catalog. Often the authentication you use could be a corporate SSO system that +provides you with email as a key. To enable you to find and resolve GitHub users +it's useful to also import the private domain verified emails into the User +entity in backstage. + +The integration attempts to return `organizationVerifiedDomainEmails` from the +GitHub API and makes this available as part of the object passed to +`UserTransformer`. The GitHub API will only return emails that use a domain +that's a verified domain for your GitHub Org. It also relies on the user having +configured such an email in their own account. The API will only return these +values when using GitHub App authentication and with the correct app permission +allowing access to emails. + +You can decorate the default `userTransformer` to replace the org email in the +returned identity. + +```ts title="packages/backend/src/plugins/catalog.ts" +const githubOrgProvider = GithubOrgEntityProvider.fromConfig(env.config, { + id: 'production', + orgUrl: 'https://github.com/backstage', + logger: env.logger, + schedule: env.scheduler.createScheduledTaskRunner({ + frequency: { minutes: 60 }, + timeout: { minutes: 15 }, + }), + /* highlight-add-start */ + userTransformer: async (user, ctx) => { + const entity = await defaultUserTransformer(user, ctx); + if (entity && user.organizationVerifiedDomainEmails?.length) { + entity.spec.profile!.email = user.organizationVerifiedDomainEmails[0]; + } + return entity; + }, + /* highlight-add-end */ +}); +``` + +Once you have imported the emails you can resolve users in your [sign-in +resolver](../../auth/github/provider.md) using the catalog entity search via email + +```typescript title="packages/backend/src/plugins/auth.ts" +ctx.signInWithCatalogUser({ + filter: { + kind: ['User'], + 'spec.profile.email': email as string, + }, +}); +``` + +## Using a Processor instead of a Provider + +An alternative to using the Provider for ingesting organizational entities is to +use a Processor. This is the old way that's based on registering locations with +the proper type and target, triggering the processor to run. + +The drawback of this method is that it will leave orphaned Group/User entities +whenever they are deleted on your GitHub server, and you cannot control the +frequency with which they are refreshed, separately from other processors. + +### Processor Installation + +The `GithubOrgReaderProcessor` is not registered by default, so you have to +install and register it in the catalog plugin: + +```bash +# From your Backstage root directory +yarn --cwd packages/backend add @backstage/plugin-catalog-backend-module-github +``` + +```typescript title="packages/backend/src/plugins/catalog.ts" +import { GithubOrgReaderProcessor } from '@backstage/plugin-catalog-backend-module-github'; + +builder.addProcessor( + GithubOrgReaderProcessor.fromConfig(env.config, { logger: env.logger }), +); +``` + +### Processor Configuration + +The integration section of your app-config needs to be set up in the same way as +for the Entity Provider - see above. + +In addition to that, you typically want to add a few static locations to your +app-config, which reference your organizations to import. The following +configuration enables an import of the teams and users under the org +`https://github.com/my-org-name` on public GitHub. + +```yaml +catalog: + locations: + - type: github-org + target: https://github.com/my-org-name + rules: + - allow: [User, Group] +``` diff --git a/docs/integrations/github/org.md b/docs/integrations/github/org.md index 974362c125..a9936ff558 100644 --- a/docs/integrations/github/org.md +++ b/docs/integrations/github/org.md @@ -6,6 +6,10 @@ sidebar_label: Org Data description: Importing users and groups from a GitHub organization into Backstage --- +:::info +This documentation is written for [the new backend system](../../backend-system/index.md) which is the default since Backstage [version 1.24](../../releases/v1.24.0.md). If you are still on the old backend system, you may want to read [its own article](./org--old.md) instead, and [consider migrating](../../backend-system/building-backends/08-migrating.md)! +::: + The Backstage catalog can be set up to ingest organizational data - users and teams - directly from an organization in GitHub or GitHub Enterprise. The result is a hierarchy of @@ -17,95 +21,28 @@ entities that mirror your org setup. > provide authentication. See the > [GitHub auth provider](../../auth/github/provider.md) for that. -## Installation without Events Support +## Installation -This guide will use the Entity Provider method. If you for some reason prefer -the Processor method (not recommended), it is described separately below. - -The provider is not installed by default, therefore you have to add a dependency -to `@backstage/plugin-catalog-backend-module-github` to your backend package. +You will have to add the GitHub Org provider to your backend as it is not installed by default, therefore you have to add a +dependency on `@backstage/plugin-catalog-backend-module-github-org` to your backend +package. ```bash # From your Backstage root directory -yarn --cwd packages/backend add @backstage/plugin-catalog-backend-module-github +yarn --cwd packages/backend add @backstage/plugin-catalog-backend-module-github-org ``` -> Note: When configuring to use a Provider instead of a Processor you do not -> need to add a _location_ pointing to your GitHub server/organization +And then update your backend by adding the following line: -Update the catalog plugin initialization in your backend to add the provider and -schedule it: - -```ts title="packages/backend/src/plugins/catalog.ts" -/* highlight-add-next-line */ -import { GithubOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github'; - -export default async function createPlugin( - env: PluginEnvironment, -): Promise { - const builder = await CatalogBuilder.create(env); - - /* highlight-add-start */ - // The org URL below needs to match a configured integrations.github entry - // specified in your app-config. - builder.addEntityProvider( - GithubOrgEntityProvider.fromConfig(env.config, { - id: 'production', - orgUrl: 'https://github.com/backstage', - logger: env.logger, - schedule: env.scheduler.createScheduledTaskRunner({ - frequency: { minutes: 60 }, - timeout: { minutes: 15 }, - }), - }), - ); - /* highlight-add-end */ - - // .. -} +```ts title="packages/backend/src/index.ts" +backend.add(import('@backstage/plugin-catalog-backend/alpha')); +/* highlight-add-start */ +backend.add(import('@backstage/plugin-catalog-backend-module-github-org')); ``` -Alternatively, if you wish to ingest data from multiple GitHub organizations you can use -the `GithubMultiOrgEntityProvider` instead. Note that by default, this provider will namespace -groups according to the org they originate from to avoid potential name duplicates: +## Events Support -```ts title="packages/backend/src/plugins/catalog.ts" -/* highlight-add-next-line */ -import { GithubMultiOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github'; - -export default async function createPlugin( - env: PluginEnvironment, -): Promise { - const builder = await CatalogBuilder.create(env); - - /* highlight-add-start */ - // The GitHub URL below needs to match a configured integrations.github entry - // specified in your app-config. - builder.addEntityProvider( - GithubMultiOrgEntityProvider.fromConfig(env.config, { - id: 'production', - githubUrl: 'https://github.com', - // Set the following to list the GitHub orgs you wish to ingest from. You can - // also omit this option to ingest all orgs accessible by your GitHub integration - orgs: ['org-a', 'org-b'], - logger: env.logger, - schedule: env.scheduler.createScheduledTaskRunner({ - frequency: { minutes: 60 }, - timeout: { minutes: 15 }, - }), - }), - ); - /* highlight-add-end */ - - // .. -} -``` - -## Installation with Events Support - -_For the legacy backend system, please read the subsection below._ - -The catalog module `github-org` comes with events support enabled for the `GithubMultiOrgEntityProvider`. +The catalog module for GitHub Org comes with events support enabled. This will make it subscribe to its relevant topics and expects these events to be published via the `EventsService`. Topics: @@ -127,87 +64,6 @@ You can decide between the following options (extensible): - [via HTTP endpoint](https://github.com/backstage/backstage/tree/master/plugins/events-backend/README.md) - [via an AWS SQS queue](https://github.com/backstage/backstage/tree/master/plugins/events-backend-module-aws-sqs/README.md) -### Legacy Backend System - -Please follow the installation instructions at - -- -- - -Additionally, you need to decide how you want to receive events from external sources like - -- [via HTTP endpoint](https://github.com/backstage/backstage/tree/master/plugins/events-backend/README.md) -- [via an AWS SQS queue](https://github.com/backstage/backstage/tree/master/plugins/events-backend-module-aws-sqs/README.md) - -Set up your provider - -```ts title="packages/backend/src/plugins/catalog.ts" -import { CatalogBuilder } from '@backstage/plugin-catalog-backend'; -/* highlight-add-next-line */ -import { GithubOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github'; -import { ScaffolderEntitiesProcessor } from '@backstage/plugin-scaffolder-backend'; -import { Router } from 'express'; -import { PluginEnvironment } from '../types'; - -export default async function createPlugin( - env: PluginEnvironment, -): Promise { - const builder = await CatalogBuilder.create(env); - builder.addProcessor(new ScaffolderEntitiesProcessor()); - /* highlight-add-start */ - const githubOrgProvider = GithubOrgEntityProvider.fromConfig(env.config, { - id: 'production', - orgUrl: 'https://github.com/backstage', - logger: env.logger, - events: env.events, - schedule: env.scheduler.createScheduledTaskRunner({ - frequency: { minutes: 60 }, - timeout: { minutes: 15 }, - }), - }); - builder.addEntityProvider(githubOrgProvider); - /* highlight-add-end */ - const { processingEngine, router } = await builder.build(); - await processingEngine.start(); - return router; -} -``` - -Or, alternatively, if using the `GithubMultiOrgEntityProvider`: - -```ts title="packages/backend/src/plugins/catalog.ts" -/* highlight-add-next-line */ -import { GithubMultiOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github'; - -export default async function createPlugin( - env: PluginEnvironment, -): Promise { - const builder = await CatalogBuilder.create(env); - - /* highlight-add-start */ - // The GitHub URL below needs to match a configured integrations.github entry - // specified in your app-config. - builder.addEntityProvider( - GithubMultiOrgEntityProvider.fromConfig(env.config, { - id: 'production', - githubUrl: 'https://github.com', - // Set the following to list the GitHub orgs you wish to ingest from. You can - // also omit this option to ingest all orgs accessible by your GitHub integration - orgs: ['org-a', 'org-b'], - logger: env.logger, - events: env.events, - schedule: env.scheduler.createScheduledTaskRunner({ - frequency: { minutes: 60 }, - timeout: { minutes: 15 }, - }), - }), - ); - /* highlight-add-end */ - - // .. -} -``` - You can check the official docs to [configure your webhook](https://docs.github.com/en/developers/webhooks-and-events/webhooks/creating-webhooks) and to [secure your request](https://docs.github.com/en/developers/webhooks-and-events/webhooks/securing-your-webhooks). The webhook will need to be configured to forward `organization`,`team` and `membership` events. @@ -264,6 +120,81 @@ There is also a `defaultUserTransformer` and `defaultOrganizationTeamTransformer You could use these and simply decorate the response from the default transformation if you only need to change a few properties. +Here's an example of how to use the transformers: + +```ts title="packages/backend/src/index.ts" +import { createBackend } from '@backstage/backend-defaults'; +import { createBackendModule } from '@backstage/backend-plugin-api'; +import { githubOrgEntityProviderTransformsExtensionPoint } from '@backstage/plugin-catalog-backend-module-github-org'; +import { myTeamTransformer, myUserTransformer } from './transformers'; + +const githubOrgModule = createBackendModule({ + pluginId: 'catalog', + moduleId: 'github-org-extensions', + register(env) { + env.registerInit({ + deps: { + githubOrg: githubOrgEntityProviderTransformsExtensionPoint, + }, + async init({ githubOrg }) { + githubOrg.setTeamTransformer(myTeamTransformer); + githubOrg.setUserTransformer(myUserTransformer); + }, + }); + }, +}); + +const backend = createBackend(); + +// Other items + +backend.add(import('@backstage/plugin-catalog-backend/alpha')); + +backend.add(githubOrgModule()); + +backend.start(); +``` + +The `myTeamTransformer` and `myUserTransformer` transformer functions are from the examples in the section below. + +### Transformer Examples + +The following provides an example of each kind of transformer. We recommend creating a `transformers.ts` file in your `packages/backend/src` folder for these. + +```ts title="packages/backend/src/transformers.ts" +import { + TeamTransformer, + UserTransformer, + defaultUserTransformer, +} from '@backstage/plugin-catalog-backend-module-github'; + +// This team transformer completely replaces the built in logic with custom logic. +export const myTeamTransformer: TeamTransformer = async team => { + return { + apiVersion: 'backstage.io/v1alpha1', + kind: 'Group', + metadata: { + name: team.slug, + annotations: {}, + }, + spec: { + type: 'GitHub Org Team', + profile: {}, + children: [], + }, + }; +}; + +// This user transformer makes use of the built in logic, but also sets the description field +export const myUserTransformer: UserTransformer = async (user, ctx) => { + const backstageUser = await defaultUserTransformer(user, ctx); + if (backstageUser) { + backstageUser.metadata.description = 'Loaded from GitHub Org Data'; + } + return backstageUser; +}; +``` + ### Resolving GitHub users via organization email When you authenticate users you should resolve them to an entity within the @@ -283,31 +214,25 @@ allowing access to emails. You can decorate the default `userTransformer` to replace the org email in the returned identity. -```ts title="packages/backend/src/plugins/catalog.ts" -const githubOrgProvider = GithubOrgEntityProvider.fromConfig(env.config, { - id: 'production', - orgUrl: 'https://github.com/backstage', - logger: env.logger, - schedule: env.scheduler.createScheduledTaskRunner({ - frequency: { minutes: 60 }, - timeout: { minutes: 15 }, - }), - /* highlight-add-start */ - userTransformer: async (user, ctx) => { - const entity = await defaultUserTransformer(user, ctx); - if (entity && user.organizationVerifiedDomainEmails?.length) { - entity.spec.profile!.email = user.organizationVerifiedDomainEmails[0]; - } - return entity; - }, - /* highlight-add-end */ -}); +```ts title="packages/backend/src/transformers.ts" +export const myVerifiedUserTransformer: UserTransformer = async (user, ctx) => { + const backstageUser = await defaultUserTransformer(user, ctx); + if (backstageUser && user.organizationVerifiedDomainEmails?.length) { + backstageUser.spec.profile!.email = + user.organizationVerifiedDomainEmails[0]; + } + return backstageUser; +}; ``` +This example assumes you have implemented the custom transformer following the [Custom Transformers](#custom-transformers) and [Transformer Examples](#transformer-examples) documentation in the sections above. + Once you have imported the emails you can resolve users in your [sign-in resolver](../../auth/github/provider.md) using the catalog entity search via email -```typescript title="packages/backend/src/plugins/auth.ts" +Once you have imported the emails you can resolve users by building a [Custom Resolver](../../auth/identity-resolver.md#building-custom-resolvers). In this custom resolver you can then use this example to properly match the user: + +```ts ctx.signInWithCatalogUser({ filter: { kind: ['User'], @@ -315,50 +240,3 @@ ctx.signInWithCatalogUser({ }, }); ``` - -## Using a Processor instead of a Provider - -An alternative to using the Provider for ingesting organizational entities is to -use a Processor. This is the old way that's based on registering locations with -the proper type and target, triggering the processor to run. - -The drawback of this method is that it will leave orphaned Group/User entities -whenever they are deleted on your GitHub server, and you cannot control the -frequency with which they are refreshed, separately from other processors. - -### Processor Installation - -The `GithubOrgReaderProcessor` is not registered by default, so you have to -install and register it in the catalog plugin: - -```bash -# From your Backstage root directory -yarn --cwd packages/backend add @backstage/plugin-catalog-backend-module-github -``` - -```typescript title="packages/backend/src/plugins/catalog.ts" -import { GithubOrgReaderProcessor } from '@backstage/plugin-catalog-backend-module-github'; - -builder.addProcessor( - GithubOrgReaderProcessor.fromConfig(env.config, { logger: env.logger }), -); -``` - -### Processor Configuration - -The integration section of your app-config needs to be set up in the same way as -for the Entity Provider - see above. - -In addition to that, you typically want to add a few static locations to your -app-config, which reference your organizations to import. The following -configuration enables an import of the teams and users under the org -`https://github.com/my-org-name` on public GitHub. - -```yaml -catalog: - locations: - - type: github-org - target: https://github.com/my-org-name - rules: - - allow: [User, Group] -```