feat: add configurable GitHub API page sizes

- Add pageSizes configuration for GitHub providers
- Document pageSizes configuration

Related to #31437

Signed-off-by: abhishekbvs <bvsabhishek@gmail.com>
This commit is contained in:
abhishekbvs
2025-10-20 01:58:25 +05:30
parent 6d396ee333
commit 637a3de8d8
15 changed files with 719 additions and 29 deletions
+20
View File
@@ -308,6 +308,26 @@ If you do so, `default` will be used as provider ID.
The amount of time that should pass before the first invocation happens.
- **`scope`** _(optional)_:
`'global'` or `'local'`. Sets the scope of concurrency control.
- **`pageSizes`** _(optional)_:
Configure page sizes for GitHub GraphQL API queries. This can help prevent `RESOURCE_LIMITS_EXCEEDED` errors with large organizations.
- **`repositories`** _(optional)_:
Number of repositories to fetch per page. Defaults to `25`.
Example with page sizes configuration:
```yaml
catalog:
providers:
github:
myOrganization:
organization: 'my-large-org'
catalogPath: '/catalog-info.yaml'
schedule:
frequency: { minutes: 30 }
timeout: { minutes: 3 }
pageSizes:
repositories: 15 # Reduce if hitting API limits
```
## GitHub API Rate Limits
+31
View File
@@ -94,6 +94,37 @@ Directly under the `githubOrg` is a list of configurations, each entry is a stru
- `githubUrl`: The target that this provider should consume
- `orgs` (optional): The list of the GitHub orgs to consume. If you only list a single org the generated group entities will use the `default` namespace, otherwise they will use the org name as the namespace. By default the provider will consume all accessible orgs on the given GitHub instance (support for GitHub App integration only).
- `schedule`: The refresh schedule to use, matches the structure of [`SchedulerServiceTaskScheduleDefinitionConfig`](https://backstage.io/docs/reference/backend-plugin-api.schedulerservicetaskscheduledefinitionconfig/)
- `pageSizes` (optional): Configure page sizes for GitHub GraphQL API queries to prevent `RESOURCE_LIMITS_EXCEEDED` errors with large organizations. See [Page Sizes Configuration](#page-sizes-configuration) below for details.
### Page Sizes Configuration
For large GitHub organizations (200+ teams), you may encounter `RESOURCE_LIMITS_EXCEEDED` errors due to GitHub's GraphQL API resource limits. You can configure page sizes to reduce the number of records fetched per API request:
```yaml title="app-config.yaml"
catalog:
providers:
githubOrg:
- id: production
githubUrl: https://github.com
orgs: ['large-org']
schedule:
frequency: { hours: 1 }
timeout: { minutes: 50 }
pageSizes:
teams: 25 # Default: 25
teamMembers: 50 # Default: 50
organizationMembers: 50 # Default: 50
repositories: 25 # Default: 25
```
**Configuration Options:**
- `teams`: Number of teams to fetch per page when querying organization teams (default: 25)
- `teamMembers`: Number of team members to fetch per page when querying team members (default: 50)
- `organizationMembers`: Number of organization members to fetch per page (default: 50)
- `repositories`: Number of repositories to fetch per page (default: 25)
**Note:** Reducing page sizes will result in more API calls and slightly longer sync times, but will prevent resource limit errors for large organizations.
### Events Support