This integration is powered by Singer's GitHub tap. For support, visit the GitHub repo or join the Singer Slack.
GitHub integration summary
Stitch’s GitHub integration replicates data using the GitHub REST API v3. Refer to the Schema section for a list of objects available for replication.
GitHub feature snapshot
A high-level look at Stitch's GitHub integration, including release status, useful links, and the features supported in Stitch.
STITCH | |||
Release Status |
Released |
Supported By | |
Stitch Plan |
Free |
Singer GitHub Repository | |
DATA SELECTION | |||
Table Selection |
Supported |
Column Selection |
Supported |
REPLICATION SETTINGS | |||
Anchor Scheduling |
Supported |
Advanced Scheduling |
Unsupported |
Table-level Reset |
Unsupported |
Configurable Replication Methods |
Unsupported |
TRANSPARENCY | |||
Extraction Logs |
Supported |
Loading Reports |
Supported |
Connecting GitHub
GitHub setup requirements
To set up GitHub in Stitch, you need:
-
Access to the projects you want to replicate data from. Stitch will only be able to access the same projects as the user who creates the access token.
Step 1: Create a GitHub token
- Sign into your GitHub account.
- Click the User menu (your icon) > Settings.
- Click Developer settings in the navigation on the left side of the page.
- Click Personal access tokens.
- On the Personal access tokens page, click the Generate new token button. If prompted, enter your password.
- In the Description field, enter
stitch
. This will allow you to easily idenfiy what application is using the token. -
In the Select Scopes section, check the repo option:
Note: While these are full permissions, Stitch will only ever read your data. The repo scope is required due to how GitHub structures permissions.
- Click the Generate token button.
- The new access token will display on the next page. Copy the token before navigating away from the page - GitHub won’t display it again.
Step 2: Add GitHub as a Stitch data source
- Sign into your Stitch account.
-
On the Stitch Dashboard page, click the Add Integration button.
-
Click the GitHub icon.
-
Enter a name for the integration. This is the name that will display on the Stitch Dashboard for the integration; it’ll also be used to create the schema in your destination.
For example, the name “Stitch GitHub” would create a schema called
stitch_github
in the destination. Note: Schema names cannot be changed after you save the integration. - In the GitHub Access Token field, paste the access token you created in Step 1.
-
In the GitHub Repository Name field, enter the paths of the repositories you want to track. The path is relative to
https://github.com
. For example: The path for the Stitch Docs repository isstitchdata/docs
To track multiple repositories, enter a space delimited list of the repository paths. For example:
stitchdata/docs stitchdata/docs-about-docs
Step 3: Define the historical sync
The Sync Historical Data setting will define the starting date for your GitHub integration. This means that:
- For tables using Incremental Replication, data equal to or newer than this date will be replicated to your data warehouse.
- For tables using Full Table Replication, all data - including records that are older, equal to, or newer than this date - will be replicated to your data warehouse.
Change this setting if you want to replicate data beyond GitHub’s default setting of 1 year. For a detailed look at historical replication jobs, check out the Syncing Historical SaaS Data guide.
Step 4: Create a replication schedule
In the Replication Frequency section, you’ll create the integration’s replication schedule. An integration’s replication schedule determines how often Stitch runs a replication job, and the time that job begins.
GitHub integrations support the following replication scheduling methods:
To keep your row usage low, consider setting the integration to replicate less frequently. See the Understanding and Reducing Your Row Usage guide for tips on reducing your usage.
Initial and historical replication jobs
After you finish setting up GitHub, its Sync Status may show as Pending on either the Stitch Dashboard or in the Integration Details page.
For a new integration, a Pending status indicates that Stitch is in the process of scheduling the initial replication job for the integration. This may take some time to complete.
Initial replication jobs with Anchor Scheduling
If using Anchor Scheduling, an initial replication job may not kick off immediately. This depends on the selected Replication Frequency and Anchor Time. Refer to the Anchor Scheduling documentation for more information.
Free historical data loads
The first seven days of replication, beginning when data is first replicated, are free. Rows replicated from the new integration during this time won’t count towards your quota. Stitch offers this as a way of testing new integrations, measuring usage, and ensuring historical data volumes don’t quickly consume your quota.
GitHub table schemas
Table and column names in your destination
Depending on your destination, table and column names may not appear as they are outlined below.
For example: Object names are lowercased in Redshift (CusTomERs
> customers
), while case is maintained in PostgreSQL destinations (CusTomERs
> CusTomERs
). Refer to the Loading Guide for your destination for more info.
assignees
Replication Method : |
Full Table |
Primary Key : |
id |
API endpoint : |
The assignees
table contains info about the available assignees for issues in a repository.
id
The assignee ID. |
login
The user’s username. |
type
The user’s type. |
url
The profile URL associated with the user. |
collaborators
Replication Method : |
Full Table |
Primary Key : |
id |
API endpoint : |
The collaborators
table contains info about the users who contribute to a repository.
For organization-owned repositories, this will include outside collaborators, organization owners, organization members that are direct collaborators, who have access through team memberships, or have access through default organization permissions.
id
The collaborator’s ID. Reference: |
login
The collaborator’s username. |
type
The collaborator’s type. |
url
The profile URL associated with the collaborator. |
comments
Replication Method : |
Key-based Incremental |
Replication Key : |
updated_at |
Primary Key : |
id |
API endpoint : |
The comments
table contains info about comments made on issues.
id
The comment ID. |
||||||||||||||||||
updated_at
The time the comment was last updated. |
||||||||||||||||||
body
The body of the comment. |
||||||||||||||||||
created_at
The time the comment was created. |
||||||||||||||||||
home_url
The home URL of the comment. |
||||||||||||||||||
html_url
The HTML URL of the comment. |
||||||||||||||||||
issue_url
The URL of the issue associated with the comment. |
||||||||||||||||||
node_id
The node ID. |
||||||||||||||||||
url
The GitHub URL of the comment. |
||||||||||||||||||
user
Details about the user who created the comment.
|
commits
Replication Method : |
Key-based Incremental |
Replication Key : |
since |
Primary Key : |
sha |
API endpoint : |
The commits
table contains info about repository commits in a project.
sha
The git commit hash. |
|||||||||||
comments_url
The URL to the commit’s comments page. |
|||||||||||
commit
Details about the commit.
|
|||||||||||
html_url
The HTML URL to the commit. |
|||||||||||
parents
Details about the parent commits.
|
|||||||||||
url
The URL to the commit. |
issues
Replication Method : |
Key-based Incremental |
Replication Key : |
updated_at |
Primary Key : |
id |
API endpoint : |
The issues
table contains info about repository issues.
Issues and pull requests
GitHub’s API considers every pull request an issue, but not every issue may be a pull request. Therefore, this table may contain both issues and pull requests.
id
The issue ID. Reference: |
updated_at
The last time the issue was updated. |
pull_requests
Replication Method : |
Full Table |
Primary Key : |
id |
API endpoint : |
The pull_requests
table contains info about pull requests made against the repository.
id
The pull request ID. Reference: |
||
updated_at
The last time the pull request was updated. |
||
body
The description of the pull request. |
||
closed_at
The time the pull request was closed. |
||
created_at
The time the pull request was created. |
||
merged_at
The time the pull request was merged. |
||
number
The number of the pull request in the repository. |
||
state
The current status of the pull request. For example: |
||
title
The title of the pull request. |
||
url
The URL to the pull request. |
||
user
Details about the user who created the pull request.
|
review_comments
Replication Method : |
Key-based Incremental |
Replication Key : |
updated_at |
Primary Key : |
id |
API endpoint : |
The review_comments
table contains info about comments made on pull request reviews.
Note: In order to replicate this table, you must also set the pull_requests
table to replicate.
id
The review comment ID. |
||
updated_at
The time the review comment was last updated. |
||
body
The body of the review comment. |
||
commit_id
The ID of the commit the review comment is associated with. Reference: |
||
created_at
The time the review comment was created. |
||
diff_url
The diff URL associated with the review comment. |
||
html_url
The HTML URL of the review comment. |
||
in_reply_to_id
If the review comment is a reply to another review comment, this will be the ID of the review comment it is in response to. |
||
issue_url
The URL of the issue associated with the review comment. |
||
node_id
The review comment’s node ID. |
||
original_position
The original position of the review comment. |
||
original_commit_id
The ID of the original comment the review comment is associated with. Reference: |
||
pull_request_review_id
The ID of the pull request review the comment is a part of. Reference: |
||
path
The path of the file the review comment was made on. |
||
position
The position of the review comment. |
||
pull_request_url
The URL of the pull request associated with the review comment. |
||
url
The GitHub URL of the review comment. |
||
user
Details about the user who created the review comment.
|
reviews
Replication Method : |
Full Table |
Primary Key : |
id |
API endpoint : |
The reviews
table contains info about pull request reviews. A pull request review is a group of comments on a pull request.
Note: In order to replicate this table, you must also set the pull_requests
table to replicate.
id
The review ID. |
||
body
The description of the review. |
||
commit_id
The ID of the commit the review was performed on. Reference: |
||
html_url
The HTML URL to the review. |
||
pull_request_url
The URL to the pull request being reviewed. |
||
state
The state of the review. Possible values are:
|
||
user
Details about the user who submitted the review.
|
stargazers
Replication Method : |
Key-based Incremental |
Replication Key : |
starred_at |
Primary Key : |
user_id |
API endpoint : |
The stargazers
table contains info about users who have starred a repository.
user_id
The user ID. |
starred_at
The time the user starred the repository. |
user
Details about the user who starred the repository. |
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.