GitHub integration summary

Stitch’s GitHub integration replicates data using the GitHub REST API v3. Refer to the Schema section for a list of objects available for replication.

GitHub feature snapshot

A high-level look at Stitch's GitHub integration, including release status, useful links, and the features supported in Stitch.

STITCH
Release Status

Released

Supported By

Singer Community

Stitch Plan

Free

Singer GitHub Repository

GitHub Repository

DATA SELECTION
Table Selection

Supported

Column Selection

Supported

REPLICATION SETTINGS
Anchor Scheduling

Supported

Advanced Scheduling

Unsupported

Table-level Reset

Unsupported

Configurable Replication Methods

Unsupported

TRANSPARENCY
Extraction Logs

Supported

Loading Reports

Supported

Connecting GitHub

GitHub setup requirements

To set up GitHub in Stitch, you need:

  • Access to the projects you want to replicate data from. Stitch will only be able to access the same projects as the user who creates the access token.

Step 1: Create a GitHub token

  1. Sign into your GitHub account.
  2. Click the User menu (your icon) > Settings.
  3. Click Developer settings in the navigation on the left side of the page.
  4. Click Personal access tokens.
  5. On the Personal access tokens page, click the Generate new token button. If prompted, enter your password.
  6. In the Description field, enter stitch. This will allow you to easily idenfiy what application is using the token.
  7. In the Select Scopes section, check the repo option:

    Highlighted repo scopes on the GitHub Personal Access Tokens page

    Note: While these are full permissions, Stitch will only ever read your data. The repo scope is required due to how GitHub structures permissions.

  8. Click the Generate token button.
  9. The new access token will display on the next page. Copy the token before navigating away from the page - GitHub won’t display it again.

Step 2: Add GitHub as a Stitch data source

  1. Sign into your Stitch account.
  2. On the Stitch Dashboard page, click the Add Integration button.

  3. Click the GitHub icon.

  4. Enter a name for the integration. This is the name that will display on the Stitch Dashboard for the integration; it’ll also be used to create the schema in your destination.

    For example, the name “Stitch GitHub” would create a schema called stitch_github in the destination. Note: Schema names cannot be changed after you save the integration.

  5. In the GitHub Access Token field, paste the access token you created in Step 1.
  6. In the GitHub Repository Name field, enter the paths of the repositories you want to track. The path is relative to https://github.com. For example: The path for the Stitch Docs repository is stitchdata/docs

    To track multiple repositories, enter a space delimited list of the repository paths. For example: stitchdata/docs stitchdata/docs-about-docs

Step 3: Define the historical sync

The Sync Historical Data setting will define the starting date for your GitHub integration. This means that:

  • For tables using Incremental Replication, data equal to or newer than this date will be replicated to your data warehouse.
  • For tables using Full Table Replication, all data - including records that are older, equal to, or newer than this date - will be replicated to your data warehouse.

Change this setting if you want to replicate data beyond GitHub’s default setting of 1 year. For a detailed look at historical replication jobs, check out the Syncing Historical SaaS Data guide.

Step 4: Create a replication schedule

In the Replication Frequency section, you’ll create the integration’s replication schedule. An integration’s replication schedule determines how often Stitch runs a replication job, and the time that job begins.

GitHub integrations support the following replication scheduling methods:

To keep your row usage low, consider setting the integration to replicate less frequently. See the Understanding and Reducing Your Row Usage guide for tips on reducing your usage.

Initial and historical replication jobs

After you finish setting up GitHub, its Sync Status may show as Pending on either the Stitch Dashboard or in the Integration Details page.

For a new integration, a Pending status indicates that Stitch is in the process of scheduling the initial replication job for the integration. This may take some time to complete.

Free historical data loads

The first seven days of replication, beginning when data is first replicated, are free. Rows replicated from the new integration during this time won’t count towards your quota. Stitch offers this as a way of testing new integrations, measuring usage, and ensuring historical data volumes don’t quickly consume your quota.


GitHub table schemas

Replication Method :

Full Table

Primary Key :

id

API endpoint :

listAssignees

The assignees table contains info about the available assignees for issues in a repository.

id
INTEGER

The assignee ID.

login
STRING

The user’s username.

type
STRING

The user’s type.

url
STRING

The profile URL associated with the user.


Replication Method :

Full Table

Primary Key :

id

API endpoint :

listCollaborators

The collaborators table contains info about the users who contribute to a repository.

For organization-owned repositories, this will include outside collaborators, organization owners, organization members that are direct collaborators, who have access through team memberships, or have access through default organization permissions.

id
INTEGER

The collaborator’s ID.

Reference:

login
STRING

The collaborator’s username.

type
STRING

The collaborator’s type.

url
STRING

The profile URL associated with the collaborator.


Replication Method :

Key-based Incremental

Replication Key :

updated_at

Primary Key :

id

API endpoint :

List comments on a pull request

The comments table contains info about comments made on issues.

id
INTEGER

The comment ID.

updated_at
DATE-TIME

The time the comment was last updated.

body
STRING

The body of the comment.

created_at
DATE-TIME

The time the comment was created.

home_url
STRING

The home URL of the comment.

html_url
STRING

The HTML URL of the comment.

issue_url
STRING

The URL of the issue associated with the comment.

node_id
STRING

The node ID.

url
STRING

The GitHub URL of the comment.

user
OBJECT

Details about the user who created the comment.

login
STRING

The login name of the user who created the comment.

id
STRING

The ID of the user who created the comment.

node_id
STRING

The node ID of the user who created the comment.

avatar_url
STRING

The URL of the avatar of the user who created the comment.

gravatar_id
STRING

The URL of the Gravatar of the user who created the comment.

url
STRING

The API URL of the user who created the comment.

html_url
STRING

The GitHub URL of the user who created the comment.

followers_url
STRING

The URL to the user’s followers page.

following_url
STRING

The URL to the user’s following page.

gists_url
STRING

The URL to the user’s gists page.

starred_url
STRING

The URL to the user’s starred page.

subscriptions_url
STRING

The URL to the user’s subscriptions page.

organizations_url
STRING

The URL to the user’s organizations page.

repos_url
STRING

The URL to the user’s repositories page.

events_url
STRING

The URL to the user’s events page.

received_events_url
STRING

The URL to the user’s received events page.

type
STRING

The type of the user.

site_admin
STRING

Indicates if the user is a site administrator.

comments (table), user (attribute)

Replication Method :

Key-based Incremental

Replication Key :

since

Primary Key :

sha

API endpoint :

listRepositoryCommits

The commits table contains info about repository commits in a project.

sha
STRING

The git commit hash.

comments_url
STRING

The URL to the commit’s comments page.

commit
OBJECT

Details about the commit.

url
STRING

The URL to the commit.

tree
OBJECT

Details about the commit tree.

sha
STRING

The git commit tree hash.

url
STRING

The URL to the commit tree.

commits (table), tree (attribute)

author
OBJECT

Details about the author of the commit.

date
STRING

The date the author committed the change.

email
STRING

The author’s email address.

name
STRING

The author’s name.

commits (table), author (attribute)

message
STRING

The commit message.

committer
OBJECT

Details about the user who committed the change.

date
STRING

The date the committer committed the change.

email
STRING

The committer’s email address.

name
STRING

The committer’s name.

commits (table), committer (attribute)

comment_count
INTEGER

The number of comments on the commit.

commits (table), commit (attribute)

html_url
STRING

The HTML URL to the commit.

parents
ARRAY

Details about the parent commits.

sha
STRING

The git hash of the parent commit.

html_url
STRING

The HTML URL to the parent commit.

url
STRING

The URL to the parent commit.

commits (table), parents (attribute)

url
STRING

The URL to the commit.


Replication Method :

Key-based Incremental

Replication Key :

updated_at

Primary Key :

id

API endpoint :

listIssuesForRepository

The issues table contains info about repository issues.

Issues and pull requests

GitHub’s API considers every pull request an issue, but not every issue may be a pull request. Therefore, this table may contain both issues and pull requests.

id
INTEGER

The issue ID.

Reference:

updated_at
DATE-TIME

The last time the issue was updated.


Replication Method :

Full Table

Primary Key :

id

API endpoint :

listPullRequests

The pull_requests table contains info about pull requests made against the repository.

id
STRING

The pull request ID.

Reference:

updated_at
DATE-TIME

The last time the pull request was updated.

body
STRING

The description of the pull request.

closed_at
STRING

The time the pull request was closed.

created_at
STRING

The time the pull request was created.

merged_at
STRING

The time the pull request was merged.

number
INTEGER

The number of the pull request in the repository.

state
STRING

The current status of the pull request. For example: open

title
STRING

The title of the pull request.

url
STRING

The URL to the pull request.

user
OBJECT

Details about the user who created the pull request.

id
INTEGER

The user ID.

Reference:

login
STRING

The user’s GitHub username.

pull_requests (table), user (attribute)

Replication Method :

Key-based Incremental

Replication Key :

updated_at

Primary Key :

id

API endpoint :

List comments on a pull request

The review_comments table contains info about comments made on pull request reviews.

Note: In order to replicate this table, you must also set the pull_requests table to replicate.

id
INTEGER

The review comment ID.

updated_at
DATE-TIME

The time the review comment was last updated.

body
STRING

The body of the review comment.

commit_id
STRING

The ID of the commit the review comment is associated with.

Reference:

created_at
DATE-TIME

The time the review comment was created.

diff_url
STRING

The diff URL associated with the review comment.

html_url
STRING

The HTML URL of the review comment.

in_reply_to_id
INTEGER

If the review comment is a reply to another review comment, this will be the ID of the review comment it is in response to.

issue_url
STRING

The URL of the issue associated with the review comment.

node_id
STRING

The review comment’s node ID.

original_position
INTEGER

The original position of the review comment.

original_commit_id
STRING

The ID of the original comment the review comment is associated with.

Reference:

pull_request_review_id
INTEGER

The ID of the pull request review the comment is a part of.

Reference:

path
STRING

The path of the file the review comment was made on.

position
INTEGER

The position of the review comment.

pull_request_url
STRING

The URL of the pull request associated with the review comment.

url
STRING

The GitHub URL of the review comment.

user
OBJECT

Details about the user who created the review comment.

login
STRING

The login name of the user who created the review comment.

id
STRING

The ID of the user who created the review comment.

review_comments (table), user (attribute)

Replication Method :

Full Table

Primary Key :

id

API endpoint :

listReviewsOnPullRequest

The reviews table contains info about pull request reviews. A pull request review is a group of comments on a pull request.

Note: In order to replicate this table, you must also set the pull_requests table to replicate.

id
INTEGER

The review ID.

body
STRING

The description of the review.

commit_id
STRING

The ID of the commit the review was performed on.

Reference:

html_url
STRING

The HTML URL to the review.

pull_request_url
STRING

The URL to the pull request being reviewed.

state
STRING

The state of the review. Possible values are:

  • APPROVED
  • PENDING
  • CHANGES_REQUESTED

user
OBJECT

Details about the user who submitted the review.

id
INTEGER

The user ID.

Reference:

login
STRING

The user’s GitHub username.

reviews (table), user (attribute)

Replication Method :

Key-based Incremental

Replication Key :

starred_at

Primary Key :

user_id

API endpoint :

listStargazers

The stargazers table contains info about users who have starred a repository.

user_id
INTEGER

The user ID.

starred_at
STRING

The time the user starred the repository.

user
OBJECT

Details about the user who starred the repository.

id
INTEGER

The user ID.

stargazers (table), user (attribute)


Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.