Heap integration summary

Stitch’s Heap integration replicates data from Avro files published to Amazon S3 via Heap’s Connect for Amazon S3 feature. Refer to the Schema section for a list of objects available for replication.

Heap feature snapshot

A high-level look at Stitch's Heap integration, including release status, useful links, and the features supported in Stitch.

STITCH
Release Status

Open Beta

Supported By

Stitch

Stitch Plan

Free

Singer GitHub Repository

Heap Repository

DATA SELECTION
Table Selection

Supported

Column Selection

Supported

REPLICATION SETTINGS
Anchor Scheduling

Supported

Advanced Scheduling

Unsupported

Table-level Reset

Unsupported

Configurable Replication Methods

Unsupported

TRANSPARENCY
Extraction Logs

Supported

Loading Reports

Supported

Connecting Heap

Heap setup requirements

To set up Heap in Stitch, you need:

  • Access to Heap Connect using Amazon S3. Stitch’s Heap integration currently only replicates data from Heap Amazon S3 instances.

  • Permissions in AWS Identity Access Management (IAM) that allow you to create policies, create roles, and attach policies to roles. This is required to grant Stitch authorization to your S3 bucket.

Step 1: Retrieve your Amazon Web Services account ID

  1. Sign into your Amazon Web Services (AWS) account.
  2. Click the user menu, located between the bell and Global menus in the top-right corner of the page.
  3. Click My Account.
  4. In the Account Settings section of the page, locate the Account Id field:

    An AWS account ID, highlighted in the AWS Account Settings page

Keep this handy - you’ll need it to complete the next step.

Step 2: Add Heap as a Stitch data source

  1. Sign into your Stitch account.
  2. On the Stitch Dashboard page, click the Add Integration button.

  3. Click the Heap icon.

  4. Enter a name for the integration. This is the name that will display on the Stitch Dashboard for the integration; it’ll also be used to create the schema in your destination.

    For example, the name “Stitch Heap” would create a schema called stitch_heap in the destination. Note: Schema names cannot be changed after you save the integration.

  5. In the S3 Bucket field, enter the name of the bucket. Enter only the bucket name: No URLs, https, or S3 parts. For example: heap-rs3-stitch-bucket
  6. In the AWS Account ID field, paste the account ID you retrieve in Step 1.

Step 3: Define the historical sync

The Sync Historical Data setting will define the starting date for your Heap integration. This means that data equal to or newer than this date will be replicated to your data warehouse.

Change this setting if you want to replicate data beyond Heap’s default setting of 1 year. For a detailed look at historical replication jobs, check out the Syncing Historical SaaS Data guide.

Step 4: Create a replication schedule

In the Replication Frequency section, you’ll create the integration’s replication schedule. An integration’s replication schedule determines how often Stitch runs a replication job, and the time that job begins.

Heap integrations support the following replication scheduling methods:

To keep your row usage low, consider setting the integration to replicate less frequently. See the Understanding and Reducing Your Row Usage guide for tips on reducing your usage.

Step 5: Grant access to your bucket using AWS IAM

Next, Stitch will display a Grant Access to Your Bucket page. This page contains the info you need to configure bucket access for Stitch, which is accomplished via an IAM policy and role.

Note: Saving the integration before you’ve completed the steps below will result in connection errors.

Step 5.1: Create an IAM policy

An IAM policy is JSON-based access policy language to manage permissions to bucket resources. The policy Stitch provides is an auto-generated policy unique to the specific bucket you entered in the setup page.

For more info about the top-level permissions the Stitch IAM policy grants, click the link below.

Permission Name Operation Operation Description
s3:GetObject GET Object

Allows for the retrieval of objects from Amazon S3.

HEAD Object

Allows for the retrieval of metadata from an object without returning the object itself.

s3:ListBucket GET Bucket (List Objects)

Allows for the return of some or all (up to 1,000) of the objects in a bucket.

HEAD Bucket

Used to determine if a bucket exists and access is allowed.

To create the IAM policy:

  1. In AWS, navigate to the IAM service by clicking the Services menu and typing IAM.
  2. Click IAM once it displays in the results.
  3. On the IAM home page, click Policies in the menu on the left side of the page.
  4. Click Create Policy.
  5. In the Create Policy page, click the JSON tab.
  6. Select everything currently in the text field and delete it.
  7. In the text field, paste the Stitch IAM policy.
  8. Click Review policy.
  9. On the Review Policy page, give the policy a name. For example: stitch_s3
  10. Click Create policy.

Step 5.2: Create an IAM role for Stitch

In this step, you’ll create an IAM role for Stitch and apply the IAM policy from the previous step. This will ensure that Stitch is visible in any logs and audits.

To create the role, you’ll need the Account ID, External ID, and Role Name values provided on the Stitch Grant Access to Your Bucket page.

  1. In AWS, navigate to the IAM Roles page.
  2. Click Create Role.
  3. On the Create Role page:
    1. In the Select type of trusted entity section, click the Another AWS account option.
    2. In the Account ID field, paste the Account ID from Stitch. Note: This isn’t your AWS account ID from Step 1 - this is the Account ID that displays in Stitch on the Grant Access to Your Bucket page.
    3. In the Options section, check the Require external ID box.
    4. In the External ID field that displays, paste the External ID from the Stitch Grant Access to Your Bucket page: Account ID and External ID fields mapped from Stitch to AWS
    5. Click Next: Permissions.
  4. On the Attach permissions page:
    1. Search for the policy you created in Step 6.1.
    2. Once located, check the box next to it in the table.
    3. Click Next: Tags.
  5. If you want to enter any tags, do so on the Add tags page. Otherwise, click Next: Review.
  6. On the Review page:
    1. In the Role name field, paste the Role Name from the Stitch Grant Access to Your Bucket page: Role name field mapped from Stitch to AWS

      Remember: Role names are unique to the Stitch Heap integration they’re created for. Attempting to use the same role for multiple integrations will cause connection errors.

    2. Enter a description in the Role description field. For example: Stitch role for Heap integration.
    3. Click Create role.

Step 5.3: Check and save the connection in Stitch

After you’ve created the IAM policy and role, you can save the integration in Stitch. When finished, click Check and Save.

Step 6: Set tables and columns to replicate

To complete the setup, you’ll need to select the tables and columns you want to replicate to your data warehouse.

Check out the Schema section to learn more about the available tables in Heap and how they replicate.

  1. In the list of tables that displays - or in the Tables to Replicate tab, if you skipped this step during setup - locate a table you want to replicate.
  2. To track a table, click the checkbox next to the table’s name. A green checkmark means the table is set to replicate.

  3. To track a column, click the checkbox next to the column’s name. A green checkmark means the column is set to replicate.

  4. Repeat this process for all the tables and columns you want to replicate.
  5. When finished, click the Finalize Your Selections button at the bottom of the screen to save your selections.

Note: If you change these settings while a replication job is still in progress, they will not be used until the next job starts.

Initial and historical replication jobs

After you finish setting up Heap, its Sync Status may show as Pending on either the Stitch Dashboard or in the Integration Details page.

For a new integration, a Pending status indicates that Stitch is in the process of scheduling the initial replication job for the integration. This may take some time to complete.

Free historical data loads

The first seven days of replication, beginning when data is first replicated, are free. Rows replicated from the new integration during this time won’t count towards your quota. Stitch offers this as a way of testing new integrations, measuring usage, and ensuring historical data volumes don’t quickly consume your quota.


Heap Replication

Replication in Stitch’s Heap integration depends on two factors:

  1. How Heap syncs data to your Amazon S3 bucket, and
  2. How Stitch identifies new data in Heap integrations

Heap data syncs to Amazon S3

Heap dumps data into Amazon S3 periodically. By default, this is on a nightly basis.

According to Heap’s documentation:

Heap will provide a periodic dump of data into S3 (nightly by default). Data will be delivered in the form of Avro-encoded files, each of which corresponds to one downstream table (though there can be multiple files per table). Dumps will be incremental, though individual table dumps can be full resyncs, depending on whether the table was recently toggled or the event definition modified.

This means that while files will only include new and updated data pertinent to that specific object (table), a full resync may be included.

Incremental Replication using file modification timestamps

To identify new and updated data for replication, Stitch will use file modification timestamps as Replication Keys and store them on a per-table basis. This means that only files dumped from a new Heap data sync will be selected for replication.

While data from Heap integrations is replicated using Key-based Incremental Replication, the behavior for this integration differs subtly from other integrations.

The table below compares Key-based Incremental Replication and Replication Key behavior for Heap to that of other integrations.

Heap Other integrations
What's used as a Replication Key?

The time a file is modified.

A column or columns in a table.

Are Replication Keys inclusive?

No. Only files with a modification timestamp value greater than the last saved bookmark are replicated.

Yes. Rows with a Replication Key value greater than or equal to the last saved bookmark are replicated.

What's replicated during a replication job?

The entire contents of a modified file.

Only new or updated rows in a table.


Heap table schemas

Custom attributes

Heap’s data model is dynamic, meaning it changes as custom attributes are added to object types in your account. For example: Adding new user attributes to the user object.

This means that the Heap schema in your destination may also change over time as you add new attributes in Heap.

When a new attribute is added to an object in Heap, it will display as a selectable field in the Stitch app. Note: To include the field in replication, you’ll need to select it in Stitch. Stitch will not automatically select new fields.

The schema documentation following this section outlines the default attributes for each object type according to Heap’s documentation.

Event tables

For each event type you define in Heap, a table for that event will be available for selection in Stitch.

For example: If there’s a Sign up - Click button event, there will be a table named sign_up_click_button.

Refer to the [event_type] schema documentation for a list of default event attributes.

Note: When new event types are added in Heap, you will need to select the table and fields in Stitch to include it in replication.


[event_type]

Replication Method :

Key-based Incremental

Replication Key :

See Replication

Primary Key :

event_id

Official docs :

Official Docs

For every event type defined in Heap, a table will display in the Stitch app. The name of the table will be the event name, which Heap will first strip the non-alphanumeric characters from. For example: The table name for Sign Up - Click Link will be transformed into sign_up_click_link by Heap.

Note: Custom attributes are supported for this table. As Heap schemas are dynamic, Stitch’s [event_type] documentation will only list the non-custom attributes outlined in Heap’s documentation.

event_id
STRING

The event ID.

Reference:

user_id
INTEGER

The ID of the associated user.

Reference:

session_id
INTEGER

The ID of the associated session.

Reference:

Custom Attributes

Any custom attributes applied to this event type model in Heap.

time
STRING

The UTC timestamp when the event happened.

session_time
STRING

The timestamp when the session started. Note: According to Heap, this field is primarily used for Heap’s internal use and shouldn’t be relied on for analysis.

type
STRING

For web auto-tracked events, can be any of view page, click, submit, change, with push state events registered as view page events.

For iOS auto-tracked events, can be touch, edit field, or a gesture recognizer you’ve defined.

For custom events, this will be the custom event name.

library
STRING

The version of the heap library which initiated the session. Possible values are:

  • web
  • iOS

platform
STRING

The user’s operating system.

device_type
STRING

The user’s device type. Possible values are:

  • Mobile
  • Tablet
  • Desktop

country
STRING

The country in which the user session occurred, based on IP.

region
STRING

The region in which the user session occurred, based on IP.

city
STRING

The city in which the user session occurred, based on IP.

IP
STRING

The IP address for the session.

referrer
STRING

Applicable only to library: Web. The URL that linked to the site and initiated the session. If the user navigated directly to the site or referral headers were stripped, this value will be direct.

landing_page
STRING

Applicable only to library: Web. The URL of the first pageview of the session.

browser
STRING

Applicable only to library: Web. The user’s browser.

search_keyword
STRING

Applicable only to library: Web. The search term that brought the user to the site.

utm_source
STRING

Applicable only to library: Web. The GA-based utm_source tag associated with the session’s initial pageview.

utm_campaign
STRING

Applicable only to library: Web. The GA-based utm_campaign tag associated with the session’s initial pageview.

utm_medium
STRING

Applicable only to library: Web. The GA-based utm_medium tag associated with the session’s initial pageview.

utm_term
STRING

Applicable only to library: Web. The GA-based utm_term tag associated with the session’s initial pageview.

utm_content
STRING

Applicable only to library: Web.The GA-based utm_content tag associated with the session’s initial pageview.

path
STRING

Applicable only to library: Web. The path of the pageview.

query
STRING

Applicable only to library: Web. The query parameters associated with the pageview.

hash
STRING

Applicable only to library: Web. The hash parameters associated with the pageview.

title
STRING

Applicable only to library: Web. The title of the current page.

href
STRING

The href property of link.

device
STRING

Applicable only to library: iOS. The user’s device model.

carrier
STRING

Applicable only to library: iOS. The user’s mobile phone carrier.

app_name
STRING

Applicable only to library: iOS. The current name of the iOS app.

app_version
STRING

Applicable only to library: iOS. The current version of the iOS app.

action_method
STRING

The name of the action method triggered by this event. For example: loginButtonWasPressed

view_controller
STRING

Applicable only to library: iOS. The name of the current view controller.

screen_ally_id
STRING

Applicable only to library: iOS. The accessibility identifier for the current view controller.

screen_ally_label
STRING

Applicable only to library: iOS. The accessibility label for the current view controller.

target_view_class
STRING

The underlying class name of an iOS action’s target.

target_view_name
STRING

The instance variable name of an iOS action’s target.

target_ally_id
STRING

Applicable only to library: iOS. The name of an iOS action’s target.

target_ally_label
STRING

Applicable only to library: iOS. The label of an iOS action’s target.

target_text
STRING

The button text of the event target.


pageviews

Replication Method :

Key-based Incremental

Replication Key :

See Replication

Primary Key :

event_id

Official docs :

Official Docs

The pageviews table contains info about pageviews.

Note: Custom attributes are supported for this table. As Heap schemas are dynamic, Stitch’s pageviews documentation will only list the non-custom attributes outlined in Heap’s documentation.

event_id
STRING

The event ID.

Custom Attributes

Any custom attributes applied to the pageview model in Heap.

user_id
INTEGER

The ID of the associated user.

Reference:

session_id
INTEGER

The ID of the associated session.

Reference:

session_time
STRING

The timestamp when the session started. Note: According to Heap, this field is primarily used for Heap’s internal use and shouldn’t be relied on for analysis.

time
STRING

The UTC timestamp when the pageview occurred.

library
STRING

The version of the heap library which initiated the session. Possible values are:

  • web
  • iOS

platform
STRING

The user’s operating system.

device_type
STRING

The user’s device type. Possible values are:

  • Mobile
  • Tablet
  • Desktop

country
STRING

The country in which the user session occurred, based on IP.

region
STRING

The region in which the user session occurred, based on IP.

city
STRING

The city in which the user session occurred, based on IP.

IP
STRING

The IP address for the session.

referrer
STRING

Applicable only to library: Web. The URL that linked to the site and initiated the session. If the user navigated directly to the site or referral headers were stripped, this value will be direct.

landing_page
STRING

Applicable only to library: Web. The URL of the first pageview of the session.

browser
STRING

Applicable only to library: Web. The user’s browser.

search_keyword
STRING

Applicable only to library: Web. The search term that brought the user to the site.

utm_source
STRING

Applicable only to library: Web. The GA-based utm_source tag associated with the session’s initial pageview.

utm_campaign
STRING

Applicable only to library: Web. The GA-based utm_campaign tag associated with the session’s initial pageview.

utm_medium
STRING

Applicable only to library: Web. The GA-based utm_medium tag associated with the session’s initial pageview.

utm_term
STRING

Applicable only to library: Web. The GA-based utm_term tag associated with the session’s initial pageview.

utm_content
STRING

Applicable only to library: Web.The GA-based utm_content tag associated with the session’s initial pageview.

path
STRING

Applicable only to library: Web. The path of the pageview.

query
STRING

Applicable only to library: Web. The query parameters associated with the pageview.

hash
STRING

Applicable only to library: Web. The hash parameters associated with the pageview.

title
STRING

Applicable only to library: Web. The title of the current page.

device
STRING

Applicable only to library: iOS. The user’s device model.

carrier
STRING

Applicable only to library: iOS. The user’s mobile phone carrier.

app_name
STRING

Applicable only to library: iOS. The current name of the iOS app.

app_version
STRING

Applicable only to library: iOS. The current version of the iOS app.

view_controller
STRING

Applicable only to library: iOS. The name of the current view controller.

screen_ally_id
STRING

Applicable only to library: iOS. The accessibility identifier for the current view controller.

screen_ally_label
STRING

Applicable only to library: iOS. The accessibility label for the current view controller.


sessions

Replication Method :

Key-based Incremental

Replication Key :

See Replication

Primary Key :

event_id

Official docs :

Official Docs

The sessions table contains info about sessions. In Heap, a web session ends after 30 minutes of user inactivity, while in iOS, a session ends after the app has entered the background.

Note: Custom attributes are supported for this table. As Heap schemas are dynamic, Stitch’s sessions documentation will only list the non-custom attributes outlined in Heap’s documentation.

event_id
STRING

The event ID.

user_id
INTEGER

The ID of the associated user.

Reference:

session_id
INTEGER

The ID of the associated session.

Reference:

Custom Attributes

Any custom attributes applied to the session model in Heap.

time
STRING

The UTC timestamp when the session started.

library
STRING

The version of the heap library which initiated the session. Possible values are:

  • web
  • iOS

platform
STRING

The user’s operating system.

device_type
STRING

The user’s device type. Possible values are:

  • Mobile
  • Tablet
  • Desktop

country
STRING

The country in which the user session occurred, based on IP.

region
STRING

The region in which the user session occurred, based on IP.

city
STRING

The city in which the user session occurred, based on IP.

IP
STRING

The IP address for the session.

referrer
STRING

Applicable only to library: Web. The URL that linked to the site and initiated the session. If the user navigated directly to the site or referral headers were stripped, this value will be direct.

landing_page
STRING

Applicable only to library: Web. The URL of the first pageview of the session.

browser
STRING

Applicable only to library: Web. The user’s browser.

search_keyword
STRING

Applicable only to library: Web. The search term that brought the user to the site.

utm_source
STRING

Applicable only to library: Web. The GA-based utm_source tag associated with the session’s initial pageview.

utm_campaign
STRING

Applicable only to library: Web. The GA-based utm_campaign tag associated with the session’s initial pageview.

utm_medium
STRING

Applicable only to library: Web. The GA-based utm_medium tag associated with the session’s initial pageview.

utm_term
STRING

Applicable only to library: Web. The GA-based utm_term tag associated with the session’s initial pageview.

utm_content
STRING

Applicable only to library: Web.The GA-based utm_content tag associated with the session’s initial pageview.

device
STRING

Applicable only to library: iOS. The user’s device model.

carrier
STRING

Applicable only to library: iOS. The user’s mobile phone carrier.

app_name
STRING

Applicable only to library: iOS. The current name of the iOS app.

app_version
STRING

Applicable only to library: iOS. The current version of the iOS app.


user_migrations

Replication Method :

Key-based Incremental

Replication Key :

See Replication

Primary Key :

from_user_id

Official docs :

Official Docs

The user_migrations table contains info about user migrations.

Note: Custom attributes are supported for this table. As Heap schemas are dynamic, Stitch’s user_migrations documentation will only list the non-custom attributes outlined in Heap’s documentation.

from_user_id
STRING

The migrating user’s ID.

to_user_id
INTEGER

The destination user’s ID.

Reference:

time
STRING

The timestamp when the migration occurred.


users

Replication Method :

Key-based Incremental

Replication Key :

See Replication

Primary Key :

user_id

Official docs :

Official Docs

The users table contains info about users.

Note: Custom attributes are supported for this table. As Heap schemas are dynamic, Stitch’s users documentation will only list the non-custom attributes outlined in Heap’s documentation.

user_id
STRING

The user ID.

identity
STRING

The user’s username or other unique token.

handle
STRING

The user’s username or other unique token.

email
STRING

The user’s email address.

joindate
STRING

The UTC timestamp when the user was first seen.

last_modified
STRING

The UTC timestamp when the user’s data was last modified.

Custom Attributes

Any custom attributes applied to the user model in Heap.



Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.