To ensure compatibility and that the fields Stitch requires for replication are included in selected streams, Stitch enforces field selection and compatibility rules. Learn about the metadata types that control field inclusion in the Connect API.


Field types

Stitch requires two types of fields for stream replication: Primary Keys and, when applicable, Replication Keys.

Primary Key fields

To accurately replicate data for a stream, Stitch requires the Primary Key information for each stream. A Primary Key is a column or set of columns that uniquely define a record.

Depending on the source and stream type, this is handled one of several ways.

Database sources

For database sources, Stitch will typically query the database’s information schema to determine the Primary Key fields and then store the list of Primary Key field names as a list in the stream’s metadata table-key-properties property:

{
  "selected": null,
  "stream_id": 2289176,
  "tap_stream_id": "demni2mf59dt10-heroku-orders",
  "stream_name": "orders",
  "metadata": {
    "database-name": "demni2mf59dt10",
    "selected": null,
    "replication-method": null,
    "is-view": false,
    "row-count": 447,
    "schema-name": "heroku",
    "table-key-properties": [
      "id"
    ]
  }
}

Database views

For database views, the stream’s metadata will contain an is-view property with a value of true:

{
  "selected": true,
  "stream_id": 2375830,
  "tap_stream_id": "demni2mf59dt10-public-customer_view",
  "stream_name": "customer_view",
  "metadata": {
    "database-name": "demni2mf59dt10",
    "selected": true,
    "is-view": true,
    "replication-key": "updated_at",
    "replication-method": "updated_at",
    "row-count": 56,
    "schema-name": "public",
    "table-key-properties": [],
    "view-key-properties": [
      "id"
    ]
  }
}

Primary Key information must be provided in the view-key-properties metadata property when the stream is selected for replication.

SaaS sources

For SaaS sources, Primary Keys are typically hard-coded in the Singer tap backing the source. The list of Primary Key field names will be stored as a list in the stream’s metadata table-key-properties property:

{
  "selected": null,
  "stream_id": 2288758,
  "tap_stream_id": "custom_collections",
  "stream_name": "custom_collections",
  "metadata": {
    "forced-replication-method": "INCREMENTAL",
    "selected": null,
    "table-key-properties": [
      "id"
    ],
    "valid-replication-keys": [
      "updated_at"
    ]
  }
}

Replication Key fields

If a stream’s replication-method is INCREMENTAL, an appropriate field must be set as the stream’s Replication Key. Replication Keys are columns used to identify new and updated data for replication. These are typically integer, datetime, or timestamp columns and are required to use Key-based Incremental Replication.

Like Primary Keys, this is handled in one of several ways depending on the source type.

Database sources

For database sources, a valid Replication Key must be provided using the replication-key metadata property when the stream is selected.

{
  "selected": null,
  "stream_id": 2289176,
  "tap_stream_id": "demni2mf59dt10-heroku-orders",
  "stream_name": "orders",
  "metadata": {
    "database-name": "demni2mf59dt10",
    "selected": null,
    "replication-method": null,
    "is-view": false,
    "row-count": 447,
    "schema-name": "heroku",
    "table-key-properties": [
      "id"
    ]
  }
}

Note: This is also applicable to database views if the stream’s replication-method is set to INCREMENTAL.

SaaS sources

For SaaS sources, Replication Keys are hard-coded in the Singer tap backing the source. The list of Replication Key field names will be stored as a list in the stream’s metadata valid-replication-keys property:

{
  "selected": null,
  "stream_id": 2288758,
  "tap_stream_id": "custom_collections",
  "stream_name": "custom_collections",
  "metadata": {
    "forced-replication-method": "INCREMENTAL",
    "selected": null,
    "table-key-properties": [
      "id"
    ],
    "valid-replication-keys": [
      "updated_at"
    ]
  }
}

Field selection rules

Stitch requires Primary Key and Replication Key fields in streams to be selected in order to successfully and accurately replicate data.

To ensure the required fields are included in a stream’s field inclusion list, Stitch enforces field selection rules.

Metadata in field selection

Field selection rules are shaped by three metadata fields in a Field-level Metadata object:

inclusion
STRING
READ-ONLY

Indicates when a field will be included. Possible values are:

  • automatic - The field is included all the time, regardless of selected-by-default and selected values
  • available - The field is available for selection. The field will be included if selected-by-default or selected is true.
  • unsupported - The field is unsupported and will not be included, regardless of selected-by-default and selected values
selected-by-default
BOOLEAN
READ-ONLY

Indicates if a field will be selected by default. Possible values are:

  • null - The value has not been set
  • true - The field is selected by default and is included regardless of the selected value
  • false - The field is not selected by default. The field will be included if the selected value is true.
selected
BOOLEAN

Indicates whether a field should be selected. Possible values are:

  • null - The value has not been set
  • true - The field is selected
  • false - The field is not selected

Field selection metadata combinations

Below are the possible combinations of metadata field values and whether a field will be selected with the listed settings.

Note: A * in the table indicates any possible value (null, true, or false) for the metadata field.

inclusion selected selected-by-default replicated?
automatic * *
unsupported * *
available true null
available true true
available true false
available false null
available false true
available false false
available null true
available null false
available null null

Field compatibility rules

While all fields are subject to field selection rules, some fields are also subject to field compatibility rules. This means that certain combinations of fields are not able to be selected together in a single stream.

These restrictions primarily affect SaaS sources like Bing Ads or Google AdWords, and are set by the source.

Field exclusion metadata

If a field is subject to compatibility rules, its Field-level Metadata object will contain a fieldExclusion property. This property contains a list of arrays that correspond to the breadcrumb of an incompatible field.

For example: Below is the field-level metadata for the DeviceOS field in the Bing Ads ad_group_performance_report stream:

{
  "breadcrumb": [
    "properties",
    "DeviceOS"
  ],
  "metadata": {
    "fieldExclusions": [
      [
        "properties",
        "ExactMatchImpressionSharePercent"
      ],
      [
        "properties",
        "ImpressionLostToAdRelevancePercent"
      ],
      [
        "properties",
        "ImpressionLostToBidPercent"
      ],
      [
        "properties",
        "ImpressionLostToBudgetPercent"
      ],
      [
        "properties",
        "ImpressionLostToExpectedCtrPercent"
      ],
      [
        "properties",
        "ImpressionLostToRankPercent"
      ],
      [
        "properties",
        "ImpressionSharePercent"
      ]
    ],
    "inclusion": "available"
  }
}

This indicates that when the DeviceOS field is selected, the fields listed in the fieldExclusions property cannot also be selected.

Field exclusion violations

The Connect API may allow you to select fields that violate fieldExclusion rules, but doing so will likely result in extraction job failures.

To avoid this scenario, Stitch recommends considering fieldExclusions when building your own application.