Microsft Azure SQL Data Warehouse is a fast, fully-managed, petabyte-scale data warehouse. It’s ideal for batch-based data warehouse workloads, and designed with a decoupled storage and compute model that allows it to scale quickly and be maintained in a cost-effective way.


Pricing

Azure SQL Data Warehouse bases their pricing on your compute and storage usage. Compute usage is charged using an hourly rate, meaning you’ll only be billed for the hours your data warehouse is active. Compute usage is billed in one hour increments.

Storage charges include the size of your primary database and seven days of incremental snapshots. Microsoft Azure rounds charges to the nearest terabyte (TB). For example: If the data warehouse is 1.5 TB and you have 100 GB of snapshots, your bill will be for 2 TB of data.

Refer to Microsoft’s documentation for more info and examples.


Setup info

High-level details about Stitch’s implementation of Azure SQL Data Warehouse, such as supported connection methods, availability on Stitch plans, etc.

Release status

Released

Pricing tier

All Stitch plans

Supported versions

Not applicable

Currently, Stitch supports all versions of Azure SQL Data Warehouse.

SSH connections

Supported

Stitch supports using SSH tunnels to connect to Azure SQL Data Warehouse destinations. Refer to these instructions to set up an SSH tunnel for Azure SQL Data Warehouse.

SSL connections

Supported

VPN connections

Unsupported

Setup documentation


Replication

Details about Stitch and replication for Azure SQL Data Warehouse destinations.

Incremental Replication - Upserts

Supported

Using a table’s Primary Keys, Stitch will perform an update operation on existing rows to overwrite the data.

Primary Keys

Unsupported

Azure SQL Data Warehouse destinations do not have native support for Primary Keys.

To ensure existing rows can be updated in incrementally-replicated tables, Stitch will create an _sdc_primary_keys table in every integration schema. This table contains the Primary Key data for every table in the given schema, which Stitch will use to de-dupe rows during loading.

Note: De-duplicating doesn’t apply to tables that are replicated in full.

Refer to our Azure Primary Key handling documentation for more info.

Multipe data types

Unsupported

Azure SQL Data Warehouse destinations do not support multiple data types in a single column.

To accommodate this scenario, Stitch will create additional columns for each data type:

  • The original column’s name will remain as-is. It will be typed according to the first data type Stitch detected for the column.
  • Subsequent columns will have the data type appended to their name. Additionally, only data of this type will be stored in this column. For example: customer_id__bi would contain data determined to be a big integer. Note: If multiple data types are loaded during creation of the column, all columns will be created with the data type appended to their name.
Nested data structure support

Unsupported

Azure SQL Data Warehouse destinations do not have native support for nested data structures.

To ensure nested data can be loaded, Stitch will flatten objects and arrays into columns and subtables, respectively. For more info and examples, refer to the Handling nested data structures guide.

Incompatible sources

No reported incompatibilities

No integration compatibility issues have been reported for Azure SQL Data Warehouse.

Loading process overview

After the data is extracted from your integrations, Stitch will perform the following steps to prepare and load that data into your Azure SQL Data Warehouse destination.

Step 1: Load data into Azure Blob Storage

The first step in the loading process for Azure SQL Data Warehouse destinations is to load the extracted data into Azure Blob Storage.

Blob storage is intended for storing massive amounts of unstructured data. In the next step, Stitch will use Polybase to retrieve the data from Blob Storage and prepare it for loading into Azure SQL Data Warehouse.

Step 2: Prep data using Polybase

Polybase is a Microsoft offering that integrates Microsoft SQL products with Hadoop. Polybase is needed to query data from Azure Blob Storage.

In this step, Stitch will perform the following:

  1. Create an external data source. This creates an external data source for the Polybase queries Stitch will run.
  2. Create an external file format. This creates an object that defines the external (extracted) data Stitch will load. This is used in the next step to create an external table.
  3. Create an external table. Using the external file format, this will create an external table. The external table is used to stage the data from Azure blob storage and load it into your Azure SQL Data Warehouse data warehouse.

Step 3: Insert data into the data warehouse

Lastly, Stitch will insert the data from the external table in Polybase into your Azure SQL Data Warehouse data warehouse.


Limitations

In this section:

  • Object name limits - Details about object naming rules imposed by Azure SQL Data Warehouse destinations.

  • Table limits - Details about table rules imposed by Azure SQL Data Warehouse destinations.

  • Data limits - Details about data limitations imposed by Azure SQL Data Warehouse destinations.

  • Column naming - Details about transformations performed by Stitch to ensure source column names adhere to the object naming rules imposed by Azure SQL Data Warehouse destinations.

Object name limits

Details about object naming rules imposed by Azure SQL Data Warehouse destinations.

Case sensitivity

Insensitive

This means that Azure SQL Data Warehouse destinations treat upper- and lowercase characters as the same. For example: CUSTOMERID and customerid are equal in Azure SQL Data Warehouse.

When column names are loaded into Azure SQL Data Warehouse, any uppercase characters will be forced into lowercase. This means that Id and id will canonicalize to the same name, which can lead to object collision errors. Refer to the Column naming section of this guide for examples.

Table name length

112 characters

Tables with names that exceed this limit will be rejected. Rejected tables are logged in the _sdc_rejected table.

Column name length

128 characters

Columns with names that exceed this limit will be rejected. Rejected columns are logged in the _sdc_rejected table.

Table limits

Details about table rules imposed by Azure SQL Data Warehouse destinations.

Max # of columns per table

1,024 columns

Max table size

60 TB compressed on disk

Max # of tables per database

10,000 tables

Data limits

Details about data limitations imposed by Azure SQL Data Warehouse destinations.

Max record size

1 MB

This limit is imposed by PolyBase, which is required to load data into Azure SQL Data Warehouse destinations.

Boolean values

1, 0, NULL

Azure SQL Data Warehouse will store data typed as BOOLEAN using only the above values. These columns will be BIT columns in the destination.

For example: If a BOOLEAN source value is true, it will be stored as 1 in Azure SQL Data Warehouse.

Timezone support

Unsupported

Azure SQL Data Warehouse will store the value in UTC with the specified offset. These columns are type as DATETIMEOFFSET.

More info about timestamp data types can be found in Microsoft’s documentation.

Date ranges

January 1, 0001 AD, through December 31, 9999 AD

Date values outside of this range will be rejected and logged in the _sdc_rejected table.

More info about Azure SQL Data Warehouse date data types can be found in Microsoft’s documentation.

Timestamp ranges

00:00:00 through 23:59:59.997

Timestamp values outside of this range will be rejected and logged in the _sdc_rejected table.

More info about Azure SQL Data Warehouse time data types can be found in Microsoft’s documentation.

Decimal precision

Must be between 1 and 38 (38 is the default)

Decimal values outside of this range will be rejected and logged in the _sdc_rejected table.

Decimal scale

Must be between 0 and the precision value

Decimal values outside of this range will be rejected and logged in the _sdc_rejected table.

VARCHAR max width

4,000 bytes

String values outside of this range will be truncated to the maximum allowed width.

Column naming

Details about transformations performed by Stitch to ensure source column names adhere to the object naming rules imposed by Azure SQL Data Warehouse destinations.

Source column name Destination column name
Mixed case CuStOmErID customerid
Contains spaces customer id customerid
Contains unsupported characters !customer# id customerid
Name begins with an underscore _customerid customerid
Name begins with a non-letter 123customerid customerid

Compare destinations

Not sure if Azure SQL Data Warehouse is the data warehouse for you? Check out the Choosing a Stitch Destination guide to compare each of Stitch’s destination offerings.


Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.