Snowflake offers a number of ways to ingest the data starting from batch processing to stream processing. SnowPipe feature offers micro-batch processing capability and is one of the very powerful means of ingesting your data without having to deal with provisioning of compute resources. Once configured and used for the right use cases, it offers scalable, cost-optimized and serverless means of loading your files from Azure's storage into snowflake.

When do we use SnowPipes?

Following scenarios best describe the cases when SnowPipe data ingestion is suitable

  • Delay of about a minute during ingestion is acceptable
  • Workloads are expected to be of varying sizes or frequency
  • 3.Cost optimization for workloads is desired.
Architecture

This architecture shows components and data flow related to SnowPipe on azure platform.

Input/CSV File

This is the input file we are going to ingest into Snowflake using SnowPipe. Typically these are stored in a blob storage and files may have a specific pattern of file names.

Azure

Azure platform (dotted lines) defines the Azure infrastructure boundary.

Resource Group

We created this resource group to keep track of resource listing and their usage for accounting and tracking purpose.

Storage Account

All our Azure’s storage’s resources are connected to this storage account

Container/Blob storage

Blob storages are part of Containers and are created within storage accounts. This blob receives/stores individual CSV files by clients

Queue/Event Grid

"Blob created" events are subscribed so that Event grid triggers an action when a file arrives in blob storage and notifies Snowflake stage using the storage queue.

Snowflake Stage

Snowflake's stage holds gets the details of the files that are ready to be ingested into snowflake,

CREATE OR REPLACE STAGE BANK_TRANSACTIONS_STAGE
url = '<Your storage url>'
credentials = (azure_sas_token=
'<your-sas-token>')
Azure Integration

Defines an integration between Azure storage and Snowflake.

CREATE  NOTIFICATION INTEGRATION BANKING_SNOWPIPE_EVENT
ENABLED =TRUE
TYPE=QUEUE
NOTIFICATION_PROVIDER=AZURE_STORAGE_QUEUE
AZURE_STORAGE_QUEUE_PRIMARY_URI='<Your_storage_queue_uri>'
AZURE_TENANT_ID='<Your_Tenant_ID>';
Enterprise Applications & AD

A new user is created and granted permission on storage. Azure integration generates a consent URL that should be navigated and the authenticated to permit Snowflake to use Azure's storage resources.

Create Pipe

This command creates SnowPipe or an external storage pipe.

CREATE OR REPLACE pipe "BANK_TRANSACTIONS_PIPE"
auto_ingest = true
integration = 'BANKING_SNOWPIPE_EVENT'
as
copy into FINANCE.STAGE.BANK_TRANSACTIONS
(TransactionNo,DateTime,FromAccount,ToAccount,Amount,TypeOfTransaction,TranDescription,Source)
from (SELECT $1,TO_DATE($2,'MM/DD/YYYY HH:MI AM'),$3,$4,$5,$6,$7,$8  FROM @BANK_TRANSACTIONS_STAGE )
file_format = (type = 'CSV');
Fine details
  • You need to have ACCOUNTADMIN or CREATE INTEGRATION role.
  • There is about a minute of delay in getting the data processed and ingested into snowflake. You need to consider this latency while planning to go for SnowPipe
  • Works with external stages (in this case Azure blob storage) only.
  • Azure's storage account must be ADLS Gen 2 (Storage V2) or higher.
  • Workload optimized ideal data file sizes are between 10 to 50MB. If you have more or less size, you might want to consider combining or splitting the incoming data files along with other considerations such as latency.
Billing

Since compute clusters for SnowPipe are provisioned by Snowflake as needed, its billing appears as a separate line item and is listed as "SNOWPIPE". You can also query the billing details. Following query lets you list last 14 days of SnowPipe billing.

Select * from
table(information_schema.pipe_usage_history(    
date_range_start=>dateadd('day',-14,current_date()),         
date_range_end=>current_date()));
Is this elaborate setup all worth it?

Absolutely! Once we build this data pipeline, data ingestion is fully scalable and serverless to receive any number of data files to ingest into snowflake.

Most of the detailed steps mentioned here are reusable and doing the same second time around is usually effortless. It looks like a lot of small steps but we could also write powershell, python and snowsql scripts to automate the whole SnowPipe creation process that could have all parameters fed in by a config file. 

GitHub link has all files related to this demo.

Conclusion

SnowPipe is one of the great ways to ingest files and minimizes the cost when used with appropriate measures and planning.

Tags : #Snowflake #SnowflakeDB #SnowPipe