Stream live data from on-prem SQL Server to Google Cloud SQL for SQL Server using transactional replication

Streaming live data from on-prem SQL Server to GCP BigQuery can’t be done directly. A staging area is needed for the data before getting loaded into BigQuery.

This can be achieved in different ways.

  1. Use 3rd party tools to build the pipelines for streaming the live data from on-prem SQL Server to Cloud SQL — this approach may lead to increase in the budget.
  2. Use Azure Data Factory to pull data from on-prem SQL Server database into Azure Storage container on near real-time schedule. Then, stream the data from the Azure Storage container to Cloud Pub/Sub, and from there to BigQuery via Dataflow — this approach may lead to technical debt.
  3. Setup replication between on-prem SQL Server and Azure SQL using SQL Data Sync. Then, stream the data from Azure SQL instance to Cloud Pub/Sub, and from there to BigQuery via Dataflow — this approach may lead to technical debt.
  4. Use SQL Server native transactional replication to stream the data to Cloud SQL for SQL Server, and from there to BigQuery via Dataflow.

This blog post explains the SQL Server native transactional replication setup to stream the data to Cloud SQL for SQL Server instance.

The setup includes these steps.

  1. Create on-prem SQL Server database.
  2. Create Cloud SQL for SQL Server instance.
  3. Setup the native transactional replication.

Create on-prem SQL Server database

I installed SQL Server 2019 on my laptop and imported WideWorldImporters database using a bacpac file for my testing.

Create Cloud SQL for SQL Server instance

I created a Cloud SQL for SQL Server instance in a project in my personal GCP account.

Once the Cloud SQL instance is created, I added my laptop’s IP address to the Authorized networks.

Setup the native transactional replication

For transactional replication to work, there is no need to create the complete schema (database and objects) at the destination. All we need is just an empty database at the destination. Objects will be created when data snapshot is replicated to the destination for the first time.

Note: sqlserver is the default login name for the Cloud SQL instance.

Transactional replication works in publisher/distributor and subscriber model. so, we have to setup Publication for the source and Subscription for the destination.

→ Publication setup:

→ Subscription setup:

Once the Subscription setup is complete, you can start the replication job.

Once the setup is complete, you can check the replication status in Replication Monitor.

Disclaimer: The posts here represent my personal views and not those of my employer or any specific vendor. Any technical advice or instructions are based on my own personal knowledge and experience.

--

--