Azure Synapse Link for Azure Cosmos DB — tech overview

ETL pipelines require many layers of data movement resulting in much operational complexity, and performance impact on the transactional workloads in Cosmos DB.
For generating reports in Power BI, we have to first perform Flattening and transforming JSON documents.
We need to maintain and manage separate setup for ETL, reporting and machine learning activities on the operational data in Cosmos DB. Also, all these activities will have direct impact on RU usage.
This is where Synapse Link and Synapse Analytics fit in and make our work less cumbersome, without impacting the RU usage.
Value addition with Azure Synapse Link

Azure Synapse Link for Azure Cosmos DB is a cloud-native hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data in Azure Cosmos DB. Azure Synapse Link creates an integration between Azure Cosmos DB and Azure Synapse Analytics.
Azure Synapse Link is supported for the Azure Cosmos DB SQL API and MongoDB API for now.
Azure Cosmos DB analytical store

Cosmos DB analytical store is a fully isolated column store for enabling large-scale analytics against operational data, without any impact to transactional workloads.
It is better suited for analytical queries because it serializes similar fields of data together and reduces the disk I/O operations.
Analytical store does not need separate request units (RUs) to be allocated.
Benefits with analytical store:
- We can run analytical workloads without impacting transaction workloads, as the analytical store is separate from the transactional store.
- We can achieve cost-effective archival of historical data using Analytical Store Time To Live (Analytical TTL).
- For globally distributed Cosmos DB account, analytical store will also be available in all regions of that account. With this, we can run analytical queries against the nearest regional copy of the data.
- Authentication with the analytical store is the same as the transactional store for a given database. So, we can leverage linked service in Synapse Studio to prevent pasting the Azure Cosmos DB keys in the Spark notebooks.
With Transactional TTL, we can have the data automatically deleted from the transactional store after a certain time period.
We can achieve longer retention of operational data in the analytical store by setting analytical TTL > transactional TTL at the container level. With this, we can archive the historical data and use it for analytics.
Setting up Azure Synapse Link
This is a multi-step process.
- Enable analytical store for the Cosmos DB container.
- Create a new or use an existing Synapse Analytics workspace. New workspace has to be associated with a new or existing ADLS Gen2 storage account at the time of creation.
- Create Cosmos DB linked service in Synapse Studio. This is the connection between Synapse Analytics and Cosmos DB analytical store.
Once the linked service is created, we can do the following things on the data from Cosmos DB analytical store.
- Query the data from analytical store with SQL pool or Apache Spark.
- Build a database and views in SQL pool over Synapse Link for Azure Cosmos DB for building a data model with Power BI.
Step 1: enable analytical store

Step 2: create Synapse Analytics workspace

Step 3: create Cosmos DB linked service

Query the data from analytical store


Synapse Analytics provides serverless and dedicated options for SQL pool and Apache Spark pool.
May be we can do something like using serverless pools for testing and dedicated pools in Prod.
Cost estimate
With enabling Synapse Link for Cosmos DB, 3 additional resources should be taken into consideration for cost estimate
(in addition to Cosmos DB transactional workload) .
- Cosmos DB analytical store
- Synapse Analytics workspace
- Azure Data Lake Storage Gen2
Cost estimate: Cosmos DB analytical store

Cost estimate: Synapse Analytics workspace

Cost estimate: ADLS Gen2

So, the final cost estimate for using Azure Cosmos DB after enabling Synapse Link will be

Disclaimer: The posts here represent my personal views and not those of my employer or any specific vendor. Any technical advice or instructions are based on my own personal knowledge and experience.