ADF pipeline to copy a file from AWS S3 to ADLS Gen2

4 min readFeb 18, 2021

The process to copy a file from AWS S3 bucket to Azure Data Lake Storage (ADLS) Gen2 storage account using Azure Data Factory (ADF) is easy to implement.

It involves these steps.

Create AWS S3 bucket
Get Access key ID and Secret access key to access AWS S3 bucket
Create ADLS Gen2 storage account
Create linked services for AWS S3 and ADLS Gen2 in ADF
Create datasets for AWS S3 and ADLS Gen2 in ADF
Create pipeline in ADF for copying the file from AWS S3 to ADLS Gen2

I used NYC taxi FHV public dataset file for the purpose of my testing.

Create AWS S3 bucket

For the purpose of my testing, I created AWS S3 bucket with public access unblocked. This is not at all recommended for regular storage accounts.

I uploaded a New York taxi trip data file to the S3 bucket.

Get Access key ID and Secret key to access AWS S3 bucket

We need access key and secret access key for the account having AWS S3 bucket, to be able to copy the file from AWS S3 bucket to ADLS Gen2 container through ADF.

We can get the keys from Security credentials in AWS Identity and Access Management (IAM). Its better to download the key file.

Create ADLS Gen2 storage account

For the purpose of my testing, I created the storage account with Public endpoint as the connectivity method. This is not at all recommended for regular storage accounts.

Create linked services for AWS S3 and ADLS Gen2 in ADF

Linked services are much like connection strings, which define the connection information needed for Data Factory to connect to external resources.

For AWS S3 linked service, you enter the access key ID and secret access key in the linked service setup window. Testing the connection helps to verify the linked service setup.

For ADLs Gen2 linked service, you can select the ADLS Gen2 storage account created in earlier step. Testing the connection helps to verify the linked service setup.

Create datasets for AWS S3 and ADLS Gen2 in ADF

Dataset represents the structure of the data within the linked data stores. ADF supports many different types of datasets, depending on the data stores you use.

Create pipeline in ADF for copying the file from AWS S3 to ADLS Gen2

A pipeline is a logical grouping of activities that together perform a task. The pipeline allows you to manage the activities as a set instead of each one individually. You deploy and schedule the pipeline instead of the activities independently.

For the purpose of my testing, I created “Copy Data” pipeline with AWS S3 bucket as my source and ADLS Gen2 storage account as my sink/target.

Once the pipeline is created, you just trigger the debug run to start the copy task.

Disclaimer: The posts here represent my personal views and not those of my employer or any specific vendor. Any technical advice or instructions are based on my own personal knowledge and experience.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Azure

Azure Data Factory

Aws S3 Bucket

Azure Data Lake Store

Written by Bharath నునేపల్లి

73 Followers

3 Following

Cloud & platform architect turned AI engineer. Passionate about data, scalable solutions, and leveraging AI/ML to solve real-world problems.

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Bharath నునేపల్లి

BigQuery and Postgres as Vector Databases

Bharath నునేపల్లి

BigQuery and Postgres as Vector Databases

With the explosion of Large Language Models (LLMs) and their applications in Retrieval-Augmented Generation (RAG), the demand for efficient…

5d ago

Migrate Azure SQL single database to AWS RDS for SQL Server using AWS DMS

Bharath నునేపల్లి

Migrate Azure SQL single database to AWS RDS for SQL Server using AWS DMS

In my earlier blog post, I explained the process to migrate a database from AWS RDS to Azure SQL using Azure DMS.

Mar 26, 2021

Authoritative Vs non-Authoritative Terraform resources for GCP IAM policies

Bharath నునేపల్లి

Authoritative Vs non-Authoritative Terraform resources for GCP IAM policies

Once the services/resources (like GCS bucket, BigQuery dataset) are deployed to GCP, permissions are needed to access those…

May 7, 2022

Creating logins for SQL Server on GCP Cloud SQL through Terraform

Bharath నునేపల్లి

Creating logins for SQL Server on GCP Cloud SQL through Terraform

Google Cloud Platform (GCP) provides a PaaS service called Cloud SQL, which provides a cloud-based alternative to local MySQL, PostgreSQL…

Sep 17, 2022

See all from Bharath నునేపల్లి

Recommended from Medium

Technical Guide: End-to-End CI/CD DevOps with Jenkins, Docker, Kubernetes, ArgoCD, Github Actions , AWS EC2 and Terraform by Joel .O Wembo

Django Unleashed

Joel Wembo

Technical Guide: End-to-End CI/CD DevOps with Jenkins, Docker, Kubernetes, ArgoCD, Github Actions …

Building an end-to-end CI/CD pipeline for Django applications using Jenkins, Docker, Kubernetes, ArgoCD, AWS EKS, AWS EC2

Apr 12, 2024

1.2K

Azure End-to-End Data Engineering Project: Medallion Architecture with Databricks [Part 2]

Rihab Feki

Azure End-to-End Data Engineering Project: Medallion Architecture with Databricks [Part 2]

Using Azure SQL DB, Azure Data Factory, Databricks, Delta Lake, Power BI

Jan 18

316

Lists

ChatGPT

21 stories991 saves

Generative AI Recommended Reading

52 stories1691 saves

Natural Language Processing

1977 stories1620 saves

Medallion Architecture: Principles and Practical Exploration

Level Up Coding

Santosh Shinde

Medallion Architecture: Principles and Practical Exploration

Data Layout Approach: A Modern Approach to Scalable Data Lakehouse Design and Understanding with Databricks notebook

Feb 15

117

How I Am Using a Lifetime 100% Free Server

Harendra

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Oct 26, 2024

9.4K

170

Enterprise Data Architecture 101: AWS+Snowflake Blueprints

Hugo Lu

Enterprise Data Architecture 101: AWS+Snowflake Blueprints

A framework for understanding Enterprise Data Architecture on AWS in Snowflake for 2024

Sep 27, 2024

101

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jessica Stillman

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Oct 30, 2024

25K

732

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams