Process automation to handle long-running (hung) pipelines in Azure Data Factory (ADF)

We have a pipeline in our ADF instance, which pulls data on a daily basis from 300+ tables present on a cloud data source. This pipeline gets hung for a table (it can be any table) at least once a week. When this happens, many tables get stuck behind that one table.

We usually cancel the long-running (hung) pipeline and restart it. It completes fine after the restart. But, this restart process is totally manual and cumbersome. I designed and implemented process automations to handle this situation.

The main thing that’s needed to restart a long-running (hung) pipeline is the list of tables that are yet to be processed by the pipeline. To get this list, I developed a process to track the progress of the pipeline for each table. This info is stored in a table and a Power BI dashboard is built on that. You can find more details about this in my previous blog post.

We use Event Management module of ServiceNow for alerts management. I created an alert in ADF to monitor long-running pipeline, and that alert is integrated with Event Management module. So, when an alert is generated in ADF for long-running pipeline, it triggers an alert in Event Management module and that alert will create an incident ticket for us.

ServiceNow’s Event Management alert can trigger a script. To automate the long-running (hung) pipeline restart process, I leveraged this capability. I coded a PowerShell script that does the following.

  1. Connect to our Azure subscription.
  2. Get the RunID of the long-running (hung) pipeline by running Kusto query on Log Analytics workspace associated with our ADF instance.
  3. Cancel the long-running (hung) pipeline.
  4. Start a new pipeline that processes the remaining tables.

Disclaimer: The posts here represent my personal views and not those of my employer or any specific vendor. Any technical advice or instructions are based on my own personal knowledge and experience.