This article is contributed. See the original author and article here.
Event Trigger in Azure Data Factory is the building block to build an event-driven ETL/ELT architecture (EDA). Data Factory’s native integration with Azure Event Grid let you trigger processing pipeline based upon certain events. Currently, Event Triggers support events with Azure Data Lake Storage Gen2 and General Purpose version 2 storage accounts, including Blob Created and Blob Deleted.
As with any architecture, it’s sometimes critical to enforce Role Based Access Control (RBAC) to ensure that only certain members on the team can access certain sensitive information. Unauthorized access to listen to, subscribe to updates from, and trigger pipelines linked to blob accounts should be strictly prohibited.
Azure Data Factory make it really easy for you and enforce the following rules:
- To successfully create a new or update an existing Event Trigger, the Azure account signed into the Data Factory needs to have owner access to the relevant storage account. Otherwise, the operation with fail with Access Denied
- Data Factory needs no special permission to your Event Grid, and you do not need to assign special RBAC permission to Data Factory service principal for the operation.
In order to understand how Azure Data Factory delivers the two promises, let’s take a step back and take a sneak peek behind the scene. These are the high level architecture for integration among Data Factory, Storage, and Event Grid.
- Create a new Event Trigger
Two noticeable callouts from the flows are:
- Azure Data Factory makes no direct contact with Storage account. Request to create a subscription is instead relayed and processed by Event Grid. Hence, your Data Factory needs no permission to Storage account in this stage
- Access control and permission checking happens on Azure Data Factory side. Before ADF issues a request to subscribe to Storage event, it checks the permission for the user. More specifically, it checks whether the Azure account signed in and attempting to create the Event trigger have owner access to the relevant Storage account. If the permission check fails, trigger creation also fails
- Storage event trigger Data Factory pipeline run
When it comes to Event triggering pipeline in Data Factory, two noticeable call outs in the workflow:
- Event Grid uses a Push model that it relays the message as soon as possible when storage drops the message into the system. This is different from messaging system, such as Kafka where a Pull system is used.
- Event Trigger on Azure Data Factory serves as an active listener to the incoming message and it properly triggers the associated pipeline.
- Event Trigger itself makes no direct contact with Storage account
- That said, if you have a Copy or other activity inside the pipeline to process the data in Storage account, Data Factory will make direct contact with Storage, using the credentials stored in the Linked Service. Please ensure that Linked Service is set up appropriately
- However, if you make no reference to the Storage account in the pipeline, you do not need to grant permission to Data Factory to access Storage account
What’s in the bag for the future?
The team is currently in the process of expanding functionalities for Event Trigger. Soon, we will support Custom Event in Event Grid to give customers even more flexibilities in defining the Event Driven Architecture. Please keep an eye out for the exciting announcement, as we test the functionality thoroughly and gradually roll it out to General Availability.
Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.