This article is contributed. See the original author and article here.
Author(s): Arun Sethia is a Program manager in Azure HDInsight Customer Success Engineering (CSE) team.
Co-Author: Sairam is a Product manager for Azure HDInsight on AKS.
Azure Logic Apps allows you to create and run automated workflows with little to no code. These workflows can be stateful or stateless. Each workflow starts with a single trigger, after which you must add one or more actions. An Action specifies a task to perform. Trigger specifies the condition for running any further steps in that workflow, for example when a blob is added or updated, when http request is received, checks for new data in an SQL database table, etc. These workflows can be stateful or stateless, based on your Azure Logic App plan (Standard and Consumption).
Using workflows, you can orchestrate complex workflow with multiple processing steps, triggers, and interdependencies. These steps can involve certain Apache Spark and Apache Flink jobs, and integration with Azure services.
The blog is focused on how you can add an action to trigger Apache Spark or Apache Flink job on HDInsight on AKS from a workflow.
Azure Logic App – Orchestrate Apache Spark Job on HDInsight on AKS
In our previous blog, we discussed about different options to submit Apache Spark jobs to HDInsight on AKS cluster. The Azure Logic Apps workflow will make use of Livy Batch Job API to submit Apache Spark job.
The following diagram shows interaction between Azure Logic Apps, Apache Spark cluster on HDInsight on AKS, Azure Active Directory and Azure Key Vault. You can always use the other cluster shapes like Apache Flink or Trino for the same, with the Azure management endpoints.
HDInsight on AKS allows you to access Apache Spark Livy REST APIs using OAuth token. It would require a Microsoft Entra service principal and Grant access to the cluster for the same service principal to the HDInsight on AKS cluster (RBAC support is coming soon). The client id (appId) and secret (password) of this principal can be stored in Azure Key Vault (you can use various design pattern’s to rotate secrets).
Based on your business scenario, you can start (trigger) your workflow; in this example we are using “Http request is received.” The workflow connects to Key Vault using System managed (or you can use User Managed identities) to retrieve secrets and client id for a service principal created to access HDInsight on AKS cluster. The workflow retrieves OAuth token using client credential (secret, client id, and scope as https://hilo.azurehdinsight.net/.default).
The invocation to the Apache Spark Livy REST APIs on HDInsight is done with Bearer token and Livy Batch (POST /batches) payload.
The final workflow is as follows, the source code and sample payload are available on this GitHub
Azure Logic App – Orchestrate Apache Flink Job on HDInsight on AKS
HDInsight on AKS provides user friendly ARM Rest APIs to submit and manage Flink jobs. Users can submit Apache Flink jobs from any Azure service using these Rest APIs. Using ARM REST API, you can orchestrate the data pipeline with Azure Data Factory Managed Airflow. Similarly, you can use Azure Logic Apps workflow to manage complex business workflow.
The following diagram shows interaction between Azure Logic Apps, Apache Flink cluster on HDInsight on AKS, Azure Active Directory and Azure Key Vault.
To invoke ARM REST APIs, we would require a Microsoft Entra service principal and configure its access to specific Apache Flink cluster on HDInsight on AKS with Contributor role. (resource id can be retrieved from the portal, go to cluster page, click on JSON view, value for “id” is resource id).
az ad sp create-for-rbac -n --role Contributor --scopes
The workflow connects to Key Vault using System managed (or you can use User Managed identities) to retrieve secrets and client id for a service principal created to access HDInsight on AKS cluster. The workflow retrieves OAuth token using client credential (secret, client id, and scope as https://management.azure.com/.default).
The final workflow is as follows, the source code and sample payload is available on GitHub
HDInsight on AKS REST APIs lets you automate, orchestrate, schedule and allows you to monitor workflows with your choice of framework. Such automation reduces complexity, reduces development cycles and completes tasks with fewer errors.
You can choose what works best for your organization, let us know your feedback or any other integration from Azure services to automate and orchestrate your workload on HDInsight on AKS.
- Overview – Azure Logic Apps | Microsoft Learn
- What are connectors – Azure Logic Apps | Microsoft Learn
- Apache Flink job orchestration using Azure Data Factory Managed Airflow
- Apache Spark on HDInsight on AKS - Submit Spark Job
- Create your first HDInsight on AKS cluster (microsoft.com)
We are super excited to get you started:
- Join our community, share an idea or share your success story – https://aka.ms/hdionakscommunity
- Have a question on how to migrate or want to discuss a use case – https://aka.ms/askhdinsight
Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.