Using MSI to authenticate on a Synapse Spark Notebook while querying the Storage

This article is contributed. See the original author and article here.

Scenario: The customer wants to configure the notebook to run without using the AAD configuration. Just using MSI.

Synapse uses Azure Active Directory (AAD) passthrough by default for authentication between resources.

As documented here: https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-secure-credentials-with-tokenlibrary?pivots=programming-language-scala

_When the linked service authentication method is set to Managed Identity or Service Principal, the linked service will use the Managed Identity or Service Principal token with the LinkedServiceBasedTokenProvider provider._

The purpose of this post is to help step by step how to do this configuration:

Requisites:

Synapse ( literally the workspace) MSI must have the RBAC – Storage Blob Data Contributor permission on the Storage Account. That is also the prerequisite documented

However, I worked with a customer that setup ACL -> Read and execute permission on the Storage Account <I also tested and it works>

It should work with or without the firewall on the storage. I mean firewall enable is not mandatory.

However, If you by security reasons enabled the firewall on the storage be sure of the following:
- Configure Azure Storage firewalls and virtual networks | Microsoft Docs
  
  When you grant access to trusted Azure services, you grant the following types of access:
  - Trusted access for select operations to resources that are registered in your subscription.
  - Trusted access to resources based on system-assigned managed identity.
- Additional information regards to this subject: Connect to a secure storage account from your Azure Synapse workspace – Azure Synapse Analytics | Microsoft Docs

ACL

Step 1:

Open Synapse Studio and configure the Linked Server to this storage account using MSI:

Step 2:

Using config set point the notebook to the linked server as documented:

val linked_service_name = "LinkedServerName" 
// replace with your linked service name


// Allow SPARK to access from Blob remotely
val sc = spark.sparkContext
spark.conf.set("spark.storage.synapse.linkedServiceName", linked_service_name)
spark.conf.set("fs.azure.account.oauth.provider.type", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider") 
//replace the container and storage account names
val df = "abfss://Container@StorageAccount.dfs.core.windows.net/"

print("Remote blob path: " + df)

mssparkutils.fs.ls(df)

In my example, I am using mssparkutils to list the container.

You can read more about mssparkutils here: Introduction to Microsoft Spark utilities – Azure Synapse Analytics | Microsoft Docs

Additionally:

This link will cover details about ADF, which is not the focus of this post. But, in terms of MSI it covers relevant permissions:

Copy and transform data in Azure Blob storage – Azure Data Factory | Microsoft Docs

Grant the managed identity permission in Azure Blob storage. For more information on the roles, see Use the Azure portal to assign an Azure role for access to blob and queue data.

As source, in Access control (IAM), grant at least the Storage Blob Data Reader role.

As sink, in Access control (IAM), grant at least the Storage Blob Data Contributor role.

That is it!

Liliam UK Engineer

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

Using MSI to authenticate on a Synapse Spark Notebook while querying the Storage

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta

We look forward to meeting you