This article is contributed. See the original author and article here.

Scenario: The customer wants to configure the notebook to run without using the AAD configuration. Just using MSI.


Synapse uses Azure Active Directory (AAD) passthrough by default for authentication between resources.

 


As documented here: https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-secure-credentials-with-tokenlibrary?pivots=programming-language-scala


_When the linked service authentication method is set to Managed Identity or Service Principal, the linked service will use the Managed Identity or Service Principal token with the LinkedServiceBasedTokenProvider provider._


 


The purpose of this post is to help step by step how to do this configuration:


 



Requisites:


  • Synapse ( literally the workspace) MSI  must have the RBAC – Storage Blob Data Contributor permission on the Storage Account. That is also the prerequisite documented

  • However, I worked with a customer that setup ACL -> Read and execute permission on the Storage Account <I also tested and it works>

  • It should work with or without the firewall on the storage. I mean firewall enable is not mandatory.

  • However, If you by security reasons enabled the firewall on the storage be sure of the following:



grant_storage.png

 

ACL

ACLs.png

 

Step 1:

 


Open Synapse Studio and configure the Linked Server to this storage account using MSI:

linkedserver.png

 



Step 2:


Using config set point the notebook to the linked server as documented:

val linked_service_name = "LinkedServerName" 
// replace with your linked service name


// Allow SPARK to access from Blob remotely
val sc = spark.sparkContext
spark.conf.set("spark.storage.synapse.linkedServiceName", linked_service_name)
spark.conf.set("fs.azure.account.oauth.provider.type", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider") 
//replace the container and storage account names
val df = "abfss://Container@StorageAccount.dfs.core.windows.net/"

print("Remote blob path: " + df)

mssparkutils.fs.ls(df)

 


 


In my example, I am using mssparkutils to list the container.


 


 


You can read more about mssparkutils here: Introduction to Microsoft Spark utilities – Azure Synapse Analytics | Microsoft Docs


 


 


Additionally:


 


This link will cover details about ADF, which is not the focus of this post. But, in terms of MSI it covers relevant permissions:


Copy and transform data in Azure Blob storage – Azure Data Factory | Microsoft Docs


Grant the managed identity permission in Azure Blob storage. For more information on the roles, see Use the Azure portal to assign an Azure role for access to blob and queue data.



  • As source, in Access control (IAM), grant at least the Storage Blob Data Reader role.

  • As sink, in Access control (IAM), grant at least the Storage Blob Data Contributor role.


 


That is it!


Liliam UK Engineer


 



Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.