This article is contributed. See the original author and article here.

The purpose of this about is to discuss Managed and External tables while querying from SQL On-demand or Serverless.


Thanks to my colleague Dibakar Dharchoudhury for the really nice discussion related to this subject.


 


By the docs: Shared metadata tables – Azure Synapse Analytics | Microsoft Docs


 



  • Managed tables


Spark provides many options for how to store data in managed tables, such as TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE, DELTA, and LIBSVM. These files are normally stored in the warehouse directory where managed table data is stored.



  • External tables


Spark also provides ways to create external tables over existing data, either by providing the LOCATION option or using the Hive format. Such external tables can be over a variety of data formats, including Parquet.


Azure Synapse currently only shares managed and external Spark tables that store their data in Parquet format with the SQL engines


 


NoteThe Spark created, managed, and external tables are also made available as external tables with the same name in the corresponding synchronized database in serverless SQL pool.”


 


Following an example of an External Table created on Spark-based in a parquet file:


 


1) Authentication:

blob_account_name = "StorageAccount"
blob_container_name = "ContainerName"
from pyspark.sql import SparkSession

sc = SparkSession.builder.getOrCreate()
token_library = sc._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary
blob_sas_token = token_library.getConnectionString("LInkedServerName")

spark.conf.set(
    'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
    blob_sas_token)

 


Note my linked Server Configuration:


LInked_server.png


 


2) External table:


 

Spark.sql('CREATE DATABASE IF NOT EXISTS SeverlessDB')

#THE BELOW EXTERNAL SPARK TABLE 
filepath ='wasbs://Container@StorageAccount.blob.core.windows.net/parquets/file.snappy.parquet'
df = spark.read.load(filepath, format='parquet')
df.write.mode('overwrite').saveAsTable('SeverlessDB.Externaltable')


 Here you can query from SQL Serverless


Query_from SQL.png


 


If you check the path where your external table was created you will be able to see under the Data lake as follows.  For example, my workspace name is synapseworkspace12:


external_table.png


 


 


3) I can also create a managed table as parquet using the same dataset that I used for the external one as follows:


 



#Managed - table
df.write.format("Parquet").saveAsTable("SeverlessDB.ManagedTable")

This one will also be persisted on the storage account under the same path but on the managed table folder.


 


Following the documentation. This is another way to achieve the same result for managed table, however in this case the table will be empty:


 

CREATE TABLE SeverlessDB.myparquettable(id int, name string, birthdate date) USING Parquet

 


storage.png


Those are the commands supported to create managed and external tables on Spark per doc. that would be possible to query on SQL Serverless.


 


 If you want to clean up this lab – Spark SQL:


 



-- Drop the database and it's tables
DROP DATABASE SeverlessDB CASCADE



 


 


That is it!


 


Liliam 


UK Engineer



Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

%d bloggers like this: