This article is contributed. See the original author and article here.

Background


 


At-scale data processing systems typically store a single table in storage as multiple files. Representing these files as single assets can clutter a data catalog and not accurately represent that these files make up a single dataset. Azure Purview uses resource sets to address this problem by grouping assets in storage into a single object in the catalog if they match certain built-in patterns. 


 


For more information on how and what Azure Purview groups into resource sets, see Understanding Resource Sets.


 


Overview of improvements


 


Before these changes, resource sets just used the final path of the qualified name which made it difficult to understand what the asset was representing when searching, browsing, or exploring lineage.


 


For example, the output of a Apache Spark job would just be represented as {SparkPartitions} within the catalog. In this example, a user would likely be more interested in the name of the containing folder that the Spark Partitions are written into (say, for example, a delta table). 


 


Starting this week, improvements to how the friendly names are extracted are coming to Azure Purview. Using a variety of heuristics that parse the resource set qualified name for useful information, the catalog will now more accurately display what the asset is representing. No longer will you see display names like {SparkPartitions}, flowsheet{N}_at_{Year}-{Month}-{Day}T{Hour}:{N}:{N}, or {GUID}.


 


Once this change is available in your Purview region, all new resource sets ingested via scan will automatically use the new friendly names. Over the next couple weeks, existing resource sets will have the new naming heuristics applied via a passive background process. No action needs to be done by anyone using the tool!


 


Examples


 


Below are some examples of how Azure Purview extracts the default name from resource set qualified paths.


 


Example 1


 



Display name: “name of spark output”

 

Example 2


 



Display name: “my partitioned data”

 

Example 3


 



Display name: “data”

 

Next Steps


To get started with Azure Purview, learn how to scan data into the catalog.


 


Stay tuned for more improvements are resource sets and Azure Purview!

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

%d bloggers like this: