March Ahead with Azure Purview: Unify ALL your data using Apache Atlas open API support

March Ahead with Azure Purview: Unify ALL your data using Apache Atlas open API support

This article is contributed. See the original author and article here.

Last week at Ignite, we made a number of announcements – since the launch of Azure Purview, we have discovered over 14.5 billion data assets in 2000+ Purview accounts. Thank you!


 


We are debuting a blog series today – “March Ahead with Azure Purview”. This blog series is focused on helping you get the most out of your current Purview implementation. Over the month of March, we will have blogs on best practices, tips and tricks and troubleshooting guidance on topics including Scans, Access, Roles, and Proof-of-Concept planning.


 


Tell us what other topics you want us to blog about in the comments! The first blog below is intended to help you understand the relationship between Azure Purview and the Apache Atlas Open API ecosystem. Are you planning to use Azure Purview to manage data in Azure Databricks? Read on! 


 


Apache Atlas is a scalable and extensible set of core foundational governance services – enabling enterprises to meet their compliance requirements effectively and efficiently within Hadoop and allows integration with the whole enterprise data ecosystem. The high-level features that Atlas provides are metadata types & instances, classification, lineage, and discovery. Purview provides these capabilities and in most cases, more advanced than what native Atlas provides, while maintaining inter-compatibility with the Atlas API ecosystem. We have added a few APIs like the advanced search capability that enhances functionality over what is available in native Atlas. Let’s dive into this:


 


The Apache Atlas construct contains 3 fundamental concepts – a type, an entity, and an attribute. A Type in Atlas is a definition of how particular types of metadata objects are stored and accessed. A type represents a collection of attributes that define the properties for that metadata object. An entity in Atlas is a specific value or instance of an Entity Type, and thus represents a specific metadata object in the real world. An attribute represents the properties on an entity. Learn more about the Atlas Type system here.


 


Setup, Authentication, and using Purview Atlas Endpoints


For an in-depth look at how to set up your development environment for working with Azure Purview’s Atlas REST APIs, review the REST API Tutorial. In short, you will need the following:



  • A Service Principal with Data Curator role on your Purview service. Learn about roles here.

  • Collect the Purview service’s name.

  • Be able to collect an access token from an OAuth2.0 request.


The samples below assume you have completed this setup and have the following environment variables setup.



Get a system Entity’s Metadata with Purview Atlas APIs


A common starting point for using the REST APIs is to get an entity that has already been scanned.  By getting an entity through the REST API, you have quick access to the schema, classifications, attributes, and other relationships to the entity.


Start by navigating to the entity you want to get with the API and obtain the GUID from the URL.


VishalAnil_0-1614878034798.png


 


You then call the /entity/bulk?guid= endpoint and provide the guid you collected. You could also pass a comma delimited set of guids to retrieve multiple objects.


 


curl -H “Accept: application/json”


-H “Authorization: Bearer $AUTH_TOKEN”


$ENDPOINT/entity/bulk?guid=e5d12ea7-53a8-4b48-b8a4-61f6f6f60000 | jq .


 


The response provided contains several key sections including:



  • Referred entities: Provides detail about every entity that is referenced. That includes columns in your schema or process entities used in lineage.

  • Entities: This provides an array of the entities you asked for in the guid parameter. Each object in this array will have the core properties, attributes, and relationship attributes.



 


Understanding Type Definitions for system entities


Once you have started exploring the entities you have scanned, you might want to instantiate your own entity based on that type. For example, you scanned an azure sql table but want to be able to programmatically generate your own server, database, schema, tables, and columns. In order to instantiate your own entity for a given type, you must first understand what the required attributes are, and what the other required entities for creation of this entity are.


 


Part of the response from our GET /entity/bulk?guid call returned a typeName attribute. That type name can be used to retrieve its definition which includes all the attributes we can capture and all the relationship attributes (i.e. the way a database entity relates to a server entity and a column entity relates to a table entity) that are available to the type.


 


curl -H “Accept: application/json”


-H “Authorization: Bearer $AUTH_TOKEN”


$ENDPOINT/types/entitydef/name/azure_sql_table | jq .


 


Understanding the Type definition response


The abbreviated response from the azure_sql_table type definition below shows several key features.



  • Options.schemaElementsAttribute – The relationship attribute that will be referenced in the schema tab in the Purview UI.

  • An array of Attribute Definitions – This defines what attributes we want to collect, the type, whether it is one or many values, the min and max number of values, and whether it’s optional or required.

  • superTypes – The type which you are inheriting from, most often it will be DataSet or Process type.

  • An array of Relationship Attribute Definitions – These relationships describe how one entity connects to another. A few interesting relationshp attributes for an azure_sql_table include:

    • “columns” allows an instance of azure_sql_table to contain reference to an array of azure_sql_columns.

    • “dbSchema” points to a single azure_sql_schema. This is a required relationship attribute, you can’t create an azure_sql_table without a database schema.

    • “meanings” is available on all entities and it provides the support for adding glossary terms to a given entity.




Here is an example of the response payload:


{


  “category”: “ENTITY”,


  “guid”: “5f94b8b9-0430-4210-ade2-7b6f7e2d2db4”,


  “name”: “azure_sql_table”,


  “description”: “azure_sql_table”,


  “serviceType”: “Azure SQL Database”,


  “options”: {


    “schemaElementsAttribute”: “columns”


  },


  “attributeDefs”: [


    {


      “name”: “objectType”,


      “typeName”: “string”,


      “isOptional”: true,


      “cardinality”: “SINGLE”,


      “valuesMinCount”: 0,


      “valuesMaxCount”: 1,


       …


    },


    …


],


  “superTypes”: [


    “DataSet”


  ],


  “subTypes”: [],


  “relationshipAttributeDefs”: [


    {


      “name”: “dbSchema”,


      “typeName”: “azure_sql_schema”,


      “isOptional”: false,


      “cardinality”: “SINGLE”,


      “relationshipTypeName”: “azure_sql_schema_tables”,


      …


    },


    {


      “name”: “columns”,


      “typeName”: “array<azure_sql_column>”,


      “isOptional”: true,


      “cardinality”: “SET”,


      “relationshipTypeName”: “azure_sql_table_columns”,


      …


    },


    {


      “name”: “meanings”,


      “typeName”: “array<AtlasGlossaryTerm>”,


      “relationshipTypeName”: “AtlasGlossarySemanticAssignment”,


      …


    },


    …


  ]


}


 


 


Creating Your first Custom Type with Purview Atlas APIs


Now that you have learnt about the existing system types in Purview, as a user you might want to create your own type definitions along with creating your own custom lineage.  As an example, we are creating our very own Process type to help us represent Lineage between Azure Databricks and existing entities.


Let us start by creating a custom Process entity type for our Databricks notebooks. The JSON below defines a Databricks notebook that has a required “notebook name”, an optional Schedule, and an array of possible parameters for the notebook.  Since we are using a super type of Process, we inherit attributes like qualified name and importantly the inputs and outputs attributes, and relationship attributes. Since we are inheriting those attributes, we do not need to specify these attributes in our Type definition.


 


Here is an example of the request payload:


{“entityDefs”:[{


    “category”: “ENTITY”,


    “name”: “custom_databricks_notebook_process”,


    “superTypes”: [


        “Process”


    ],


    “attributeDefs”: [


        {


            “cardinality”: “SINGLE”,


            “includeInNotification”: false,


            “isIndexable”: false,


            “isOptional”: false,


            “isUnique”: false,


            “name”: “JobName”,


            “typeName”: “string”,


            “valuesMaxCount”: 1,


            “valuesMinCount”: 0


        },


        {


            “cardinality”: “SINGLE”,


            “includeInNotification”: false,


            “isIndexable”: false,


            “isOptional”: true,


            “isUnique”: false,


            “name”: “Schedule”,


            “typeName”: “string”,


            “valuesMaxCount”: 1,


            “valuesMinCount”: 0


        },


        {


            “cardinality”: “SET”,


            “includeInNotification”: false,


            “isIndexable”: false,


            “isOptional”: true,


            “isUnique”: false,


            “name”: “Parameters”,


            “typeName”: “array<string>”,


            “valuesMaxCount”: 12,


            “valuesMinCount”: 0


        }


    ],


    “relationshipAttributeDefs”: []


}]


}


 


Taking that JSON above, we can call the /types/typedefs endpoint and POST this content to our Purview service and create the type.


curl -H “Accept: application/json” -H “Content-type: application/json”


-H “Authorization: Bearer $AUTH_TOKEN”


-X POST –data @path.to.json.file


$ENDPOINT/types/typedefs | jq .


 


The response will return the completed entity definition.


 


Using a Custom Type for Custom Lineage and entities


With a custom type for our Databricks Notebook lineage, we need to instantiate our custom entity, and point our input and outputs to existing entities.


The JSON payload below does the following:



  • References our custom type.

  • We provide a negative number to act as a “dummy guid” that will be translated into a system-assigned guid upon successful upload.

  • We provide the required attributes (name, qualifiedName, and our custom JobName).

  • Finally, we provide inputs and outputs. In this case, we are demonstrating two ways of referencing existing entities in your Purview data catalog.

    • You can pass in a JSON object with key “guid” and the value of the guid itself.

    • You can pass in a JSON object with keys type name and unique attributes. Unique attributes is itself a JSON object with qualifiedName as the key.




{“entities”:[{


    “typeName”: “custom_databricks_notebook_process”,


    “guid”: -2,


    “attributes”: {


        “name”: “MyNotebook”,


        “JobName”: “MyDatabricksJob”,


        “qualifiedName”: “custom_dbr://workspace/path/to/notebook”,


        “inputs”: [


            {


                “guid”: “abc-123-456”


            }


        ],


        “outputs”: [


            {


                “typeName”: “azure_sql_table”,


                “uniqueAttributes”: {


                    “qualifiedName”: “mssql://server/database/schema/table”


                }


            }


        ]


    },


    “relationshipAttributes”: {}


}


]}


With that payload body saved, we can POST the JSON to the /entity/bulk endpoint as shown below.


curl -H “Accept: application/json” -H “Content-type: application/json”


-H “Authorization: Bearer $AUTH_TOKEN”


-X POST –data @path.to.json.file


$ENDPOINT/entity/bulk | jq .


 


The response will tell us if this was a create or an update. In addition, we will get to see the official guid that the entity is assigned to and we can map our “dummy guid” to the official guid using the guidAssignments section of the response.


Here is an example of the request payload:


{


  “mutatedEntities”: {


    “CREATE”: [


      {


        “typeName”: “custom_databricks_notebook_process”,


        “attributes”: {


          “qualifiedName”: “custom_dbr://workspace/path/to/notebook”


        },


        “lastModifiedTS”: “1”,


        “guid”: “3daeee33-0e07-47e0-b877-30225367fc11”


      }


    ]


  },


  “guidAssignments”: {


    “-2”: “3daeee33-0e07-47e0-b877-30225367fc11”


  }


}


 


The results of our payload, assuming you had some entities created already, should look like the below Lineage graph when viewing the created custom process entity in the Purview UI.


VishalAnil_1-1614878034818.png


 


This works great for existing entities, but if you are uploading new entities at the same time as creating custom entities, you would need to change your input/output “headers” to reference the “dummy guid”.



  • Add your desired input / output entities as additional atlas entities to the “entities” array in the above JSON payload.

  • Your input / output headers would now have three keys:

    • guid: Containing the dummy guid that matches an entity being uploaded.

    • typeName: Containing the type of the entity you’re uploading and using as an input/output.

    • qualifiedName: Containing the qualified name of the entity you’re uploading and using as an input/output.




 


Community Driven SDKs


As Purview approaches General Availability, it will provide SDKs and Azure CLI integration. Until then, there are several community driven efforts to make working with the Purview / Atlas APIs easier. One such effort is the PyApacheAtlas project. Let us look at some of these examples above in PyApacheAtlas instead!


Authentication with PyApacheAtlas


Instead of doing the OAuth2.0 dance yourself, you can take advantage of the service principal authentication by passing in your service principal credentials and your purview service account name, and you have a client object that is ready to create types, entities, relationships, and custom lineage.


Here is the sample code to achieve this:


import os


 


from pyapacheatlas.auth import ServicePrincipalAuthentication


from pyapacheatlas.core.client import PurviewClient


 


oauth = ServicePrincipalAuthentication(


    tenant_id=os.environ.get(“TENANT_ID”, “”),


    client_id=os.environ.get(“CLIENT_ID”, “”),


    client_secret=os.environ.get(“CLIENT_SECRET”, “”)


)


client = PurviewClient(


    account_name=os.environ.get(“PURVIEW_NAME”, “”),


    authentication=oauth


)


 


Getting Types and Entities with PyApacheAtlas


The first thing you will do is get an entity and its type in order to understand how to use that type. In PyApacheAtlas, it’s as simple as calling a couple of methods as shown below


import json


 


from pyapacheatlas.core.typedef import TypeCategory


 


# Get the one entity based on its guid


results = client.get_entity(guid=”abc-123-456″)


print(json.dumps(results[“entities”][0], indent=2))


 


# Get the one type definition


typedefs = client.get_typedef(TypeCategory.ENTITY, name=”azure_sql_table”)


print(json.dumps(typedefs, indent=2))


 


Creating Types and Entities with PyApacheAtlas


We can quickly create a type and their attributes in PyApacheAtlas. Once the object is created and all the attribute definitions are added, you will call the upload_typdefs method on the client object. Note that the force_update=True parameter will allow us to update the type if it exists already.


Here is the sample code to achieve this:


from pyapacheatlas.core.typedef import EntityTypeDef, AtlasAttributeDef


 


ed = EntityTypeDef(


    name=”custom_databricks_notebook_process”,


    superTypes=[“Process”]


)


ed.addAttributeDef(


    AtlasAttributeDef(“JobName”, isOptional=False),


    AtlasAttributeDef(“Schedule”),


    AtlasAttributeDef(“Parameters”, cardinality=”SET”, typeName=”array<string>”, valuesMaxCount=12)


)


 


type_results = client.upload_typedefs(entityDefs=[ed], force_update=True)


 


# Now create the custom entity based on this type.


custom_entity = AtlasProcess(


    name=”MyNotebook”,


    typeName=”custom_databricks_notebook_process”,


    qualified_name=”custom_dbr://workspace/path/to/notebook”,


    attributs={“JobName”: “MyDatabricksJob”},


# Be sure to change your inputs and outputs before uploading


    inputs=[{“guid”: “abc-123-456”}],


    outputs=[{


        “typeName”: “azure_sql_table”,


        “uniqueAttributes”: {


            “qualifiedName”: “mssql://server/database/schema/table”


        }


    }],


)


# Upload the “batch”


entity_results = client.upload_entities(batch=[custom_entity])


 


Deleting Entities with PyApacheAtlas


Lastly, you can delete entities using the REST API. Use the sample below to clean up your assets from this demonstration, and you have a clean catalog to re-populate!


delete_results = client.delete_entity(guid=”605fb1b1-0ee5-437e-9439-99aea4835127″)


print(json.dumps(delete_results, indent=2))


 


To learn more about Azure Purview, check out our full documentation today.

Microsoft Teams Adoption and Governance with Microsoft’s Karuana Gatimu – MidDay Café 03-15-2021

Microsoft Teams Adoption and Governance with Microsoft’s Karuana Gatimu – MidDay Café 03-15-2021

This article is contributed. See the original author and article here.

HLS Mid-Day Café3.pngMicrosoft Teams is increasingly becoming THE place where employees get their work done. Whether it be through integrated applications, communications, or collaboration, the importance of Teams in this hybrid world of work continues to grow. This upcoming Monday, 3/15, We will be hosting Principal Manager, Customer Advocacy, Teams Engineering, Karuana Gatimu who will be covering adoption and governance for Microsoft Teams, resources to assist, and best practices for organizations to get the most out of their Teams investment.


Grab the calendar invite below and learn how to leverage best practices and resources around the adoption and governance of Microsoft Teams in your organization. Karuana is a recognized expert in this area and a frequent speaker for Microsoft at major events such as Ignite and more.


MidDay Café 03/15/2021 Agenda:



  • Welcome and Introductions.

  • Mid-Day Café News and Events

  • Microsoft Teams Adoption and Governance with Microsoft’s Karuana Gatimu, Principal Manager, Customer Advocacy, Teams Engineering.

  • Open Q&A

  • Wrap Up


For the Event:



Keep up to date with MidDay Café:



 Thanks for visiting – Michael Gannotti   LinkedIn | Twitter


Michael GannottiMichael Gannotti

Getting started with SharePoint Framework

Getting started with SharePoint Framework

This article is contributed. See the original author and article here.

graph.png


 


Using SharePoint Framework you can extend portals on Microsoft 365 and expose your apps where people work. Here are some resources to get you started.


 


What is SharePoint Framework and why should you care


SharePoint Framework is a development model for building apps on Microsoft 365. Originally, it started as a way to extend SharePoint portals. Nowadays, it allows you to also build apps for Microsoft Teams.


 


When you use SharePoint Framework to build your apps, you don’t need to worry about hosting and auth. You can build your app using any client-side framework you want and easily deploy your app to your users.


 


Resources for getting started with SharePoint Framework


Here is a list of resources to help you get started with building apps using the SharePoint Framework.


 


Introduction to customizing and extending SharePoint (learn module)


If you like a structured way of learning, this is the best place to start. This learn module takes you through the basics of what SharePoint Framework is, what kind of apps it allows you to build and how to do it. This module is a part of a larger learning path that allows you to get certified as a Microsoft 365 developer.


 


View the learn module


 


Hands-on tutorials for SharePoint development


If you prefer a more hands-on approach, the SharePoint development tutorials and training are another great place to start. There are both written and recorded walkthroughs presenting the different aspects of building solutions using SharePoint Framework. They’re kept up-to-date with the latest version of SharePoint Framework so it’s a great resource for you to bookmark.


 


View the hands-on tutorials for SharePoint development


 


SharePoint development docs


Once you’re past the basics, the official SharePoint Framework is a great place to deepen your knowledge. In the docs you will find explanation of the different capabilities and how they work. The docs also offer prescriptive guidance on topics such as how to implement SharePoint Framework in development teams or what enterprise organizations should take into account.


 


View SharePoint Framework docs


 


Microsoft 365 Community


Microsoft 365 has a vibrant community that supports each other in building apps on Microsoft 365. We share our experiences through regular community calls, offer guidance, record videos and build tools to speed up development. You can find everything we have to offer at aka.ms/m365pnp.


 


Start building your apps on Microsoft 365 today


Over 250 million users work with Microsoft 365 and using SharePoint Framework is an easy way to bring your application to where people are. I’d encourage you to check out the resources I mentioned and give SharePoint Framework a try. And if you have any questions, don’t hesitate to ask them on our community forums at aka.ms/m365pnp-community. Looking forward to hearing what you’ve built!

Security Control: Enable encryption at rest

Security Control: Enable encryption at rest

This article is contributed. See the original author and article here.

As part of our recent Azure Security Center (ASC) Blog Series, we are diving into the different Security Controls within ASC’s Secure Score.  In this post we will be discussing the “Enable encryption at rest” Security Control. 


 


This Security Control contains up to 3 recommendations, depending on the resources you have deployed in your environment, and it is worth maximum whopping points of 4 (6%) that counts towards your overall Secure Score. These recommendations are meant to keep your resources safe and improve your security hygiene where continuous teamwork must be placed.


 


Without further delay (and in no particular order), Enable encryption at rest contains one or more of the following 3 recommendations, depending on your environment:



  • Disk encryption should be applied on virtual machines.

  • Transparent Data Encryption on SQL databases should be enabled.

  • Automation account variables should be encrypted.


Image 1 – Enable encryption at restImage 1 – Enable encryption at rest


 


Like the rest of the Security Controls, all these recommendations must be considered in order to get the full points and drive up your Secure Score (you can review all the recommendations here). Also, some might have a Quick Fix! button as well!  It simplifies remediation and enables you to quickly increase your secure score and therefor improve your environment’s security.


 


Category #1: Disk encryption should be applied on virtual machines


When working with production data it is highly recommended to implement encryption in order to protect it from unauthorized access and fulfil compliance requirements for data-at-rest encryption in your organization. Azure Security Center disk encryption monitoring identifies non-compliant virtual machines (VMs) and recommends enabling disk encryption for these VMs in order to enhance data protection.


The way that Azure Security Center disk encryption recommendation (we have support for both native VHD and managed disk solutions) works is:



  • A machine is considered to have two pass encryption enabled if the storageProfile.OsDisk.encryptionSetttings.enabled == True

  • A machine is considered to have one pass encryption enabled if all of the InstanceView.disks elements have encryptionSetttings.enabled == True OR Resource.ADE.Version (vm extension) starts with 1 pass major version

  • A machine is considered to have no encryption if it does not have two pass encryption nor one pass encryption.


Azure Disk Encryption for Windows virtual machines (VMs) uses the BitLocker feature of Windows to provide full disk encryption of the OS disk and data disk. Additionally, it provides encryption of the temporary disk when the VolumeType parameter is All.


Make sure to check the list of unsupported scenarios here


 


Category #2: Transparent Data Encryption on SQL databases should be enabled


As more and more businesses go digital and towards the cloud, security is more important than ever. Transparent Data Encryption is SQL’s form of encryption at rest. It encrypts data files at rest for SQL Server, Azure SQL Database, Azure SQL Data Warehouse, and APS. The term “data at rest” refers to the data, log files, and backups stored in persistent storage. It performs real-time encryption and decryption of the database, associated backups, and transaction log files at rest without requiring changes to the application. TDE performs real-time I/O encryption and decryption of the data at the page level. Each page is decrypted when it’s read into memory and then encrypted before being written to disk. TDE encrypts the storage of an entire database by using a symmetric key called the Database Encryption Key (DEK). On database startup, the encrypted DEK is decrypted and then used for decryption and re-encryption of the database files in the SQL Server database engine process. DEK is protected by the TDE protector. TDE protector is either a service-managed certificate (service-managed transparent data encryption) or an asymmetric key stored in Azure Key Vault (customer-managed transparent data encryption).


Turing Off Transparent data encryption will result in decryption of the complete database and will leave your data vulnerable. When Transparent data Encryption is turned off or not configured, Azure Security Center will identify the risk and give you this recommendation. The configuration is a very simple toggle between ON and OFF as shown in Image 2.


Image 2: Transparent Data Encryption ConfigurationImage 2: Transparent Data Encryption Configuration


 


This recommendation comes with a Quick Fix option, that helps you ‘turn on’ the data encryption on the unhealthy resources in a single-click. Alternately, you may also refer to our github repository and find various ways (PowerShell, LogicApp, Azure Policy) to resolve the “Enable transparent data encryption on SQL databases” recommendation in Azure Security Center.


 


Category #3: Automation account variables should be encrypted


Azure Automation is a tool that allows you to automate various processes in Azure using PowerShell, Runbooks and Automation Modules. Account variables in Azure Automation are values available to all runbooks and DSC configurations within your Azure Automation account and they are preserved even when a runbook or DSC configuration fails. Therefore, it is important to protect this information, especially when these values contain sensitive information. When creating variables in Azure Automation, variables containing sensitive data need to be stored as a secure asset. Upon creation, secure assets, which include credentials, certificates and connections are encrypted using a key that is unique to each Automation account and stored in Azure Key Vault until ready for use. Azure Automation secure assets uses two models of encryption. By encrypting your organization’s sensitive information, another barrier of defense is created to protect your organization’s data. The process of encryption converts sensitive information into code that can only be deciphered by someone who has access to the encryption key, making it significantly harder for a third party to also access this information.


 


Conclusion


Even data-at-rest is at risk of outside attack. Encryption is one approach to preventing the visibility of your data from unauthorized access. The “Enable encryption at rest” Security Control kicks off these efforts within your organization by helping you protect the confidentiality of your data and resources. Try it out and let us know how it goes!


 


Acknowledgements:


Thanks to Future Kortor, Program Manager, to collaborate in writing Category 3 section.


 


Reviewer: 


Thanks to Yuri, Principal Program Manager, for reviewing the article and for his inputs.

[Customer story] IT Sligo – levelling the playing field in education with cloud technology

This article is contributed. See the original author and article here.

Another customer story is out. This time it is from Irland.


 


One good quote from the report: “It is really levelling the playing field from an accessibility point of view,”


 


Read more about their story here: https://pulse.microsoft.com/en-ie/making-a-difference-en-ie/na/fa2-it-sligo-levelling-the-playing-field-in-education-with-cloud-technology-2/


 


Thank you,


Luca