The Data Lakehouse, the Data Warehouse and a Modern Data platform architecture

The Data Lakehouse, the Data Warehouse and a Modern Data platform architecture

This article is contributed. See the original author and article here.

I am encountering two overriding themes when talking to data architects today about their data and analytics strategy – which take very different sides, practically at the extreme ends of the discussion about the future design of the data platform.



  1. The Data Lakehouse. The focus here is how traditional Data Lakes have now advanced so that the capabilities previously provided by the Data Warehouse can now be replicated within the Data Lake. The Data Lakehouse approach proposes using data structures and data management features in a data lake that are similar to those previously found in a data warehouse:


Databricks - What is a data lakehouseDatabricks – What is a data lakehouse



  1. Snowflake as your data platform. Snowflake has quickly become a major player in the data warehousing market, making use of its cloud native architecture to drive market share. They have taken this a step further now though and are now pushing the concept of “Make Snowflake Your Data Lake”


Snowflake for Data LakesSnowflake for Data Lakes


 


So on one-hand, the Data Lakehouse advocates says “There is no longer a need for a relational database, do it all in the data lake”, while Snowflake is saying “Build your data lake in a relational database”. Is there really such a stark divergence of views about how to architect a modern data platform?


 


While both of these architectures have some merit, a number of questions immediately spring to mind. Both of these are driven with a focus on a single technology – which immediately should ring alarm bells for any architect. Both concepts also bring baggage from the past:



  • the Data Lakehouse pitch feels uncomfortably close to the “Hadoop can do it all” hype from 10 years ago, which led to vast sums being spent by organisations jumping onto this big data bandwagon; they believed the hype, invested huge amount of money into this wonder platform, only to find that it wasn’t as effective as it promised and that many of the problems with the “data warehouse” were actually due to their processes and governance that were simply replicated in the new technology.

  • some of the Snowflake marketing seems to be morphing into similar concepts of the Enterprise Data Warehouse vendors of 20-30 years ago – the concept of a single data repository and technology being all you need for all your enterprise data needs – which follows a very legacy logical architecture for a product that so heavily hypes its modern physical architecture.


So how do we make sense of these competing patterns? Why is there such a big disparity between two approaches, and is there really such a major decision needed between open (spark/delta) v proprietary code (snowflake/relational) bases and repositories ? I believe that if you drill into the headline propositions, the reality is that any architecture isn’t an “either/or” but a “better together” and that a pragmatic approach should be taken. As such, whenever starting any conversation today, I tend to lead with three areas of assessment:



  1. What data do you have and what are your big data, BI and advanced analytical requirements? An organisation that requires mainly machine learning and anomaly detection against semi-structured data requires a very different approach to one that has more traditional BI and next best action needs driven from structured data. Also consider what works well for your data; if it is mostly structured and sourced from relational systems, why not keep it that way rather than putting it into a semi-structured form in a Lake and then layering structures back over the top; alternatively for semi-structured or constantly changing data, why force this into a relational environment that wasn’t designed for this type of data and which then requires the data to be exported out to the compute?

  2. What skills base do you have in IT and the business? If your workforce are relational experts and have great SQL skills, it could be a big shift for them to become Spark developers; alternatively if your key resources are teams of data scientists used to working in their tools of choice, they are unlikely to embrace a relational engine and will end up exporting all the data back out into their preferred environments.

  3. Azure – and any modern cloud ecosystem – is extremely flexible, it redefines the way modern compute architectures work by completely disconnecting compute and storage and provides the ability to build processes that use the right tool for the right job on a pay for what you use basis. The benefits are huge – workloads can be run much faster, more effectively and at massively reduced costs compared to “traditional” architectures, but it requires a real paradigm shift in thinking from IT architects and developers to think about using the right technology for the job and not just following their tried and tested approaches in one technology.


The responses to these 3 areas, especially 1 and 2, should determine the direction of any data platform architecture for your business. The concepts from item 3 should be front and centre for all architects and data platform decision makers though, as getting the best from your cloud investment requires new ways of thinking. What surprises me most today is that many people seem reticent to change their thinking to take advantage of these capabilities – often through a combination of not understanding what is possible, harking back to what they know, and of certain technology providers pushing the concept of “why do you need this complexity when you can do everything in one (our) tool”. While using multiple tools and technologies may seem like adding complexity if they don’t work well together, the capabilities of a well-integrated ecosystem will usually be easier to use and manage than trying to bend a single technology to do everything.


 


Why does Microsoft propose Azure Synapse Analytics in this area? We believe that this hybrid approach is the right way forward – that enabling efficient and effective BI, Analytics, ML and AI is possible when all your data assets are connected and managed in a cohesive fashion. A true Enterprise Data platform architecture enables better decisions and transformative processes, enabling a digital feedback loop within your organization and provide the foundation for successful analytics. One constant area of feedback we received from customers though was that while building a modern data platform was the right strategy, they wanted it to be easier to implement. IT architects and developers wanted to spend less time worrying about the plumbing – integrating the components, getting them to talk to each other – and more time building the solution. We thus set out to rearchitect and create the next generation of query processing and data management with Synapse to meet the needs of the modern, high scale, volume, velocity, and variety of data workloads. As opposed to limiting customers only to one engine, Synapse provides SQL, Spark, and Log Analytics engines within a single integrated development environment, a cloud-native analytics service engine that converges big data and data warehousing to achieve limitless scale on structured, semi-structured, and un-structured data. Purpose built engines optimized for different scenarios enable customers to yield more insights faster and with fewer resources and less cost.


Azure Synapse AnalyticsAzure Synapse Analytics


 


Azure Synapse Analytics is a limitless analytics service with a unified experience to ingest, explore, prepare, manage and serve data for immediate BI and machine-learning needs. So Azure Synapse Analytics isn’t a single technology, but an integrated combination of the different tools and capabilities you need to build your modern data platform, allowing you to choose the right tool for each job/step/process while removing the complexity of integrating these tools.


 


While Synapse can provide this flexible modern data platform architecture in a single service, the concept is open. Synapse provides Spark and dedicated SQL pool engines, but alternatively Databricks and Snowflake could replace these components within this architecture. Alternatively any combination of Synapse, other first-party, third-party, or open-source components can be used to create the modern data platform, the vast majority of which are supported within Azure.


 


This open combination of individual technologies should be combined within a Modern Data platform architecture to give you the ability to build the right modern data platform for your business. Take advantage of the flexibility of Azure and use the best tools and techniques to construct the most effective data platform for your business.

Assign a built-in role to a user at resource and Resource Group scope using ARM template

Assign a built-in role to a user at resource and Resource Group scope using ARM template

This article is contributed. See the original author and article here.

This article is focused on creating an ARM template which will create a storage account resource in the resource group and will assign role at both RG (Resource Group) scope and created storage account resource level


This article is divided into following 5 sections.



  1. Fetch User Object ID

  2. Fetch Built-in Role ID

  3. Create ARM template to provision storage account

  4. Role assignment in ARM template

  5. Deploying ARM template to Azure Portal


 


Let’s start step by step as mentioned above, we will fetch the user object ID which will be used in deploying ARM template



  1. So firstly, lets fetch the user’s object id


Use the PowerShell script to fetch user’s object id by its email id.


PS Script: Get-AzADUser | Where-Object { $_.UserPrincipalName -eq “testuser@testdomain.xyz.com” }


This will show the user details like, DisplayName, Id, Mail, UserPrincipalName, Grab the Id and save it for further use


You can also fetch the user object Id from Azure Portal, Navigate to Azure Active Director > Users > Select the user you want to fetch the Id of > Copy the Object Id


 



  1. Similarly, we will fetch the built-in role Id using PowerShell script, for this article I will fetch the “Reader” role id but you can fetch your required role id,


PS Script: Get-AzRoleDefinition -Name Reader


This script will output few of the Role details, grab the Id from the output and save it for further use


 



  1. Now it’s time to create the ARM Template which will create the Storage account and assign user with Reader role to the created storage account also, we will assign user with Reader role to the Resource group using scope.


 


Follow the template mentioned below for creating storage account and role assignment.


Refer Microsoft documentation to know more on ARM Template syntax and details and to know more details on role assignment


 


{


    “$schema”: “https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#“,


    “contentVersion”: “1.0.0.0”,


    “parameters”: {


        “AAD_Object_ID”: {


            “metadata”: {


                “description”: “Object ID of the User, Group or Service Principal”


            },


            “type”: “string”


        },


        “Role_Definition_ID”: {


            “metadata”: {


                “description”: “Identifier (GUID) of the role definition to map to service principal”


            },


            “type”: “string”


        }


    },


    “variables”: {


        “Full Role_Definition_ID”: “[concat(‘/subscriptions/’, subscription().subscriptionId, ‘/providers/Microsoft.Authorization/roleDefinitions/’, parameters(‘Role_Definition_ID’))]”,


        “StorageAccountName”: “shrstrgacc”,


        “StorageAccountAssignmentName”: “[concat(variables(‘StorageAccountName’), ‘/Microsoft.Authorization/’, guid(concat(resourceGroup().id), variables(‘Full Role_Definition_ID’)))]”                                


    },


    “resources”: [


        {


            “type”: “Microsoft.Storage/storageAccounts”,


            “apiVersion”: “2018-07-01”,


            “name”: “[variables(‘StorageAccountName’)]”,


            “comments”: “Storage account used to store VM disks”,


            “location”: “[resourceGroup().location]”,


            “sku”: {


                “name”: “Standard_LRS”


            },


            “kind”: “Storage”,


            “properties”: {


                “roleDefinitionId”: “[variables(‘Full Role_Definition_ID’)]”,


                “principalId”: “[parameters(‘AAD_Object_ID’)]”


            }


        },


        {


            “type”: “Microsoft.Authorization/roleAssignments”,


            “apiVersion”: “2017-09-01”,


            “name”: “[guid(concat(resourceGroup().id), resourceId(‘Microsoft.Storage/storageAccounts’, ‘shrstrgacc’), variables(‘Full Role_Definition_ID’))]”,


            “dependsOn”: [


                “[resourceId(‘Microsoft.Storage/storageAccounts’, ‘shrstrgacc’)]”


            ],


            “properties”: {


                “roleDefinitionId”: “[variables(‘Full Role_Definition_ID’)]”,


                “principalId”: “[parameters(‘AAD_Object_ID’)]”,


                “scope”: “[resourceGroup().id]”


            }


        },


        {


            “type”: “Microsoft.Storage/storageAccounts/providers/roleAssignments”,


            “apiVersion”: “2017-05-01”,


            “name”: “[variables(‘StorageAccountAssignmentName’)]”,


            “dependsOn”: [


                “[resourceId(‘Microsoft.Storage/storageAccounts’, ‘shrstrgacc’)]”


            ],


            “properties”: {


                “roleDefinitionId”: “[variables(‘Full Role_Definition_ID’)]”,


                “principalId”: “[parameters(‘AAD_Object_ID’)]”


            }


        }


    ],


    “outputs”: {}


}


 


As you can see from the above ARM template, we have given 2 input parameters which are, “AAD_Object_ID” & “Role_Definition_ID”, so to give a brief about what this input parameter will hold, AAD_Object_ID will be the User object Id fetched from Step 1 and Role_Definitation_ID will be the built in Reader Role ID fetched from Step 2


 


To further drill down to the ARM Template resources, we will be using;


Type: Microsoft.Storage/storageAccounts to provision storage account with the mentioned properties in the ARM Template


Type: Microsoft.Authorization/roleAssignments to assign role at Resource group scope


Type: Microsoft.Storage/storageAccounts/providers/roleAssignments to assign role to the storage account resource


Also, save the above mentioned template code in a file with  .json extension for example armtest.json and copy the file path as we will need it while deploying it to Azure in the final step


 



  1. Now it’s the time to deploy ARM Template to Azure Portal use the following script


#Connect to Azure Account


Connect Az-Account


 


# Use PowerShell command New-AzResourceGroupDeployment, this command deploys azure resources to the Resource group


Refer, Microsoft documentation on deploying using New-AzResourceGroupDeployment


 


New-AzResourceGroupDeployment -ResourceGroupName <your-   resource-group-name>`


-TemplateFile <ARMTemplateFilePath > `


-AAD_Object_ID <user object Id> `


-Role_Definition_ID <Built in Reader role Id>


 


Note – Pass the copied path of the saved ARM Template file to the TemplateFile parameter in the script


 


Now it’s time to verify the outcome in the Azure Portal,


Wohoo, Storage is created in the Resource group mentioned in the New- AzResourceGroupDeployment


 


ShrushtiShah_2-1647592558283.png


Fig 1.1: Storage Account created using ARM Template


 


Now, Lets check if the Reader role to the testuser is assigned to the Resource Group


Navigate to Azure Portal > Resource Group > Select the Resource group you added in the ARM deployment script > Access Control > Role Assignments


Wohoo, we can see the Reader role to the test user is assigned access to the Resource Group scope


 


ShrushtiShah_3-1647592558288.png


Fig 1.2: Role Assignment to the Resource Group using ARM Template


 


It’s time to verify the role access at the storage account resource level,


Navigate to Azure Portal > Resource Group > Select the Resource group you added in the ARM deployment script > Select the created storage account > Access control > Role Assignments


Wohoo, at storage account level we can see the reader role is assigned to the test user and the same is inherited from the Resource Group.


 


ShrushtiShah_4-1647592558293.png


Fig 1.3: Role assigned to created storage account using ARM Template


 


I hope this article seems useful for all the Azure enthusiasts on how they can assign RBAC to the users/groups/SPNs/Managed Identities using ARM Template.


Keep Learning!


Keep Sharing!


 


 

CRI-O Security Update for Kubernetes

This article is contributed. See the original author and article here.

CRI-O has released a security update addressing a critical vulnerability—CVE-2022-0811—in CRI-O 1.19. A local attacker could exploit this vulnerability to take control of an affected Kubernetes environment as well as other software or platforms that use CRI-O runtime containers.

CISA encourages users and administrators to review the CRI-O Security Advisory and apply the necessary updates or workarounds.

Strengthening Cybersecurity of SATCOM Network Providers and Customers

This article is contributed. See the original author and article here.

Actions to Take Today:
• Use secure methods for authentication.
• Enforce principle of least privilege.
• Review trust relationships.
• Implement encryption.
• Ensure robust patching and system configuration audits.
• Monitor logs for suspicious activity.
• Ensure incident response, resilience, and continuity of operations plans are in place.

The Cybersecurity and Infrastructure Security Agency (CISA) and the Federal Bureau of Investigation (FBI) are aware of possible threats to U.S. and international satellite communication (SATCOM) networks. Successful intrusions into SATCOM networks could create risk in SATCOM network providers’ customer environments.

Given the current geopolitical situation, CISA’s Shields Up initiative requests that all organizations significantly lower their threshold for reporting and sharing indications of malicious cyber activity. To that end, CISA and FBI will update this joint Cybersecurity Advisory (CSA) as new information becomes available so that SATCOM providers and their customers can take additional mitigation steps pertinent to their environments.

CISA and FBI strongly encourages critical infrastructure organizations and other organizations that are either SATCOM network providers or customers to review and implement the mitigations outlined in this CSA to strengthen SATCOM network cybersecurity.

Click here for a PDF version of this report.

CISA and FBI strongly encourages critical infrastructure organizations and other organizations that are either SATCOM network providers or customers to review and implement the following mitigations:

Mitigations for SATCOM Network Providers

  • Put in place additional monitoring at ingress and egress points to SATCOM equipment to look for anomalous traffic, such as:
    • The presence of insecure remote access tools—such as Teletype Network Protocol (Telnet), File Transfer Protocol (FTP), Secure Shell Protocol (SSH), Secure Copy Protocol (SCP), and Virtual Network Computing (VNC)—facilitating communications to and from SATCOM terminals.
    • Network traffic from SATCOM networks to other unexpected network segments.
    • Unauthorized use of local or backup accounts within SATCOM networks.
    • Unexpected SATCOM terminal to SATCOM terminal traffic.
    • Network traffic from the internet to closed group SATCOM networks.
    • Brute force login attempts over SATCOM network segments.
  • See the Office of the Director of National Intelligence (ODNI) Annual Threat Assessment of the U.S. Intelligence Community, February 2022 for specific state-sponsored cyber threat activity relating to SATCOM networks.

Mitigations for SATCOM Network Providers and Customers

  • Use secure methods for authentication, including multifactor authentication where possible, for all accounts used to access, manage, and/or administer SATCOM networks. 
    • Use and enforce strong, complex passwords: Review password policies to ensure they align with the latest NIST guidelines
    • Do not use default credentials or weak passwords.
    • Audit accounts and credentials: remove terminated or unnecessary accounts; change expired credentials.
  • Enforce principle of least privilege through authorization policies. Minimize unnecessary privileges for identities. Consider privileges assigned to individual personnel accounts, as well as those assigned to non-personnel accounts (e.g., those assigned to software or systems). Account privileges should be clearly defined, narrowly scoped, and regularly audited against usage patterns.
  • Review trust relationships. Review existing trust relationships with IT service providers. Threat actors are known to exploit trust relationships between providers and their customers to gain access to customer networks and data.  
    • Remove unnecessary trust relationships. 
    • Review contractual relationships with all service providers. Ensure contracts include appropriate provisions addressing security, such as those listed below, and that these provisions are appropriately leveraged: 
      • Security controls the customer deems appropriate. 
      • Provider should have in place appropriate monitoring and logging of provider-managed customer systems.
      • Customer should have in place appropriate monitoring of the service provider’s presence, activities, and connections to the customer network.
      • Notification of confirmed or suspected security events and incidents occurring on the provider’s infrastructure and administrative networks.
  • Implement independent encryption across all communications links leased from, or provided by, your SATCOM provider. See National Security Agency (NSA) Cybersecurity Advisory: Protecting VSAT Communications for guidance.
  • Strengthen the security of operating systems, software, and firmware.
    • Ensure robust vulnerability management and patching practices are in place and, after testing, immediately patch known exploited vulnerabilities included in CISA’s living catalog of known exploited vulnerabilities. These vulnerabilities carry significant risk to federal agencies as well as public and private sectors entities. 
    • Implement rigorous configuration management programs. Ensure the programs can track and mitigate emerging threats. Regularly audit system configurations for misconfigurations and security weaknesses.
  • Monitor network logs for suspicious activity and unauthorized or unusual login attempts.
    • Integrate SATCOM traffic into existing network security monitoring tools.
    • Review logs of systems behind SATCOM terminals for suspicious activity.
    • Ingest system and network generated logs into your enterprise security information and event management (SIEM) tool. 
    • Implement endpoint detection and response (EDR) tools where possible on devices behind SATCOM terminals, and ingest into the SIEM.
    • Expand and enhance monitoring of network segments and assets that use SATCOM.
    • Expand monitoring to include ingress and egress traffic transiting SATCOM links and monitor for suspicious or anomalous network activity. 
    • Baseline SATCOM network traffic to determine what is normal and investigate deviations, such as large spikes in traffic.
  • Create, maintain, and exercise a cyber incident response plan, resilience plan, and continuity of operations plan so that critical functions and operations can be kept running if technology systems—including SATCOM networks—are disrupted or need to be taken offline.

ISC Releases Security Advisories for BIND

This article is contributed. See the original author and article here.

The Internet Systems Consortium (ISC) has released security advisories that address vulnerabilities affecting multiple versions of ISC Berkeley Internet Name Domain (BIND). A remote attacker could exploit these vulnerabilities to cause a denial-of-service condition.

CISA encourages users and administrators to review the following ISC advisories and apply the necessary updates or workarounds.

WordPress Releases Security Update

This article is contributed. See the original author and article here.

WordPress versions prior to 5.9.2 are affected by multiple vulnerabilities. Exploitation of some of these vulnerabilities could allow a remote attacker to take control of an affected website.

CISA encourages users and administrators to review the WordPress Security Release and upgrade to WordPress 5.9.2.