Considerations of Data Partitioning on Spark during Data Loading on Clustered Columnstore Index

Considerations of Data Partitioning on Spark during Data Loading on Clustered Columnstore Index

This article is contributed. See the original author and article here.

Authors: Sumit Sarabhai and Ravinder Singh


Reviewers: Arvind Shyamsundar, Denzil Ribeiro, Davide Mauri, Mohammad Kabiruddin, Mukesh Kumar, and Narendra Angane


 


INTRODUCTION


Bulk load methods on SQL Server are by default serial, for example, one BULK INSERT statement would spawn only one thread to insert the data into a table. However, for concurrent loads you may insert into the same table using multiple BULK INSERT statements, provided there are multiple files to be read.


Consider a scenario where the requirements are:



  • Load data from a single file of a large size (say, more than 20 GB)

  • Splitting a file isn’t an option as it will be an extra step in the overall bulk load operation.

  • Every incoming data file is of different size, which makes it difficult to identify the number of chunks (to split the file into) and dynamically define BULK INSERT statements to execute for each chunk.

  • The size of file(s) to be loaded spans through several GBs (say more than 20 GB and above), each containing millions of records.


In such scenarios utilizing Apache Spark engine is one of the popular methods of loading bulk data to SQL tables concurrently.


In this article, we have used Azure Databricks spark engine to insert data into SQL Server in parallel stream (multiple threads loading data into a table) using a single input file. The destination could be a Heap, Clustered Index* or Clustered Columnstore Index. This article is to showcase how to take advantage of a highly distributed framework provided by spark engine by carefully partitioning the data before loading into a Clustered Columnstore Index of a relational database like SQL Server or Azure SQL Database.


The most interesting observation shared in this article is to exhibit how the Clustered Columnstore Index row group quality is degraded when default spark configurations are used, and how the quality can be improved by efficient use of spark partitioning. Essentially, improving row group quality is an important factor for determining query performance.


*Note: There could be some serious implications of parallel inserting data into Clustered Index as mentioned in Guidelines for Optimizing Bulk Import and The Data Loading Performance Guide These guidelines and explanations are still valid for the latest versions of SQL Server.


 


Environment Setup


Data Set:



  • Custom curated data set – for one table only. One CSV file of 27 GB, 110 M records with 36 columns. The input data set have one file with columns of type int, nvarchar, datetime etc.


Database:



  • Azure SQL Database – Business Critical, Gen5 80vCores


ELT Platform:



  • Azure Databricks – 6.6 (includes Apache Spark 2.4.5, Scala 2.11)

  • Standard_DS3_v2 14.0 GB Memory, 4 Cores, 0.75 DBU (8 Worker Nodes Max)


Storage:



  • Azure Data Lake Storage Gen2


Pre-requisite: Before going further through this article spend some time to understand Overview of Loading Data into Columnstore Indexes here: Data Loading performance considerations with Clustered Columnstore indexes


In this test, the data was loaded from a CSV file located on Azure Data Lake Storage Gen 2. The CSV file size is 27 GB having 110 M records with 36 columns. This is a custom data set with random data.


A typical high-level architecture of Bulk ingestion or ingestion post-transformation (ELTETL) would look similar to the one given below:


 


Arch


 


Loading Through BULK INSERTS


In the first test, a single BULK INSERT was used to load data into Azure SQL Database table with Clustered Columnstore Index and no surprises here, it took more than 30 minutes to complete, depending on the BATCHSIZE used. Remember, BULK INSERT is a single threaded operation and hence one single stream would read and write it to the table, thus reducing load throughput.


Ravinder-Singh_0-1614847292881.png


 


Ravinder-Singh_1-1614847292904.jpeg


 


 


Loading Through Azure Databricks


To achieve maximum concurrency and high throughput for writing to SQL table and reading a file from ADLS (Azure Data Lake Storage) Gen 2, Azure Databricks was chosen as a choice of platform, although we have other options to choose from, viz. Azure Data Factory or another spark engine-based platform.


The advantage of using Azure Databricks for data loading is that Spark engine reads the input file in parallel through dedicated Spark APIs. These APIs would use a definite number of partitions which are mapped to one of more input data files, and the mapping is done either on a part of the file or entire file. The data is read into a Spark DataFrame or, DataSet or RDD (Resilient Distributed Dataset). In this case data was loaded into a DataFrame which was followed by a transformation (setting the schema of a DataFrame to match the destination table) and then the data is ready to be written to SQL table.


To write data from DataFrame into a SQL table, Microsoft’s Apache Spark SQL Connector must be used. This is a high-performance connector that enables you to use transactional data in big data analytics and persists results for ad-hoc queries or reporting. The connector allows you to use any SQL database, on-premises or in the cloud, as an input data source or output data sink for Spark jobs.


Bonus: A ready to use Git Hub repo can be directly referred for fast data loading with some great samples: Fast Data Loading in Azure SQL DB using Azure Databricks


Note that the destination table has a Clustered Columnstore index to achieve high load throughput, however, you can also load data into a Heap which will also give good load performance. For the relevance of this article, we’ll talk only about loading into a Clustered Columnstore Index. We used different BATCHSIZE values for loading data into a Clustered Columnstore Index – please refer this document to know the impact of BATCHSIZE during bulk loading into Clustered Columnstore Index.


Below are the test runs of data loading on Clustered Columnstore Index with BATCHSIZE 102,400 and 1,048,576:


Ravinder-Singh_2-1614847292905.png


 


Please note that we are using default parallelism and partitions used by Azure Databricks and directly pushing the data to SQL Clustered Columnstore Indexed table. We have not manipulated any default configuration used by Azure Databricks. Irrespective of the BatchSize defined, all our tests were completed at approximately the same time.


The 32 concurrent threads loading the data into SQL DB is due to the size of provisioned Databricks cluster mentioned above. The cluster has maximum of 8 worker nodes with 4 cores each i.e., 8*4 = 32 cores capable of running a maximum of 32 concurrent threads at max.


A Look at Row Groups


For table in which we inserted the data with BATCHSIZE 1,048,576, here are the number of row groups created in SQL:


 


Total number of row groups:


SELECT COUNT(1)
FROM
sys.dm_db_column_store_row_group_physical_stats
WHERE
object_id = OBJECT_ID(‘largetable110M_1048576’)
216

Quality of row groups:


SELECT *  FROM sys.dm_db_column_store_row_group_physical_stats
WHERE object_id = OBJECT_ID(‘largetable110M_1048576’)

Ravinder-Singh_3-1614847292913.jpeg


 


In this case, we have only one delta store with OPEN state (total_rows = 3810) and 215 row groups which were in COMPRESSED state, and this make sense because if the batch size for the insert is > 102,400 rows, the data no longer ends up in the delta store, rather is inserted directly into a compressed rowgroup in columnar format. In this case, all the row groups in the COMPRESSED state have >102,400 records. Now, the questions regarding row groups are:



  1. Why do we have 216 row groups?

  2. Why do each row group have a different number of rows in them when our BatchSize was set at 1,048,576?


Please note that each row group has data which is approximately equal to 500,000 records in the above result set.


The answer to both these questions is the way Azure Databricks spark engine partitions the data and controls the number of records getting inserted into row groups of Clustered Columnstore Index. Let’s have a look at the number of partitions created by Azure Databricks for the data set in question:













# Get the number of partitions before re-partitioning
print(df_gl.rdd.getNumPartitions())
216








So, we have 216 partitions created for the data set. Remember, these are the default number of partitions. Each partition has approximately 500,000 records.

 





# Number of records in each partition
from pyspark.sql.functions
import spark_partition_id
df_gl.withColumn(“partitionId”, spark_partition_id()).groupBy(“partitionId”).count().show(10000)

Ravinder-Singh_4-1614847292907.jpeg

Comparing the number of records in spark partitions with the number of records in the row groups, you’ll see that they are equal. Even the number of partitions is equal to the number of row groups. So, in a sense, the BATCHSIZE of 1,048,576 is being overdriven by the number of rows in each partition.


sqldbconnection = dbutils.secrets.get(scope = “sqldb-secrets”, key = “sqldbconn”)
sqldbuser = dbutils.secrets.get(scope = “sqldb-secrets”, key = “sqldbuser”)
sqldbpwd = dbutils.secrets.get(scope = “sqldb-secrets”, key = “sqldbpwd”)

servername
= “jdbc:sqlserver://” + sqldbconnection url = servername + “;” + “database_name=” + <Your Database Name> + “;”
table_name = “<Your Table Name>”

# Write data to SQL table with BatchSize 1048576
df_gl.write
.format(“com.microsoft.sqlserver.jdbc.spark”)
.mode(“overwrite”)
.option(“url”, url)
.option(“dbtable”, table_name)
.option(“user”, sqldbuser)
.option(“password”, sqldbpwd)
.option(“schemaCheckEnabled”, False)
.option(“BatchSize”, 1048576)
.option(“truncate”, True)
.save()

 


Row Group Quality


A row group quality is determined by the number of row groups and records per row group. Since a Columnstore index scans a table by scanning column segments of individual row group, maximizing the number of rows in each rowgroup enhances query performance. When row groups have a high number of rows, data compression improves which means there is less data to read from the disk. For the best query performance, the goal is to maximize the number of rows per rowgroup in a Columnstore index. A rowgroup can have a maximum of 1,048,576 rows. However, it is important to note that row groups must have at least 102,400 rows to achieve performance gains due to the Clustered Columnstore index. Also, remember that the maximum size of the row groups (1 million) may not be reached in every case, courtesy this document the rowgroup size is not just a factor of the max limit but is affected by the following factors.



  • The dictionary size limit, which is 16 MB

  • Insert batch size specified.

  • The Partition scheme of the table since a rowgroup doesn’t span partitions.

  • Memory Grants or memory pressure causing row groups to be trimmed.

  • Index REORG being forced or Index Rebuild being run.


Having said that, it is now an important consideration to have row group size as close to 1 million records as possible. In this testing since the size of each row group is close to 500,000 records, we have two options to reach to the size of ~1 million records:



  1. In spark engine (Databricks), change the number of partitions in such a way that each partition is as close to 1,048,576 records as possible,

  2. Keep spark partitioning as is (to default) and once the data is loaded in a table run ALTER INDEX REORG to combine multiple compressed row groups into one.


Option#1 is quite easy to implement in the Python or Scala code which would run on Azure Databricks. The overhead is quite low on the Spark side.


Option#2 is an extra step which is needed to be taken post data loading and ofcourse, this is going to consume extra CPU cycles on SQL and add to the time taken for the overall loading process.


To keep the relevance of this article, let’s stick to Spark partitioning. Let’s now discuss more about spark partitioning and how it can be changed from its default values and its impact in the next section.


 


Spark Partitioning


The most typical source of input for a Spark engine is a set of files which are read using one or more Spark APIs by dividing into an appropriate number of partitions sitting on each worker node. This is the power of Spark partitioning where the user is abstracted from the worry of deciding number of partitions and the configurations controlling the partitions if they don’t desire. The default number of partitions which are calculated based on environment and environment settings typically work well for most of the cases. However, in certain cases a better understanding of how the partitions are automatically calculated and a user can alter the partition count if desired can make a stark difference in performance.


Note: Large Spark clusters can spawn a lot of parallel threads which may lead to memory grants contention on Azure SQL DB. You must watch out for this possibility to avoid early trim due to memory timeouts. Please refer this article for more details to understand how schema of the table and number of rows etc. also may have an impact on memory grants.


spark.sql.files.maxPartitionBytes is an important parameter to govern the partition size and is by default set at 128 MB. It can be tweaked to control the partition size and hence will alter the number of resulting partitions as well.


spark.default.parallelism which is equal to the total number of cores combined for the worker nodes.


Finally, we have coalesce() and repartition() which can be used to increase/decrease partition count of even the partition strategy after the data has been read into the Spark engine from the source.


coalesce() can be used only when you want to reduce the number of partitions because it does not involve shuffle of the data. Consider that this data frame has a partition count of 16 and you would want to increase it to 32, so you decide to run the following command.


df = df.coalesce(32)
print(df.rdd.getNumPartitions())

However, the number of partitions will not increase to 32 and it will remain at 16 because coalesce() does not involve shuffling. This is a performance-optimized implementation because you can get reduced partition without expensive data shuffle.



In case you want to reduce the partition count to 8 for the above example then you would get the desired result.


df = df.coalesce(8)
print(df.rdd.getNumPartitions())

This will combine the data and result in 8 partitions.


repartition() on the other hand would be the function to help you. For the same example, you can get the data into 32 partitions using the following command.


df = df.repartition(32)
print(df.rdd.getNumPartitions())



Finally, there are additional functions which can alter the partition count and few of those are groupBy(), groupByKey(), reduceByKey() and join(). These functions when called on DataFrame results in shuffling of data across machines or commonly across executors which result in finally repartitioning of data into 200 partitions by default. This default 200 number can be controlled using spark.sql.shuffle.partitions configuration.


 


Back to Data Loading


Now, knowing about how partition works in Spark and how it can be changed, it’s time to implement those learnings. In the above experiment, the number of partitions was 216 (by default) and it was because the size of the file was ~27 GB, so dividing 27 GB by 128 MB (which is maxPartitionBytes defined by Spark by default) gives 216 partitions.


 


Impact of Spark Re-partitioning


The change to be done to the PySpark code would be to re-partition the data and make sure each partition now has 1,048,576 rows or close to it. For this, first get the number of records in a DataFrame and then divide it by 1,048,576. The result of this division will be the number of partitions to use to load the data, let’s say the number of partitions is n. However, there could be a possibility that some of the partitions now have >=1,048,576 rows, hence, to make sure in each partition have rows <= 1,048,576 we take number of partitions as n+1. Using n+1 is also important in cases when the result of th division is 0. In such cases, you’ll have one partition.


Since the data is already loaded in a DataFrame and Spark by default has created the partitions, we now have to re-partition the data again with the number of partitions equal to n+1.







# Get the number of partitions before re-partitioning 
print(df_gl.rdd.getNumPartitions())
216

# Get the number of rows of DataFrame and get the number of partitions to be used.
rows
= df_gl.count()
n_partitions = rows//1048576

# Re-Partition the DataFrame

df_gl_repartitioned
= df_gl.repartition(n_partitions+1)

# Get the number of partitions after re-partitioning
print(df_gl_repartitioned.rdd.getNumPartitions()) 105

# Get the partition id and count of partitions
df_gl_repartitioned
.withColumn(“partitionId”, spark_partition_id()).groupBy(“partitionId”).count().show(10000)

Ravinder-Singh_5-1614847292908.jpeg












 

So, after repartitioning the number of partitions has been reduced to 105 (n+1) from 216 and because of which each partition now has appx.1,048,576 records.







At this point, let’s write the data into SQL table again and verify the row group quality. This time the number of rows per row group will be close to the rows in each partition (a bit smaller than 1,048,576). Let’s see below:


Row Groups After Re-partitioning



SELECT COUNT(1)
FROM
sys.dm_db_column_store_row_group_physical_stats
WHERE
object_id = OBJECT_ID(‘largetable110M_1048576’)
105






Quality of Row Groups After Re-partitioning


Ravinder-Singh_6-1614847292915.jpeg


 


Essentially, this time the overall data loading was 2 seconds slower than earlier, but the quality of the row group was much better. The number of row groups was reduced to half, and the row groups are almost filled to their maximum capacity. Please note that there will an additional time consumed due to repartitioning of the data frame and it depends on the size of the data frame and number of partitions.


Disclaimer: Please note that you won’t always get 1 million records per row_group. It will depend on the data type, number of columns etc. along with factors discussed earlier – See trim_reason in sys.dm_db_column_store_row_group_physical_stats (Transact-SQL) – SQL Server | Microsoft Docs


 


Key Take Away



  1. It is always recommended to use BatchSize while bulk loading data into SQL Server (be it CCI or Heap). However, in case Azure Databricks or any other Spark engine is used to load the data, the data partitioning plays a significant role to ascertain the quality of row groups in Clustered Columnstore index.

  2. Data loading using BULK INSERT SQL command will honor the BATCHSIZE mentioned in the command, unless other factors affect the number of rows inserted into a rowgroup.

  3. Partitioning the data in Spark shouldn’t be based on some random number, it’s good to dynamically identify the number of partitions and use n+1 as number of partitions.

  4. Since a Columnstore index scans a table by scanning column segments of individual row groups, maximizing the number of records in each rowgroup enhances query performance. For the best query performance, the goal is to maximize the number of rows per rowgroup in a Columnstore index.

  5. The speed of data loading from Azure Databricks largely depends on the cluster type chosen and its configuration. Also, note that as of now the Azure SQL Spark connector is only supported on Apache Spark 2.4.5. Microsoft has released support for Spark 3.0 which is currently in Preview, we recommend you to test this connector thoroughly in your devtest environments.

  6. Depending on the size of the data frame, number of columns, the data type etc. the time to do repartitioning will vary, so you must consider this time to the overall data loading from end-end perspective.


 

Windows Admin Center Capabilities

Windows Admin Center Capabilities

This article is contributed. See the original author and article here.

Windows Admin Center is a tool I wish I had had when I was managing servers.  When the tool was first announced in 2017 under the name Project “Honolulu”, I was intrigued.  I was working for a managed service provider at the time and was mostly focusing on Microsoft 365 migrations and adoptions, so it wasn’t a tool I really got to use much, but I did have it installed and used it to manage the test server we had in the office!


 


What is Windows Admin Center?


 


Windows Admin Center is a remote management tool for Windows Server. It can manage your server wherever it resides, either in your on-premises datacentre, in Microsoft Azure or another cloud.


The tools you expect to need to do management tasks on your server are there, you don’t have to go opening the Microsoft Terminal Services Client (MSTSC) to remote into the server, you don’t have to open a separate Microsoft Management Console (MMC) windows to carry out that task you’ve been asked to.  You can use Windows Admin Center! It’s a single management plane for your server management tasks.


 


When I was managing servers, I remember the days of having multiple MSTSC sessions open, or MMC windows open or even some browser-based session open to carry out tasks.  It would get confusing having so many windows open, and there was all the chance of accidently doing something on the wrong server.


 


Windows Admin Center is a browser-based management tool that can help you manage those single server instances or cluster instances in troubleshooting, configuration, or management scenarios.  One of the greatest benefits of Windows Admin Center is that there are no additional costs, it’s included with Windows Server license.


 


Built for the future


Windows Admin Center can also help you extend on-premises deployments of Windows Server to the cloud using the Azure Hybrid services.  Windows Admin Center can help you implement technologies such as Azure Backup, Azure File Sync, Azure Monitor, Azure Arc, Azure Update Management and Azure Security Centre into your on-premises servers.


 


From within Windows Admin Center you can start to interact with your Azure subscription, setting up resources groups and installing the necessary agents for Azure Arc to help manage and monitor your on-premises servers.


 


From within Windows Admin Center you can configure it so that your server is protected by Azure Backup. Windows Admin Center walks you through the process needed without leaving your Windows Admin Center session.


 


What I love about this is Windows Admin Center is something you can start using within your organization today, even if you aren’t using any cloud services, even if using the cloud isn’t on your trajectory just yet.  But when you are ready to use the cloud the features and functionality to help you manage your on-premises servers is there built in ready for you to use. So, it’s ready for the future.


 


Extensibility


The other great thing about Windows Admin Center it has great extensibility.  Which really means that it gives Microsoft and 3rd party developers the ability to build out additional capabilities to complement the existing offerings. Microsoft offers an SDK that enables developers to build their own tools for Windows Admin Center.  Several partners have developed some extensions, such as DataON, Fujitsu and Lenovo.  So again, Windows Admin Center is helping to give you a unified experience, with your Windows Server management tasks, your Azure management tasks as well as managing your hardware.


 


The option is there for anyone to have a go at creating a tool extension, solution extension or gateway plugin.  The SDK contains content and examples to help guide you through the process.


Techielass_0-1614166562220.png


 


 


Windows Admin Center in the Azure Portal


In September 2020, the team announced a new way to utilize Windows Admin Center.  From within the Azure Portal! This new capability allows you to manage tasks you need to carry out within your Windows Server Azure IaaS VMs seamlessly and at a more granular level.  The feature has went public preview so is now available for you to try!


 


Get started today


If you have to manage Windows Servers and haven’t tried Windows Admin Center, then go download a copy today and give it a whirl and understand how it can help your environment!


 

Experiencing Data Access issue in Azure Portal for Many Data Types – 03/04 – Investigating

This article is contributed. See the original author and article here.

Initial Update: Thursday, 04 March 2021 07:16 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers may experience intermittent data gaps and incorrect alert activation in US Gov Virginia region.
  • Work Around: None
  • Next Update: Before 03/04 09:30 UTC
We are working hard to resolve this issue and apologize for any inconvenience.
-Vyom

Long-term backup retention on Azure SQL Managed Instance

Long-term backup retention on Azure SQL Managed Instance

This article is contributed. See the original author and article here.

 


Many applications have regulatory, compliance, or other business purposes that require you to retain database backups beyond the 7-35 days provided by Azure SQL Managed Instance automated backups. By using the long-term retention (LTR) feature, you can store the specified full backup of a SQL Managed Instance database in Azure Blob storage with configured redundancy for up to 10 years. You can then restore any backup as a new database. Long-term retention is available in Public Preview for Azure SQL Managed Instance.


 


How long-term retention works?


Long-term backup retention (LTR) leverages the full database backups that are automatically created to enable point-time restore (PITR). If an LTR policy is configured, these backups are to different for long-term storage. The LTR policy for each database can also specify how frequently the LTR backups are created. To enable LTR, you can define a policy using a combination of four parameters: weekly backup retention (W), monthly backup retention (M), yearly backup retention (Y), and week of year (WeekOfYear). To learn more about how to configure long-term retention policy, see here.


 


Configuring backup storage redundancy


By default, SQL Managed Instance stores data in geo-redundant storage blobs that are replicated to a paired region. When an Azure region is unavailable, geo-restore can be used to recover the database in a different Azure region. Geo-restore is only available if backups are geo-redundant.


However, to meet data residency requirements, managed instance backups can now be configured as local or zone redundant. This ensures that the database backups stay within the same region. The configured backup storage redundancy for a given managed instance applies to both PITR backups and long-term retention backups (LTR).


For SQL Managed Instance, the backup storage redundancy must be selected at the time of creation of the managed instance. Updating backup storage redundancy for an existing managed instance is not supported yet.


 


Long-term retention using Azure Portal


In this section we will describe how you can configure long-term retention for your SQL Managed Instance database using Azure Portal. Here are the RBAC roles that you will need while operating with the long-term retention backups.


When you setup long-term retention for a database, you may perform the following operations on your backups:



  1. Set the long-term retention policy for your database.

  2. Update the long-term retention policy.

  3. View or get a list of available long-term retention backups.

  4. Restore from an available long-term retention backup.

  5. Delete a long-term retention backup.

  6. Restore from a long-term retention backup of a deleted database.

  7. Monitor the long-term retention backup costs.


We will look into each of these and understand how this can be done.


 


Create or Update long-term retention policy for a managed instance database.


To setup a new LTR policy for a database:



  1. Navigate to the managed instance on Azure Portal.

  2. Open the Backups tab

  3. Next, click on the Retention Policies tab and select the database(s) for which you want to set the LTR policy.

  4. Click on Configure policies.

  5. Click on ‘Change LTR policies for all selected databases.

  6. Apply the Weekly/Monthly/Yearly setting that you would like to keep.

  7. Click Apply to apply the new policy.


create.png


 


 


List available long-term backups for a database.


To list all the available LTR backups for a given database:



  1. Navigate to the managed instance.

  2. Click on Backups -> Available backups

  3. The databases which have accumulated LTR backups, will show Manage under the ‘Available LTR backups’ column.

  4. Click Manage, this will open up a list of available LTR backups for the given database. This list shows the time when backup was taken and the expiration time for the backup as per the retention policy.


list.png


 


 


Restore or Delete a long-term retention backup.


There are 2 different methods that you can use on Azure Portal to restore using a long-term retention backup.



  1. Go to Available backups blade and for the database where you see Manage under the ‘Available LTR backups’ column, click on the Restore icon. This will open the Create Database/Restore Database experience on a new page.restore1.png

  2. OR, you can click on Manage, this will open up a list of available LTR backups in a new blade. Select the LTR backup, which you would like to use for restore. Then click the Restore button. This will also open the Create Database/Restore Database experience on a new page.


restore2.png


 


3. Clicking on Restore will open the full-screen database restore experience on Azure Portal. Here, you will see the long-term retention source backup. A name for the new database will be pre-selected based on the timestamp, you can change the database name here and set it to desired value.


restore3.png


 


4. To delete a backup, which is no longer required, instead of clicking Restore, click the Delete button. This will delete the selected long-term retention backup(s). Note that, deleting a long-term retention backup is an irreversible operation. Once deleted, these backups cannot be recovered.


 


 


Restore from a long-term retention backup of a deleted database.


To restore LTR backup of a deleted database:



  1. Navigate to your managed instance on Azure Portal.

  2. Click on Backups -> Available backups.

  3. Select the Deleted radio button under Show Databases. This will bring up a list of deleted databases, which have backups available for restore.

  4. Click Manage under the ‘Available LTR backups’ column. This will open a blade, which will show you a list of available LTR backups for that database.

  5. Select the backup that you want to restore from the list. Then click Restore. This will open the full-screen restore experience.


restore_deleted.png


 


 


Analyze LTR backup costs


To analyze backup costs, you can use the Cost Analysis tool available at the subscription level on Azure Portal. Details of how to look up different backup costs are described here. In this section, we will look at how you can look up your LTR backup costs.



  1. Navigate to your Subscription on Azure Portal.

  2. Click on Cost Analysis.

  3. Click on Add filter.

  4. Add a ‘Service name‘ filter, with value ‘sql database’.

  5. Add another filter for ‘Meter subcategory‘, with value ‘ltr backup storage’.

  6. In the charts displayed, if you slice by Resource, you will see the longtermretetnion backups listed from your resource.

  7. In the next chart, if you filter by Meter, it will show you the storage type used by backups and the applicable cost for backups.


analyze_cost.png


 


 


Conclusion


To meet your organizations compliance or regulatory needs, you can now easily configure long-term retention backups for up to 10 years for Azure SQL managed instance database(s). This can be easily done using Azure Portal or Azure PowerShell. You can also pick the desired backup storage redundancy for backups, to meet your data residency requirements.


 


If you find this article useful, please like it on this page and share through social media. To share this article, you can use the Share button below, or this short link: https://aka.ms/LTRonMI.


 


Disclaimer


Please note that products and options presented in this article are subject to change. This article reflects the long-term retention backup options available for Azure SQL Managed Instance in March, 2021.


 


 


 


 

Download baselines from Community hub in Configuration Manager Technical Preview 2103

Download baselines from Community hub in Configuration Manager Technical Preview 2103

This article is contributed. See the original author and article here.

Update 2103 for the Technical Preview Branch of Microsoft Endpoint Configuration Manager has been released. Community hub now supports contributing and downloading configuration baselines including the child configuration items. This integration allows administrators to easily share and reuse configuration baselines and configuration items. Contributing and downloading configuration baselines is also available for current branch versions of Configuration Manager.


baselines in community hubbaselines in community hub


 Note: Baselines with software updates or version-specific references aren’t supported.


 


For more information, see  What’s new in Community hub and Contribute to the community hub


 


This preview release also includes:


 


Tenant attach: Antivirus policy supports exclusions merge – When a tenant attached device is targeted with two or more antivirus policies, the settings for antivirus exclusions will merge before being applied to the client. This change results in the client receiving the exclusions defined in each policy, allowing for more granular control of antivirus exclusions. 


 


Cloud attach during site upgrade – Microsoft Endpoint Manager is an integrated solution for managing all of your devices. Cloud attach brings together Configuration Manager and Intune into a single console called Microsoft Endpoint Manager admin center. If you haven’t already enabled the following cloud attach features, now the site upgrade process lets you cloud attach the site: 



New CMG deployments use TLS 1.2 – Starting in this release, new deployments of the cloud management gateway (CMG) by default enable the option to Enforce TLS 1.2. This change doesn’t affect existing deployments. While not recommended, if you need to use the less secure TLS protocol versions, you can still disable this option.


 


PowerShell release notes preview – These release notes summarize changes to the Configuration Manager PowerShell cmdlets in technical preview version 2103.


 


Update 2103 for Technical Preview Branch is available in the Microsoft Endpoint Configuration Manager Technical Preview console. For new installations, the 2103 baseline version of Microsoft Endpoint Configuration Manager Technical Preview Branch is available on the Microsoft Evaluation Center. Technical Preview Branch releases give you an opportunity to try out new Configuration Manager features in a test environment before they are made generally available.


 


We would love to hear your thoughts about the latest Technical Preview!  Send us Feedback about product issues directly from the console and use our UserVoice page for ideas about new features.


 


Thanks,


The Configuration Manager team


 


Configuration Manager Resources:


Documentation for Configuration Manager Technical Previews


Try the Configuration Manager Technical Preview Branch


Documentation for Configuration Manager


Microsoft Endpoint Manager announcement


Microsoft Endpoint Manager vision statement


Configuration Manager Forums


Configuration Manager Support