This article is contributed. See the original author and article here.

Scenario:

You are trying to perform ListBlob operation on a container with soft delete option enabled and having ‘X’ number of day as the retention period. You are facing latency while performing the listing operation on the container.

 

Actions:

You are trying to perform ListBlob operation on a container comprising of multi-million blobs and this can lead to high storage latency being observed for the operation. The article talks about the some of the key points that you can take note off and try isolating this ahead:

 

  1. Let us look at series of steps that take place whenever a listing operation is performed
    • Whenever a ListBlob operation is performed on a container – with or without prefix, the Azure Storage’s Partition server will scan internal filetables and memorytables on the partition which is holding that storage account. The partition may or may not contain data for other storage account too.
    • Each iteration scans a maximum of 5000 rows by default (however it may not return any rows), it returns the result and the next marker. Additionally, the table layer also has a timeout of 5 seconds for any kind of scan operation done.
    • Once this timeout OR maximum rows are hit, the table server will return the results it got so far, along with the next marker. Hence, we may sometimes not see the blob that we ‘expected’ to see in first result. This is as per design.
    • Additionally, if you are overwriting blobs very frequently, it generates a lot of snapshots and may increase the time for the listing operation.
  2. Let us now look at series of steps that take place when a deletion operation is fired with  Soft Delete option as disabled.
    • Whenever you perform delete operation over the blobs and if the soft delete option isn’t enabled, the blobs get marked for deletion at the same time itself.
    • The garbage collector then collects all the blobs marked for deletion and clean them up.
    • Please note that there might be some delay in garbage collector to catch up on the blobs marked for deletion.
    • Until this time, you might see latency while performing the listing operation as it could be a case that deleted blobs are interleaving with live blobs.
    • Once the GC process catches up you should no longer see any latency while performing the listing operation.
  3. Let us take this on step further and look at series of steps that take place once a deletion operation is fired with Soft Delete option as enabled.
    • Whenever you perform delete operation over the blobs and if the soft delete option is enabled, the blobs will continue to reside in the storage account until the actual expiry time of the blob is met. You can get an idea of the actual expiry time by help of Deleted Time and RemainingRetentionDays (Days until permanent delete)
    • Deleted Time signifies the time when the delete operation was performed over the blob whereas the RemainingRetentionDays signifies the number of days after which soft deleted blob will be permanently deleted.
    • You can check for these values from the Azure Portal by selecting the properties of the deleted blob. Please refer to the below snippet:
       
       
       
       
       
       
       

      Amrinder_Singh_0-1594305596969.png

    • Alternatively, you can also make us of ListBlob API call and include the parameter as include={deleted}  to fetch the values for Deleted-Time and RemainingRetentionDays.
    • Now, in this case, the garbage collector will not attempt to remove these soft deleted blobs prior to that timeframe as they will not qualify for the same.
    • Post Soft Delete expiry time, the blobs will get qualified for the garbage collection.
    • The GC process may then additionally take couple days to permanently clean these objects ahead.
    • For example : Let’s say you have deleted the blobs starting 7th Jan at 01:00 PM UTC until 7:00 PM UTC and you have soft delete option enabled on the account with retention period of 3 days. In this scenario, the blobs will continue to reside in the storage account approximately till on 10th Jan at 07:00 PM UTC (i.e. until respective blobs expires). The GC process may then take additional days couple of days before you start to see the effect of blob deletion. (NOTE : These values are just for representation)
    • The latency in this case shall continue to persist for a longer time as compared to the delete operation with soft delete option disabled. 

 

By far, we know that the ListBlob operation on a multi-million blob container may cause high storage latency and this is as per design. The question further arises if there is anything that can be done to overcome the latency which is caused due to deletion. There can be 3 further sub-scenarios here:

 

  1. You have completed the deletion of blobs, but you don’t want to wait for entire expiry time to be met. Below are the steps that you can then follow ahead:
    • Undelete (recover) blobs deleted on 7th JAN (Date taken from the above example only; you need to modify as per you requirement)
    • Temporarily disable the Blob SoftDelete settings.
    • Delete the recovered blobs again. This time when the blobs are deleted, they will be marked for permanent deletion as SoftDelete was disabled.
    • Enable SoftDelete settings back again.
    • In case, you have gone pass the retention period this option will not be available, and you will then only have to wait for GC process to catch up.

 

  1. You are yet to start with the deletion of blobs. Below are the steps that you can then follow ahead:
    • Temporarily disable the Blob SoftDelete settings.
    • Delete the recovered blobs again. This time when blobs are deleted, they will be marked for permanent deletion as SoftDelete settings was disabled.
    • Enable SoftDelete settings back again.

 

Note: In both the above-mentioned options, it may impact other blobs deleted during the time SoftDelete is temporarily disabled on Storage Account.

 

  1. In case, you don’t want to go with either of the above 2 options, but still want to clean up soft delete blobs without waiting for the retention period, there is one more option you can try ahead:
    • You can copy all the blobs in the container to a different container.
    • Delete the container.

 

Below are other generic points related to Soft Delete option:

  • This is a great feature in order to protect your blob data from being accidentally or erroneously modified or deleted
  • The options is not enabled by default whether it is a new account or an existing storage account.
  • You can enabled or disable the feature at any given point of time.
  • If at any point of time, you want to get a count of soft deleted blobs, you can make use of the powershell  script provided in the link below as well:

https://gist.github.com/ajith-k/80a2dc84986d6bb83be76be6bc672043             

 

Hope this helps!

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.