Dataverse Low-Code Plugins | Dataverse Accelerator | [Preview]

Dataverse Low-Code Plugins | Dataverse Accelerator | [Preview]

So by now, you must’ve come across Dataverse Low Code Plugins quite a lot if you’re following Dynamics 365 Wave Updates. Here’s a post that demystifies and summarizes what Low Code Plugins are all about and how you can start implementing the same. Note: Please note that this is in Preview at the time of … Continue reading Dataverse Low-Code Plugins | Dataverse Accelerator | [Preview]

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

Dataverse Low-Code Plugins | Dataverse Accelerator | [Preview]

Enable App Auto-Updates in Power Platform Admin Center | [Preview]

Sometimes, Apps need to be up to date in order for some features to run effectively. Power Platform Admin Center now allows you to select Third-Party Publishers for an Environment to allow automatic App Updates in your defined Maintenance Window slots. Enable Auto-Updates Here’s how you turn on Auto-Updates for certain Publishers in your environment … Continue reading Enable App Auto-Updates in Power Platform Admin Center | [Preview]

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

Running HPC and AI workloads in containers in Azure

This article is contributed. See the original author and article here.

Introduction


Containers technologies are no longer something new in the industry. It all started focusing on how to deploy reproducible development environments but now you can find many other fields where applying containers, or some of the underlying technologies used to implement them, are quite common.


I will not cover here Azure Container Instances nor Azure Kubernetes Services. For an example of the latter you can browse this article NDv4 in AKS. ACI will be explained in another article. 


Currently there are many options available when working with containers, Linux seasoned engineers quite likely have worked with LXC; later Docker revolutionized the deployment of development environments, more recently other alternatives like Podman have emerged and are now competing for a place in many fields.


However, in HPC, we have been working for some years with two different tools, Shifter as the first fully focused containers project for supercomputers and Singularity. I will show you how to use Singularity in HPC clusters running in Azure. I will also explain how to use Podman for running AI workloads using GPUs in Azure VMs.


Running AI workloads using GPU and containers


Running AI workloads do not need the presence of GPUs, but almost all the frameworks for machine learning/deep learning are designed to make use of them. So, I will assume GPU compute resources are required in order to run any AI workload.


There are many ways of taking advantage of GPU compute resource within containers. For example, you can run the whole container in privileged mode in order to get access to all the hardware available in the host VM, some nuances must be highlighted here because privileged mode cannot grant more permissions than those inherent to the user running the container. This means running a container as root in privileged mode is way different than running the container as a regular user with less privileges.


The most common way to get access to the GPU resources is via nvidia-container-toolkit, this package contains a hook in line with OCI standard (see references below) providing direct access to GPU compute resources within the container.


I will use a regular VM using Nvidia T4 Tesla GPU (NC8as_T4_v3) running RHEL 8.8. Let’s get started.


These are all the steps required to run AI workloads using containers and GPU resources in a VM running in Azure:



  1. A VM using any family of N-series (for AI workloads like machine learning, deep learning, etc… NC or ND are recommended) and a supported operating system.

  2. Install CUDA drivers and CUDA toolkit if required. You can omit this if you are using DSVM images from Marketplace, these images come with all required drivers preinstalled.

  3. Install your preferred container runtime environment and engine to work with containers.

  4. Install nvidia-container-toolkit.

  5. Run a container using any image with the tools required to check the GPU usage like nvidia-smi command. Using any container from NGC is more than recommended to avoid additional steps.

  6. Create the image with your code or commit the changes in a running container.


I will start with step 2 because I’m sure there is no need to explain how to create a new VM with N-series.


 


Installing CUDA drivers


There is no specific restriction about which CUDA release must be installed. You have the freedom to choose the latest version from Nvidia website, for example. 


 

$ wget https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo -O /etc/yum.repos.d/cuda-rhel8.repo
$ sudo dnf clean all
$ sudo dnf -y install nvidia-driver

 


Let’s check if the drivers are installed correctly by using nvidia-smi command:


 

[root@hclv-jsaelices-nct4-rhel88 ~]# nvidia-smi
Fri Nov  3 17:41:03 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000001:00:00.0 Off |                  Off |
| N/A   51C    P0              30W /  70W |      2MiB / 16384MiB |      7%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

 


 


Installing container runtime environment and engine


As I commented in the introduction, Podman will be our main tool to run containers. By default, Podman will use runc as the runtime environment, runc adheres to OCI standard so no additional steps to make sure nvidia-container-toolkit will work in our VM.


 

$ sudo dnf install -y podman

 


I won’t explain here all the benefits of using Podman against Docker. I’ll just mention Podman is daemonless and a most modern implementation of all technologies required to work with containers like control groups, layered filesystems and namespaces to name a few.


Let’s verify Podman was successfully installed using podman info command:


 

[root@hclv-jsaelices-nct4-rhel88 ~]# podman info | grep -i ociruntime -A 19
  ociRuntime:
    name: runc
    package: runc-1.1.4-1.module+el8.8.0+18060+3f21f2cc.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.1.4
      spec: 1.0.2-dev
      go: go1.19.4
      libseccomp: 2.5.2
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_SYS_CHROOT,CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false

 


 


Installing nvidia-container-toolkit


Podman fully supports OCI hooks and that is precisely what nvidia-container-toolkit provides. Basically, OCI hooks are custom actions performed during the lifecycle of the container. It is a prestart hook that is called when you run a container providing access to the GPU using the drivers installed in the host VM. The already created repository is also providing this package so let’s install it using dnf:


 

$ sudo dnf install -y nvidia-container-toolkit

 


Podman is daemonless so no need to add the runtime using nvidia-ctk runtime configure, but, in this case, an additional step is required to  generate the CDI configuration file:

$ sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
$ nvidia-ctk cdi list
INFO[0000] Found 2 CDI devices
nvidia.com/gpu=0
nvidia.com/gpu=all

 


 


Running containers for AI workloads


Now, we have all the environment ready for running new containers for AI workloads. I will make use of NGC images from Nvidia to save time and avoid the creation of custom ones. Please, keep in mind some of them are quite big so make sure you have enough space in your home folder.


Let’s start with an Ubuntu 20.04 image with CUDA already installed on it:


 

[jsaelices@hclv-jsaelices-nct4-rhel88 ~]$ podman run --security-opt=label=disable --device=nvidia.com/gpu=all nvcr.io/nvidia/cuda:12.2.0-devel-ubuntu20.04

==========
== CUDA ==
==========

CUDA Version 12.2.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

 


Another example running the well-known DeviceQuery tool that comes with CUDA toolkit:


 

[jsaelices@hclv-jsaelices-nct4-rhel88 ~]$ podman run --security-opt=label=disable --device=nvidia.com/gpu=all nvcr.io/nvidia/k8s/cuda-sample:devicequery-cuda11.7.1-ubuntu20.04
/cuda-samples/sample Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla T4"
  CUDA Driver Version / Runtime Version          12.2 / 11.7
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 15948 MBytes (16723214336 bytes)
  (040) Multiprocessors, (064) CUDA Cores/MP:    2560 CUDA Cores
  GPU Max Clock rate:                            1590 MHz (1.59 GHz)
  Memory Clock rate:                             5001 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        65536 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   1 / 0 / 0
  Compute Mode:
     

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 11.7, NumDevs = 1
Result = PASS

 


You can see in these examples that I’m running those containers with my user without root privileges (rootless environment) with no issues, and that is because of that option passed to the podman run command, –security-opt=label=disable. This command is used to disable all SELinux labeling. This is performed this way for the sake of this article’s length. I could use a SELinux policy created with Udica or use the one that comes with Nvidia (nvidia-container.pp) but I preferred to disable the labeling for these specific samples.


Now it is time to try running specific frameworks for AI using Python. Let’s try with Pytorch:


 

[jsaelices@hclv-jsaelices-nct4-rhel88 ~]$ podman run --rm -ti --security-opt=label=disable --device=nvidia.com/gpu=all pytorch/pytorch
root@7cb030cc3b47:/workspace# python
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>

 


As you can see PyTorch framework can see the GPU and would be able to run any code using GPU resources without any issue.


I won’t create any custom image as suggested in the last step described previously. That can be a good exercise for the reader, so it is your turn to test your skills running containers and using GPU resources.


 


Running HPC workloads using containers


Now it is time to run HPC applications in our containers. You can also use podman to run those, in fact there is an improvement over podman developed jointly by NERSC and Red Hat called Podman-HPC but, for this article, I decided to use Singularity which is well-know in HPC field.


For this section, I will run some containers using Singularity in a cluster created with CycleCloud using HB120rs_v3 size for the compute nodes. For the OS, I’ve chosen Almalinux 8.7 HPC image from Azure Marketplace.


I will install Singularity manually but this can be automated using cluster-init in CycleCloud.


Installing Singularity in the cluster


In Almalinux 8.7 HPC image epel repository is installed by default so you can easily install singularity with a single command:


 

[root@slurmhbv3-hpc-2 ~]# yum install -y singularity-ce
Last metadata expiration check: 1:16:36 ago on Fri 03 Nov 2023 04:38:39 PM UTC.
Dependencies resolved.
=========================================================================================================================================================
 Package                          Architecture             Version                                                     Repository                   Size
=========================================================================================================================================================
Installing:
 singularity-ce                   x86_64                   3.11.5-1.el8                                                epel                         44 M
Installing dependencies:
 conmon                           x86_64                   3:2.1.6-1.module_el8.8.0+3615+3543c705                      appstream                    56 k
 criu                             x86_64                   3.15-4.module_el8.8.0+3615+3543c705                         appstream                   517 k
 crun                             x86_64                   1.8.4-2.module_el8.8.0+3615+3543c705                        appstream                   233 k
 libnet                           x86_64                   1.1.6-15.el8                                                appstream                    67 k
 yajl                             x86_64                   2.1.0-11.el8                                                appstream                    40 k
Installing weak dependencies:
 criu-libs                        x86_64                   3.15-4.module_el8.8.0+3615+3543c705                         appstream                    37 k

Transaction Summary
=========================================================================================================================================================
Install  7 Packages

Total download size: 44 M
Installed size: 135 M
Downloading Packages:
(1/7): criu-libs-3.15-4.module_el8.8.0+3615+3543c705.x86_64.rpm                                                          1.0 MB/s |  37 kB     00:00
(2/7): conmon-2.1.6-1.module_el8.8.0+3615+3543c705.x86_64.rpm                                                            1.1 MB/s |  56 kB     00:00
(3/7): crun-1.8.4-2.module_el8.8.0+3615+3543c705.x86_64.rpm                                                              5.3 MB/s | 233 kB     00:00
(4/7): libnet-1.1.6-15.el8.x86_64.rpm                                                                                    1.5 MB/s |  67 kB     00:00
(5/7): criu-3.15-4.module_el8.8.0+3615+3543c705.x86_64.rpm                                                               4.5 MB/s | 517 kB     00:00
(6/7): yajl-2.1.0-11.el8.x86_64.rpm                                                                                      954 kB/s |  40 kB     00:00
(7/7): singularity-ce-3.11.5-1.el8.x86_64.rpm                                                                             11 MB/s |  44 MB     00:04
---------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                    7.0 MB/s |  44 MB     00:06
Extra Packages for Enterprise Linux 8 - x86_64                                                                           1.6 MB/s | 1.6 kB     00:00
Importing GPG key 0x2F86D6A1:
 Userid     : "Fedora EPEL (8) "
 Fingerprint: 94E2 79EB 8D8F 25B2 1810 ADF1 21EA 45AB 2F86 D6A1
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-8
Key imported successfully
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                 1/1
  Installing       : yajl-2.1.0-11.el8.x86_64                                                                                                        1/7
  Installing       : libnet-1.1.6-15.el8.x86_64                                                                                                      2/7
  Running scriptlet: libnet-1.1.6-15.el8.x86_64                                                                                                      2/7
  Installing       : criu-3.15-4.module_el8.8.0+3615+3543c705.x86_64                                                                                 3/7
  Installing       : criu-libs-3.15-4.module_el8.8.0+3615+3543c705.x86_64                                                                            4/7
  Installing       : crun-1.8.4-2.module_el8.8.0+3615+3543c705.x86_64                                                                                5/7
  Installing       : conmon-3:2.1.6-1.module_el8.8.0+3615+3543c705.x86_64                                                                            6/7
  Installing       : singularity-ce-3.11.5-1.el8.x86_64                                                                                              7/7
  Running scriptlet: singularity-ce-3.11.5-1.el8.x86_64                                                                                              7/7
  Verifying        : conmon-3:2.1.6-1.module_el8.8.0+3615+3543c705.x86_64                                                                            1/7
  Verifying        : criu-3.15-4.module_el8.8.0+3615+3543c705.x86_64                                                                                 2/7
  Verifying        : criu-libs-3.15-4.module_el8.8.0+3615+3543c705.x86_64                                                                            3/7
  Verifying        : crun-1.8.4-2.module_el8.8.0+3615+3543c705.x86_64                                                                                4/7
  Verifying        : libnet-1.1.6-15.el8.x86_64                                                                                                      5/7
  Verifying        : yajl-2.1.0-11.el8.x86_64                                                                                                        6/7
  Verifying        : singularity-ce-3.11.5-1.el8.x86_64                                                                                              7/7

Installed:
  conmon-3:2.1.6-1.module_el8.8.0+3615+3543c705.x86_64                          criu-3.15-4.module_el8.8.0+3615+3543c705.x86_64
  criu-libs-3.15-4.module_el8.8.0+3615+3543c705.x86_64                          crun-1.8.4-2.module_el8.8.0+3615+3543c705.x86_64
  libnet-1.1.6-15.el8.x86_64                                                    singularity-ce-3.11.5-1.el8.x86_64
  yajl-2.1.0-11.el8.x86_64

Complete!

 


I won’t explain all the pros and cons when using Singularity over other containers alternatives. I will just highlight some of the security features provided by Singularity and, especially, the format of the image used (Singularity Image Format, SIF) during the examples.


One of the biggest advantages of using Singularity is the size of the images, SIF is a binary format and is very compact comparing to regular layered Docker images. See below an example of the image of OpenFOAM:


 

[jsaelices@slurmhbv3-hpc-2 .singularity]$ singularity pull docker://opencfd/openfoam-default
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 855f75e343f2 done   |
Copying blob b9158799e696 done   |
Copying blob 561d59533bc7 done   |
Copying blob 96b48e52a343 done   |
Copying blob a8cede8f862e done   |
Copying blob 3153aa388d02 done   |
Copying blob 3efcde42d95a done   |
Copying config dc7161e162 done   |
Writing manifest to image destination
2023/11/03 17:59:51  info unpack layer: sha256:3153aa388d026c26a2235e1ed0163e350e451f41a8a313e1804d7e1afb857ab4
2023/11/03 17:59:51  info unpack layer: sha256:855f75e343f27a0838944f956bdf15a036a21121f249957cf121b674a693c0c9
2023/11/03 17:59:51  info unpack layer: sha256:a8cede8f862e92aa526c663d34038c1152fb56f3e7005a1bcefd29219a77fd6f
2023/11/03 17:59:54  info unpack layer: sha256:561d59533bc76812ab48aef920990af0217af17b23aaccc059a5e660a2ca55b0
2023/11/03 17:59:54  info unpack layer: sha256:b9158799e696063a99dc698caef940b9e60ca7ff9c1edd607fc4688d953a1aa6
2023/11/03 17:59:54  info unpack layer: sha256:96b48e52a343650d16be2c5ba9800b30ff677f437379cc70e05c255d1212b52e
2023/11/03 18:00:03  info unpack layer: sha256:3efcde42d95ab617eac299e62eb8800b306a0279e9368daf2141337f22bf8218
INFO:    Creating SIF file...

 


You can see the size is about 350 MB:


 

[jsaelices@slurmhbv3-hpc-2 .singularity]$ ls -lh openfoam-default_latest.sif
-rwxrwxr-x. 1 jsaelices jsaelices 349M Nov  3 18:00 openfoam-default_latest.sif

 


Docker is using a layered format that is substantially bigger in size:


 

[root@slurmhbv3-hpc-1 ~]# docker images
REPOSITORY                 TAG       IMAGE ID       CREATED        SIZE
opencfd/openfoam-default   latest    dc7161e16205   3 months ago   1.2GB

 


 


Running MPI jobs with Singularity


Singularity is fully compatible with MPI and there are 2 different ways to submit an MPI job with SIF images.


I will use the bind method for its simplicity but you can also use the hybrid method if binding volumes between the host and the container is not desirable.


Let’s create a simple definition file called mydefinition.def (similar to Dockerfile or Containerfile):


 

Bootstrap: docker
From: almalinux

%files
/shared/bin/mpi_test /shared/bin/mpi_test

%environment
export MPI_HOME=/opt/intel/oneapi/mpi/2021.9.0
export MPI_BIN=/opt/intel/oneapi/mpi/2021.9.0/bin
export LD_LIBRARY_PATH=/opt/intel/oneapi/mpi/2021.9.0/libfabric/lib:/opt/intel/oneapi/mpi/2021.9.0/lib/release:/opt/intel/oneapi/mpi/2021.9.0/lib:/opt/intel/oneapi/tbb/2021.9.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.9.0//libfabric/lib:/opt/intel/oneapi/mpi/2021.9.0//lib/release:/opt/intel/oneapi/mpi/2021.9.0//lib:/opt/intel/oneapi/mkl/2023.1.0/lib/intel64:/opt/intel/oneapi/compiler/2023.1.0/linux/lib:/opt/intel/oneapi/compiler/2023.1.0/linux/lib/x64:/opt/intel/oneapi/compiler/2023.1.0/linux/compiler/lib/intel64_lin
export MPI_INCLUDE=/opt/intel/oneapi/mpi/2021.9.0/include
export HOST=$(hostname)

%runscript
echo "Running MPI job inside Singularity: $HOST"
echo "MPI job submitted: $*"
exec echo

 


 Here, I’m just using the Almalinux image from Docker Hub, copying the MPI application, defining some useful environment variables and a few simple commands to execute when the container is called without any parameter.


Now, it is time to build the SIF image:


 

[root@slurmhbv3-hpc-1 jsaelices]# singularity build mympitest.sif mpi_sample.def
INFO:    Starting build...
2023/11/03 18:07:49  info unpack layer: sha256:92cbf8f6375271a4008121ff3ad96dbd0c10df3c4bc4a8951ba206dd0ffa17e2
INFO:    Copying /shared/bin/mpi_test to /shared/bin/mpi_test
INFO:    Copying /shared/bin/openmpi-test to /shared/bin/openmpi-test
INFO:    Adding environment to container
INFO:    Adding runscript
INFO:    Creating SIF file...
INFO:    Build complete: mympitest.sif

 


I’m going to just execute the MPI application binding the folder where the whole Intel MPI is laying:


 

[jsaelices@slurmhbv3-hpc-1 ~]$ singularity exec --hostname inside-singularity --bind /opt/intel:/opt/intel mympitest.sif /shared/bin/mpi_test
Hello world: rank 0 of 1 running on inside-singularity

 


Let’s call the app using mpiexec as we do with any other MPI job:


 

[jsaelices@slurmhbv3-hpc-1 ~]$ mpiexec -n 2 -hosts slurmhbv3-hpc-1 singularity exec --bind /opt/intel:/opt/intel mympitest.sif /shared/bin/mpi_test
Hello world: rank 0 of 2 running on slurmhbv3-hpc-1
Hello world: rank 1 of 2 running on slurmhbv3-hpc-1

 


In the next step, I will use SLURM scheduler to submit the job. In order to do that, I’m creating a very simple script:


 

#!/bin/bash
#SBATCH --job-name singularity-mpi
#SBATCH -N 2
#SBATCH -o %N-%J-%x
module load mpi/impi_2021.9.0
mpirun -n 4 -ppn 2 -iface ib0 singularity exec --bind /opt/intel:/opt/intel mympitest.sif /shared/bin/mpi_test

 


Let’s submit the job with sbatch:


 

$ sbatch singularity.sh

 


Let’s check the output file of the submitted job:


 

[jsaelices@slurmhbv3-hpc-1 ~]$ cat slurmhbv3-hpc-1-2-singularity-mpi
Hello world: rank 0 of 4 running on slurmhbv3-hpc-1
Hello world: rank 1 of 4 running on slurmhbv3-hpc-1
Hello world: rank 2 of 4 running on slurmhbv3-hpc-2
Hello world: rank 3 of 4 running on slurmhbv3-hpc-2

 


With this example this article ends.


You’ve seen how to run containers, how to make use of GPU and run AI workloads in a simple and effective way. You’ve also learnt how to run Singularity containers and MPI jobs easily. You can use all this material as a starting point to extend your knowledge and apply it to more complex tasks. Hope you enjoyed it.


References


Podman 


Podman HPC  


Nvidia Container Toolkit 


Singularity containers 

Announcing the new Azure Virtual Desktop Web Client User Interface

This article is contributed. See the original author and article here.

We are happy to announce the general availability of the User Interface (UI) for the Azure Virtual Desktop Web Client. The new UI offers a cleaner, more modern look and feel. With this update, you can   



  • Switch between Light and Dark Mode   

  • View your resources in a grid or list format  

  • Reset web client settings to their defaults 


How to access it 


The new client is toggled on by default on the web client, and the “preview” caption has now been removed from the toggle 


 


For additional information on the new UI, see What’s new in the Remote Desktop Web client for Azure Virtual Desktop | Microsoft Learn and New User Interface. 


  


Note: We recommend using the new client as the original version will be deprecated soon. We will share more information on that shortly! 

RUST support for UEFI development through Project Mu

RUST support for UEFI development through Project Mu

This article is contributed. See the original author and article here.

Just a decade ago, few people seemingly knew or cared about firmware. But with the increasing interconnectedness of devices and the rise of cybersecurity threats, there’s a growing awareness of firmware as the foundational software that powers everything from smartphones to smart TVs.


 


Project MU.pngTraditionally developed using the C language, firmware is essential for setting up a device’s basic functions. As a globally recognized standard, UEFI — Unified Extensible Firmware Interface enables devices to boot with fundamental security features that contribute to the security posture of modern operating systems.


 


Call for greater firmware security


As the security of our device operating systems gets more sophisticated, firmware needs to keep up. Security is paramount, but it shouldn’t compromise speed or user-friendliness. The goal is clear – firmware that’s both fast and secure.


 


What does this modern approach look like? Let’s start by looking at the key challenges:


 



  • Evolving threat landscape: As operating systems become more secure, attackers are shifting their focus to other system software, and firmware is a prime target. Firmware operates at a very foundational level in a device, and a compromise here can grant an attacker deep control over a system.

  • Memory safety in firmware: Many firmware systems have been historically written in languages like C, which, while powerful, do not inherently protect against common programming mistakes related to memory safety. These mistakes can lead to vulnerabilities such as buffer overflows, which attackers can exploit.

  • Balance of speed and security: Firmware needs to execute quickly. However, increasing security might introduce execution latency, which isn’t ideal for firmware operations.


 


Rust in the world of firmware


When it comes to modern PC firmware, Rust stands out as a versatile programming language. It offers flexibility, top-notch performance, and most importantly, safety. While C has been a go-to choice for many, it has its pitfalls, especially when it comes to errors that might lead to memory issues. Considering how crucial firmware is to device safety and operation, any such vulnerabilities can be a goldmine for attackers, allowing them to take over systems.[1] That’s where Rust shines. It’s designed with memory safety in mind, without the need for garbage collection, and has strict rules around data types and parallel operations. This minimizes the probability of errors that expose vulnerabilities, making Rust a strong choice for future UEFI firmware development.


 


Unlocking new possibilities with Rust


Rust is not just another programming language; it’s a gateway to a wealth of resources and features that many firmware developers might have missed out on in the past. For starters, Rust embraces a mix of object-oriented, procedural, and functional programming approaches and offers flexible features like generics and traits, making it easier to work with different data types and coding methods. Many complex data structures that must be hand-coded in C are available “for free” as part of the Rust language. But it’s not just about versatility and efficiency. Rust’s tools are user-friendly, offering clear feedback during code compilation and comprehensive documentation for developers. Plus, with its official package management system, developers get access to tools that streamline coding and highlight important changes. One of those features is Rust’s use of ‘crates’ – these are like ready-to-use code packages that speed up development and foster collaboration among the Rust community.


 


Making the move from C to Rust


Rust stands out for its emphasis on safety, meaning developers often don’t need as many external tools like static analyzers, which are commonly used with C. But Rust isn’t rigid; if needed, it allows for exceptions with its “unsafe code” feature, giving developers some flexibility. One of Rust’s advantages is how well it interacts with C. This means teams can start using Rust incrementally, without having to abandon their existing C code. So, while Rust offers modern advantages, it’s also mindful of the unique requirements of software running directly on hardware — without relying on the OS or other abstraction layers. Plus, it offers compatibility with C’s data structures and development patterns.


 


The Trio: Surface, Project Mu and Rust


Surface with Windows pioneered the implementation of Project Mu in 2018 as an open-source UEFI core to increase scalability, maintainability, and reusability across Microsoft products and partners. The idea was simple but revolutionary, fostering a more collaborative approach to reduce costs and elevate quality. It also offers a solution to the intricate business and legal hurdles many partners face, allowing teams to manage their code in a way that respects legal and business boundaries. A major win from this collaboration is enhanced security; by removing unnecessary legacy code, vulnerabilities are reduced. From its inception, Surface has been an active contributor, helping Project Mu drive innovation and improve the ecosystem.


 


Pioneering Rust adoption through Project Mu and Surface


Surface and Project Mu are working together to drive adoption of Rust into the UEFI ecosystem. Project Mu has implemented the necessary changes to the UEFI build environment to allow seamless integration of Rust modules into UEFI codebases. Surface is leveraging that support to build Rust modules in Surface platform firmware. With Rust in Project Mu, Microsoft’s ecosystem benefits from improved security transparency while reducing the attack surface of Microsoft devices due to Rust’s memory safety benefits. Also, by contributing firmware written in Rust to open-sourced Project Mu, Surface participates in an industry shift to collaboration with lower costs and a higher security bar. With this adoption, Surface is protecting and leading the Microsoft ecosystem more than ever.


 


Building together: Surface’s commitment to the Rust community


Surface and Project Mu plan to participate in the open Rust development community by leveraging and contributing to popular crates and publishing new ones that may be useful to other projects. A general design strategy is to solve common problems in a generic crate that can be shared and integrated into the firmware. Community crates, such as r-efi for UEFI, have already been helpful during early Rust development.


 


Getting Started


Project Mu has made it easier for developers to work with Rust by introducing a dedicated container in the Project Mu Developer Operations repository (DevOps repo). This container is equipped with everything needed to kickstart Rust development. As more Rust code finds its way into Project Mu’s repositories, it will seamlessly integrate with the standard Rust infrastructure in Project Mu, and the dedicated container provides an easy way to immediately take advantage of it.


 


The Project Mu Rust Build readme details how to begin developing with Rust and Project Mu. Getting started requires installing the Rust toolchain and Cargo make as a build runner to quickly build Rust packages. Refer to the readme for guidance on setting up the necessary build and configuration files and creating a Rust module.


 


Demonstrating Functionality


QEMU is an open-source virtual machine emulator. Project Mu implements open-source firmware for the QEMU Q35 platform in its Mu Tiano Platforms repository. This open virtual platform is an easily accessible demonstration vehicle for Project Mu features. In this case, UEFI (DXE) Rust modules are already included in the platform firmware to demonstrate their functionality (and test it in CI).


 


Looking ahead


With the expansion of firmware code written in Rust, Surface looks forward to leveraging the Project Mu community to help make our firmware even more secure.  To get involved with Project Mu, review the documentation and check out the Github repo. Regularly pull updates from the main repo, keep an eye on the project’s roadmap, and stay engaged with the community to remain informed about changes and new directions.


 


Footnotes


1. See Trends, challenge, and shifts in software vulnerability mitigation


 


References


What’s new for Security: Training and Certification

This article is contributed. See the original author and article here.

Microsoft Learn offers you the latest resources to ensure you have what you need to prepare for exams and reach your skilling goals. Here we share some important updates about Security content, prep videos, certifications, and more.


 


Exam Readiness Zone: preparing for Exams SC-100, SC-200, and SC-300


Now, you can leverage the Exam Readiness Zone, our free exam prep resource available on Microsoft Learn for your next Security certification! View our expert-led exam prep videos to help you identify the key knowledge and skills measured on exams and how to allocate your study time. Each video segment corresponds to a major topic area on the exam.


 


During these videos, trainers will point out objectives that many test takers find difficult. In these videos, we include example questions and answers with explanations.


 


For technical skilling, we now have videos available for the following topics:


 


Preparing for Exam SC-100: Microsoft Cybersecurity Architect



  • Design solutions that align with security best practices and priorities

  • Design security operations, identity, and compliance capabilities

  • Design security solutions for infrastructure

  • Design security solutions for applications and data


Review the exam prep videos.


 


Preparing for Exam SC-200: Microsoft Security Operations Analyst



  • Mitigate threats using Microsoft 365 Defender

  • Mitigate threats using Microsoft 365 Defender

  • Mitigate threats using Microsoft Sentinel


Review the exam prep videos and take a free practice assessment.


 


Preparing for Exam SC-300: Microsoft Identity and Access Administrator



  • Implement identities in Azure AD

  • Implement authentication and access management

  • Implement access management for applications

  • Plan and implement identity governance in Azure AD


Review the exam prep videos and take a free practice assessment.


 


Visit the Exam Readiness Zone to leverage tips, tricks, and strategies for preparing for your next Microsoft Certification exam.


 


 


Newly added Security Cloud Skills Challenge on 30 Days to Learn It


We recently released the new Security Operations Analyst Cloud Skills Challenge on 30 Days to Learn It. Build your skills and prepare for Exam SC-200: Microsoft Security Operations Analyst, required to earn your Microsoft Certified: Security Operations Analyst Associate certification.


 


Are you thinking of adopting the upcoming Security Copilot? This challenge will help you prepare, as it includes the security operations analyst skills required to tune up your platform and get it ready for Security Copilot.


 


Complete the challenge within 30 days and you can be eligible to earn a 50% discount on the Certification exam.


 


Start the 30 Days to Learn it challenge today!


 


New name: Information Protection and Compliance Administrator Associate certification


As we announced a couple of months ago, we updated the certification name to the Microsoft Certified: Information Protection and Compliance Administrator Associate certification and the exam name to Exam SC-400: Administering Information Protection and Compliance in Microsoft 365 as we recognized the need to expand this certification and exam to include compliance features.


The Exam SC-400 evaluates your proficiency in performing the following technical tasks: implementing information protection, implementing DLP, implementing data lifecycle and records management, monitoring and investigating data and activities through Microsoft Purview, and managing insider and privacy risks in Microsoft 365.


 


Prepare for the exam with the SC-400 study guide and with our free practice assessment.


 


Security Learning Rooms


The Microsoft Learn Community offers a variety of ways to connect and engage with each other and technical experts. One of the core components of this experience are the learning rooms, a space to find connections with experts and peers.


 


There are four Microsoft Security Learning Rooms to choose from that span end-to-end:



  • Cloud Security Study Group

  • Compliance Learning Room

  • Cybersecurity from Beginner to Expert

  • Microsoft Entra


 


Whether you choose one path or all of them, the Microsoft Learn Community experiences are ready to support your learning journey.


To explore additional security technical guidance, please visit the refreshed Security documentation hub on Microsoft Learn.


 


 

Dataverse Low-Code Plugins | Dataverse Accelerator | [Preview]

Validate Email address on Email field in Model-Driven Apps | [Preview]

As part of various implementations, often you’ll need to ensure that the field validation is in place. Note: Please note that this feature is still in Preview at the time of writing this post. Enable Data Validation in Power Platform Admin Center Here’s how you can enable Data Validation for Email fields in Power Platform … Continue reading Validate Email address on Email field in Model-Driven Apps | [Preview]

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

Enhance agent efficiency and flexible work distribution with capacity profiles

Enhance agent efficiency and flexible work distribution with capacity profiles

This article is contributed. See the original author and article here.

Efficiently managing a contact center requires a fine balance between workforce engagement and customer satisfaction. The ability to create agent-specific capacity profiles in Dynamics 365 Customer Service empowers administrators and supervisors to fine-tune the work allocation based on an agent’s experience and expertise, optimizing agent performance and delivering tailored customer service. 

Understand capacity profiles 

Capacity profiles are at the core of Dynamics 365 Customer Service, defining the type and amount of work agents can handle, ensuring equitable work distribution. Profiles are even more beneficial when agents are blended across various channels. Agent-specific capacity profiles take this a step further, enabling customized work limits for individual agents based on their proficiency. Let’s explore this capability with an example. 

A real-world scenario: Casey’s challenge 

Meet Casey, a Customer Service administrator at Contoso Bank who aims to maximize the efficiency of her customer service team. She wants senior agents to handle more responsibilities, giving junior agents the time to focus on training and skill development.

Casey decides to use agent-specific capacity profiles for credit card inquiries in the North America region. She sets up a “Credit Card NAM” profile with a default limit of two concurrent conversations. She assigns it to Kiana, a seasoned agent, and Henry, a junior agent who recently joined Contoso. 

Customize capacity limits 

Casey recognizes that Kiana’s seniority and expertise warrant a different limit. With agent-specific capacity profiles, she can easily update Kiana’s limit to handle three conversations at a time. The immediate benefit of this approach is apparent. This balance allows junior agents like Henry to invest more time in training and development while experienced agents like Kiana manage a higher workload efficiently. 

Flexibility in action 

In the dynamic world of customer service, circumstances can change rapidly. Contoso Bank faces an unexpected surge in insurance-related queries. Casey needs to adapt to this evolving scenario promptly and this is where agent-specific capacity profiles truly shine. 

Casey has Kiana take on the additional insurance queries alongside her credit card queries. She assigns the “Insurance” profile to Kiana. She also resets Kiana’s work limit for the “Credit Card NAM” profile back to the default amount, providing her the bandwidth to handle the increased workload efficiently. 

The result: Optimal efficiency 

This example showcases the flexibility and real-time adaptability that agent-specific capacity profiles offer. Casey is empowered to make agile and precise work distribution decisions, ensuring that agents’ expertise and experience are utilized optimally. 

Conclusion 

In the world of customer service, where every interaction matters, this feature is a game-changer. It helps organizations reduce agent stress, elevate customer satisfaction, and offer a flexible solution for modern customer service management. By embracing this feature, businesses can ensure that their customer service is optimized for excellence, regardless of changing circumstances. 

Learn more about capacity profiles

Watch a short video introduction.

To learn more, read the documentation: Create and manage capacity profiles

The post Enhance agent efficiency and flexible work distribution with capacity profiles appeared first on Microsoft Dynamics 365 Blog.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

Tutorial:A graceful process to develop and deploy Docker Containers to Azure with Visual Studio Code

Tutorial:A graceful process to develop and deploy Docker Containers to Azure with Visual Studio Code

This article is contributed. See the original author and article here.

Creating and deploying Docker containers to Azure resources manually can be a complicated and time-consuming process. This tutorial outlines a graceful process for developing and deploying a Linux Docker container on your Windows PC, making it easy to deploy to Azure resources.


 


This tutorial emphasizes using the user interface to complete most of the steps, making the process more reliable and understandable. While there are a few steps that require the use of command lines, the majority of tasks can be completed using the UI. This focus on the UI is what makes the process graceful and user-friendly.


 


In this tutorial, we will use a Python Flask application as an example, but the steps should be similar for other languages such as Node.js.


 


Prerequisites:


 


Before you begin, you’ll need to have the following prerequisites set up:


 



  • WSL 2 installation


WSL provides a great way to develop your Linux application on a Windows machine, without worrying about compatibility issues when running in a Linux environment. We recommend installing WSL 2 as it has better support with Docker. To install WSL 2, open PowerShell or Windows Command Prompt in administrator mode, enter below command:


 


wsl –install

And then restart your machine.


You’ll also need to install the WSL extension in your Visual Studio Code.


 


yorkzhang_2-1698766955740.png


 



  • Python 3 installation


Run “wsl” in your command prompt. Then run following commands to install python 3.10 (if you use Python 3.5 or a lower version, you may need to install venv by yourself):


 


sudo apt-get update
sudo apt-get upgrade
sudo apt install python3.10

 



  • Docker for Linux


You’ll need to install Docker in your Linux environment. For Ubuntu, please refer to below official documentation:


https://docs.docker.com/engine/install/ubuntu/



  • Docker for Windows


To create an image for your application in WSL, you’ll need Docker Desktop for Windows. Download the installer from below Docker website and run the downloaded file to install it.


https://www.docker.com/products/docker-desktop/ 


 


Steps for Developing and Deployment


 


1. Connect Visual Studio Code to WSL


 


To develop your project in Visual Studio Code in WSL, you need to click the bottom left blue button:


 


yorkzhang_3-1698767179615.png


 


Then select “Connect to WSL” or “Connect to WSL using Distro”:


 


yorkzhang_4-1698767223603.png


 


2. Install some extensions for Visual Studio Code


 


Below two extensions have to be installed after you connect Visual Studio Code to WSL.


The Docker extension can help you create Dockerfile automatically and highlight the syntax of Dockerfile. Please search and install via Visual Studio Code Extension.


 


yorkzhang_5-1698767292825.png


 


To deploy your container to Azure in Visual Studio Code, you also need to have Azure Tools installed.


yorkzhang_6-1698767365895.png


 


3. Create your project folder


 


Click “Terminal” in menu, and click “New Terminal”:


 


yorkzhang_7-1698767432134.png


 


Then you should see a terminal for your WSL.


 


I use a quick simple Flask application here for example, so I run below command to clone its git project:


 


git clone https://github.com/Azure-Samples/msdocs-python-flask-webapp-quickstart

4. Python Environment setup (optional)


 


After you install Python 3 and create project folder. It is recommended to create your own project python environment. It makes your runtime and modules easy to be managed.


To setup your Python Environment in your project, you need to run below commands in the terminal:


 


cd msdocs-python-flask-webapp-quickstart
python3 -m venv .venv

 


Then after you open the folder, you will be able to see some folders are created in your project:


 


yorkzhang_8-1698767517118.png


 


Then if you open the app.py file, you can see it used the newly created python environment as your python environment:


 


yorkzhang_9-1698767564582.png


 


If you open a new terminal, you also find the prompt shows that you are now in new python environment as well:


 


yorkzhang_10-1698767600526.png


Then run below command to install the modules required in the requirement.txt:


 


pip install -r requirements.txt

5. Generate a Dockerfile for your application


 


To create a docker image, you need to have a Dockerfile for your application.


 


You can use Docker extension to create the Dockerfile for you automatically. To do this, enter ctrl+shift+P and search “Dockerfile” in your Visual Studio Code. Then select “Docker: Add Docker Files to Workspace”


yorkzhang_11-1698767664775.png


 


You will be required to select your programming languages and framework(It also supports other language such as node.js, java, node). I select “Python Flask”.


Firstly, you will be asked to select the entry point file. I select app.py for my project.


Secondly, you will be asked the port your application listens on. I select 80.


Finally, you will be asked if Docker Compose file is included. I select no as it is not multi-container.


A Dockefile like below is generated:


yorkzhang_12-1698767725370.png


Note:


If you do not have requirements.txt file in the project, the Docker extension will create one for you. However, it DOES NOT contain all the modules you installed for this project. Therefore, it is recommended to have the requirements.txt file before you create the Dockerfile. You can run below command in the terminal to create the requirements.txt file:


 


pip freeze > requirements.txt

 


After the file is generated, please add “gunicorn” in the requirements.txt if there is no “gunicorn” as the Dockerfile use it to launch your application for Flask application.


Please review the Dockerfile it generated and see if there is anything need to modify.


You will also find there is a .dockerignore file is generated too. It contains the file and the folder to be excluded from the image. Please also check it too see if it meets your requirement.


 


6. Build the Docker Image


 


You can use the Docker command line to build image. However, you can also right-click anywhere in the Dockefile and select build image to build the image:


 


yorkzhang_13-1698767781258.png


 


Please make sure that you have Docker Desktop running in your Windows.


Then you should be able to see the docker image with the name of the project and tag as “latest” in the Docker extension.


yorkzhang_14-1698767835075.png


 


7. Push the Image to Azure Container Registry


 


Click “Run” for the Docker image you created and check if it works as you expected.


yorkzhang_0-1698768943667.png


 


Then, you can push it to the Azure Container Registry (ACR). Click “Push” and select “Azure”.


yorkzhang_15-1698767869845.png


 


You may need to create a new registry if there isn’t one. Answer the questions that Visual Studio Code asks you, such as subscription and ACR name, and then push the image to the ACR.


 


8. Deploy the image to Azure Resources


 


Follow the instructions in the following documents to deploy the image to the corresponding Azure resource:


Azure App Service or Azure Container App: Deploy a containerized app to Azure (visualstudio.com) Opens in new window or tab


Container Instance: Deploy container image from Azure Container Registry using a service principal – Azure Container Instances | Microsoft Learn Opens in new window or tab

How to Avoid Transaction Isolation Level Issues on Azure SQL Managed Instance

This article is contributed. See the original author and article here.

In this technical article, we will delve into an interesting case where a customer encountered problems related to isolation levels in Azure SQL Managed Instance. Isolation levels play a crucial role in managing the concurrency of database transactions and ensuring data consistency. We will start by explaining isolation levels and providing examples of their usage. Then, we will summarize and describe the customer’s problem in detail. Finally, we will go through the analysis of the issue.


 


Isolation Level


Isolation level is a property of a transaction that determines how data is accessed and modified by concurrent transactions. Different isolation levels provide different guarantees about the consistency and concurrency of the data. SQL Server and Azure SQL Managed Instance support five isolation levels: read uncommitted, read committed, repeatable read, snapshot, and serializable. The default isolation level for both platforms is read committed.


 


Read uncommitted allows a transaction to read data that has been modified by another transaction but not yet committed. This can lead to dirty reads, non-repeatable reads, and phantom reads. Read committed prevents dirty reads by only allowing a transaction to read data that has been committed by another transaction. However, it does not prevent non-repeatable reads or phantom reads. Repeatable read prevents non-repeatable reads by locking the data that has been read by a transaction until the transaction ends. However, it does not prevent phantom reads. Snapshot prevents both non-repeatable reads and phantom reads by using row versioning to provide a consistent view of the data as it existed at the start of the transaction. Serializable prevents all concurrency anomalies by locking the entire range of data that is affected by a transaction until the transaction ends.


 


The isolation level can be set for each connection using the SET TRANSACTION ISOLATION LEVEL statement or using the IsolationLevel property of the .NET TransactionScope class. The isolation level can also be overridden for individual statements using table hints such as (NOLOCK) or (READCOMMITTED).


 


Problem Description


The customer reported that they observed unexpected transaction isolation level changes when running distributed transactions using .NET Transaction Scope on Azure SQL Managed Instance, while the same application was behaving differently when using On premise SQL Server.


 


The customer was opening two connections to the same database under one transaction scope, one at a time, and they observed the transaction isolation level got reset after the second connection had been opened. For example, if they set the isolation level to repeatable read for the first connection, it would be changed to read committed for the second connection. This caused inconsistency and concurrency issues in their application.


 


The following code snippet illustrates the scenario:


 


 


 

TransactionOptions transactionOptions = new TransactionOptions
{
                IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted
};

string connectionStr = "Data Source=testwest.com;Initial Catalog=test;User id=sa;Password=;Connection Timeout=0";

using (TransactionScope ts = new TransactionScope(TransactionScopeOption.Required, transactionOptions))
{
                using (SqlConnection connection1 = new SqlConnection(connectionStr))
                {
                                SqlCommand cmd = new SqlCommand("SELECT transaction_isolation_level FROM sys.dm_exec_sessions where session_id = @@SPID", connection1);
                                connection1.Open();
                                SqlDataReader rs = cmd.ExecuteReader();
                                rs.Read();
                                Console.WriteLine(rs.GetInt16(0));
                                connection1.Close();
                }

                using (SqlConnection connection2 = new SqlConnection(connectionStr))
                {
                                SqlCommand cmd = new SqlCommand("SELECT transaction_isolation_level FROM sys.dm_exec_sessions where session_id = @@SPID", connection2);
                                connection2.Open();
                                SqlDataReader rs = cmd.ExecuteReader();
                                rs.Read();
                                Console.WriteLine(rs.GetInt16(0));
                                connection2.Close();
                }

                ts.Complete();
}

 


 


 


 


The customer stated that they are not using the “Pooling” parameter in their connection string, which means that connection pooling is enabled by default.


 


Problem Analysis


We investigated the issue and found that the root cause was related to how connection reset works on Azure SQL Managed Instance and cloud in general, compared to On-premise SQL Server.


 


Connection reset is a mechanism that restores the connection state to its default values before reusing it from the connection pool. Connection reset can be triggered by various events, such as closing the connection, opening a new connection with a different database name or user ID, or executing sp_reset_connection stored procedure.


 


One of the connection state attributes that is affected by connection reset is the transaction isolation level. Resetting the connection on Azure SQL Managed Instance will always reset the transaction isolation level to the default one, which is read committed. This is not true for on-premise SQL Server, where resetting the connection will preserve the transaction isolation level that was set by the application.


 


This difference in behavior is due to how Azure SQL Managed Instance implements distributed transactions using MSDTC (Microsoft Distributed Transaction Coordinator). MSDTC requires that all connections participating in a distributed transaction have the same transaction isolation level. To ensure this requirement, Azure SQL Managed Instance resets the transaction isolation level to read committed for every connection that joins a distributed transaction.


 


Since the customer is opening and closing the connection to the same database twice, only one physical connection will be created. The driver will use the same connection for both query executions, but the connection will be reset before being reused. The first connection reset will happen when the first connection is closed, and the second connection reset will happen when the second connection is opened under the same transaction scope. The second connection reset will override the isolation level that was set by the application for the first connection.


 


This explains why the customer observed unexpected transaction isolation level changes when running distributed transactions using .NET Transaction Scope on Azure SQL Managed Instance.


 


Conclusion


First and foremost, it is beneficial to emphasize that this is an expected behavior from a design perspective. The customer is advised to either disable connection pooling or explicitly set the transaction isolation level for every opened connection.


 


To disable connection pooling, they can add “Pooling=false” to their connection string. This will create a new physical connection for every logical connection, and avoid the connection reset issue. However, this will also increase the overhead of opening and closing connections, and reduce the scalability and performance of the application.


 


To explicitly set the transaction isolation level for every opened connection, they can use the SET TRANSACTION ISOLATION LEVEL statement or the IsolationLevel property of the .NET TransactionScope class. This will ensure that the isolation level is consistent across all connections participating in a distributed transaction, regardless of the connection reset behavior. For example, they can modify their code snippet as follows:


 


 


 

using (TransactionScope scope = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = IsolationLevel.RepeatableRead }))

{

    using (SqlConnection conn1 = new SqlConnection(connectionString))

    {

        conn1.Open();

        // Set the isolation level explicitly

        SqlCommand cmd1 = new SqlCommand("SET TRANSACTION ISOLATION LEVEL REPEATABLE READ", conn1);

        cmd1.ExecuteNonQuery();

        // Execute some queries on conn1

    }



    using (SqlConnection conn2 = new SqlConnection(connectionString))

    {

        conn2.Open();

        // Set the isolation level explicitly

        SqlCommand cmd2 = new SqlCommand("SET TRANSACTION ISOLATION LEVEL REPEATABLE READ", conn2);

        cmd2.ExecuteNonQuery();

        // Execute some queries on conn2

    }



    scope.Complete();

}

 


 


For additional information about database isolation settings, you can review the below documents.


SET TRANSACTION ISOLATION LEVEL (Transact-SQL) – SQL Server | Microsoft Learn


Transaction locking and row versioning guide – SQL Server | Microsoft Learn


System stored procedures (Transact-SQL) – SQL Server | Microsoft Learn


SQL Server Connection Pooling – ADO.NET | Microsoft Learn


 


I hope this article was helpful for you, please feel free to share your feedback in the comments section. 


 


Disclaimer
Please note that products and options presented in this article are subject to change. This article reflects isolation level settings for Azure SQL Managed Instance in October, 2023.