This article is contributed. See the original author and article here.

In our recently released AKS Engine on Azure Stack Hub pattern we’ve walked through the process of how to architect, design, and operate a highly available Kubernetes-based infrastructure on Azure Stack Hub. As production workloads are deployed, one of the topics that need to be clear and have operational procedures assigned, is the Patch and Update (PNU) process and the differences between Azure Kubernetes Service (AKS) clusters in Azure and AKS Engine based clusters on Azure Stack Hub. We have invited Heyko Oelrichs, who is a Microsoft Cloud Solution Architect, to explore these topics and help start the PnU strategy for the AKSe environments on Azure Stack Hub.

 

Before we start let’s introduce the relevant components: 

  • AKS Engine is the open-source tool (hosted on GitHub) that is also used in Azure (under the covers) to deploy managed AKS clusters and is available to provision unmanaged Infrastructure-as-a-Service (IaaS) based Kubernetes Clusters in Azure and Azure Stack Hub.  
  • Azure Stack Hub is an extension of Azure that provides Azure services in the customer’s or the service provider’s datacenter.  

The PNU process of a managed AKS cluster in Azure is partially automated and consists of two main areas: 

  1. Kubernetes version upgrades are triggered manually either through the Portal, Azure CLI or ARMThese upgrades contain, next to the Kubernetes version upgrade itself, upgrades of the underlaying base OS image if available. These upgrades typically cause the reboot of the cluster nodes. 

    Our recommendation is to regularly upgrade the Kubernetes version in your AKS cluster to stay supported and current on new features and bug fixes.  

  2. Security updates for the base OS image are applied automatically to the underlaying cluster nodes. These updates can include OS security fixes or kernel updates. AKS does not automatically reboot these Linux nodes to complete the update process. 

The PNU process on Azure Stack Hub is pretty much similar with a few small differences we want to highlight here. First thing to note is that Azure Stack Hub runs in a customer or service provider data center and is not managed or operated by Microsoft.  

That also means that Kubernetes clusters deployed using AKS Engine on Azure Stack Hub are not managed by Microsoft. Neither the worker nodes nor the control plane. Microsoft provides the tool AKS Engine and the base OS images (via the Azure Stack Hub Marketplace) you can use to manage and upgrade your cluster.  

On a high level, AKS Engine helps with the most important operations: 

Important to note though is, that AKS Engine allows you to upgrade only clusters that were originally deployed using the tool, clusters that were created without and outside of AKS Engine cannot be maintained and upgraded using AKS Engine.  

Upgrade to a newer Kubernetes version 

The aks-engine upgrade command updates the Kubernetes version and the AKS Base Image. Every time that you run the upgrade command, for every node of the cluster, the AKS engine creates a new VM using the AKS Base Image associated to the version of aks-engine used. 

The Azure Stack Hub Operator together with the Kubernetes Cluster administrator should make sure, prior to each upgrade: 

  • that no system updates or scheduled tasks are planned 
  • that the subscription has enough space for the entire process 
  • that you have a backup cluster and that it is operational 
  • that the required AKS Base image is available, the right AKS Engine version is used as well as that the target Kubernetes version is specified and supported 

The aks-engine repository on GitHub contains a detailed description of the upgrade process.  

Upgrade the base OS image only 

There might be valid reasonsfor example dependencies to specific Kubernetes API versions and others, to not upgrade to a newer Kubernetes version, while still upgrading to a newer release of the underlaying base OS image. Newer base OS images contain the latest OS security fixes and kernel updates. This base OS image only upgrade is possible by explicitly specifying the target version, see here. 

The process is the same as for the Kubernetes version upgrade and also contains a reboot/recreation of the underlaying cluster nodes. 

Applying security updates 

The third area, that’s already baked into AKS Engine based Kubernetes clusters and does not need manual intervention is the process of how security updates are applied. This applies for example to Security updates that were released before a new base OS image is available in the Azure Stack Hub Marketplace or between twaks-engine upgrade runs, e.g. as part of a monthly maintenance task. 

These Security updates are automatically installed using the Unattended Upgrade mechanism. Unattended Upgrade is a tool built into Debian, which is the foundation of Ubuntu which is the Linux distro used for AKS and AKS Engine based Kubernetes clusters. It’s enabled by default and installs security updates automatically, but does not reboot the Kubernetes cluster nodes.  

Note: this automatic installation is done in connected environments, where the Azure Stack Hub workloads in user-subscriptions have access to the Internet. Disconnected environments need to follow a different approach. 

Rebooting the nodes can be automated using the open-source KUbernetes REboot Daemon (kured) that watches for Linux nodes that require a reboot, then automatically handle the rescheduling of running pods and node reboot process. 

 

Update types and components 

Component(s) 

Updates 

Responsibility 

Azure Stack Hub 

Microsoft software updates can include the latest Windows Server security updates, non-security updates, and Azure Stack Hub feature updates. 

OEM hardware vendor-provided updates can contain hardware-related firmware and driver update packages. 

Azure Stack Hub Operator 

 

Go to Azure Stack Hub servicing policy to learn more. 

AKS Engine 

AKS Engine updates typically contain support for newer Kubernetes versions, Azure and Azure Stack API updates and other improvements. 

Kubernetes cluster operator 

Visit the aks-engine releases and documentation on GitHub to learn more. 

AKS Base Image 

AKS Base Images are released on a regular basis and contain newer operating system versions, software components, security and kernel updates. These images are available through the Azure Stack Hub Marketplace. 

Azure Stack Hub Operator + Kubernetes cluster operator 

Kubernetes 

Kubernetes releases minor versions roughly every three months. These releases include new features and improvements. Patch releases are more frequent and are only intended for critical bug fixes in a minor version. These patch releases include fixes for security vulnerabilities or major bugs impacting a large number of customers and products running in production based on Kubernetes. 

Kubernetes cluster operator 

Visit Supported Kubernetes versions in Azure Kubernetes Service (AKS) and Supported AKS Engine versions to learn more.  

Linux (Ubuntu) and Windows Node Updates 

Some Linux updates are automatically applied to Linux nodes (as described above). These updates include OS security fixes or kernel updates.  

Windows Server nodes don’t receive daily updates. Instead an aks-engine upgrade deploys new nodes with the latest base Window Server image and patches. 

Kubernetes cluster operator 

Azure Stack Hub Operator (to provide new OS images) 

 

Conclusion and Responsibilities 

  • New AKS Base OS Images are regularly released via the Azure Stack Hub Marketplace and have to be downloaded by the Azure Stack Operator. 
  • New AKS Base OS Images and Kubernetes versions are applied using aks-engine upgrade and include the recreation of the nodes – this does not affect the operation of the cluster or the user workloads 
  • Azure Stack Hub Operators play a crucial role in the overall upgrade process and should be consulted and involved in every upgrade process
  • *very important* the Azure Stack Hub Operator should always consult the Release Notes that come with each update and inform the Kubernetes cluster administrator of any known issues. 
  • Kubernetes cluster operators have to be aware of the availability of new updates for Kubernetes and AKS Engine and to apply them accordingly.  
  • AKS Engine supports specific versions of Kubernetes and the AKS Base Image. 
  • Security updates and kernel fixes are applied automatically and do not automatically reboot the cluster nodes. 
  • Kubernetes cluster operators should implemented kured or other solutions to gracefully reboot cluster nodes with pending reboots to complete the update process 

This article and especially the list of responsibilities and considerations above is intended to give you a starting point and an idea of how to structure and execute the PNU process for AKSe environments. The details of the PNU process and how they relate to the application architecture are the most critical pieces of a successful and reliable operation. Separating the layers (the Azure Stack Hub platform, the ASKe platform, the application and respective data itself) would help towards being prepared to support an outage at each layer – and having operations prepared for each of them as well as mitigation steps required, would help minimize the risk. 

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.