This article is contributed. See the original author and article here.

This article contains a few recommendations for reducing the total cost of ownership (TCO) of your Azure Kubernetes Service (AKS) cluster. If you want to minimize the number of unused cores, you can use the following general guidelines to improve the density of your workloads and reduce the number of VMs to the bare minimum.


 



Here are some more general recommendations to reduce the TCO of an AKS cluster in addition to the previous guidelines and considerations:


 



  • Review the Cost optimization section of the Azure Well-Architected Framework for AKS.

  • Use Azure Advisor to monitor and release unused resources. Find and release any resource not used by your AKS cluster, such as public IPs, managed disks, etc. For more information, see Find and delete unattached Azure managed and unmanaged disks

  • Use Microsoft Cost Management budgets and reviews to keep track of expenditures.

  • Use Azure Reservations to reduce the cost of the agent nodes. Azure Reservations help you save money by committing to one-year or three-year plans for multiple products. Committing allows you to get a discount on the resources you use. Reservations can significantly reduce your resource costs by up to 72% from pay-as-you-go prices. Reservations provide a billing discount and don’t affect the runtime state of your resources. After you purchase a reservation, the discount automatically applies to matching resources. You can purchase reservations from the Azure portal, APIs, PowerShell, and Azure CLI.

  • Add one or more spot node pools to your AKS cluster. As you know, a spot node pool is backed by a spot Virtual Machine Scale Set (VMSS). Using spot VMs for nodes with your AKS cluster allows you to take advantage of unutilized capacity in Azure at significant cost savings. The amount of available unutilized capacity will vary based on many factors, including node size, region, and time of day. When deploying a spot node pool, Azure will allocate the spot nodes if there’s capacity available. But there’s no SLA for the spot nodes. A spot scale set that backs the spot node pool is deployed in a single fault domain and offers no high availability guarantees. When Azure needs the capacity back, the Azure infrastructure will evict spot nodes, when you create a spot node pool. You can define the maximum price you want to pay per hour and enable the cluster autoscaler, which is recommended for use with spot node pools. Based on the workloads running in your cluster, the cluster autoscaler scales out and scales in the number of nodes in the node pool. For spot node pools, the cluster autoscaler will scale out the number of nodes after an eviction if additional nodes are still needed. For more information, see Add a spot node pool to an Azure Kubernetes Service (AKS) cluster.

  • System pools must contain at least one node, while user node pools may contain zero or more nodes. Hence, you could set up a user node pool to automatically scale from 0 to N node. Using a horizontal pod autoscaler based on CPU and memory or the metrics of an external system like Apache Kafka, RabbitMQ, Azure Service Bus, etc., you could configure your workloads to scale out and scale in using Kubernetes Event-driven Autoscaling (KEDA).

  • Your AKS workloads may not need to run continuously, for example, a development cluster with node pools running specific workloads. To optimize your costs, you can completely turn off an AKS cluster or stop one or more node pools in your AKS cluster, allowing you to save on compute costs. For more information, see:




  • Deploy and manage containerized applications with Azure Kubernetes Service (AKS) running on Ampere Altra Arm-based processors. For more information, see Azure Virtual Machines with Ampere Altra Arm-based processors.

  • Migrate application workloads written in full .NET Framework, which requires running in Windows containers to .NET Standard. Migrated workloads will run in Linux containers with a smaller footprint in terms of a container image and hence provide better density. This decreases the number of agent nodes required to host and run applications.

  • For multitenant solutions, physical isolation is more costly and adds management overhead. Logical isolation requires more Kubernetes experience and increases the surface area for changes and security threats, but it shares the costs.


Don’t hesitate to write a comment below if you want to suggest additional recommendations to reduce the total cost of ownership of an AKS cluster. I’ll include your observations in this article. Thanks!

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.