Azure Learn Article: Build a private 5G network

Azure Learn Article: Build a private 5G network

This article is contributed. See the original author and article here.

There are four critical components to a private 5G network: the local radio and networking functions, the edge compute platform, the application ecosystem, and the cloud. Private 5G network solutions are often built as a collection of loosely integrated components. This loose-integration approach might make sense in terms of physical and organizational demarcation points, but it has drawbacks. The following problems are inherent in this approach:



  • Difficult to deploy

  • Difficult to manage

  • Expensive to scale

  • Not secure


Microsoft offers a different approach to help operators take advantage of the opportunity to provide private 5G to enterprises. This cloud-managed private 5G network solution features cloud-native mobile core technology, advanced edge computing, and a flexible choice of radio and application partners. The solution is designed to be delivered and managed by operators. It’s appropriate for various industry verticals. The Microsoft approach for private 5G networks provides advantages to both operators and enterprises, and a choice of platforms and applications.


 


Azure private multiaccess edge compute is a solution for integrating edge compute, networking, and applications on a range of edge platforms. It’s designed to help operators and partners deliver high-performance MEC solutions to enterprises.


private-multiaccess-edge-compute.png


 


Azure Stack platform for the edge


Private multiaccess edge compute can be deployed across multiple platforms powered by Azure Stack solutions. Azure Stack solutions extend Azure services and capabilities to various environments, from the datacenter to edge locations and remote offices. Enterprises can use Azure Stack to build, deploy, and run hybrid and edge computing apps consistently across IT ecosystems.


edge-platforms.png


Azure Stack Edge is an Azure-managed edge appliance that brings the compute, storage, and intelligence of Azure into the enterprise environment. As part of the private multiaccess edge compute solution, Azure Stack Edge provides a single point for processing mobile network data at the edge. It enables local, intelligent breakout of data processing for faster processing and lower bandwidth consumption.


 


Azure Stack HCI is a hyperconverged infrastructure (HCI) operating system delivered as an Azure service. It provides the latest security, performance, and hybrid capabilities, which include Azure Arc enablement and management via the Azure portal.


 


Azure Private 5G Core


The final component of the Microsoft architecture for private 5G networks is the 4G/5G mobile core, which is deployed as a standalone 5G core with an optional 4G interworking function to enable support for 4G LTE RAN. Private 5G Core is a fully virtualized, cloud-native solution that includes standard 5G core network functions.


private-5g-core.png


Azure automates the deployment and configuration of Private 5G Core on Azure Stack Edge. To deploy and update network configuration, the network operator can use any or all of these tools:



  • The Azure portal web GUI

  • The Azure Resource Manager REST API

  • Deployment templates (ARM templates and/or Terraform)


Operators can use declarative syntax, in standard JSON or Bicep data format, to configure the core. This enables integration with CI/CD pipelines like Azure DevOps or GitHub Actions for repeatable deployments and updates across multiple network sites.


Operators don’t have to deploy 5G to benefit from a cloud-enabled private 5G network. Enterprise customers and operators can deploy a private 5G network in a 4G-only, 5G-only, or hybrid-4G/5G environment. Private 5G Core supports the transition from 4G to a 5G standalone network, so enterprises can start with a 4G implementation and later migrate to 5G or deploy a hybrid 4G/5G private solution.


 


Azure orchestration for edge workloads


Azure provides several tools for onboarding edge applications and network functions. The Microsoft private 5G solution uses Azure Arc and Azure Network Function Manager.


 


Azure Arc can simplify complex infrastructures that are distributed across on-premises, edge, and multicloud environments and extend Azure management to these environments.


 


For network function workloads, Network Function Manager is a cloud-native orchestration service that customers can use to deploy and provision network functions on Azure Stack platforms. Supported platforms include Azure Stack Edge Pro, which has a built-in GPU, and Azure Stack HCI. When used with Azure Stack Edge, Network Function Manager provides deployment, provisioning, and cloud-based management of network functions or apps running at the private edge, directly from the Azure portal. It’s a managed service, so an Azure-managed service provider handles updates, lifecycle management, and support for network functions and applications on the edge device. The platform supports virtual machines and containerized workloads, and one or two GPUs for acceleration.


 


Azure solution characteristics


This solution significantly reduces the total cost of ownership of operating a private cellular network. The following attributes contribute to reducing costs:


 


Telco private 5G network-as-a-service


Azure enables operators to deploy a complete CBRS/4G/5G mobile core as a service. The private 5G network-as-a-service approach changes how private 5G networks are deployed and managed. It gives operators more flexibility and enables them to provide the mobile core functionality as a hosted and managed service.


 


The solution can be integrated with a variety of radio access network (RAN) and subscriber identity management (SIM) systems via the Microsoft partner ecosystem. In addition to integration with other applications on Azure, like Azure Machine Learning and Azure IoT Hub, the solution has built-in features that enterprises need for service integrations. These features typically incur separate cost and compute, and require complex operations, but these functions are included as part of this solution. No additional hardware is required. The solution also provides integrated local policy to enable differentiated traffic treatment based on SIM policies configured in the Azure cloud and synced to Azure Private 5G Core.


 


Cloud-native solution


Because it uses both the Azure cloud and Azure Stack Edge, this solution architecture brings the benefits of cloud economics and a pay-as-you-go consumption model. Customers can size and deploy the solution for their current workloads and avoid the risk of underestimating or overestimating resource requirements. The Azure cloud also provides built-in enhanced security and privacy compliance. Customers can confidently deploy the solution in verticals that require stringent data privacy, like healthcare, government, public safety, and defense. Deploying an edge-based solution that uses Azure Stack Edge provides connectivity services and the ability to deploy edge applications. This combination helps customers deploy edge applications that require low-latency and edge-compute processing.


 


Cloud management, automation, and observability


The solution is managed remotely via Azure cloud. It’s a multi-tenant solution that gives role-based access control to network admins, enabling them to view, manage, and control private 5G networks and SIMs. Cloud management provides cost savings for several reasons:



  • The solution can be serviced without the need to send technicians or agents to a site.

  • Cloud management eliminates the need for an additional on-premises operations support system (OSS).

  • Exposed northbound APIs can be easily integrated with existing SIM and OSS/BSS (business support system) solutions.


Automation provides a better user experience and simplifies deployment. Automation can reduce solution deployment from weeks to hours, whether the deployment is managed by the operator, a managed services provider (MSP), or a partner. Without automation, the scale needed to deploy a solution to thousands of enterprises and consumers is unattainable. Azure Automation provides cloud-based automation, operating system updates, and a configuration service that supports consistent management across Azure and non-Azure environments. It includes process automation, configuration management, update management, shared capabilities, and heterogeneous features.


azure-monitor.png


The single-pane-of-glass approach doesn’t just enable operators to deploy the service. It also quickly instantiates the private 5G network service at scale for its enterprises. The portal can then be used for full-service assurance and automation, including configuration, fault, and performance management.


 


Azure Private 5G Core and Azure Stack Edge report metrics and alarms to Azure Monitor to provide network insights. Operators can use these insights for closed-loop automation and to provide service assurance to customers. Beyond network insights, this data can provide useful business intelligence and network monetization to enterprise customers.


 


Integration with Azure services


This solution enables an ecosystem of applications for operators and enterprise customers. These applications include business intelligence and analytics applications and AI and machine learning applications from Microsoft and others. Azure private multiaccess edge compute combines Private 5G Core and the Azure-managed edge compute environment in a single architecture. This colocation enables enterprises to move data in and out of their mobile networks for processing and choose which data should be processed on-site and which should be sent to the cloud.


integration.png


MEC for networking and applications


An operator needs to provide both the local radio infrastructure and the managed compute for a private 5G network service. A cloud edge computing component is also required for processing cloud services and applications. One solution is a shared edge where both the mobile network functions and local edge applications run side-by-side. They run in a common zero-trust security framework provided by Azure Stack Edge or Azure Stack HCI. This approach provides seamless integration between the 5G network, edge computing, and the cloud, and significantly reduces CapEx and OpEx.


 


Alternative solutions involve splitting the compute functions into two sets: one managed by the operator and the other by the enterprise or another managed service provider. In this approach, solution architecture splits up the management, but enterprises benefit from using Azure as the common management layer for both networking and applications.


 


For example, consider a scenario in which the enterprise customer is a national retail chain with hundreds of stores. The enterprise can choose to integrate a select set of modern connected applications with the private 5G network on a single compute platform that’s managed by the operator. Or the enterprise can use two distinct compute platforms at every location: one for the mobile network functions (managed by the operator) and another to run distributed cloud and enterprise-developed applications. The architecture will provide flexibility for the enterprise and the operator.


 


The telco experience


This solution meets the high-performance, high-availability, low-latency requirements of telecommunications cloud services. It’s optimized to help operators and MSPs manage and control private 5G network services for their enterprise customers. By addressing the infrastructure requirements of private 5G networks in a seamless and highly scalable platform, the solution allows operators to focus on their strengths and bring the value of the cloud to their customers.


 


Questions from enterprise customers


As enterprises become better-educated consumers of private 5G network services, they’ll look for specific features that align with their unique requirements and use cases. Here are some of the questions that mobile operators can expect to hear from prospective enterprise customers:


 


How integrated are the mobile core and edge platforms?


Azure private multiaccess edge compute combines Private 5G Core and the edge compute environment in a single architecture. Seamless integration between the core and the edge provides reassurance that the private network solution will work as expected. Multi-vendor solutions can lead to lengthy integrations and challenges when technical problems arise. Because private 5G networks might handle mission-critical applications, disruptions aren’t acceptable.


 


Does the solution offer end-to-end management?


End-to-end management needs to incorporate Azure and partner services in a common management framework. Azure currently provides consistency for the edge platform, Kubernetes, and Private 5G Core. We also recommend that operators develop or integrate their order and service management via Azure APIs to provide automation to simplify the consumption of private 5G networks.


 


Is the solution reliable?


Azure private multiaccess edge compute supports multiple layers of reliability. Enterprises expect high-availability connectivity for mission-critical workloads. Azure provides multi-node edge deployment with Azure Kubernetes Service (AKS) on-premises clusters to run the Private 5G Core containers. For information about the Private 5G Core SLA, see the Azure SLA page after the product reaches general availability.


 


Is the solution simple, but flexible enough to scale dynamically?


As enterprises realize the benefits of running applications and operations on a private 5G network, the network footprint will expand into other business areas and other sites. Operators will need to scale these solutions quickly and cost-effectively. Having a cloud-based mobile core provides a significant advantage in this scenario.


 


How secure is the network?


The effective deployment of the Azure Zero Trust security model within a common architecture helps to ensure the consistent application of security policy across both operator and enterprise functions. The security model is enabled across all Azure service components.


 


Will the private 5G network integrate with my existing business applications and support my devices?


The Azure approach is to provide open and standardized APIs, SIM provisioning, and profile management for integration with business applications and devices. Azure partnerships with most network application and device vendors can help provide smooth integration. Because Microsoft owns both the cloud and the mobile core components, integration with OSS/BSS systems is streamlined.


 


How can I see what’s happening in my network?


With a unified Azure solution, enterprises and operators can view, manage, and monitor everything from telemetry data to network traffic by using a single management tool. Common lifecycle management, security policy, and orchestration provide end-to-end visibility and control for managing the service.


 


Does the platform support service automation and orchestration?


Containers and microservices are the building blocks for business applications. The ability to automate and orchestrate new mobile services is essential for operators and enterprises as they build new applications and services on their 5G networks. Enterprises also need to be able to host these applications either on-premises or in the cloud. The Azure solution provides automation and orchestration across all domains and network functions: physical, virtual, and cloud.


 


How do I analyze mobile network data?


Connected devices generate data that can be useful to enterprises. With this solution, enterprises can bring data into the Azure environment with enhanced security. They can use this data for AI, machine learning, and analytics, including analysis via business intelligence tools from Microsoft and others.


 


Contributors


This article is maintained by Microsoft. It was originally written by the following contributors.


Principal author:



Other contributor:


CISA Releases Four Industrial Control Systems Advisories

This article is contributed. See the original author and article here.

CISA released four Industrial Control Systems (ICS) advisories on December 22, 2022. These advisories provide timely information about current security issues, vulnerabilities, and exploits surrounding ICS.

CISA encourages users and administrators to review the newly released ICS advisories for technical details and mitigations:

ICSA-22-356-01 Priva TopControl Suite
ICSA-22-356-02 Rockwell Automation Studio 5000 Logix Emulate
ICSA-22-356-03 Mitsubishi Electric MELSEC iQ-R, iQ-L Series and MELIPC Series
ICSA-22-356-04 Omron CX-Programmer

Deep dive into AKS ingress, load balancing algorithms and Azure Network Security Groups

Deep dive into AKS ingress, load balancing algorithms and Azure Network Security Groups

This article is contributed. See the original author and article here.

I have already written a few articles on AKS, so feel free to have a look at them as I will not repeat what I’ve already written there. For the record, here are the links:



As a side note, this post is about how to harden ingress using Network Security Groups on top of internal network policies. Probably, 99% of the clusters worldwide are hardened using network policies only. If your organization is part of the last percent, and want to harden clusters further, this post is for you.


 


In AKS, ingress is used to let external components/callers interact with resources that are inside the cluster. Ingress is handled by an ingress controller. There are many of them: Nginx, Istio, Traefik, AGIC, etc. but whatever technology you choose, you will rely on a load balancer (LB) service to serve the traffic. When you instruct K8s to provision a service of type LB, it tells the Cloud provider (Microsoft in this case) to provision a LB service. This LB might be public facing or internal. The recommended approach is to provision an internal LB that is proxied by a Web Application Firewall (WAF) (Front Door+PLS or Application Gateway, or even any third party WAF). API traffic is also often proxied by a more specific API gateway. 


 


The default behavior of a LB service in K8s, is to load balance the traffic twice:


 



  • Firstly, at the level of the Azure LB, which will forward traffic to any healthy AKS node

  • Secondly, at the level of K8s itself. 


Because the Azure Load Balancer has no clue about the actual workload of the AKS cluster, it will blindly forward traffic to any “healthy” node, no matter what the pod density is. To prevent an imbalanced load balancing, K8s adds its own bits, to achieve a truly evenly distributed load balancing that considers the overall cluster capacity. While this sounds like the perfect thing on earth, it comes with two major caveats:


 



  • K8s might add an extra hop between the Azure Load Balancer and the actual target, thus impacting latency.

  • No matter whether there is an extra hop or not, K8s will source network address translation (SNAT) the source traffic, thus hiding the actual client IP. The ingress controller will therefore not be able to apply traffic restriction rules.


When you provision a LB service, for example, following the steps documented here, you will end up with a service, whose externalTrafficPolicy is set to Cluster. The Cluster mode takes care of the imbalanced load balancing. It performs the extra K8s bits to optimize load balancing and comes with the caveats described earlier. 


 


This problem is known and documented so, why am I blogging about this? Well, this is because the load balancing algorithm that you choose at the level of the LB service, also has an impact on how you can restrict (or not) traffic that comes in, at the level of the Network Security Groups, and that is not something I could find guidance on. 


If you do not want to restrict ingress traffic in any way, you can stop reading here, else, keep on reading.


Lab environment


Let us deep dive into the internals of AKS ingress with a concrete example.


For this, let us use the following lab:


 


vkittwe_0-1671642118252.png


 


Where we have 1 VNET and 4 subnets. 3 subnets are used by the AKS cluster: 1 for the system node pool, 1 for ingress and the last one for user pods. I have labelled the Ingress and User node pools so that I can schedule pods accordingly. It is important to precise that our cluster is based on Kubenet, which adds a layer of complexity in terms of how networking works with K8s. The VM subnet holds our client VM, used to perform client calls to an API that is exposed through the ingress controller. Ultimately, we end up with this:


 


vkittwe_1-1671642118259.png


Load balancer with default behavior


I have provisioned the following LB service in AKS:


 


apiVersion: v1


kind: Service


metadata:


  annotations:


    meta.helm.sh/release-name: ingress-nginx


    meta.helm.sh/release-namespace: ingress


    service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz


    service.beta.kubernetes.io/azure-load-balancer-internal: “true”


    service.beta.kubernetes.io/azure-load-balancer-internal-subnet: ingress


  labels:


    app.kubernetes.io/component: controller


    app.kubernetes.io/instance: ingress-nginx


    app.kubernetes.io/managed-by: Helm


    app.kubernetes.io/name: ingress-nginx


    app.kubernetes.io/part-of: ingress-nginx


    app.kubernetes.io/version: 1.5.1


    helm.sh/chart: ingress-nginx-4.4.0


  name: ingress-nginx-controller


  namespace: ingress


spec:


  allocateLoadBalancerNodePorts: true


  externalTrafficPolicy: Cluster


  internalTrafficPolicy: Cluster


  ipFamilies:


  – IPv4


  ipFamilyPolicy: SingleStack


  loadBalancerIP: 192.168.0.22


  ports:


  – appProtocol: http


    name: http


    nodePort: 31880


    port: 80


    protocol: TCP


    targetPort: http


  selector:


    app.kubernetes.io/component: controller


    app.kubernetes.io/instance: ingress-nginx


    app.kubernetes.io/name: ingress-nginx


  sessionAffinity: None


  type: LoadBalancer


 


Notice the Azure-specific annotations at the top, telling Azure to provision an internal load balancer in a specific subnet, and with the /healthz endpoint for the health probe. 


I’m only using HTTP but same logic applies to HTTPS.  I have three nodes in the cluster:


 


vkittwe_2-1671642118261.png


 


I also have deployed one API and a few privileged containers, for later use:


vkittwe_3-1671642118272.png


 


Each privileged container runs on its corresponding node (one on the system node, one on the ingress node, and another one on the user node). The API runs on the user node. My ingress controller runs a single pod (for now) on the Ingress node:


vkittwe_4-1671642118278.png


 


Lastly, I deployed the following ingress rule:


 


apiVersion: networking.k8s.io/v1


kind: Ingress


metadata:


  name: weather 


spec:


  ingressClassName: nginx


  rules:


  – host: demoapi.eyskens.corp


    http:


      paths:


      – path: /weatherforecast


        pathType: Prefix


        backend:


          service:


            name: simpleapi


            port:


              number: 8080


 


 


Upon the deployment of the K8s LB service, Azure creates its own LB service, which after a few minutes, shows the status of the health probe checks:


vkittwe_5-1671642118280.png


 


Each AKS node is being probed by the load balancer. So far so good, we can run our first API call from the client VM:


vkittwe_6-1671642118283.png


 


Our ingress controller sees the call and logs it:


 


192.168.0.4 – – [20/Dec/2022:06:25:33 +0000] “GET /weatherforecast HTTP/1.1” 200 513 “-“


“Mozilla/5.0 (Windows NT; Windows NT 10.0; en-US) WindowsPowerShell/5.1.17763.3770” 180


0.146 [default-simpleapi-8080] [] 10.244.1.5:8080 513 0.146 200


76e041740b70be9b36dd4fda52f97410


 


Everything works well until the security architect asks you how you control who can call this service…So, let us pretend that our client VM is the only one that can call the service. We could simply specify a rule at the level of the ingress controller itself. We know the IP of our VM so let’s give it a try:


 


apiVersion: networking.k8s.io/v1


kind: Ingress


metadata:


  name: weather 


  annotations:


    nginx.ingress.kubernetes.io/whitelist-source-range: 192.168.1.4


spec:


  ingressClassName: nginx 


  rules:


  – host: demoapi.eyskens.corp


    http:


      paths:


      – path: /weatherforecast


        pathType: Prefix


        backend:


          service:


            name: simpleapi


            port:


              number: 8080


 


By adding a whitelist-source-range annotation. Surprise, we now we get a Forbidden response from our Client VM:


 


vkittwe_7-1671642118285.png


 


Looking at the NGINX logs reveals the problem:


 


…*24932 access forbidden by rule, client: 10.244.2.1, …


 


The client is 10.244.2.1 (crb0 of the ingress node) instead of our VM (192.168.1.4). Why is that? That is because of caveat number 2 described earlier. The Client IP is always hidden by the SNAT operation because the service is in Cluster mode. Well, is the client always 10.244.2.1? No, just run a few hundred queries from the same client VM, and you will notice things like this:


 


2022/12/20 07:03:45 [error] 98#98: *39317 access forbidden by rule, client: 10.244.2.1,


server: demoapi.eyskens.corp, request: “GET /weatherforecast HTTP/1.1”, host:


“demoapi.eyskens.corp”


 


2022/12/20 07:03:45 [error] 98#98: *39317 access forbidden by rule, client: 10.244.2.1,


server: demoapi.eyskens.corp, request: “GET /weatherforecast HTTP/1.1”, host:


“demoapi.eyskens.corp”


 


2022/12/20 07:03:48 [error] 98#98: *39361 access forbidden by rule, client: 192.168.0.36,


server: demoapi.eyskens.corp, request: “GET /weatherforecast HTTP/1.1”, host:


“demoapi.eyskens.corp”


 


We can see our CBR0 back, but also 192.168.0.36, which is the IP of our user node. Because you want to understand how things work internally, you decide to analyze traffic from the ingress node itself. You basically exec into the privileged container that is running on that node:


 


kubectl exec ingresspriv-5cfc6468c8-f8bjf -it — /bin/sh


 


Then, you go onto the node itself, and run a tcpdump:


 


chroot /host


tcpdump -i cbr0|grep weatherforecast


 


Where you can indeed see that traffic is coming from everywhere:


 


07:15:29.484576 IP aks-user-28786594-vmss000001.internal.cloudapp.net.35448 >


10.244.2.2.http: Flags [P.], seq 14689:14845, ack 26227, win 16411, length 156: HTTP:


GET /weatherforecast HTTP/1.1


07:15:29.499647 IP aks-user-28786594-vmss000001.internal.cloudapp.net.35448 >


10.244.2.2.http: Flags [P.], seq 14845:15001, ack 26506, win 16416, length 156: HTTP:


GET /weatherforecast HTTP/1.1


07:15:29.538515 IP aks-user-28786594-vmss000001.internal.cloudapp.net.35448 >


10.244.2.2.http: Flags [P.], seq 15001:15157, ack 26785, win 16414, length 156: HTTP:


GET /weatherforecast HTTP/1.1


07:15:29.552183 IP aks-user-28786594-vmss000001.internal.cloudapp.net.35448 >


10.244.2.2.http: Flags [P.], seq 15157:15313, ack 27064, win 16413, length 156: HTTP:


 GET /weatherforecast HTTP/1.1


07:15:29.552630 IP aks-nodepool1-41831646-vmss000001.internal.cloudapp.net.24631 >


10.244.2.2.http: Flags [P.], seq 1:181, ack 1, win 16416, length 180: HTTP: GET


/weatherforecast HTTP/1.1


07:15:29.604062 IP aks-user-28786594-vmss000001.internal.cloudapp.net.35448 >


10.244.2.2.http: Flags [P.], seq 15313:15469, ack 27343, win 16412, length 156: HTTP:


GET /weatherforecast HTTP/1.1


07:15:29.620439 IP aks-user-28786594-vmss000001.internal.cloudapp.net.35448 >


10.244.2.2.http: Flags [P.], seq 15469:15625, ack 27622, win 16411, length 156: HTTP:


GET /weatherforecast HTTP/1.1


07:15:29.637675 IP aks-nodepool1-41831646-vmss000001.internal.cloudapp.net.24631 >


10.244.2.2.http: Flags [P.], seq 181:337, ack 280, win 16414, length 156: HTTP: GET


/weatherforecast HTTP/1.1


07:15:29.666067 IP aks-nodepool1-41831646-vmss000001.internal.cloudapp.net.24631


 


This is plain normal: every single node of the cluster could be a potential client. So, no way to filter out traffic from the perspective of the ingress controller, with the Cluster mode.


 


Never mind, let us use the NSG instead of the ingress controller to restrict traffic to our VM only. That’s even better, let’s stop the illegal traffic sooner and not even hit our ingress controller. Naturally, you tell the security architect that you have a plan and come with the following NSG rules for the ingress NSG:


 


vkittwe_8-1671642118290.png


 


You basically allow the client VM to talk to the LB, and some system ports, but you deny everything else. You tell the security guy to relax about the “Any” destination since none of the internal cluster IPs is routable or visible to the outside world anyway (Kubenet), so only what is exposed through an ingress rule will be “visible”. For the two other subnets, you decide to be even more restrictive. You basically use the same rules as shown above, except for rule 100. You’re pretty sure it should be ok, and that no one else on earth but your client VM will ever be able to consume that API. Because you are a thoughtful person, you still decided to enable the NSG flow logs to make sure you capture traffic…in case you need to troubleshooting things later on.


 


So, you retry and you realize that a few queries go through while others are blocked, until every call from the client VM is going through again….What could be the culprit? First reaction is to look at the Azure Load Balancer itself and you’ll notice that it is not in a great shape anymore:


 


vkittwe_9-1671642118293.png


 


Why is that? It is because of the NSG that you defined at the level of the Ingress subnet, which broke it entirely, although you had authorized the AzureLoadBalancer tag. So, you decide to remove the NSG, wait a few minutes, and inspect again what comes into the CBR0 of the Ingress node, to figure it out from the inside:


 


tcpdump -i cbr0


 


This is what comes out of the dump:


 


08:36:46.835356 IP aks-user-28786594-vmss000001.internal.cloudapp.net.54259 >


10.244.2.2.http: Flags [P.], seq 199:395, ack 125, win 1024, length 196: HTTP: GET


/healthz HTTP/1.1


08:36:48.382720 IP aks-nodepool1-41831646-vmss000001.internal.cloudapp.net.8461 >


 10.244.2.2.http: Flags [P.], seq 197:391, ack 125, win 16384, length 194: HTTP: GET


/healthz HTTP/1.1


 


You see that all nodes are directly probing the HEALTH endpoint on the ingress node…so the NSG rules described earlier are too restrictive. You start realizing that ruling AKS traffic with NSGs is a risky business…So, the non-ingress nodes are considered unhealthy by the Azure LB because they are unable to probe the ingress node. The ingress node is still considered healthy because it can probe itself and AzureLoadBalancer tag is allowed. That’s what explains the fact that ultimately, everything is going through because only the ingress node itself receives traffic, right from the LB and is already allowing the client VM. If you leave the cluster as is, you basically broke the built-in LB algorithm of K8s itself that protects against imbalanced traffic, plus you’ll keep getting calls from the other nodes trying to probe the ingress one. But because you want to optimize LB and avoid further issues, you add this rule to your ingress NSG:


 


vkittwe_10-1671642118300.png


 


Where you basically allow every node to probe on port 80 (remember we only focus on HTTP). You can safely put Any as destination since the destination is the IP of the Ingress Pod, which you can’t guess in advance. After a few minutes, the Azure LB seems to be in a better shape:


vkittwe_11-1671642118302.png


 


And yet a few minutes later, it’s all green again.


 


Now that you have restored the ingress, you’re back to square 1: some requests go through but not all. First, you checked the NSG flow logs of the User node pool and realized that you have such flow tuples that are blocked by the NSG:


 


[“1671521722,192.168.1.4,192.168.0.22,50326,80,T,I,D,B,,,,”]


 


Where you see that our client VM is trying to hit port 80 of our Azure Load Balancer…Wait, we said that the Cluster mode hides the original client IP, isn’t??  Yes that’s right but only from the inside, not from the outside. The NSGs still see the original IP….Ok, but why do I find these logs in the NSG of the user node?? Well, because each node of the cluster can potentially forward traffic to the destination…. So, it boils down to the following conclusion when using the Cluster mode:


 



  • All nodes must be granted inbound access to the ingress node(s) for the health probes.

  • Because each node is potentially forwarding the traffic itself, you must allow the callers (our VM in this case, an Application Gateway, an APIM in reality) in each subnet.

  • You must accept de-facto cluster-wide node-level lateral movements to these endpoints/ports. 


But what if the security architect cannot live with that? Let us explore the second option.


 


LB  with externalTrafficPolicy set to Local.


The only thing needed to do is to switch the service’s externalTrafficPolicy to Local:


 


Type:                     LoadBalancer


IP Family Policy:         SingleStack


IP Families:              IPv4


IP:                       10.0.58.7


IPs:                      10.0.58.7


IP:                       192.168.0.22


LoadBalancer Ingress:     192.168.0.22


Port:                     http  80/TCP


TargetPort:               http/TCP


NodePort:                 http  31880/TCP


Endpoints:                10.244.2.2:80


Session Affinity:         None


External Traffic Policy:  Local


 


After a minute, you’ll notice this at the level of the Azure Load Balancer:


 


vkittwe_12-1671642118304.png


 


Don’t worry, this is the expected behavior. When you switch the K8s service to Local, K8s stops adding its own bits to the load balancing algorithm. The only way to prevent the Azure Load Balancer from sending traffic to a non-ingress-enabled node, is to mark these nodes as unhealthy. Any probe to the healthz endpoint will return a 503 answer from non-ingress-enabled nodes (example with the user node):


 


kubectl exec userpriv-5d879699c8-b2cww -it — /bin/sh


chroot /host


root@aks-user-28786594-vmss000001:/# curl -v http://localhost:31049/healthz


*   Trying 127.0.0.1…


* TCP_NODELAY set


* Connected to localhost (127.0.0.1) port 31049 (#0)


> GET /healthz HTTP/1.1


> Host: localhost:31049


> User-Agent: curl/7.58.0


> Accept: */*



< HTTP/1.1 503 Service Unavailable


< Content-Type: application/json


< X-Content-Type-Options: nosniff


< Date: Tue, 20 Dec 2022 11:57:34 GMT


< Content-Length: 105



{


        “service”: {


                “namespace”: “ingress”,


                “name”: “ingress-nginx-controller”


        },


        “localEndpoints”: 0


* Connection #0 to host localhost left intact


}root@aks-user-28786594-vmss000001:/#


 


While it will return a 200 on every ingress-enabled node. But what is it exactly an ingress-enabled node? It’s simply a node that is hosting an ingress controller pod. So, with this, you can control much better the ingress entry point, plus you can also directly use a whitelist-source-range annotation, and this time, the ingress controller will see the real client IP since there is no more SNAT happening. 


 


You still must pay attention to where your ingress controller pods are running. Remember that I have dedicated one node pool in its own subnet for ingress and I have forced the pod scheduling on that node pool through a nodeSelector. Should I omit this, and scale my replicas to 3, I could end up with the following situation where AKS would schedule pods across nodes of different node pools:


 


vkittwe_13-1671642118314.png


 


This would lead to a situation where nodes hosting an instance of the ingress controller would show up as healthy again at the level of the Azure Load Balancer. Therefore, you’d be hitting NSGs of subnets hosting those nodes…So, if you want to concentrate the ingress traffic to a single entry point, you need to:


 



  • Dedicate a node pool with 1 or more nodes (more for HA) and enable Availability Zones for your ingress pods to run on.

  • Make sure to force the pod scheduling of the ingress controller onto that node pool.

  • Make sure you white list your clients (APIM, Application Gateway, etc.) at the level of the NSG of the ingress subnet. You do not need to white list other nodes because in Local mode, nodes do not probe the ingress node anymore. If you run a tcdump -i cbr0 on the ingress node, you’ll see only traffic from the node’s pod cidr itself.


Conclusion


 


Whether you are using Kubenet or CNI, ruling AKS ingress with Network Security Groups is not something you can improvise. Using the native load balancing algorithm (Cluster), you should end up with an optimized load balancing, but you must accept the fact that you have to live with lateral movements across the entire cluster. You can of course try to compensate with additional network policies but OS-level traffic would still be allowed (mind the container escape risk). Using the Local mode, you can control much better the incoming traffic, at the level of the NSGs as well as at the level of the ingress controller itself. You can run any number of ingress controller pods, as long as you scheduled them on identified nodes. The downside is that you might end up with an imbalanced traffic. This is a trade-off!