This article is contributed. See the original author and article here.
Some customer asked me about the following query around their creating system.
Query
“We are creating some system and using paired regions to secure redundancy of it. Each database instance (primary and replicated) is located on both regions and these instances belongs to a fail over group. When we tested database failover, applications in their environment could not access primary database instance. We don’t think network is reachable since global peering is configured between virtual networks in each region. What is the root cause of this issue? How do we fix this issue?”
Backgrounds
As backgrounds are not clear, I asked the customer to share details about their system and the facing issue.
- Their system is deployed to paired regions to secure the redundancy of their system.
- Traffic Manager works in front of their system to load balance incoming traffic. They use priority-based traffic-routing for load balancing. If some failure occurs in active region, Traffic Manager changes route of incoming traffic to the another region.
- Global peering between virtual networks in both regions is configured.
- They use App Service to host their applications. Their App Service instances are integrated with virtual networks, and service endpoints for SQL Database instances are configured at the subnets where these app services are integrated. Also, service endpoints for App Service instances are configured in order to interact each App Service instance.
- They use SQL Database in this system and instances on both regions belongs to automatic failover group. As read-write/read-only listener is geo-independent, they don’t have to modify database connection string used in applications whenever database failover occurs.
- As of now, they don’t mind that primary database region should be the same as the one where Traffic Manager routes incoming traffic. In other words, they think cross region connection is fine.
The following diagram reflects their comments and our hearing results.
Root cause
If you are familiar with Azure, you can detect the root cause of this issue easily. This is due to service endpoint limitation. For Azure SQL, a service endpoint applies only to Azure service traffic within a virtual network’s region.
The following case works fine.
However, the following case does not work even if global peering is configured.
Solutions
In this case, we can choose two options listed below.
- Using private link
- Modifying traffic routing rule
1. Using private link
If cross region connection is still fine, they can fix this issue by using private link instead of service endpoint.
Azure Private Link for Azure SQL Database and Azure Synapse Analytics
https://docs.microsoft.com/azure/azure-sql/database/private-endpoint-overview
When using private link, the diagram looks like this.
When using private link, the following limitations should be considered.
Cost
- Private link is not free of charge.
https://azure.microsoft.com/pricing/details/private-link/ - Costs for network traffic across regions are required.
Performance
- When connecting SQL Database, all connections are proxied via the Azure SQL Database gateways. That leads to poorer throughput than direct connection.
https://docs.microsoft.com/azure/azure-sql/database/connectivity-architecture#connection-policy - Network latency gets longer in case of cross region connection.
2. Modifying traffic routing rule
In some cases, private link does not meet requirements. In this case, we should configure Traffic Manger to match between the region where Traffic Manager routes incoming traffic and database primary region. The diagram looks like this.
To achieve this, the following configuration is required.
- First of all, priority for active region is set smaller value (e.g. 50) , and the priority for the other region is set much bigger value (e.g. 1000).number. This configuration allows incoming traffic to be routed to active region. For more details, see the following document.
Priority traffic-routing method
https://docs.microsoft.com/azure/traffic-manager/traffic-manager-routing-methods#priority-traffic-routing-method
- Then, healthcheck API should be configured. The API checks if access between applications and databases is healthy. If heathy, the API returns HTTP 200, otherwise, it returns 503.
- Following the document, traffic Manager is configured in order to use this API to monitor endpoint. If healthcheck API returns 503, Traffic Manger modifies routing route.
Configure endpoint monitoring
This concept has some limitations listed below.
- Needless to say, healthcheck API should be created.
- It takes some time to change routing region. Precisely, the minimum number of trials (from 0 to 9) to monitor endpoint by healthcheck API and trial interval (default is 30 second interval, and 10 second interval is also available, but additional cost is required). For more details, see the following document.
Configure endpoint monitoring
https://docs.microsoft.com/azure/traffic-manager/traffic-manager-monitoring#configure-endpoint-monitoring
Conclusion
In this case, I suggested both ways and asked this customer to make their decision. And last but not least, Traffic manager is used in this case, but this solution is applicable when using Azure Front Door.
Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.
Recent Comments