Unable to access SQL Database connected via Service Endpoint after failover occurs

This article is contributed. See the original author and article here.

Some customer asked me about the following query around their creating system.

Query

“We are creating some system and using paired regions to secure redundancy of it. Each database instance (primary and replicated) is located on both regions and these instances belongs to a fail over group. When we tested database failover, applications in their environment could not access primary database instance. We don’t think network is reachable since global peering is configured between virtual networks in each region. What is the root cause of this issue? How do we fix this issue?”

Backgrounds

As backgrounds are not clear, I asked the customer to share details about their system and the facing issue.

Their system is deployed to paired regions to secure the redundancy of their system.

Traffic Manager works in front of their system to load balance incoming traffic. They use priority-based traffic-routing for load balancing. If some failure occurs in active region, Traffic Manager changes route of incoming traffic to the another region.

Global peering between virtual networks in both regions is configured.

They use App Service to host their applications. Their App Service instances are integrated with virtual networks, and service endpoints for SQL Database instances are configured at the subnets where these app services are integrated. Also, service endpoints for App Service instances are configured in order to interact each App Service instance.

They use SQL Database in this system and instances on both regions belongs to automatic failover group. As read-write/read-only listener is geo-independent, they don’t have to modify database connection string used in applications whenever database failover occurs.

As of now, they don’t mind that primary database region should be the same as the one where Traffic Manager routes incoming traffic. In other words, they think cross region connection is fine.

The following diagram reflects their comments and our hearing results.

Root cause

If you are familiar with Azure, you can detect the root cause of this issue easily. This is due to service endpoint limitation. For Azure SQL, a service endpoint applies only to Azure service traffic within a virtual network’s region.

The following case works fine.

However, the following case does not work even if global peering is configured.

Solutions

In this case, we can choose two options listed below.

Using private link

Modifying traffic routing rule

1. Using private link

If cross region connection is still fine, they can fix this issue by using private link instead of service endpoint.

Azure Private Link for Azure SQL Database and Azure Synapse Analytics

https://docs.microsoft.com/azure/azure-sql/database/private-endpoint-overview

When using private link, the diagram looks like this.

When using private link, the following limitations should be considered.

Cost

Private link is not free of charge.
https://azure.microsoft.com/pricing/details/private-link/

Costs for network traffic across regions are required.

Performance

When connecting SQL Database, all connections are proxied via the Azure SQL Database gateways. That leads to poorer throughput than direct connection.
https://docs.microsoft.com/azure/azure-sql/database/connectivity-architecture#connection-policy

Network latency gets longer in case of cross region connection.

2. Modifying traffic routing rule

In some cases, private link does not meet requirements. In this case, we should configure Traffic Manger to match between the region where Traffic Manager routes incoming traffic and database primary region. The diagram looks like this.

To achieve this, the following configuration is required.

First of all, priority for active region is set smaller value (e.g. 50) , and the priority for the other region is set much bigger value (e.g. 1000).number. This configuration allows incoming traffic to be routed to active region. For more details, see the following document.

Priority traffic-routing method
https://docs.microsoft.com/azure/traffic-manager/traffic-manager-routing-methods#priority-traffic-routing-method

Then, healthcheck API should be configured. The API checks if access between applications and databases is healthy. If heathy, the API returns HTTP 200, otherwise, it returns 503.

Following the document, traffic Manager is configured in order to use this API to monitor endpoint. If healthcheck API returns 503, Traffic Manger modifies routing route.

Configure endpoint monitoring

https://docs.microsoft.com/azure/traffic-manager/traffic-manager-monitoring#configure-endpoint-monitoring

This concept has some limitations listed below.

Needless to say, healthcheck API should be created.

It takes some time to change routing region. Precisely, the minimum number of trials (from 0 to 9) to monitor endpoint by healthcheck API and trial interval (default is 30 second interval, and 10 second interval is also available, but additional cost is required). For more details, see the following document.

Configure endpoint monitoring
https://docs.microsoft.com/azure/traffic-manager/traffic-manager-monitoring#configure-endpoint-monitoring

Conclusion

In this case, I suggested both ways and asked this customer to make their decision. And last but not least, Traffic manager is used in this case, but this solution is applicable when using Azure Front Door.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

Unable to access SQL Database connected via Service Endpoint after failover occurs

Query

Backgrounds

Root cause

Solutions

1. Using private link

Cost

Performance

2. Modifying traffic routing rule

Conclusion

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta

We look forward to meeting you