This article is contributed. See the original author and article here.
In using databases as a key component of internet infrastructure, IT departments are finding unexpected problems in particular when using DBaaS (Database-as-a-Service). One of these challenges is in connection management. There are three areas where connection management can be a problem:
- CPU overhead when an application “thrashes” connections rapidly by opening, closing and authenticating connections;
- Memory overhead when applications hold open long-lived connections that are often idle, which would be better used as block cache or may require a larger instance size than CPU requirements dictate
- Noisy neighbor congestion for a multi-tenant database. Limiting the number of active connections on a per-customer basis ensures fairness.
The Heimdall Proxy provides better control over database resources so users will reduce their database instances and support higher user counts. In this blog, we explain how these functions work and are configured.
The Heimdall Proxy was designed for any SQL database including Azure Database for Postgres, and Azure SQL Data Warehouse (SQL DW) for connection pooling such as:
- General connection reuse: New connections leverage already established connections to the database to avoid connection thrashing. This results in lower CPU utilization;
- Connection multiplexing: Disconnects idle connections from the client to the database, freeing those connections for reuse for other clients. This dramatically reduces connection counts to the database, and frees memory to allow the database to operate more effectively;
- Tenant Connection Management: The combination of 1) Per-user and per database connection limiting and 2) Connection Multiplexing control the number of active queries a particular tenant can use at a time. This protects database resources and helps ensure the customer SLA (Service-level Agreement) is met and not disrupted by a busy neighbor using the same database.
Figure 1: Heimdall Proxy Architecture Diagram
Basic Connection Pooling
A basic connection pooler opens a number of connections upfront (the pool), then as an application needs a connection, instead of opening a new connection, it simply borrows a connection from the pool and returns it as soon as it is not needed. For most pools to be effective:
- The application is aware pooling will be used, and does not leave connections idle, but instead opens and closes them as needed;
- All connections leverage the same properties, such as the database user and catalog (database instance);
- State is not changed on the connection.
For a typical application server environment (e.g. J2EE), basic pooling is supported. In other environments, where pooling was not part of the initial design, simply inserting a connection pooler can cause more overhead than expected:
- When multiple users are connecting, and each user rarely uses more than a few connections (e.g. Data Analytics): This may open a set of connections per user (e.g. PgBouncer), or close connections that are retrieved from the pool that do not match the desired properties and open new ones. This results in a large amount of connection thrashing (e.g. Apache Tomcat pooling and most other poolers).
- When many catalogs are used: In order to avoid changing the connection state, a discrete pool per catalog is created allowing an appropriate connection to be reused. For Postgres, the catalog cannot be changed once a connection has been established, so a new connection must be created, just as if it was a different user.
- When attempting to constrain total connections to the database and on a per-user basis
Figure 2: Basic Connection Pooling
For basic connection pooling, an active (green) front-side connection is paired with a back-side connection as shown in Figure 2 above. Additionally, you may have idle (red), unassigned connections in the backend for new connections. As such, you are NOT reducing the total number of connections, but are reducing the thrashing that occurs as the connections are opened and closed. The main benefit of basic pooling is lower CPU load.
To configure connection pooling on Heimdall Central Console, select the Data Source tab. Click on the checkbox to turn on Connection Pooling shown below:
Beyond basic pooling, there is connection multiplexing, which does not associate a client connection with a fixed database connection. Instead, active queries or transactions are assigned to an existing idle database connection, or else a new connection will be established. If a client connection is idle, no resources are used on the database, reducing connection load and freeing memory. Shown in Figure 3 below, only active connections (green) are connected to the database via the connection pool. Idle connections (red) are ignored.
Figure 3: Connection Multiplexing
Multiplexing is a much more complicated technology than basic pooling. Therefore, many factors need to be accounted for. In the following situations, multiplexing will halt, and a connection will remain pinned, including:
- If a transaction starts, then the connection mapping will remain until the transaction completes with a commit, rollback, or the client connection is terminated;
- If a prepared statement occurs on a connection, this makes the connection stateful, and will remain pinned to the database until the connection is terminated;
- If a temporary table is created, the connection will remain pinned until the table is deleted.
To configure multiplexing on the Heimdall Central Console, go to the VirtualDB tab. And under Proxy Configuration, just click Multiplex option shown below:
In the event that special queries break multiplexing logic, and multiplexing needs to be disabled on the connection go to the Rules tab for more granular control (along with other pool behaviors). Below is an example. Click on the icon to edit an existing rule.
Tenant Connection Management
The third use-case helps ensure SLAs by enforcing per-tenant limits on connections and when combined with multiplexing, total active queries. This prevents one user from saturating the database, ensuring fairness of resources for others. A second tier of pool management is activated, that of “user pools”.
In the Data Sources tab, the pool can be managed at two tiers: the user level and the database. Each user can be limited to a number of total connections and idle connections using the icon to add limits as shown here:
Shown above, the total connections allowed to the database across all users is 1000, but each user is only allowed 100, and of those, only 10 can be idle. Excess idle connections will be disposed of. Each time a connection is returned from the pool, there is a chance the connection will be closed: A value of 1000 means that there is a 1/1000 chance that the connection will be closed. This behavior is different from most connection poolers that set an absolute connection age which for large deployments can result in a stampede of many connections closing and reopening at once.
Figure 4: Multi-tenancy with Pooling, Multiplexing and Per-tenant connection limits
Figure 4 shows two tenants (with unique usernames or catalogs), allowing only active connections (green) to the database when multiplexing is enabled. If Tenant A attempts to perform a third query (blue) while two are active, it will be queued until one of the current active queries completes.
The net result of the combination of 1) Pooling and 2) Multiplexing, and 3) Per-tenant limits is that no single tenant can saturate database capacity, resulting in the SLAs of other customers failing. Further, as beyond a certain point, adding execution threads to the database will result in negative performance. This control can improve overall performance in many cases, allowing more capacity during peak load.
Odoo is an e-commerce platform written in Python. The platform does support connection pooling. However, it uses an ORM (Object-Relational-Mapping) to abstract the application layer from the database connections. The ORM uses transactions for every activity, which would normally prevent multiplexing from providing benefit. Our proxy has another advanced feature called “delayed transactions” that works around this problem. It will prevent a true transaction from starting until DML is actually required. When activated with Odoo, the connection counts on the database can easily be reduced by up to 20x depending on the load.
Magento is an e-commerce package written in PHP. Since PHP does not support efficient connection pooling due to its processing model, each page-view opens a connection to the database, requests all the data it needs, then closes the connection. For every page-view, it results in a very high amount of connection thrashing against the database and can consume a large percentage of the database CPU. With the Heimdall proxy, basic connection pooling alone can reduce the load on the database by up to 15% percent.
Slatwall is an eCommerce platform written in Java, and is natively designed to use pooling. Although, under heavy load, it can result in the saturation of the allowed connections on MySQL (at most 7000). In order to support larger user-loads, the Heimdall proxy reduces the connection load by an order of magnitude, resolving connection limits on the database, and allowing the CPU load to be the limiting factor in larger deployments. Per the developer of Slatwall, connection offload with multiplexing and pooling resulted in a 10x reduction in connections to the database.
While Heimdall proxy provides connection management for databases, there are other features provided that further improve SQL scale, performance, and security:
- Query Caching: The best way to lower database CPU is to never issue a query against the database in the first place. The Heimdall proxy provides the caching and invalidation logic for the storage of your choice (e.g. Redis) as a look-aside results cache. It is a simple way to improve database scale and improve response times without application changes;
- Automated Read/Write split: When a single database becomes too expensive to upgrade, or there is already a standby reader that is sitting idle, separating read and write queries can be used to offload write queries to read replicas, improving resource utilization. Additionally, replication lag detection is supported to ensure ACID compliance.
- Active Directory/LDAP integration: By authenticating against LDAP, the Heimdall proxy manages connections for a large number of users, and synchronizes the authentication credentials into the database. In environments that require database resources to be accessible to many users in the enterprise, while providing data security, this feature is easy to administer, while preventing individual users from over-taxing resources.
Deployment of our proxy requires no application changes, saving months of deployment and maintenance. Compare us with PgBouncer or Pgpool and see the difference. Download a free trial on the Azure Marketplace.
- Blog: Using the Heimdall Proxy to Split Reads and Writes for Postgres
- Blog: Automated Query Caching for Postgres
- Heimdall Data technical documentation
- Contact: email@example.com
Heimdall Data, a Microsoft technology partner, offers a database proxy that intelligently manages connections for SQL databases. With the Heimdall’s connection pooling and multiplexing feature, users can get optimal scale and performance from their database without application changes.
Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.