Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
I saw this issue while doing the benchmarks for TODO-5097. The performance of the benchmark t_connect dropped from ~2500 connects per second (single threaded) to ~47 connects per second. The exact workflow of this benchmark is:
- connect to the server
- run SELECT 1 FROM dual
- disconnect
it is implemented for sysbench 1.x in LUA using sysbench.opt.reconnect=1 which instructs sysbench to disconnect and reconnect after each iteration of its main loop.
What is even worse: the connect performance does not depend on the number of benchmark threads when SSL is enabled. In the benchmark done for TODO-5097 it grows from ~2500 to ~17000 (64 threads) without SSL. But with SSL it is 28 (single threaded) and grows to a maximum of 47 (2 or more treads). That implies that the bottleneck must be in the single acceptor thread, before the connection is handed over to a dedicated worker thread. Probably it's the SSL handshake that is taking so long.
The issue will affect all users switching to 11.4 or later because for those releases we have SSL enabled by default. But I could show the same behavior for 10.6 with a homemade server cert using a self-signed CA cert.
We should consider changing the way how we handle new connections. Either have a pool of threads handling the SSL handshake or maybe even doing that in the worker thread.
Users have the following workarounds available:
- disable SSL completely (--skip-ssl or equivalent on the client side)
- use of a connection pool or persistent connections
- in general not reconnect rapidly