Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.0.0
-
None
-
None
Description
In 10.0-monty, we start seeing failures in main.non_blocking_api.
The failure is seen on BSD, but the root problem exists on all platforms.
The issue is that we get the flag MYSQL_WAIT_TIMEOUT back from
eg. mysql_real_connect_cont(), however mysql_get_timeout_value() returns
(unsigned)-1. This is incorrect, and a change from existing behaviour.
The symptom in the test suite is that tests compute a timeout for poll(2) as
mysql_get_timeout_value()*1000, which ends up as -1000 which is invalid for
poll(2) on bsd (and incorrect in any case).
If no timeout is desired, the MYSQL_WAIT_TIMEOUT flag should not be set.
As far as I can see, the problem is a wrong merge of new VIO stuff in
10.0/10.0-monty. It breaks the non-blocking client library code in 10.0-base
rather badly:
- The timeout values were changed from seconds to milliseconds, but the
non-blocking part was not updated to reflect this.
- vio_io_wait() does not seem to handle non-blocking operation at all, so
will halt any application that uses it.
- There are probably other problems hidden...
An easy way to repeat the problem is to run client/async_example against a
running server with strace:
$ strace -e trace=poll bld/client/async_example 127.0.0.1 root rootpass > /dev/null
poll([
], 1, -1) = 1 ([
{fd=3, revents=POLLOUT}])
poll([
], 1, -1000) = 1 ([
{fd=3, revents=POLLIN}])
Note the second poll() call passing -1000 as timeout - this is incorrect, and
is caused by above issue.
However note that this is not the only problem. All of the new VIO stuff needs
to be fixed for non-blocking operation.
It is particularly important that it is 110% ensured that the non-blocking
client code will never block - this would be a subtle problem that will not be
easily seen in the test suite, but will cause large applications that use
non-blocking mode to become slow or fail.