[CONC-322] mysql_real_connect - EAGAIN treated like EINPROGRESS Created: 2018-04-10 Updated: 2018-05-12 Resolved: 2018-04-23 |
|
| Status: | Closed |
| Project: | MariaDB Connector/C |
| Component/s: | None |
| Affects Version/s: | 3.0.4 |
| Fix Version/s: | 3.0.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Daniel Black | Assignee: | Georg Richter |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Ubuntu artful, ppc64le, p9 |
||
| Attachments: |
|
| Description |
|
Version from mariadb-server-10.3 branch built last week
Stracing sysbench liked against libmariadb-dev:
The EAGAIN was returned by the connect syscall. In https://github.com/MariaDB/mariadb-connector-c/blob/master/plugins/pvio/pvio_socket.c#L614 its handled like EINPROGRESS where eventually via poll (pvio_socket_wait_io_or_timeout) a file the connection will be ready. Looking at linux source (net/unix/af_unix.c) this happens when the recv queue is full. When this state the socket isn't in a connected state and polling it won't make it connect. So options are:
|
| Comments |
| Comment by Daniel Black [ 2018-04-11 ] | |||||||||||||||||||||||||||||
|
EAGAIN, which is the same as EWOULDBLOCK in linux, and as you see the connect(2) description isn't helpful/ Tried patch as attached. For the case where a connect returns EAGAIN:
So it seems either the epoll is uncorrelated to the success of a connect or another process is grabbing the resources (i did have multiple sysbench processes attempting that socket). It took ~7000 attempts at connection before successful so something more intelligent is needed. Unfortunately I didn't record timings in this strace. Looking similar: https://lists.mysql.com/internals/35077 If you can think of something that might be useful/test let me know. | |||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2018-04-17 ] | |||||||||||||||||||||||||||||
|
Thanks for the commit. I can now see why the poll didn't have an effect, the destination socket is the cause of the EAGAIN, polling on an unconnected socket is unaware of this backlog. Could also defer the setting of the socket to non-blocking after the connect which would allow the kernel to wait (https://github.com/torvalds/linux/blob/master/net/unix/af_unix.c#L1266..L1268) and possibly not error saving time in this loop. | |||||||||||||||||||||||||||||
| Comment by Georg Richter [ 2018-04-23 ] | |||||||||||||||||||||||||||||
|
Fixed in 3.0.4 |