The WSARecv() calls that we use for the notification about new queries from client, allow (and are typically used with) the output buffer, so whenever notification happens, you already have read off the data.
So far we passed 0-sized buffer to WSARecv (which I think is not officially documented, but appears to always work). This emulates Unix-poll readiness notification, works fine, but can be much better optimized.
For example, before the query starts, we currently we do WSARecv() at least 3 times - once to get completion/readiness notification ( receiving 0 bytes ), another time when the packet header is read (4 bytes buffer), and the rest of packet (depends on how length parsed from the header). Even if 2 later calls would normally read from the socket buffer, and return without wait , it is an unnecessarily expensive copying from winsock buffers into user buffers.
These 3 WSARecv calls can be reduced to 1 , if we passed a buffer during network AIO, which is filled upon AIO completion.
All of the above also applies to named pipes( ReadFile instead of WSARecv)
Note that in sysbench point-select workload that runs in memory, reading user input with ReadFile(WSARecv) is the most expensive operation, measured at 12% with named pipes