Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
There are two different row format for resultset : TEXT and BINARY format.
When retrieving numeric values in text format there is a lot of overhead:
- Server converts numeric values into string
- String values (which might be larger than the native binary format) need to be transferred via wire
- Client needs to store the string and convert it back to a numeric value
The difference mainly concern :
- null that are encoded using 1 bit, compare to 1 byte
- numeric: text versus native. example "2147483647" will take 11 bytes (1 for length) vs 4 bytes for binary
- date/time/timestamp: text versus "semi-native". example: "2001-01-01 00:00:00" takes 20 bytes (1 for length) vs 8 bytes
-When tested, representative workload TPC-C data is 40208 bytes for text vs 31325 for binary row (without header).-
That much exchange on network improve drastically performance.
Parsing native data will improve client execution too, for example in Python benchmarks fetching numeric values is up to 60% faster.
A minor annoyance is that using BINARY format is less readable for debugging
Proposal: Add an additionally capability MARIADB_BINARY_RESULT. If supported, the client will send this capability flag during handshake. The server afterwards will send result sets for COM_QUERY in binary format.