The problem:
Some functions in class handler assume the record is read to table->record[0]:
virtual int read_range_first(const key_range *start_key,
|
const key_range *end_key,
|
bool eq_range, bool sorted);
|
virtual int read_range_next();
|
virtual int multi_range_read_next(range_id_t *range_info);
|
Some functions put the record into the passed buffer:
int ha_index_read_map(uchar * buf, const uchar * key,
|
key_part_map keypart_map,
|
enum ha_rkey_function find_flag);
|
int ha_index_next_same(uchar *buf, const uchar *key, uint keylen);
|
IndexConditionPushdown assumes that the index column values are unpacked into table->record[0]. This is because Item *pushed_index_cond unpacks there.
Typically, there's no difference as buf == table->record[0].
There are exceptions: handler::get_auto_increment(), handler::ha_check_overlaps, partitioning.
We hit the issue with partitioning. ha_partition::handle_ordered_index_scan() uses a Priority Queue to merge ordered streams of records it has read from different partitions.
It has calls like:
|
case partition_index_read:
|
error= file->ha_index_read_map(rec_buf_ptr,
|
m_start_key.key,
|
m_start_key.keypart_map,
|
m_start_key.flag);
|
|
case partition_index_first:
|
error= file->ha_index_first(rec_buf_ptr);
|
reverse_order= FALSE;
|
break;
|
When the API can only read to table->record[0], it does so and copies the record:
case partition_read_range:
|
{
|
/*
|
This can only read record to table->record[0], as it was set when
|
the table was being opened. We have to memcpy data ourselves.
|
*/
|
error= file->read_range_first(m_start_key.key? &m_start_key: NULL,
|
end_range, eq_range, TRUE);
|
if (likely(!error))
|
memcpy(rec_buf_ptr, table->record[0], m_rec_length);
|
reverse_order= FALSE;
|
break;
|
}
|
Currently, this sequence of SE API calls:
h->push_index_cond(cond);
|
h->ha_index_read_map(some_buffer, ...);
|
does NOT produce a valid result, for both MyISAM and InnoDB.
Both will unpack the index columns into some_buffer and then call handler_index_cond_check() which will read index columns from table->record[0].
Possible ways out
1. Adopt and document the limitation
2. Disallow reads to side buffer.
3. Make index_read_map and co work with ICP and side buffers
Adopt and document the limitation
Document that one must not call ha_index_read_map(buffer != table->record[0], .... ) when
ICP is enabled, add an assertion in the function.
ha_partitioning will copy record when necessary.
Disallow reads to buffer!=table->record[0]
Makes the API uniform but will incur some record copying where previously was none.
Make index_read_map and co work with ICP and side buffers
let them unpack index columns into table->record[0] for checking.
If check succeeded and we're reading to side buffer, unpack into *buffer.
ha_partitioning will not need any modification. Some extra copying will be done in the engine.
A simple testcase:
shows
{
"query_block": {
"select_id": 1,
"cost": 0.02196592,
"nested_loop": [
{
"table": {
"table_name": "t1",
"partitions": ["p0", "p1", "p2", "p3"],
"access_type": "range",
"possible_keys": ["a"],
"key": "a",
"key_length": "5",
"used_key_parts": ["a"],
"loops": 1,
"rows": 9,
"cost": 0.02196592,
"filtered": 100,
"attached_condition": "t1.a < 10 and t1.b + 1 > 3"
}
}
]
}
}
Note that the conditions are in attached_condition.
For comparison, let's run the same on a non-partitioned table:
);
shows
{
"query_block": {
"select_id": 1,
"cost": 0.0146548,
"nested_loop": [
{
"table": {
"table_name": "t2",
"access_type": "range",
"possible_keys": ["a"],
"key": "a",
"key_length": "5",
"used_key_parts": ["a"],
"loops": 1,
"rows": 9,
"cost": 0.0146548,
"filtered": 100,
"index_condition": "t2.a < 10 and t2.b + 1 > 3"
}
}
]
}
}
Now, the conditions are in the index_condition.