[MDEV-14500] filesort to support engines with slow rnd_pos - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 10.4.1
Component/s: Optimizer, Storage Engine - Spider
Labels:
None

Description

There are many engines that either do not support position()/rnd_pos() at all (ColumnStore) or these methods work very slowly (Archive, CSV).

This is a limitation a server should be able to cope with, instead of forcing every such engine to implement an imperfect workaround.

The method rnd_pos() is primarily needed for filesort. The server sorts sort keys / row "positions" pairs and then reads positions and retrieves corresponding rows.

There are two techniques a server can use to avoid using rnd_pos() in filesort. First, filesort can store actual column values instead of positions, then it won't need to retrieve rows afterwards. This is enabled automatically, depending on the length of these "addon" columns. Second, OPTION_BUFFER_RESULT flag can force all results to go into a temporary table.

So, the solution for not-rnd_pos-able engines could be, like, "force OPTION_BUFFER_RESULT unless filesort puts all fileds into the addon list".

Additionally it'd make sense to adjust the addon heuristics to use addon fields more, comparing the cost with the cost of a temporary table.

Attachments

Issue Links

relates to

MCOL-643 support longer then 8m rows in order by

Closed

MCOL-1098 Implement/test new server ORDER BY implementation

Closed

Activity

Ascending order - Click to sort in descending order

Jacob Mathew (Inactive) added a comment - 2018-02-20 07:07

Sergei, please review my changes committed in 884b83e to my development branch bb-10.3-~~MDEV-14500~~.

Jacob Mathew (Inactive) added a comment - 2018-02-20 07:07 Sergei, please review my changes committed in 884b83e to my development branch bb-10.3- MDEV-14500 .

Sergei Golubchik added a comment - 2018-08-21 12:33

jacob-mathew, please add test cases for your change. I'd like to walk through some part of the code in the debugger.

Sergei Golubchik added a comment - 2018-08-21 12:33 jacob-mathew , please add test cases for your change. I'd like to walk through some part of the code in the debugger.

Sergei Golubchik added a comment - 2018-10-11 16:43

rnd_pos() is rather expensive in Archive and CSV engines. I tried a simple test case with your patch:

source include/have_archive.inc;

source include/have_sequence.inc;

create table st(c1 int not null, c2 double not null, c3 char(255) not null) engine=archive;

insert st select seq, seq+0.7, concat('row with c1 = ', seq) from seq_1_to_10;

flush status;

set max_length_for_sort_data = 4;

select c1,c3 from st order by c2;

select * from information_schema.session_status where variable_name like 'handler%read%' and variable_value != 0;

drop table st;

the test worked, result was correct, but, of course, rnd_pos was used. Then I added

bool is_rnd_pos_expensive() { return TRUE; }

to ha_archive.h and the test crashed on SELECT query. Same result with CSV...

Sergei Golubchik added a comment - 2018-10-11 16:43 rnd_pos() is rather expensive in Archive and CSV engines. I tried a simple test case with your patch: source include/have_archive.inc; source include/have_sequence.inc; create table st(c1 int not null , c2 double not null , c3 char (255) not null ) engine=archive; insert st select seq, seq+0.7, concat( 'row with c1 = ' , seq) from seq_1_to_10; flush status; set max_length_for_sort_data = 4; select c1,c3 from st order by c2; select * from information_schema.session_status where variable_name like 'handler%read%' and variable_value != 0; drop table st; the test worked, result was correct, but, of course, rnd_pos was used. Then I added bool is_rnd_pos_expensive() { return TRUE; } to ha_archive.h and the test crashed on SELECT query. Same result with CSV...

Jacob Mathew (Inactive) added a comment - 2018-10-12 20:58

I cannot reproduce the crash. I added the following to both ha_archive.h and ha_tina.h and rebuilt:

bool is_rnd_pos_expensive() { return TRUE; }

I ran the above test case for both the Archive and CSV engines, on Windows 10 and Centos 7.3, with debug and release builds. There was no crash and the test produced the correct results in all runs. When running with the debug builds, I verified that the new filesort tmp table was being used.

Please provide the stack trace of your crash.

Jacob Mathew (Inactive) added a comment - 2018-10-12 20:58 I cannot reproduce the crash. I added the following to both ha_archive.h and ha_tina.h and rebuilt: bool is_rnd_pos_expensive() { return TRUE; } I ran the above test case for both the Archive and CSV engines, on Windows 10 and Centos 7.3, with debug and release builds. There was no crash and the test produced the correct results in all runs. When running with the debug builds, I verified that the new filesort tmp table was being used. Please provide the stack trace of your crash.

Sergei Golubchik added a comment - 2018-11-15 19:49 - edited

New problem. This code doesn't handle temp table overflow. Normally it's done like (from sql_show.cc)

  if ((error= table->file->ha_write_tmp_row(table->record[0])))

    TMP_TABLE_PARAM *param= table->pos_in_table_list->schema_table_param;

    if (create_internal_tmp_table_from_heap(thd, table, param->start_recinfo,

                                            &param->recinfo, error, 0, NULL))

      return 1;

  return 0;

and create_internal_tmp_table_from_heap() copies the MEMORY table to MyISAM or Aria table and updates TABLE *table accordingly.

But in this case, when MEMORY table overflows we've already remembered positions of all its rows in the sort buffer (and BUFFPEK-s on disk). Moving all rows to a table in a different engine would invalidate all positions.

Not yet sure what would be the best way to solve it...

Sergei Golubchik added a comment - 2018-11-15 19:49 - edited New problem. This code doesn't handle temp table overflow. Normally it's done like (from sql_show.cc ) if ((error= table->file->ha_write_tmp_row(table->record[0]))) { TMP_TABLE_PARAM *param= table->pos_in_table_list->schema_table_param; if (create_internal_tmp_table_from_heap(thd, table, param->start_recinfo, &param->recinfo, error, 0, NULL)) return 1; } return 0; and create_internal_tmp_table_from_heap() copies the MEMORY table to MyISAM or Aria table and updates TABLE *table accordingly. But in this case, when MEMORY table overflows we've already remembered positions of all its rows in the sort buffer (and BUFFPEK -s on disk). Moving all rows to a table in a different engine would invalidate all positions. Not yet sure what would be the best way to solve it...

Oleksandr Byelkin added a comment - 2018-11-20 10:46

OK to push

Oleksandr Byelkin added a comment - 2018-11-20 10:46 OK to push

People

Assignee:: Sergei Golubchik

Reporter:: Sergei Golubchik

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2017-11-24 15:18

Updated:: 2018-11-20 14:42

Resolved:: 2018-11-20 14:42

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server