Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14500

filesort to support engines with slow rnd_pos

Details

    Description

      There are many engines that either do not support position()/rnd_pos() at all (ColumnStore) or these methods work very slowly (Archive, CSV).

      This is a limitation a server should be able to cope with, instead of forcing every such engine to implement an imperfect workaround.

      The method rnd_pos() is primarily needed for filesort. The server sorts sort keys / row "positions" pairs and then reads positions and retrieves corresponding rows.

      There are two techniques a server can use to avoid using rnd_pos() in filesort. First, filesort can store actual column values instead of positions, then it won't need to retrieve rows afterwards. This is enabled automatically, depending on the length of these "addon" columns. Second, OPTION_BUFFER_RESULT flag can force all results to go into a temporary table.

      So, the solution for not-rnd_pos-able engines could be, like, "force OPTION_BUFFER_RESULT unless filesort puts all fileds into the addon list".

      Additionally it'd make sense to adjust the addon heuristics to use addon fields more, comparing the cost with the cost of a temporary table.

      Attachments

        Issue Links

          Activity

            Sergei, please review my changes committed in 884b83e to my development branch bb-10.3-MDEV-14500.

            jacob-mathew Jacob Mathew (Inactive) added a comment - Sergei, please review my changes committed in 884b83e to my development branch bb-10.3- MDEV-14500 .

            jacob-mathew, please add test cases for your change. I'd like to walk through some part of the code in the debugger.

            serg Sergei Golubchik added a comment - jacob-mathew , please add test cases for your change. I'd like to walk through some part of the code in the debugger.

            rnd_pos() is rather expensive in Archive and CSV engines. I tried a simple test case with your patch:

            source include/have_archive.inc;
            source include/have_sequence.inc;
            create table st(c1 int not null, c2 double not null, c3 char(255) not null) engine=archive;
            insert st select seq, seq+0.7, concat('row with c1 = ', seq) from seq_1_to_10;
            flush status;
            set max_length_for_sort_data = 4;
            select c1,c3 from st order by c2;
            select * from information_schema.session_status where variable_name like 'handler%read%' and variable_value != 0;
            drop table st;
            

            the test worked, result was correct, but, of course, rnd_pos was used. Then I added

            bool is_rnd_pos_expensive() { return TRUE; }
            

            to ha_archive.h and the test crashed on SELECT query. Same result with CSV...

            serg Sergei Golubchik added a comment - rnd_pos() is rather expensive in Archive and CSV engines. I tried a simple test case with your patch: source include/have_archive.inc; source include/have_sequence.inc; create table st(c1 int not null , c2 double not null , c3 char (255) not null ) engine=archive; insert st select seq, seq+0.7, concat( 'row with c1 = ' , seq) from seq_1_to_10; flush status; set max_length_for_sort_data = 4; select c1,c3 from st order by c2; select * from information_schema.session_status where variable_name like 'handler%read%' and variable_value != 0; drop table st; the test worked, result was correct, but, of course, rnd_pos was used. Then I added bool is_rnd_pos_expensive() { return TRUE; } to ha_archive.h and the test crashed on SELECT query. Same result with CSV...

            I cannot reproduce the crash. I added the following to both ha_archive.h and ha_tina.h and rebuilt:

            bool is_rnd_pos_expensive() { return TRUE; }
            

            I ran the above test case for both the Archive and CSV engines, on Windows 10 and Centos 7.3, with debug and release builds. There was no crash and the test produced the correct results in all runs. When running with the debug builds, I verified that the new filesort tmp table was being used.

            Please provide the stack trace of your crash.

            jacob-mathew Jacob Mathew (Inactive) added a comment - I cannot reproduce the crash. I added the following to both ha_archive.h and ha_tina.h and rebuilt: bool is_rnd_pos_expensive() { return TRUE; } I ran the above test case for both the Archive and CSV engines, on Windows 10 and Centos 7.3, with debug and release builds. There was no crash and the test produced the correct results in all runs. When running with the debug builds, I verified that the new filesort tmp table was being used. Please provide the stack trace of your crash.
            serg Sergei Golubchik added a comment - - edited

            New problem. This code doesn't handle temp table overflow. Normally it's done like (from sql_show.cc)

              if ((error= table->file->ha_write_tmp_row(table->record[0])))
              {
                TMP_TABLE_PARAM *param= table->pos_in_table_list->schema_table_param;
                if (create_internal_tmp_table_from_heap(thd, table, param->start_recinfo, 
                                                        &param->recinfo, error, 0, NULL))
             
                  return 1;
              }
              return 0;
            

            and create_internal_tmp_table_from_heap() copies the MEMORY table to MyISAM or Aria table and updates TABLE *table accordingly.

            But in this case, when MEMORY table overflows we've already remembered positions of all its rows in the sort buffer (and BUFFPEK-s on disk). Moving all rows to a table in a different engine would invalidate all positions.

            Not yet sure what would be the best way to solve it...

            serg Sergei Golubchik added a comment - - edited New problem. This code doesn't handle temp table overflow. Normally it's done like (from sql_show.cc ) if ((error= table->file->ha_write_tmp_row(table->record[0]))) { TMP_TABLE_PARAM *param= table->pos_in_table_list->schema_table_param; if (create_internal_tmp_table_from_heap(thd, table, param->start_recinfo, &param->recinfo, error, 0, NULL))   return 1; } return 0; and create_internal_tmp_table_from_heap() copies the MEMORY table to MyISAM or Aria table and updates TABLE *table accordingly. But in this case, when MEMORY table overflows we've already remembered positions of all its rows in the sort buffer (and BUFFPEK -s on disk). Moving all rows to a table in a different engine would invalidate all positions. Not yet sure what would be the best way to solve it...

            OK to push

            sanja Oleksandr Byelkin added a comment - OK to push

            People

              serg Sergei Golubchik
              serg Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.