Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23867

insert... select crash in compute_window_func

Details

    Description

      A customer is experiencing crashes when an ETL query is executed.
      See stack trace and details.

      Attachments

        Issue Links

          Activity

            rpizzi Rick Pizzi (Inactive) created issue -
            rpizzi Rick Pizzi (Inactive) made changes -
            Field Original Value New Value
            Attachment 23867.txt [ 54116 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            varun Varun Gupta (Inactive) made changes -
            varun Varun Gupta (Inactive) made changes -
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ]
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Varun Gupta [ varun ]

            rpizzi can you also attach the global variables list

            varun Varun Gupta (Inactive) added a comment - rpizzi can you also attach the global variables list

            reproducible test case, with encrypt-tmp-files=ON

            set sort_buffer_size= 2000;
            create table t1( a DECIMAL(12,0) DEFAULT NULL, b VARCHAR(20) DEFAULT NULL, c DECIMAL(12,0) DEFAULT NULL)engine=INNODB;
            insert into t1 select seq, seq, seq from seq_1_to_5000;
            select count(*) from (select a, b, c, sum(b) OVER (PARTITION BY a) FROM t1)q;
            

            varun Varun Gupta (Inactive) added a comment - reproducible test case, with encrypt-tmp-files=ON set sort_buffer_size= 2000; create table t1( a DECIMAL (12,0) DEFAULT NULL , b VARCHAR (20) DEFAULT NULL , c DECIMAL (12,0) DEFAULT NULL )engine=INNODB; insert into t1 select seq, seq, seq from seq_1_to_5000; select count (*) from ( select a, b, c, sum (b) OVER (PARTITION BY a) FROM t1)q;

            The stacktrace for the failure on debug build

            Program terminated with signal SIGABRT, Aborted.
            #0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
            56	../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
            [Current thread is 1 (Thread 0x7fad85c67700 (LWP 240685))]
            #0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
            #1  0x000055fb818aaa89 in my_write_core (sig=6) at /home/varun/MariaDB/10.4/mysys/stacktrace.c:386
            #2  0x000055fb80f35f5f in handle_fatal_signal (sig=6) at /home/varun/MariaDB/10.4/sql/signal_handler.cc:343
            #3  <signal handler called>
            #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            #5  0x00007fad8caa3859 in __GI_abort () at abort.c:79
            #6  0x00007fad8caa3729 in __assert_fail_base (fmt=0x7fad8cc39588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55fb81a17768 "pos_in_file % info->buffer_length == 0", file=0x55fb81a17728 "/home/varun/MariaDB/10.4/sql/mf_iocache_encr.cc", line=82, function=<optimized out>) at assert.c:92
            #7  0x00007fad8cab4f36 in __GI___assert_fail (assertion=0x55fb81a17768 "pos_in_file % info->buffer_length == 0", file=0x55fb81a17728 "/home/varun/MariaDB/10.4/sql/mf_iocache_encr.cc", line=82, function=0x55fb81a17790 "int my_b_encr_read(IO_CACHE*, uchar*, size_t)") at assert.c:101
            #8  0x000055fb80dd0f25 in my_b_encr_read (info=0x7fad3013dfa8, Buffer=0x7fad30135398 "\230\177\034\060\255\177", Count=8) at /home/varun/MariaDB/10.4/sql/mf_iocache_encr.cc:82
            #9  0x000055fb818813cf in _my_b_read (info=0x7fad3013dfa8, Buffer=0x7fad30135398 "\230\177\034\060\255\177", Count=8) at /home/varun/MariaDB/10.4/mysys/mf_iocache.c:577
            #10 0x000055fb810ef848 in my_b_read (info=0x7fad3013dfa8, Buffer=0x7fad30135398 "\230\177\034\060\255\177", Count=8) at /home/varun/MariaDB/10.4/include/my_sys.h:532
            #11 0x000055fb810f0782 in rr_from_tempfile (info=0x7fad85c64c20) at /home/varun/MariaDB/10.4/sql/records.cc:499
            #12 0x000055fb80b1d00d in READ_RECORD::read_record (this=0x7fad85c64c20) at /home/varun/MariaDB/10.4/sql/records.h:70
            #13 0x000055fb80e174ee in compute_window_func (thd=0x7fad30000d90, window_functions=..., cursor_managers=..., tbl=0x7fad30061ed8, filesort_result=0x7fad3013df80) at /home/varun/MariaDB/10.4/sql/sql_window.cc:2823
            #14 0x000055fb80e17ae3 in Window_func_runner::exec (this=0x7fad30069cf8, thd=0x7fad30000d90, tbl=0x7fad30061ed8, filesort_result=0x7fad3013df80) at /home/varun/MariaDB/10.4/sql/sql_window.cc:2968
            #15 0x000055fb80e17bd6 in Window_funcs_sort::exec (this=0x7fad30069cf0, join=0x7fad30016f78, keep_filesort_result=true) at /home/varun/MariaDB/10.4/sql/sql_window.cc:2996
            #16 0x000055fb80e18186 in Window_funcs_computation::exec (this=0x7fad30019410, join=0x7fad30016f78, keep_last_filesort_result=true) at /home/varun/MariaDB/10.4/sql/sql_window.cc:3123
            #17 0x000055fb80c67f66 in AGGR_OP::end_send (this=0x7fad30019080) at /home/varun/MariaDB/10.4/sql/sql_select.cc:28780
            #18 0x000055fb80c508df in sub_select_postjoin_aggr (join=0x7fad30016f78, join_tab=0x7fad30069490, end_of_records=true) at /home/varun/MariaDB/10.4/sql/sql_select.cc:20182
            #19 0x000055fb80c50c52 in sub_select (join=0x7fad30016f78, join_tab=0x7fad300690e8, end_of_records=true) at /home/varun/MariaDB/10.4/sql/sql_select.cc:20417
            #20 0x000055fb80c503b8 in do_select (join=0x7fad30016f78, procedure=0x0) at /home/varun/MariaDB/10.4/sql/sql_select.cc:20008
            #21 0x000055fb80c24ddf in JOIN::exec_inner (this=0x7fad30016f78) at /home/varun/MariaDB/10.4/sql/sql_select.cc:4496
            #22 0x000055fb80c23f0e in JOIN::exec (this=0x7fad30016f78) at /home/varun/MariaDB/10.4/sql/sql_select.cc:4278
            #23 0x000055fb80c25662 in mysql_select (thd=0x7fad30000d90, tables=0x7fad30014a30, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2416184064, result=0x7fad30016e90, unit=0x7fad30015118, select_lex=0x7fad30013ba8) at /home/varun/MariaDB/10.4/sql/sql_select.cc:4713
            #24 0x000055fb80b78e30 in mysql_derived_fill (thd=0x7fad30000d90, lex=0x7fad30004c00, derived=0x7fad30015950) at /home/varun/MariaDB/10.4/sql/sql_derived.cc:1255
            #25 0x000055fb80b75de4 in mysql_handle_single_derived (lex=0x7fad30004c00, derived=0x7fad30015950, phases=96) at /home/varun/MariaDB/10.4/sql/sql_derived.cc:206
            #26 0x000055fb80c3e8bd in st_join_table::preread_init (this=0x7fad3006af88) at /home/varun/MariaDB/10.4/sql/sql_select.cc:13562
            #27 0x000055fb80c50cdc in sub_select (join=0x7fad30016920, join_tab=0x7fad3006af88, end_of_records=false) at /home/varun/MariaDB/10.4/sql/sql_select.cc:20434
            #28 0x000055fb80c5034e in do_select (join=0x7fad30016920, procedure=0x0) at /home/varun/MariaDB/10.4/sql/sql_select.cc:20006
            #29 0x000055fb80c24ddf in JOIN::exec_inner (this=0x7fad30016920) at /home/varun/MariaDB/10.4/sql/sql_select.cc:4496
            #30 0x000055fb80c23f0e in JOIN::exec (this=0x7fad30016920) at /home/varun/MariaDB/10.4/sql/sql_select.cc:4278
            #31 0x000055fb80c25662 in mysql_select (thd=0x7fad30000d90, tables=0x7fad30015950, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fad300168f8, unit=0x7fad30004cc0, select_lex=0x7fad30013548) at /home/varun/MariaDB/10.4/sql/sql_select.cc:4713
            

            varun Varun Gupta (Inactive) added a comment - The stacktrace for the failure on debug build Program terminated with signal SIGABRT, Aborted. #0 __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56 56 ../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory. [Current thread is 1 (Thread 0x7fad85c67700 (LWP 240685))] #0 __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56 #1 0x000055fb818aaa89 in my_write_core (sig=6) at /home/varun/MariaDB/10.4/mysys/stacktrace.c:386 #2 0x000055fb80f35f5f in handle_fatal_signal (sig=6) at /home/varun/MariaDB/10.4/sql/signal_handler.cc:343 #3 <signal handler called> #4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #5 0x00007fad8caa3859 in __GI_abort () at abort.c:79 #6 0x00007fad8caa3729 in __assert_fail_base (fmt=0x7fad8cc39588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55fb81a17768 "pos_in_file % info->buffer_length == 0", file=0x55fb81a17728 "/home/varun/MariaDB/10.4/sql/mf_iocache_encr.cc", line=82, function=<optimized out>) at assert.c:92 #7 0x00007fad8cab4f36 in __GI___assert_fail (assertion=0x55fb81a17768 "pos_in_file % info->buffer_length == 0", file=0x55fb81a17728 "/home/varun/MariaDB/10.4/sql/mf_iocache_encr.cc", line=82, function=0x55fb81a17790 "int my_b_encr_read(IO_CACHE*, uchar*, size_t)") at assert.c:101 #8 0x000055fb80dd0f25 in my_b_encr_read (info=0x7fad3013dfa8, Buffer=0x7fad30135398 "\230\177\034\060\255\177", Count=8) at /home/varun/MariaDB/10.4/sql/mf_iocache_encr.cc:82 #9 0x000055fb818813cf in _my_b_read (info=0x7fad3013dfa8, Buffer=0x7fad30135398 "\230\177\034\060\255\177", Count=8) at /home/varun/MariaDB/10.4/mysys/mf_iocache.c:577 #10 0x000055fb810ef848 in my_b_read (info=0x7fad3013dfa8, Buffer=0x7fad30135398 "\230\177\034\060\255\177", Count=8) at /home/varun/MariaDB/10.4/include/my_sys.h:532 #11 0x000055fb810f0782 in rr_from_tempfile (info=0x7fad85c64c20) at /home/varun/MariaDB/10.4/sql/records.cc:499 #12 0x000055fb80b1d00d in READ_RECORD::read_record (this=0x7fad85c64c20) at /home/varun/MariaDB/10.4/sql/records.h:70 #13 0x000055fb80e174ee in compute_window_func (thd=0x7fad30000d90, window_functions=..., cursor_managers=..., tbl=0x7fad30061ed8, filesort_result=0x7fad3013df80) at /home/varun/MariaDB/10.4/sql/sql_window.cc:2823 #14 0x000055fb80e17ae3 in Window_func_runner::exec (this=0x7fad30069cf8, thd=0x7fad30000d90, tbl=0x7fad30061ed8, filesort_result=0x7fad3013df80) at /home/varun/MariaDB/10.4/sql/sql_window.cc:2968 #15 0x000055fb80e17bd6 in Window_funcs_sort::exec (this=0x7fad30069cf0, join=0x7fad30016f78, keep_filesort_result=true) at /home/varun/MariaDB/10.4/sql/sql_window.cc:2996 #16 0x000055fb80e18186 in Window_funcs_computation::exec (this=0x7fad30019410, join=0x7fad30016f78, keep_last_filesort_result=true) at /home/varun/MariaDB/10.4/sql/sql_window.cc:3123 #17 0x000055fb80c67f66 in AGGR_OP::end_send (this=0x7fad30019080) at /home/varun/MariaDB/10.4/sql/sql_select.cc:28780 #18 0x000055fb80c508df in sub_select_postjoin_aggr (join=0x7fad30016f78, join_tab=0x7fad30069490, end_of_records=true) at /home/varun/MariaDB/10.4/sql/sql_select.cc:20182 #19 0x000055fb80c50c52 in sub_select (join=0x7fad30016f78, join_tab=0x7fad300690e8, end_of_records=true) at /home/varun/MariaDB/10.4/sql/sql_select.cc:20417 #20 0x000055fb80c503b8 in do_select (join=0x7fad30016f78, procedure=0x0) at /home/varun/MariaDB/10.4/sql/sql_select.cc:20008 #21 0x000055fb80c24ddf in JOIN::exec_inner (this=0x7fad30016f78) at /home/varun/MariaDB/10.4/sql/sql_select.cc:4496 #22 0x000055fb80c23f0e in JOIN::exec (this=0x7fad30016f78) at /home/varun/MariaDB/10.4/sql/sql_select.cc:4278 #23 0x000055fb80c25662 in mysql_select (thd=0x7fad30000d90, tables=0x7fad30014a30, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2416184064, result=0x7fad30016e90, unit=0x7fad30015118, select_lex=0x7fad30013ba8) at /home/varun/MariaDB/10.4/sql/sql_select.cc:4713 #24 0x000055fb80b78e30 in mysql_derived_fill (thd=0x7fad30000d90, lex=0x7fad30004c00, derived=0x7fad30015950) at /home/varun/MariaDB/10.4/sql/sql_derived.cc:1255 #25 0x000055fb80b75de4 in mysql_handle_single_derived (lex=0x7fad30004c00, derived=0x7fad30015950, phases=96) at /home/varun/MariaDB/10.4/sql/sql_derived.cc:206 #26 0x000055fb80c3e8bd in st_join_table::preread_init (this=0x7fad3006af88) at /home/varun/MariaDB/10.4/sql/sql_select.cc:13562 #27 0x000055fb80c50cdc in sub_select (join=0x7fad30016920, join_tab=0x7fad3006af88, end_of_records=false) at /home/varun/MariaDB/10.4/sql/sql_select.cc:20434 #28 0x000055fb80c5034e in do_select (join=0x7fad30016920, procedure=0x0) at /home/varun/MariaDB/10.4/sql/sql_select.cc:20006 #29 0x000055fb80c24ddf in JOIN::exec_inner (this=0x7fad30016920) at /home/varun/MariaDB/10.4/sql/sql_select.cc:4496 #30 0x000055fb80c23f0e in JOIN::exec (this=0x7fad30016920) at /home/varun/MariaDB/10.4/sql/sql_select.cc:4278 #31 0x000055fb80c25662 in mysql_select (thd=0x7fad30000d90, tables=0x7fad30015950, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fad300168f8, unit=0x7fad30004cc0, select_lex=0x7fad30013548) at /home/varun/MariaDB/10.4/sql/sql_select.cc:4713
            serg Sergei Golubchik made changes -
            Priority Blocker [ 1 ] Critical [ 2 ]
            varun Varun Gupta (Inactive) added a comment - Patch addressing part of the problem http://lists.askmonty.org/pipermail/commits/2020-October/014343.html
            psergei Sergei Petrunia added a comment - - edited

            Ok, so I see two issues that have caused this MDEV.

            Both are caused by two features not taking each other into account:

            • There are special kinds of IO_CACHE objects that read (and write) encrypted data.
            • Window functions code uses init_slave_io_cache() to create "slave" IO caches that read the same data.

            Issue #1: memory allocation.

            An IO_CACHE that uses encryption uses a larger buffer (it needs space for the encrypted data, decrypted data, IO_CACHE_CRYPT struct to describe encryption parameters etc).

            The logic to do this is here in init_io_cache():

                  if (type == SEQ_READ_APPEND)
            	buffer_block *= 2;
                  else if (cache_myflags & MY_ENCRYPT)
                    buffer_block= 2*(buffer_block + MY_AES_BLOCK_SIZE) + sizeof(IO_CACHE_CRYPT);
                  ...
                  if ((info->buffer= (uchar*) my_malloc(buffer_block, flags)) != 0)
            

            init_slave_io_cache() doesn't know about this. It allocates memory like so:

              if (!(slave_buf= (uchar*)my_malloc(master->buffer_length, MYF(0))))
            

            This is why we get a memory overrun.

            Varun's patch tries to fix this by copying the logic:

            -  if (!(slave_buf= (uchar*)my_malloc(master->buffer_length, MYF(0))))
            +  buffer_length= master->buffer_length;
            +  if (master->type == SEQ_READ_APPEND)
            +    buffer_length *= 2;
            +  else if (master->myflags & MY_ENCRYPT)
            +    buffer_length= 2*(buffer_length + MY_AES_BLOCK_SIZE) + sizeof(IO_CACHE_CRYPT);
            +
            

            I would suggest a dedicated member saying how much was allocated.
            e.g. change IO_CACHE::alloced_buffer from bool to size_t.

            Issue #2: IO_CACHE::seek_not_done

            When IO_CACHE objects are cloned, they still share the file descriptor.
            This means, operation on one IO_CACHE may change the file read position which will confuse other IO_CACHEs using it.

            In order to avoid this, there is IO_CACHE::next_file_user which is a circular list of objects using the same file. If an IO_CACHE operates on the file, it it sets IO_CACHE::seek_not_done=1 for every other IO_CACHE in the list.

            _my_b_cache_read has pieces like:

                if (info->next_file_user)
                {
                  IO_CACHE *c;
                  for (c= info->next_file_user;
                       c!= info;
                       c= c->next_file_user)
                  {
                    c->seek_not_done= 1;
                  }
                }
            

            But my_b_encr_read doesn't do this. This needs to be fixed.

            psergei Sergei Petrunia added a comment - - edited Ok, so I see two issues that have caused this MDEV. Both are caused by two features not taking each other into account: There are special kinds of IO_CACHE objects that read (and write) encrypted data. Window functions code uses init_slave_io_cache() to create "slave" IO caches that read the same data. Issue #1: memory allocation. An IO_CACHE that uses encryption uses a larger buffer (it needs space for the encrypted data, decrypted data, IO_CACHE_CRYPT struct to describe encryption parameters etc). The logic to do this is here in init_io_cache(): if (type == SEQ_READ_APPEND) buffer_block *= 2; else if (cache_myflags & MY_ENCRYPT) buffer_block= 2*(buffer_block + MY_AES_BLOCK_SIZE) + sizeof (IO_CACHE_CRYPT); ... if ((info->buffer= (uchar*) my_malloc(buffer_block, flags)) != 0) init_slave_io_cache() doesn't know about this. It allocates memory like so: if (!(slave_buf= (uchar*)my_malloc(master->buffer_length, MYF(0)))) This is why we get a memory overrun. Varun's patch tries to fix this by copying the logic: - if (!(slave_buf= (uchar*)my_malloc(master->buffer_length, MYF(0)))) + buffer_length= master->buffer_length; + if (master->type == SEQ_READ_APPEND) + buffer_length *= 2; + else if (master->myflags & MY_ENCRYPT) + buffer_length= 2*(buffer_length + MY_AES_BLOCK_SIZE) + sizeof(IO_CACHE_CRYPT); + I would suggest a dedicated member saying how much was allocated. e.g. change IO_CACHE::alloced_buffer from bool to size_t. Issue #2: IO_CACHE::seek_not_done When IO_CACHE objects are cloned, they still share the file descriptor. This means, operation on one IO_CACHE may change the file read position which will confuse other IO_CACHEs using it. In order to avoid this, there is IO_CACHE::next_file_user which is a circular list of objects using the same file. If an IO_CACHE operates on the file, it it sets IO_CACHE::seek_not_done=1 for every other IO_CACHE in the list. _my_b_cache_read has pieces like: if (info->next_file_user) { IO_CACHE *c; for (c= info->next_file_user; c!= info; c= c->next_file_user) { c->seek_not_done= 1; } } But my_b_encr_read doesn't do this. This needs to be fixed.

            varun but you've also mentioned there was a 3rd problem? I can't observe it on my side- when I apply the attached patch, the example works correctly for me.

            psergei Sergei Petrunia added a comment - varun but you've also mentioned there was a 3rd problem? I can't observe it on my side- when I apply the attached patch, the example works correctly for me.

            psergey the problem you mentioned in Issue #2 is the one I was talking about.
            I added a hack in the patch to do it

            diff --git a/sql/sql_window.cc b/sql/sql_window.cc
            index abb3937096f..6fc2f65b299 100644
            --- a/sql/sql_window.cc
            +++ b/sql/sql_window.cc
            @@ -2865,6 +2865,8 @@ bool compute_window_func(THD *thd,
                 save_window_function_values(window_functions, tbl, rowid_buf);
             
                 rownum++;
            +    if (info.io_cache)
            +      info.io_cache->seek_not_done= true;
               }
            

            So your approach covers it

            varun Varun Gupta (Inactive) added a comment - psergey the problem you mentioned in Issue #2 is the one I was talking about. I added a hack in the patch to do it diff --git a/sql/sql_window.cc b/sql/sql_window.cc index abb3937096f..6fc2f65b299 100644 --- a/sql/sql_window.cc +++ b/sql/sql_window.cc @@ - 2865 , 6 + 2865 , 8 @@ bool compute_window_func(THD *thd, save_window_function_values(window_functions, tbl, rowid_buf); rownum++; + if (info.io_cache) + info.io_cache->seek_not_done= true ; } So your approach covers it
            psergei Sergei Petrunia added a comment - Draft patch: https://github.com/MariaDB/server/commit/e877e752ba2c0f107d4732301dc4a41d2d57a79d . Input provided over email.
            varun Varun Gupta (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            varun Varun Gupta (Inactive) made changes -
            Assignee Varun Gupta [ varun ] Sergei Petrunia [ psergey ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            Two concerns about the write -> pwrite conversion:

            Concern #1:
            There are two calls in mf_iocache.c left which depend on the file's current position:

            mf_iocache.c|1707| if (mysql_file_write(info->file,Buffer, length, info->myflags | MY_NABP))
            mf_iocache.c|1828| if (mysql_file_write(info->file, info->write_buffer, length,
            

            Imagine the following situation:

            Execution before this patch:

              // open file for writing
              mysql_file_write(...) // A call made at some other point than the above two calls
              mysql_file_write(...) // One of the above two calls.
            

            After the patch this becomes:

              // open file for writing
              mysql_file_pwrite(...)  // Note that this doesn't change the file position
              mysql_file_write(...) // One of the above two calls.
              // ^ the above writes at "current position" which is 0, so the write goes not
              // where it was intended.
            

            Is there any logic which would allow to say that the above will not happen?

            psergei Sergei Petrunia added a comment - Two concerns about the write -> pwrite conversion: Concern #1: There are two calls in mf_iocache.c left which depend on the file's current position: mf_iocache.c|1707| if (mysql_file_write(info->file,Buffer, length, info->myflags | MY_NABP)) mf_iocache.c|1828| if (mysql_file_write(info->file, info->write_buffer, length, Imagine the following situation: Execution before this patch: // open file for writing mysql_file_write(...) // A call made at some other point than the above two calls mysql_file_write(...) // One of the above two calls. After the patch this becomes: // open file for writing mysql_file_pwrite(...) // Note that this doesn't change the file position mysql_file_write(...) // One of the above two calls. // ^ the above writes at "current position" which is 0, so the write goes not // where it was intended. Is there any logic which would allow to say that the above will not happen?

            Concern #2 is this code in mi_check.c:

              old_file=info->dfile;
              info->dfile=info->rec_cache.file;
            

            Does the code that follows this assume that info->dfile is seeked to some particular offset?

              if (sort_info->current_key)
              {
                key=info->lastkey+info->s->base.max_key_length;
                if ((error=(*info->s->read_rnd)(info,sort_param->record,info->lastpos,0)) &&
            	error != HA_ERR_RECORD_DELETED)
                {
                  mi_check_print_error(param,"Can't read record to be removed");
                  info->dfile=old_file;
                  DBUG_RETURN(1);
                }
             
                for (i=0 ; i < sort_info->current_key ; i++)
                {
                  uint key_length=_mi_make_key(info,i,key,sort_param->record,info->lastpos);
                  if (_mi_ck_delete(info,i,key,key_length))
                  {
            	mi_check_print_error(param,"Can't delete key %d from record to be removed",i+1);
            	info->dfile=old_file;
            	DBUG_RETURN(1);
                  }
                }
                if (sort_param->calc_checksum)
                  param->glob_crc-=(*info->s->calc_checksum)(info, sort_param->record);
              }
              error=flush_io_cache(&info->rec_cache) || (*info->s->delete_record)(info);
              info->dfile=old_file;				/* restore actual value */
            

            .. and I have the same question about sort_delete_record() in storage/maria/ma_check.c.
            It also has

              row_info->dfile.file= row_info->rec_cache.file;
              if (flush_io_cache(&row_info->rec_cache))
                DBUG_RETURN(1);
              ...
            

            Before the patch, row_info->rec_cache.file was "seeked" to somewhere, now it is not.

            Any thoughts? If it is not immediately clear, let's ask Monty (this is very old code so I dont think anybody else rememers).

            psergei Sergei Petrunia added a comment - Concern #2 is this code in mi_check.c: old_file=info->dfile; info->dfile=info->rec_cache.file; Does the code that follows this assume that info->dfile is seeked to some particular offset? if (sort_info->current_key) { key=info->lastkey+info->s->base.max_key_length; if ((error=(*info->s->read_rnd)(info,sort_param->record,info->lastpos,0)) && error != HA_ERR_RECORD_DELETED) { mi_check_print_error(param, "Can't read record to be removed" ); info->dfile=old_file; DBUG_RETURN(1); }   for (i=0 ; i < sort_info->current_key ; i++) { uint key_length=_mi_make_key(info,i,key,sort_param->record,info->lastpos); if (_mi_ck_delete(info,i,key,key_length)) { mi_check_print_error(param, "Can't delete key %d from record to be removed" ,i+1); info->dfile=old_file; DBUG_RETURN(1); } } if (sort_param->calc_checksum) param->glob_crc-=(*info->s->calc_checksum)(info, sort_param->record); } error=flush_io_cache(&info->rec_cache) || (*info->s->delete_record)(info); info->dfile=old_file; /* restore actual value */ .. and I have the same question about sort_delete_record() in storage/maria/ma_check.c. It also has row_info->dfile.file= row_info->rec_cache.file; if (flush_io_cache(&row_info->rec_cache)) DBUG_RETURN(1); ... Before the patch, row_info->rec_cache.file was "seeked" to somewhere, now it is not. Any thoughts? If it is not immediately clear, let's ask Monty (this is very old code so I dont think anybody else rememers).

            No problem with mi_check or ma_check(), They are taking the file descriptor of the newly re-created data file and removing a duplicate row from it. The remove-row code is using pread/pwrite and thus is not affected by the position of the IO_CACHE file descriptor and will not affect the IO_CACHE code

            monty Michael Widenius added a comment - No problem with mi_check or ma_check(), They are taking the file descriptor of the newly re-created data file and removing a duplicate row from it. The remove-row code is using pread/pwrite and thus is not affected by the position of the IO_CACHE file descriptor and will not affect the IO_CACHE code

            Take-aways from yesterday discussion:

            • We cannot change mysql_file_read -> mysql_file_pread everywhere (as sometimes IO_CACHE is used to read from FIFOs)
            • The impact of the "use pread" approach is much larger than we anticipated.

            Due to this:

            • Lets switch to the older version of the patch, with seeks and updating seek_not_done for other slaves.
            • Then, we will make the larger patch which uses pread where appropriate and push that to a development version.
            psergei Sergei Petrunia added a comment - Take-aways from yesterday discussion: We cannot change mysql_file_read -> mysql_file_pread everywhere (as sometimes IO_CACHE is used to read from FIFOs) The impact of the "use pread" approach is much larger than we anticipated. Due to this: Lets switch to the older version of the patch, with seeks and updating seek_not_done for other slaves. Then, we will make the larger patch which uses pread where appropriate and push that to a development version.
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Varun Gupta [ varun ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            psergei Sergei Petrunia made changes -
            Priority Critical [ 2 ] Blocker [ 1 ]
            varun Varun Gupta (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]

            Ok to push the "conservative" fix for aec8072f99ef4b1a4a966692a1cb761abde96cf0 .

            psergei Sergei Petrunia added a comment - Ok to push the "conservative" fix for aec8072f99ef4b1a4a966692a1cb761abde96cf0 .
            varun Varun Gupta (Inactive) made changes -
            Affects Version/s 10.2 [ 14601 ]
            Affects Version/s 10.3 [ 22126 ]
            Affects Version/s 10.5 [ 23123 ]
            varun Varun Gupta (Inactive) made changes -
            Component/s Optimizer - Window functions [ 13502 ]
            Fix Version/s 10.2.35 [ 25022 ]
            Fix Version/s 10.3.26 [ 25021 ]
            Fix Version/s 10.4.16 [ 25020 ]
            Fix Version/s 10.5.7 [ 25019 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]

            varun please file an MDEV for the "complete" fix and link with this one

            psergei Sergei Petrunia added a comment - varun please file an MDEV for the "complete" fix and link with this one
            varun Varun Gupta (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 114120 ] MariaDB v4 [ 158423 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 162417

            People

              varun Varun Gupta (Inactive)
              rpizzi Rick Pizzi (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.