Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23867

insert... select crash in compute_window_func

Details

    Description

      A customer is experiencing crashes when an ETL query is executed.
      See stack trace and details.

      Attachments

        Issue Links

          Activity

            Concern #2 is this code in mi_check.c:

              old_file=info->dfile;
              info->dfile=info->rec_cache.file;
            

            Does the code that follows this assume that info->dfile is seeked to some particular offset?

              if (sort_info->current_key)
              {
                key=info->lastkey+info->s->base.max_key_length;
                if ((error=(*info->s->read_rnd)(info,sort_param->record,info->lastpos,0)) &&
            	error != HA_ERR_RECORD_DELETED)
                {
                  mi_check_print_error(param,"Can't read record to be removed");
                  info->dfile=old_file;
                  DBUG_RETURN(1);
                }
             
                for (i=0 ; i < sort_info->current_key ; i++)
                {
                  uint key_length=_mi_make_key(info,i,key,sort_param->record,info->lastpos);
                  if (_mi_ck_delete(info,i,key,key_length))
                  {
            	mi_check_print_error(param,"Can't delete key %d from record to be removed",i+1);
            	info->dfile=old_file;
            	DBUG_RETURN(1);
                  }
                }
                if (sort_param->calc_checksum)
                  param->glob_crc-=(*info->s->calc_checksum)(info, sort_param->record);
              }
              error=flush_io_cache(&info->rec_cache) || (*info->s->delete_record)(info);
              info->dfile=old_file;				/* restore actual value */
            

            .. and I have the same question about sort_delete_record() in storage/maria/ma_check.c.
            It also has

              row_info->dfile.file= row_info->rec_cache.file;
              if (flush_io_cache(&row_info->rec_cache))
                DBUG_RETURN(1);
              ...
            

            Before the patch, row_info->rec_cache.file was "seeked" to somewhere, now it is not.

            Any thoughts? If it is not immediately clear, let's ask Monty (this is very old code so I dont think anybody else rememers).

            psergei Sergei Petrunia added a comment - Concern #2 is this code in mi_check.c: old_file=info->dfile; info->dfile=info->rec_cache.file; Does the code that follows this assume that info->dfile is seeked to some particular offset? if (sort_info->current_key) { key=info->lastkey+info->s->base.max_key_length; if ((error=(*info->s->read_rnd)(info,sort_param->record,info->lastpos,0)) && error != HA_ERR_RECORD_DELETED) { mi_check_print_error(param, "Can't read record to be removed" ); info->dfile=old_file; DBUG_RETURN(1); }   for (i=0 ; i < sort_info->current_key ; i++) { uint key_length=_mi_make_key(info,i,key,sort_param->record,info->lastpos); if (_mi_ck_delete(info,i,key,key_length)) { mi_check_print_error(param, "Can't delete key %d from record to be removed" ,i+1); info->dfile=old_file; DBUG_RETURN(1); } } if (sort_param->calc_checksum) param->glob_crc-=(*info->s->calc_checksum)(info, sort_param->record); } error=flush_io_cache(&info->rec_cache) || (*info->s->delete_record)(info); info->dfile=old_file; /* restore actual value */ .. and I have the same question about sort_delete_record() in storage/maria/ma_check.c. It also has row_info->dfile.file= row_info->rec_cache.file; if (flush_io_cache(&row_info->rec_cache)) DBUG_RETURN(1); ... Before the patch, row_info->rec_cache.file was "seeked" to somewhere, now it is not. Any thoughts? If it is not immediately clear, let's ask Monty (this is very old code so I dont think anybody else rememers).

            No problem with mi_check or ma_check(), They are taking the file descriptor of the newly re-created data file and removing a duplicate row from it. The remove-row code is using pread/pwrite and thus is not affected by the position of the IO_CACHE file descriptor and will not affect the IO_CACHE code

            monty Michael Widenius added a comment - No problem with mi_check or ma_check(), They are taking the file descriptor of the newly re-created data file and removing a duplicate row from it. The remove-row code is using pread/pwrite and thus is not affected by the position of the IO_CACHE file descriptor and will not affect the IO_CACHE code

            Take-aways from yesterday discussion:

            • We cannot change mysql_file_read -> mysql_file_pread everywhere (as sometimes IO_CACHE is used to read from FIFOs)
            • The impact of the "use pread" approach is much larger than we anticipated.

            Due to this:

            • Lets switch to the older version of the patch, with seeks and updating seek_not_done for other slaves.
            • Then, we will make the larger patch which uses pread where appropriate and push that to a development version.
            psergei Sergei Petrunia added a comment - Take-aways from yesterday discussion: We cannot change mysql_file_read -> mysql_file_pread everywhere (as sometimes IO_CACHE is used to read from FIFOs) The impact of the "use pread" approach is much larger than we anticipated. Due to this: Lets switch to the older version of the patch, with seeks and updating seek_not_done for other slaves. Then, we will make the larger patch which uses pread where appropriate and push that to a development version.

            Ok to push the "conservative" fix for aec8072f99ef4b1a4a966692a1cb761abde96cf0 .

            psergei Sergei Petrunia added a comment - Ok to push the "conservative" fix for aec8072f99ef4b1a4a966692a1cb761abde96cf0 .

            varun please file an MDEV for the "complete" fix and link with this one

            psergei Sergei Petrunia added a comment - varun please file an MDEV for the "complete" fix and link with this one

            People

              varun Varun Gupta (Inactive)
              rpizzi Rick Pizzi (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.