Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35775

Aria: Newly-inserted rows temporarily not found

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5.23
    • 10.6
    • Storage Engine - Aria
    • None
    • Debian GNU/Linux 11 (bullseye)
      perl DBI, DBD::mysql, DBD::MariaDB

    Description

      foreword:
      i have spent much time searching the googles, and the mariadb JIRA, but due to the nature of the problem, it has been hard to nail down how others might describe it, and therefore to determine what searches i might have failed to do.

      context:
      we have a distributed application (multiple apache instances running on multiple VMs) contacting a non-replicated MariaDB server. there is a particular pattern of the form:

      1. acquire context lock using GET_LOCK()
      2. INSERT row with id_column==0
      3. SELECT auto increment column value from newly-inserted row WHERE id_column=0
      4. UPDATE id_column=auto increment column WHERE id_column=0
      5. release context lock

      occasionally we will either fail to find the newly-inserted row during the SELECT, and therefore fail to update the id_column or we will fail to find the row for the UPDATE. i will focus on the SELECT case since that is the one that i have been able to capture a trace for.

      the challenge:
      this is very, very intermittent. i often have to run several hundred thousand iterations before a failure is triggered. i am 98% sure that the trigger is related to server load/demand, since when it does happen, it does so shortly after :00, :05, :10, :15, etc., ie when groups of cron scripts tend to be scheduled. but the problem does not necessarily trigger a problem in the tables affected by the cron scripts. i am also 99% sure that it is related to our switch from MyISAM to Aria. i cannot say that this never happened under MyISAM, but it happens as frequently as multiple times per week under Aria, and requires manual intervention to resolve, so i expect that i would have known it was occurring before the transition.

      the details:
      i'm sure that this will be lacking in many ways, since i have been hyper-focused on the troubleshooting process, and have to remember what additional context others might need. but i am hoping that i can provide enough detail for someone either to direct me toward an existing bug, or provide additional direction for what i should be testing and/or capturing.

      table structure:

      CREATE TABLE IF NOT EXISTS `$data_table` (
        `_comma_flag` int(10) unsigned NOT NULL,
        `_sq` int(10) unsigned NOT NULL AUTO_INCREMENT,
        `doc_id` int(10) unsigned NOT NULL,
        `field1` varchar(255),
        `created` int(10) unsigned,
        PRIMARY KEY (`doc_id`),
        UNIQUE KEY `_sq` (`_sq`)
      ) ENGINE=Aria CHARSET=latin1 COLLATE=latin1_swedish_ci PAGE_CHECKSUM=1
      

      i do recognize that this is not strictly an MVCE, but i basically extracted what i could from the code where the problem occurs without introducing changes that lead me to chasing a different problem than the original.

      test code:

      ...
      my $iterations = 5_000;
      foreach my $i ( 1..$iterations ) {
      ...
        my $doc_id = 0;
        # get lock
        my $sth_insert = $dbh_w->prepare_cached(
          qq[
              INSERT INTO $data_table
                  ( _comma_flag, doc_id, value1, created )
                VALUES
                  ( ?, ?, ?, ?  )
          ]);
        my $sth_select = $dbh_r->prepare_cached(
          qq[
              SELECT _sq FROM $data_table
              WHERE doc_id = ?
          ]);
       
        my $sth_update = $dbh_w->prepare_cached(
          qq[
              UPDATE $data_table SET doc_id = ?
              WHERE _sq = ?
          ]);
       
        # i am explicitly binding these values as integers because i previously thought that the behavior might have been related to:
        # https://dba.stackexchange.com/questions/293791/missing-an-inserted-row
        # https://jira.mariadb.org/browse/MDEV-25862
        # https://bugs.mysql.com/bug.php?id=103992
        # i have since converted doc_id from varchar to integer and bound the column explicitly, and can confirm that the behavior still occurs.
        $sth_insert->bind_param( 1, 0, DBI::SQL_INTEGER );
        $sth_insert->bind_param( 2, $doc_id, DBI::SQL_INTEGER );
        $sth_insert->bind_param( 3, 'field1' );
        $sth_insert->bind_param( 4, time() );
        $sth_insert->execute();
       
        $sth_select->bind_param( 1, $doc_id, DBI::SQL_INTEGER );
        $sth_select->execute();
        my $row = $sth_select->fetchrow_arrayref();
       
        if ( ! defined( $row ) ) {
      warn sprintf( "%.5f %s\n", Time::HiRes::time(), 'failed to find row the first time' );
          sleep 1;
       
          $sth_select->finish();
       
          $sth_select->bind_param( 1, $doc_id, DBI::SQL_INTEGER );
          $sth_select->execute();
          $row = $sth_select->fetchrow_arrayref();
          if ( defined( $row ) ) {
      warn sprintf( "%.5f %s\n", Time::HiRes::time(), 'found row the second time' );
            die $row->[0];
          } 
      warn sprintf( "%.5f %s\n", Time::HiRes::time(), 'failed to find row the second time' );
          die;
        }
       
        $doc_id = $row->[0];
        $sth_select->finish();
       
        $sth_update->bind_param( 1, $doc_id, DBI::SQL_INTEGER );
        $sth_update->bind_param( 2, $doc_id, DBI::SQL_INTEGER );
        my $ret = $sth_update->execute();
        $sth_update->finish();
       
        if ( ! $ret ) {
      warn sprintf( "%.5f %s\n", Time::HiRes::time(), 'failed to update row the first time' );
       
          $sth_select->bind_param( 1, $doc_id, DBI::SQL_INTEGER );
          $ret = $sth_select->execute();
          my $row = $sth_select->fetchrow_arrayref();
          die join( ', ', $row, @{$row || []} );
        }
       
        # release lock
      }
      

      the trace:

      1736042767.05906
      ...
      Binding parameters: INSERT INTO _test_index_only_race_condition
                  ( _comma_flag, doc_id,field1,created )
                VALUES
                  ( 0, 0, 'value1', '1736042767' )
          
       <- mariadb_st_execute_iv returning imp_sth->row_num 1
          <- execute= ( 1 ) [1 items] at insert-select-update-loop.pl line 96
      1736042767.06003
      1736042767.06004
      ...
      Binding parameters: SELECT _sq FROM _test_index_only_race_condition
              WHERE doc_id = 0
          
       <- mariadb_st_execute_iv returning imp_sth->row_num 0
          <- execute= ( '0E0' ) [1 items] at insert-select-update-loop.pl line 105
          -> fetchrow_arrayref for DBD::MariaDB::st (DBI::st=HASH(0x55b279e5ab48)~0x55b279e80258) thr#55b2788dc2a0
              -> mariadb_st_fetch
          <- fetchrow_arrayref= ( undef ) [1 items] at insert-select-update-loop.pl line 106
      1736042767.06044
      1736042767.06045 failed to find row the first time
      1736042768.06059
      Binding parameters: SELECT _sq FROM _test_index_only_race_condition
              WHERE doc_id = 0
          
       <- mariadb_st_execute_iv returning imp_sth->row_num 1
          <- execute= ( 1 ) [1 items] at insert-select-update-loop.pl line 116
          -> fetchrow_arrayref for DBD::MariaDB::st (DBI::st=HASH(0x55b279e5ab48)~0x55b279e80258) thr#55b2788dc2a0
              -> mariadb_st_fetch
                      mariadb_st_fetch for 55b279e802b8, chopblanks 0
              mariadb_st_fetch result set details
              imp_sth->result=55b279ec50e0
              mysql_num_fields=1
              mysql_num_rows=1
              mysql_affected_rows=1
              mariadb_st_fetch for 55b279e802b8, currow=1
              <- mariadb_st_fetch, 1 cols
          <- fetchrow_arrayref= ( [ 1418509 ] ) [1 items] row1 at insert-select-update-loop.pl line 117
      1736042768.06119
      1736042768.06120 found row the second time
      
      

      so, as you can see, the row was indeed inserted successfully, was initially not found, and then eventually was found (after an arbitrary one-second delay) using the same SELECT statement.

      i can provide the full(er) trace if you need it, but it is 19MB of output where only the last 50-or-so lines are relevant

      Attachments

        1. MDEV-35775.pl
          3 kB
        2. 50-server.cnf
          1 kB
        3. mariadb-print-defaults.txt
          1 kB
        4. show-variables.txt
          19 kB
        5. insert-select-update-loop.pl
          4 kB
        6. show-variables-1.txt
          19 kB
        7. show-status.txt
          14 kB
        8. insert-select-update-loop-20250113.pl
          4 kB
        9. race-3.txt.lzma
          4 kB
        10. insert-select-update-loop-20250116.pl
          4 kB
        11. race-condition-test.tgz
          2.01 MB
        12. insert-select-loop.sh
          4 kB
        13. select-max-82.txt
          4.39 MB
        14. select-max-035.txt
          5.29 MB

        Activity

          People

            monty Michael Widenius
            sherrardb Sherrard Burton
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.