Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.5.23
-
None
-
Debian GNU/Linux 11 (bullseye)
perl DBI, DBD::mysql, DBD::MariaDB
Description
foreword:
i have spent much time searching the googles, and the mariadb JIRA, but due to the nature of the problem, it has been hard to nail down how others might describe it, and therefore to determine what searches i might have failed to do.
context:
we have a distributed application (multiple apache instances running on multiple VMs) contacting a non-replicated MariaDB server. there is a particular pattern of the form:
- acquire context lock using GET_LOCK()
- INSERT row with id_column==0
- SELECT auto increment column value from newly-inserted row WHERE id_column=0
- UPDATE id_column=auto increment column WHERE id_column=0
- release context lock
occasionally we will either fail to find the newly-inserted row during the SELECT, and therefore fail to update the id_column or we will fail to find the row for the UPDATE. i will focus on the SELECT case since that is the one that i have been able to capture a trace for.
the challenge:
this is very, very intermittent. i often have to run several hundred thousand iterations before a failure is triggered. i am 98% sure that the trigger is related to server load/demand, since when it does happen, it does so shortly after :00, :05, :10, :15, etc., ie when groups of cron scripts tend to be scheduled. but the problem does not necessarily trigger a problem in the tables affected by the cron scripts. i am also 99% sure that it is related to our switch from MyISAM to Aria. i cannot say that this never happened under MyISAM, but it happens as frequently as multiple times per week under Aria, and requires manual intervention to resolve, so i expect that i would have known it was occurring before the transition.
the details:
i'm sure that this will be lacking in many ways, since i have been hyper-focused on the troubleshooting process, and have to remember what additional context others might need. but i am hoping that i can provide enough detail for someone either to direct me toward an existing bug, or provide additional direction for what i should be testing and/or capturing.
table structure:
CREATE TABLE IF NOT EXISTS `$data_table` ( |
`_comma_flag` int(10) unsigned NOT NULL, |
`_sq` int(10) unsigned NOT NULL AUTO_INCREMENT, |
`doc_id` int(10) unsigned NOT NULL, |
`field1` varchar(255), |
`created` int(10) unsigned, |
PRIMARY KEY (`doc_id`), |
UNIQUE KEY `_sq` (`_sq`) |
) ENGINE=Aria CHARSET=latin1 COLLATE=latin1_swedish_ci PAGE_CHECKSUM=1 |
i do recognize that this is not strictly an MVCE, but i basically extracted what i could from the code where the problem occurs without introducing changes that lead me to chasing a different problem than the original.
test code:
...
|
my $iterations = 5_000; |
foreach my $i ( 1..$iterations ) { |
...
|
my $doc_id = 0; |
# get lock |
my $sth_insert = $dbh_w->prepare_cached( |
qq[
|
INSERT INTO $data_table |
( _comma_flag, doc_id, value1, created )
|
VALUES |
( ?, ?, ?, ? )
|
]);
|
my $sth_select = $dbh_r->prepare_cached( |
qq[
|
SELECT _sq FROM $data_table |
WHERE doc_id = ?
|
]);
|
 |
my $sth_update = $dbh_w->prepare_cached( |
qq[
|
UPDATE $data_table SET doc_id = ? |
WHERE _sq = ?
|
]);
|
 |
# i am explicitly binding these values as integers because i previously thought that the behavior might have been related to: |
# https://dba.stackexchange.com/questions/293791/missing-an-inserted-row |
# https://jira.mariadb.org/browse/MDEV-25862 |
# https://bugs.mysql.com/bug.php?id=103992 |
# i have since converted doc_id from varchar to integer and bound the column explicitly, and can confirm that the behavior still occurs. |
$sth_insert->bind_param( 1, 0, DBI::SQL_INTEGER ); |
$sth_insert->bind_param( 2, $doc_id, DBI::SQL_INTEGER ); |
$sth_insert->bind_param( 3, 'field1' ); |
$sth_insert->bind_param( 4, time() ); |
$sth_insert->execute(); |
 |
$sth_select->bind_param( 1, $doc_id, DBI::SQL_INTEGER ); |
$sth_select->execute(); |
my $row = $sth_select->fetchrow_arrayref(); |
 |
if ( ! defined( $row ) ) { |
warn sprintf( "%.5f %s\n", Time::HiRes::time(), 'failed to find row the first time' ); |
sleep 1; |
 |
$sth_select->finish(); |
 |
$sth_select->bind_param( 1, $doc_id, DBI::SQL_INTEGER ); |
$sth_select->execute(); |
$row = $sth_select->fetchrow_arrayref(); |
if ( defined( $row ) ) { |
warn sprintf( "%.5f %s\n", Time::HiRes::time(), 'found row the second time' ); |
die $row->[0]; |
}
|
warn sprintf( "%.5f %s\n", Time::HiRes::time(), 'failed to find row the second time' ); |
die; |
}
|
 |
$doc_id = $row->[0]; |
$sth_select->finish(); |
 |
$sth_update->bind_param( 1, $doc_id, DBI::SQL_INTEGER ); |
$sth_update->bind_param( 2, $doc_id, DBI::SQL_INTEGER ); |
my $ret = $sth_update->execute(); |
$sth_update->finish(); |
 |
if ( ! $ret ) { |
warn sprintf( "%.5f %s\n", Time::HiRes::time(), 'failed to update row the first time' ); |
 |
$sth_select->bind_param( 1, $doc_id, DBI::SQL_INTEGER ); |
$ret = $sth_select->execute(); |
my $row = $sth_select->fetchrow_arrayref(); |
die join( ', ', $row, @{$row || []} ); |
}
|
 |
# release lock |
}
|
the trace:
1736042767.05906
|
...
|
Binding parameters: INSERT INTO _test_index_only_race_condition
|
( _comma_flag, doc_id,field1,created )
|
VALUES
|
( 0, 0, 'value1', '1736042767' )
|
|
<- mariadb_st_execute_iv returning imp_sth->row_num 1
|
<- execute= ( 1 ) [1 items] at insert-select-update-loop.pl line 96
|
1736042767.06003
|
1736042767.06004
|
...
|
Binding parameters: SELECT _sq FROM _test_index_only_race_condition
|
WHERE doc_id = 0
|
|
<- mariadb_st_execute_iv returning imp_sth->row_num 0
|
<- execute= ( '0E0' ) [1 items] at insert-select-update-loop.pl line 105
|
-> fetchrow_arrayref for DBD::MariaDB::st (DBI::st=HASH(0x55b279e5ab48)~0x55b279e80258) thr#55b2788dc2a0
|
-> mariadb_st_fetch
|
<- fetchrow_arrayref= ( undef ) [1 items] at insert-select-update-loop.pl line 106
|
1736042767.06044
|
1736042767.06045 failed to find row the first time
|
1736042768.06059
|
Binding parameters: SELECT _sq FROM _test_index_only_race_condition
|
WHERE doc_id = 0
|
|
<- mariadb_st_execute_iv returning imp_sth->row_num 1
|
<- execute= ( 1 ) [1 items] at insert-select-update-loop.pl line 116
|
-> fetchrow_arrayref for DBD::MariaDB::st (DBI::st=HASH(0x55b279e5ab48)~0x55b279e80258) thr#55b2788dc2a0
|
-> mariadb_st_fetch
|
mariadb_st_fetch for 55b279e802b8, chopblanks 0
|
mariadb_st_fetch result set details
|
imp_sth->result=55b279ec50e0
|
mysql_num_fields=1
|
mysql_num_rows=1
|
mysql_affected_rows=1
|
mariadb_st_fetch for 55b279e802b8, currow=1
|
<- mariadb_st_fetch, 1 cols
|
<- fetchrow_arrayref= ( [ 1418509 ] ) [1 items] row1 at insert-select-update-loop.pl line 117
|
1736042768.06119
|
1736042768.06120 found row the second time
|
|
so, as you can see, the row was indeed inserted successfully, was initially not found, and then eventually was found (after an arbitrary one-second delay) using the same SELECT statement.
i can provide the full(er) trace if you need it, but it is 19MB of output where only the last 50-or-so lines are relevant