[MDEV-22512] innodb_zip.recover, innodb_gis.types failed in buildbot with server failing to restart Created: 2020-05-09  Updated: 2022-11-08  Resolved: 2022-11-08

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: 10.3.38, 10.4.28

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: not-10.5

Attachments: File mysqld.1.err.1.gz    
Issue Links:
Duplicate
duplicates MDEV-21351 Replace recv_sys.heap with list of bu... Closed
Relates
relates to MDEV-21351 Replace recv_sys.heap with list of bu... Closed
relates to MDEV-21826 Recovery failure : loop of Read redo ... Closed
relates to MDEV-23979 Corrupted page breaks recovery in inn... Open
relates to MDEV-19176 Do not run out of InnoDB buffer pool ... Closed

 Description   

Note: Possibly it has already been fixed, but I don't see a clear indication of it.

http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/3970/steps/mtr_centos_mix/logs/stdio

bb-10.2-release 75dcb74da652d8ba504b6906b2aa2ae5 2020-05-06 16:30:28

innodb_zip.recover 'innodb'              w1 [ fail ]
        Test ended at 2020-05-06 14:52:03
 
CURRENT_TEST: innodb_zip.recover
mysqltest: In included file "./include/wait_until_connected_again.inc": 
included from ./include/start_mysqld.inc at line 18:
included from ./include/restart_mysqld.inc at line 11:
included from /usr/share/mysql-test/suite/innodb_zip/t/recover.test at line 29:
At line 23: Server failed to restart
 
The result from queries just before the failure was:
#
# MDEV-12720 recovery fails with "Generic error"
# for ROW_FORMAT=compressed
#
CREATE TABLE a(i INT PRIMARY KEY AUTO_INCREMENT, s VARCHAR(255)) ENGINE=InnoDB
ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;
BEGIN;
insert into a(i) select null;
insert into a select null, uuid() from a a, a b, a c;
insert into a select null, uuid() from a a, a b, a c;
insert into a select null, uuid() from a a, a b, a c;
SET GLOBAL innodb_flush_log_at_trx_commit=1;
COMMIT;

Last words in the log:

2020-05-07  9:42:08 0 [Note] InnoDB: Ignoring data file './test/t1.ibd' with space ID 17, since the redo log references ./test/t1.ibd with space ID 16.
2020-05-07  9:42:08 0 [Note] InnoDB: Starting a batch to recover 332 pages from redo log.
2020-05-07  9:42:09 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 381 OS file reads, 78 OS file writes, 78 OS fsyncs.



 Comments   
Comment by Elena Stepanova [ 2020-05-09 ]

Another one, on bb-10.3-release:
http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/3984

tokudb_bugs.xa-3 'innodb'                w1 [ fail ]
        Test ended at 2020-05-07 09:50:42
 
CURRENT_TEST: tokudb_bugs.xa-3
mysqltest: In included file "./include/wait_until_connected_again.inc": 
included from /usr/share/mysql-test/plugin/tokudb/tokudb_bugs/t/xa-3.test at line 82:
At line 23: Server failed to restart
 
The result from queries just before the failure was:
< snip >
INSERT INTO t2 VALUES (2);
SET SESSION debug_dbug="d,crash_commit_after_prepare";
COMMIT;
ERROR HY000: Lost connection to MySQL server during query
select * from t1;
a
select * from t2;
b
drop table t1, t2;
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=innodb;
CREATE TABLE t2 (b INT PRIMARY KEY) ENGINE=tokudb;
BEGIN;
SELECT @@log_bin;
@@log_bin
0
INSERT INTO t1 VALUES (1);
INSERT INTO t2 VALUES (2);
SET SESSION debug_dbug="d,crash_commit_after";
COMMIT;
ERROR HY000: Lost connection to MySQL server during query

Comment by Elena Stepanova [ 2020-05-09 ]

http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/3758

10.4 497a4169df5cd9837efd7a32b16426c9

tokudb_bugs.xa-4 'innodb'                w3 [ fail ]
        Test ended at 2020-03-25 17:48:09
 
CURRENT_TEST: tokudb_bugs.xa-4
mysqltest: In included file "./include/wait_until_connected_again.inc": 
included from /usr/share/mysql-test/plugin/tokudb/tokudb_bugs/t/xa-4.test at line 59:
At line 34: Server failed to restart
 
The result from queries just before the failure was:
< snip >
INSERT INTO t2 VALUES (2);
SET SESSION debug_dbug="d,crash_commit_before";
COMMIT;
ERROR HY000: Lost connection to MySQL server during query
select * from t1;
a
select * from t2;
b
drop table t1, t2;
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=innodb;
CREATE TABLE t2 (b INT PRIMARY KEY) ENGINE=tokudb;
BEGIN;
SELECT @@log_bin;
@@log_bin
0
INSERT INTO t1 VALUES (1);
INSERT INTO t2 VALUES (2);
SET SESSION debug_dbug="d,crash_commit_after";
COMMIT;
ERROR HY000: Lost connection to MySQL server during query

Comment by Marko Mäkelä [ 2020-05-11 ]

For the latest failure, we see the following in the server error log:

10.4 497a4169df5cd9837efd7a32b16426c92e3da882

2020-03-25 17:39:39 0 [Note] InnoDB: Starting a batch to recover 256 pages from redo log.
2020-03-25 17:39:40 0 [Note] InnoDB: Starting a batch to recover 378 pages from redo log.
2020-03-25 17:39:40 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 675 OS file reads, 44 OS file writes, 44 OS fsyncs.
CURRENT_TEST: tokudb_bugs.xa-4
2020-03-25 17:48:10 0 [Note] /usr/sbin/mysqld (mysqld 10.4.13-MariaDB-debug-log) starting as process 1664 ...

That is, the recovery appears to have been stuck for 8.5 minutes. The stated revision does include MDEV-21826. In 10.5, that fix was improved further by MDEV-21351.

I would hope that this bug has been fixed in 10.5.1 by MDEV-21351, but I will leave this report open for a while. In older versions, a possible workaround could be to try to start up the server with a larger buffer pool size.

Comment by Marko Mäkelä [ 2020-05-12 ]

Here is a recovery failure of innodb_gis.types on 10.2:

10.2 0e6a5786d4fc0e96c34ccee2c59a707e3debb3b3

CURRENT_TEST: innodb_gis.types
mysqltest: In included file "./include/wait_until_connected_again.inc": 
included from ./include/start_mysqld.inc at line 18:
included from ./include/restart_mysqld.inc at line 11:
included from /usr/share/mysql-test/suite/innodb_gis/t/types.test at line 74:
At line 23: Server failed to restart
 
The result from queries just before the failure was:
< snip >
EXPLAIN SELECT ST_AsText(g) FROM t_wl6455 WHERE g = POINT(2,2);
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	t_wl6455	ref	i_p	i_p	12	const	2	#
SELECT ST_AsText(g) FROM t_wl6455 WHERE g = POINT(2,2);
ST_AsText(g)
POINT(2 2)
SET AUTOCOMMIT = 0;
INSERT INTO t_wl6455 VALUES(10, POINT(10,10));
SELECT ST_AsText(g) FROM t_wl6455 WHERE g = POINT(10,10);
ST_AsText(g)
POINT(10 10)
ROLLBACK;
SELECT ST_AsText(g) FROM t_wl6455 WHERE g = POINT(10,10);
ST_AsText(g)
INSERT INTO t_wl6455 VALUES(10, POINT(10,10));
COMMIT;
FLUSH TABLES;
INSERT INTO t_wl6455 VALUES(11, POINT(11,11));
BEGIN;
INSERT INTO t_wl6455 VALUES(1, POINT(1,1));
2020-05-12  9:56:25 281473290790544 [Note] InnoDB: Starting crash recovery from checkpoint LSN=23757732
2020-05-12  9:56:25 281473290790544 [Note] InnoDB: Starting a batch to recover 407 pages from redo log.
2020-05-12  9:56:25 281473290790544 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 432 OS file reads, 16 OS file writes, 16 OS fsyncs.
CURRENT_TEST: innodb_gis.types
2020-05-12 10:04:48 281473685382800 [Note] /usr/sbin/mysqld (mysqld 10.2.33-MariaDB-log) starting as process 7729 ...

The test does kill and restart as follows:

BEGIN;
INSERT INTO t_wl6455 VALUES(1, POINT(1,1));
 
--let $shutdown_timeout=0
--source include/restart_mysqld.inc
 
CHECK TABLE t_wl6455;

Comment by Marko Mäkelä [ 2020-10-19 ]

In 10.5, the buildbot cross-reference does not find this kind of failures at all. I believe that the problem was fixed by MDEV-21351 in 10.5.1 (and to lesser extent MDEV-19176, which did not completely fix the problem in earlier versions).

Comment by Aleksey Midenkov [ 2020-12-21 ]

10.2 fail:

innodb_gis.types 'innodb'                w1 [ fail ]
        Test ended at 2020-12-12 07:26:34
 
CURRENT_TEST: innodb_gis.types
mysqltest: In included file "./include/wait_until_connected_again.inc": 
included from ./include/start_mysqld.inc at line 18:
included from ./include/restart_mysqld.inc at line 11:
included from /usr/share/mysql/mysql-test/suite/innodb_gis/t/types.test at line 74:
At line 23: Server failed to restart

10.3 fail:

innodb_gis.types 'innodb'                w2 [ fail ]
        Test ended at 2020-12-03 16:53:16
 
CURRENT_TEST: innodb_gis.types
mysqltest: In included file "./include/wait_until_connected_again.inc": 
included from ./include/start_mysqld.inc at line 18:
included from ./include/restart_mysqld.inc at line 11:
included from /usr/share/mysql-test/suite/innodb_gis/t/types.test at line 74:
At line 23: Server failed to restart

More fails...

Please disable or make another workaround for pre-10.5.

Comment by Elena Stepanova [ 2022-04-16 ]

Still fails and not disabled.

Comment by Marko Mäkelä [ 2022-11-08 ]

Fixed by disabling the tests in 10.3 and 10.4 (but not 10.5 or later).

Generated at Thu Feb 08 09:15:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.