[MDEV-10615] archive.archive-big fails sporadically in buildbot, table marked as crashed Created: 2016-08-20  Updated: 2017-11-06  Resolved: 2017-11-05

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.0, 10.1, 10.2
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Elena Stepanova
Resolution: Cannot Reproduce Votes: 0
Labels: None

Attachments: Text File mysqld-log.txt     Text File stdio.txt    
Issue Links:
Blocks
blocks MDEV-7069 Fix buildbot failures in main server ... Stalled
Sprint: 5.5.55, 10.1.29

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest/builds/6155/steps/test_2/logs/stdio

archive.archive-big                      w2 [ fail ]
        Test ended at 2016-06-22 13:57:24
 
CURRENT_TEST: archive.archive-big
--- /mnt/buildbot/build/mariadb-10.0.26/mysql-test/suite/archive/archive-big.result	2016-06-22 11:58:23.000000000 +0300
+++ /mnt/buildbot/build/mariadb-10.0.26/mysql-test/suite/archive/archive-big.reject	2016-06-22 13:57:23.000000000 +0300
@@ -1,4 +1,6 @@
 CREATE TABLE t1(a BLOB) ENGINE=ARCHIVE;
+Warnings:
+Error	1194	Table 't1' is marked as crashed and should be repaired
 INSERT INTO t1 SELECT * FROM t1;
 INSERT INTO t1 SELECT * FROM t1;
 INSERT INTO t1 SELECT * FROM t1;
 
mysqltest: Result length mismatch

~15 occurrences over 1.5 years, so it's not a one-time coincidence.



 Comments   
Comment by Elena Stepanova [ 2017-02-18 ]

Another variation:

archive.archive-big                      w1 [ fail ]
        Test ended at 2016-06-26 06:43:58
 
CURRENT_TEST: archive.archive-big
mysqltest: At line 21: query 'INSERT INTO t1 SELECT * FROM t1' failed: 1030: Got error -1 "Internal error < 0 (Not system error)" from storage engine ARCHIVE
 
The result from queries just before the failure was:
< snip >
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
 
More results from queries before failure can be found in /mnt/buildbot/build/mariadb-10.2.1/mysql-test/var/1/log/archive-big.log
 
 - saving '/mnt/buildbot/build/mariadb-10.2.1/mysql-test/var/1/log/archive.archive-big/' to '/mnt/buildbot/build/mariadb-10.2.1/mysql-test/var/log/archive.archive-big/'
 
Retrying test archive.archive-big, attempt(2/3)...

This one certainly happens due to disk space problems.
However, there is no indication of disk space issues in the original failure (we don't have logs for it, but it's likely that other tests would have failed if there was the general problem – it didn't happen).

Comment by Elena Stepanova [ 2017-05-19 ]

We haven't had the problem (any failures with this test, actually) for almost a year, since 2016-06-26, so I think it's safe to assume for now that the problem has gone away.

Comment by Elena Stepanova [ 2017-09-22 ]

http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest-big/builds/1198

archive.archive-big                      w2 [ fail ]
        Test ended at 2017-09-13 00:30:39
 
CURRENT_TEST: archive.archive-big
mysqltest: At line 24: query 'INSERT INTO t1 SELECT * FROM t1' failed with wrong errno 1194: 'Table '/mnt/buildbot/build/mariadb-10.2.9/mysql-test/var/tmp/2/mysqld.1/#sql_3641_0' is marked as crashed and should be repaired', instead of 0...
 
The result from queries just before the failure was:
< snip >
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
 
More results from queries before failure can be found in /mnt/buildbot/build/mariadb-10.2.9/mysql-test/var/2/log/archive-big.log

Attached stdio and the error log, while we still have them.

Comment by Elena Stepanova [ 2017-11-05 ]

The initial corruption, which was happening at the beginning of the test, was most likely caused by some previous tests which would finish dirty. It stopped happening a while ago, possibly after we added more protection against dirty tests.

The second failure, "Got error -1 "Internal error < 0", is easily reproducible in conditions of limited disk/memory, e.g. on my machine it happens when i run the test in shm. Not much to be done about it, it's environmental problem. We don't run big tests in shm in buildbot anymore, so the error stopped occurring there.

The failure from the last comment remains a mystery.
I'm going to close it for now as an odd one-time artifact of some environmental problem. I've run numerous tests on the VM and locally and didn't get it; but more importantly, the logs don't make any sense.
stdio says

archive.archive-big                      w2 [ fail ]
        Test ended at 2017-09-13 00:30:39
 
CURRENT_TEST: archive.archive-big
mysqltest: At line 24: query 'INSERT INTO t1 SELECT * FROM t1' failed with wrong errno 1194: 'Table '/mnt/buildbot/build/mariadb-10.2.9/mysql-test/var/tmp/2/mysqld.1/#sql_3641_0' is marked as crashed and should be repaired', instead of 0...
....
 
worker[2] > Restart  - not started
 - skipping '/mnt/buildbot/build/mariadb-10.2.9/mysql-test/var/2/log/archive.archive-big/'
 
Retrying test archive.archive-big, attempt(2/3)...
 
***Warnings generated in error logs during shutdown after running tests: archive.archive-big
 
2017-09-13  0:30:37 140421130692352 [ERROR] mysqld: Table '/mnt/buildbot/build/mariadb-10.2.9/mysql-test/var/tmp/2/mysqld.1/#sql_3641_0' is marked as crashed and should be repaired
 
Test suite timeout! Terminating...

But the error log has no sign of the test being retried. Instead, it claims that another test was started right away:

CURRENT_TEST: archive.archive-big
2017-09-13  0:18:26 140421361023232 [Note] /mnt/buildbot/build/mariadb-10.2.9/sql/mysqld (mysqld 10.2.9-MariaDB-log) starting as process 13889 ...
2017-09-13  0:18:26 140421361023232 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162)
2017-09-13  0:18:26 140421361023232 [Warning] setrlimit could not change the size of core files to 'infinity';  We may not be able to generate a core file on signals
2017-09-13  0:18:26 140421361023232 [Note] Plugin 'SEQUENCE' is disabled.
...
2017-09-13  0:18:26 140421361023232 [Note] /mnt/buildbot/build/mariadb-10.2.9/sql/mysqld: ready for connections.
Version: '10.2.9-MariaDB-log'  socket: '/mnt/buildbot/build/mariadb-10.2.9/mysql-test/var/tmp/2/mysqld.1.sock'  port: 16020  MariaDB Server
2017-09-13  0:30:37 140421130692352 [ERROR] mysqld: Table '/mnt/buildbot/build/mariadb-10.2.9/mysql-test/var/tmp/2/mysqld.1/#sql_3641_0' is marked as crashed and should be repaired
CURRENT_TEST: tokudb.bf_create_select_hash_part
2017-09-13  0:30:40 140163285891328 [Note] /mnt/buildbot/build/mariadb-10.2.9/sql/mysqld (mysqld 10.2.9-MariaDB-log) starting as process 13992 ...

And everything was around the time when the suite was supposed to time out, and the other worker was running questionable rocksdb tests.
So, maybe it was a temporary lack of disk space (undetected in the logs available to us), or a problem with some other resource.

We'll have to see if it ever occurs again.

Generated at Thu Feb 08 07:43:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.