[MDEV-32046] Spider test instability caused by ER_NET_READ_ERROR Created: 2023-08-31  Updated: 2023-12-07

Status: Stalled
Project: MariaDB Server
Component/s: Storage Engine - Spider
Affects Version/s: 11.2
Fix Version/s: 11.2

Type: Bug Priority: Major
Reporter: Yuchen Pei Assignee: Yuchen Pei
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-31586 The test spider/bugfix.mdev_31463 has... Closed

 Description   

Setting it to critical because it pollutes CI and local mtr output. Might be similar to MDEV-31586

For example: https://buildbot.mariadb.org/#/builders/35/builds/27272

^ this failure is at a custom branch (bb-11.2-ycp-mdev-28856). First we need to verify that it reprod on a main version.

spider/bugfix.mdev_27240                 w6 [ fail ]
        Test ended at 2023-08-30 08:48:28
 
CURRENT_TEST: spider/bugfix.mdev_27240
mysqltest: At line 18: query 'LOCK TABLE tbl_a READ' failed with wrong errno ER_NET_READ_ERROR (1158): 'Got an error reading communication packets', instead of ER_CONNECT_TO_FOREIGN_DATA_SOURCE (1429)...
 
The result from queries just before the failure was:
for master_1
for child2
for child3
CREATE DATABASE auto_test_local;
USE auto_test_local;
CREATE TABLE tbl_a (a INT KEY) ENGINE=SPIDER;
SELECT a.z FROM tbl_a AS a,tbl_a b WHERE a.z=b.z;
ERROR 42S22: Unknown column 'a.z' in 'field list'
ALTER TABLE tbl_a CHANGE c c INT;
ERROR 42S22: Unknown column 'c' in 'tbl_a'
LOCK TABLE tbl_a READ;



 Comments   
Comment by Yuchen Pei [ 2023-09-13 ]

I am linking this issue to MDEV-28856 because the CI failures in the
description appear in bb-11.2-ycp-mdev-28856, development branch
with spider commits that are under review or have been reviewed,
with the changes for MDEV-28856 on top.

Actually, this issue is probably not related to MDEV-28856, as the
failures don't appear in bb-11.3-mdev-28856, and given it is failing
in multiple builds in bb-11.3-ycp-mdev-28856, it is probably some
other commits in the latter branch.

Comment by Yuchen Pei [ 2023-09-15 ]

Lowering the prio since it is not appearing in main branches.

Comment by Yuchen Pei [ 2023-09-20 ]

Hi julien.fritsch, I do not see it happening in 11.2, so I'm not
sure if it makes sense to add 11.2 as a fixversion.

The bug appeared in some of my custom branches, which contain unpushed
commits (probably the cause), or pushed but not merged commits
(probably not the cause).

I think it is ok for it to not appear in my queue atm, as it will
re-emerge if it still exists.

Comment by Yuchen Pei [ 2023-10-03 ]

Some observations:

Today I applied commits for MDEV-32157 to a custom branch
(bb-11.0-ycp-mdev-26247):

2f2fbabe24c MDEV-32157 MDEV-28856 Spider: Tests, documentation, small fixes and cleanups
13ca614e1b3 MDEV-32157 MDEV-28856 Spider: drop server in tests

Before the application, the branch seems to consistently fail
mdev_27240 when running mtr on spider suites:

./mysql-test/mtr --suite spider,spider/,spider//* --skip-test="spider/oracle.|./t\..*" --parallel=auto --big-test --force --max-test-fail=0

But after the application of these commits, the failure disappeared.

Comment by Yuchen Pei [ 2023-10-05 ]

I could not reproduce the failure in a debuggable way, and
ER_NET_READ_ERROR is a very rare and strange error. Given that the
failure is pretty consistent, unlike MDEV-31586 which displays
multiple errors, and the alternative error code is more a nuisance
than a problem, and is irrelevant to the actual bug in MDEV-27240, I
am going to add the alternative error code as an option in the test
file, like so:

976ab215416 upstream/bb-10.10-mdev-22979 MDEV-32046 Fix flaky error codes in spider/bugfix.mdev_27240

Comment by Yuchen Pei [ 2023-10-05 ]

Another one, even rarer failure, this time on spider/bugfix.mdev_27239.

https://buildbot.mariadb.org/#/builders/160/builds/23176

spider/bugfix.mdev_27239                 w12 [ fail ]
        Test ended at 2023-10-05 20:53:32
 
CURRENT_TEST: spider/bugfix.mdev_27239
--- /home/buildbot/ppc64le-ubuntu-2004/build/storage/spider/mysql-test/spider/bugfix/r/mdev_27239.result	2023-10-05 08:00:34.000000000 +0000
+++ /home/buildbot/ppc64le-ubuntu-2004/build/storage/spider/mysql-test/spider/bugfix/r/mdev_27239.reject	2023-10-05 20:53:31.877054131 +0000
@@ -9,10 +9,11 @@
 CREATE TABLE tbl_a (a INT) ENGINE=SPIDER;
 FLUSH TABLE tbl_a WITH READ LOCK;
 Warnings:
+Error	1158	Got an error reading communication packets
 Error	1429	Unable to connect to foreign data source: localhost
-Error	1429	Unable to connect to foreign data source: localhost
-Error	1429	Unable to connect to foreign data source: localhost
-Error	1429	Unable to connect to foreign data source: localhost
+Error	1429	Got an error reading communication packets
+Error	1429	Got an error reading communication packets
+Error	1429	Got an error reading communication packets
 BEGIN;
 DROP DATABASE auto_test_local;
 for master_1

It is worth further investigation, but not important enough to block
MDEV-22979, so I'm gonna lower the prio back to major and push the
workaround in the previous comment and leave this ticket open.

Updated on [2023-10-17 Tue]: this happened again, in the recent
10.6->10.10 push: https://buildbot.mariadb.org/#/builders/181/builds/23899

Generated at Thu Feb 08 10:28:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.