[MDEV-33174] nondeterministic test results in spider/bugfix.self_reference_multi Created: 2024-01-04  Updated: 2024-02-08  Resolved: 2024-02-07

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Spider
Affects Version/s: 10.5, 10.6, 10.11
Fix Version/s: 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2

Type: Bug Priority: Critical
Reporter: Yuchen Pei Assignee: Yuchen Pei
Resolution: Fixed Votes: 0
Labels: None


 Description   

Happens quite often:

https://buildbot.mariadb.net/ci/reports/cross_reference#branch=&revision=&platform=&fail_name=spider/bugfix.self_reference_multi&fail_variant=&fail_info_full=&typ=&info=&dt=&limit=100&fail_info_short=

Could probably be quickly fixed by a regex replace of the query result - doesn't really matter on which node of the loop the loop is reported - all these outcomes are correct.

It would be good to find out why this happens too.



 Comments   
Comment by Yuchen Pei [ 2024-01-29 ]

The reason that in the first select (from t0) it should print

ERROR HY000: An infinite loop is detected when opening table test.t0

is because it queries for table status of the remote table of t0 which
is t1, and then that of t1 which is t2, and that of t2 which is t0, by
which time it detects the self-reference (of t0) and reports the
error.

The reason that in the second and third selects (from t1 and t2) it
prints the same message is because it reuses the error from the first
select during ha_spider::info because not enough time has passed
to try querying for table stats from remote table again (see the line
marked with an arrow below)

10.5 81d01855f

int ha_spider::info(
  uint flag
) {
//  [... 48 lines elided]
  if (flag &
    (HA_STATUS_TIME | HA_STATUS_CONST | HA_STATUS_VARIABLE | HA_STATUS_AUTO))
  {
//  [... 14 lines elided]
    if (!share->sts_init)
    {
      pthread_mutex_lock(&share->sts_mutex);
      if (share->sts_init)
        pthread_mutex_unlock(&share->sts_mutex);
      else {
        if ((spider_init_error_table =
          spider_get_init_error_table(wide_handler->trx, share, FALSE)))
        {
          DBUG_PRINT("info",("spider diff=%f",
            difftime(tmp_time, spider_init_error_table->init_error_time)));
          if (difftime(tmp_time,
            spider_init_error_table->init_error_time) <
            spider_param_table_init_error_interval())
          {
//  [... 11 lines elided]
            if (spider_init_error_table->init_error_with_message)
              my_message(spider_init_error_table->init_error,   // <-
                spider_init_error_table->init_error_msg, MYF(0));
            DBUG_RETURN(check_error_mode(spider_init_error_table->init_error));
          }
        }
//  [... 6 lines elided]
      }
    }
//  [... 262 lines elided]
}

It is not clear why the error fails nondeterministically, as I cannot
reproduce this failure locally. Therefore we simply replace the query
result when the error is reported on any of the three nodes of the
cycle.

Comment by Yuchen Pei [ 2024-01-29 ]

Hi holyfoot, ptal thanks

upstream/bb-10.5-mdev-33174 b8acdfe37cf2447d41f162abf0468f1bcbc9b28e
MDEV-33174 Fixing nondeterministic self-referencing test result

Comment by Alexey Botchkov [ 2024-02-05 ]

ok to push.

Comment by Yuchen Pei [ 2024-02-07 ]

thanks for the review. pushed d40eaf2dab66f76e1e3749ddb863ad5bf32772da to 10.5, after a 3 hour wait on a amd64-ubuntu-2204-debug-ps rebuild...

Generated at Thu Feb 08 10:36:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.