We still seem to have the ‘purge fails to run’ problem that was originally filed as MDEV-11802. With MDEV-12288 in 10.3, it can be observed also as a result of INSERT activity.
thiru, can you try to repeat this and determine what causes the failure to run? I think I have occasionally seen this on buildbot for other tests as well.
Marko Mäkelä
added a comment - We still seem to have the ‘purge fails to run’ problem that was originally filed as MDEV-11802 . With MDEV-12288 in 10.3, it can be observed also as a result of INSERT activity.
thiru , can you try to repeat this and determine what causes the failure to run? I think I have occasionally seen this on buildbot for other tests as well.
I believe that this particular failure can be attributed to the slowness of the system. The performance optimization in MDEV-18878 should help in other cases, but not this one, because the table is not being dropped, rebuilt, or discarded.
Marko Mäkelä
added a comment - I believe that this particular failure can be attributed to the slowness of the system. The performance optimization in MDEV-18878 should help in other cases, but not this one, because the table is not being dropped, rebuilt, or discarded.
Unfortunately it doesn't help much. Tests should be protected from moderate slowness, as there can always be circumstances when slow builders become even slower – not only in buildbot, but also in the build process performed by distributions (it was a big problem with Debian builds).
The only exception is when a builder is impossibly slow beyond any reason, so that tests fail massively, in which case the builder itself needs to be fixed. I don't think it applies here, though.
Elena Stepanova
added a comment - Unfortunately it doesn't help much. Tests should be protected from moderate slowness, as there can always be circumstances when slow builders become even slower – not only in buildbot, but also in the build process performed by distributions (it was a big problem with Debian builds).
The only exception is when a builder is impossibly slow beyond any reason, so that tests fail massively, in which case the builder itself needs to be fixed. I don't think it applies here, though.
I extended the wait_all_purged.inc timeout from 30 to 60 seconds. That should reduce the probability of failures. A 60-second wait was enough to hide MDEV-18878 on the affected platform.
Marko Mäkelä
added a comment - - edited I extended the wait_all_purged.inc timeout from 30 to 60 seconds. That should reduce the probability of failures. A 60-second wait was enough to hide MDEV-18878 on the affected platform.
MDEV-22958 has been filed for the same problem, and I think that the solution is to implement a server-side wait for purge, by introducing a new SET GLOBAL variable, whose update trigger would implement the wait. The wait time can be limited by a statement timeout.
Marko Mäkelä
added a comment - MDEV-22958 has been filed for the same problem, and I think that the solution is to implement a server-side wait for purge, by introducing a new SET GLOBAL variable, whose update trigger would implement the wait. The wait time can be limited by a statement timeout.
We still seem to have the ‘purge fails to run’ problem that was originally filed as
MDEV-11802. WithMDEV-12288in 10.3, it can be observed also as a result of INSERT activity.thiru, can you try to repeat this and determine what causes the failure to run? I think I have occasionally seen this on buildbot for other tests as well.