[MDEV-20457] Develop some test and/or tools for RQG for covering maybe faulty maintenance of indexes Created: 2019-08-30  Updated: 2020-06-17  Resolved: 2020-06-17

Status: Closed
Project: MariaDB Server
Component/s: Tests
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Matthias Leich Assignee: Matthias Leich
Resolution: Fixed Votes: 0
Labels: None


 Description   

During fixing MDEV-20066 it turned out that
- the basic mistake in code leads to some wrong maintenance of the clustered index
- at least the SQL revealing the defect isn't that complicated
  CHECK TABLE or
  enforce to get result sets based on SELECT ... FORCE INDEX ... etc. and compare them
 
The Standard RQG Testsuite for checking InnoDB is able to catch MDEV-20066 (assert
reported on top of bug report) but that is rather some of the probably various sideeffects
of the basic mistake. The suite might be capable to catch also the basic bug direct.
But caused by the frequent table rebuilds caused by DDL the likelihood to catch that
is very low.
Some of the RQG optimizer tests might be capable to catch MDEV-20066 or the base bug too. But running them are a not acceptable overhead  for covering InnoDB functionality only.
 
So the task here is:
1. Figure out a way to catch the basic bug below MDEV-20066 with RQG,  short total runtime
     and small manual analysis.
2. Modify that so that ANY faulty maintenance of clustered index or secondary indexes
    get caught.
3. Extend the Standard RQG Testsuite
     



 Comments   
Comment by Matthias Leich [ 2019-08-31 ]

Some preliminary results (this text is under construction):
1. Especially RQG concurrency tests usually do not check result sets/messages returned by the server.
    Reason for that:
    The shape of result sets cannot be predicted because there is concurrent data modifying DML up till DDL.
    The same applies to some serious extend for success and fail of single SQL's.
    Some queries get killed, others fail when the server resolves a deadlock or some fail because an object
    like table/column/key/... does for some limited timespan already exists or does no more exists.
    Hence the range of most probably to be tolerated errors returned by the server is partially big.
2. CHECK TABLE  for InnoDB is capable to detect certain (but most probably not all) structural defects.
     But it has some properties which are quite problematic to exploit or handle right inside RQG.
     - It could fail from a broad range of valid reasons like other SQL (see 1.) too.
     - When detecting some structural defect when executing CHECK TABLE the server seems to write all time a 
       corresponding entry in the server error log and than
        - shutdown to prevent the error propagation in case the defect is classified as that serious
        or
        - mark the table or index ... as corrupt, return a warning and ERROR_CODE 0 in case the defect is classified as not that serious
          Exact this happens for example in the MTR based test added by Marko to MDEV-20066.
          CHECK TABLE t1;
          Table   Op      Msg_type        Msg_text
           test.t1 check   Warning InnoDB: Index 'c' contains 1 entries, should be 0.
           test.t1 check   error   Corrupt
                   no shutdown to prevent error propagation
           But unfortunately just here the defect is extreme serious.
3.  The timespan between maybe causing a structural defect and detecting it via CHECK TABLE or similar
      is critical.  In case there is some other SQL between (maybe concurrent session) than we have some
      high likelihood to end up with some assert caused instead (see MDEV-20066 bug description on top).
4. The detection of the structural defect via CHECK TABLE must be immediate followed by some abort of
     the RQG test. Otherwise SQL causing an assert could be between or we waste machine time ...
5. Periodic Reporters work all 10s. This timespan is too long except we reduce the amount of SQL between
     which than wastes resources ...
6. Validators get executed after each query (maybe short statement sequence). That is sufficient frequent
     but CHECK TABLE run by that validator commits most probably any ongoing transaction.
     And this effect might be too intrusive.
    When going with threads > 1 we will get the additional effect that the heavy locking caused by
    CHECK TABLE will have some strong impact on the concurrent DDL/DML.
    heavy locking.

Comment by Matthias Leich [ 2019-09-10 ]

Weaknesses fixed/checks extended
1. The errors 1712 (ER_INDEX_CORRUPT) and 1817 (ER_INNODB_INDEX_CORRUPT) get now mapped to the error class STATUS_DATABASE_CORRUPTION.
     Therefore RQG will bring the test quite fast to an end.
     Expected impact:
     The fraction of tests which end with some bad status != STATUS_DATABASE_CORRUPTION even though having e hit a corruption should become rare.
 2. Extend the execution of statements and handling of results sets inside of lib/GenTest/Executor/MySQL.pm.
      In case the execution of some statement passed and it was a CHECK TABLE than check the result set. In case the information within the result set
      points to data corruption inside of an InnoDB table than set STATUS_DATABASE_CORRUPTION.
     The standard reaction of RQG will be than:
     Abort immediate the test, cleanup and report STATUS_DATABASE_CORRUPTION. 

Comment by Matthias Leich [ 2020-06-17 ]

My experimental version of RQG contains in lib/GenTest/Executor/MySQL.pm since 2019-09
code which checks if the just executed query was a CHECK TABLE.
If yes than the result set is analysed. In case that result set contains "Warning"
and the string "InnoDB" than the status STATUS_DATABASE_CORRUPTION will be returned.
Example for some result with warning (I added '>' and '<' as delimiters for the column values)
>test.t1<>check<>Warning<>InnoDB: Index 'c' contains 1 entries, should be 0.<-

In case the string 'InnoDB' does not occur such warnings get ignored in order to prevent
false alarms in scenarios like:
A server is under heavy DML load, it gets crashed intentionally than restarted and than
some storage engines != InnoDB might report warnings that some table needs to get repaired.

Comment by Matthias Leich [ 2020-06-17 ]

https://jira.mariadb.org/browse/MDEV-22924 Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.

Generated at Thu Feb 08 08:59:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.