[MDEV-29091] Wait events do not distinguish waiting-for-I/O and waiting-for-locks Created: 2022-07-12  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: Locking, Performance Schema, Server, Storage Engine - InnoDB
Affects Version/s: 10.6.8, 10.7.3, 10.7.4, 10.8.2, 10.8.3, 10.9.1
Fix Version/s: 10.5, 10.6, 10.11, 11.0, 11.1, 11.2

Type: Bug Priority: Major
Reporter: Haidong Ji Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

In MariaDB (as well as in MySQL 8), we have no wait event instrumentation for record locks and for table locks.

As such, table and row lock waits are reported incorrectly as "wait/io/table/sql/handler".

This makes performance troubleshooting impossible for customers.

The expected outcome is to create 2 more instruments: row lock and table lock. These 2 types of lock waits can then be measured inside performance_schema. Being able to separate lock waits from IO waits will aid performance tuning, because reducing them will require difference approaches.

Table and row lock instrumentation maybe too resource intensive. In the meantime, the fix in https://github.com/MariaDB/server/pull/2206 marks this kind of wait as wait/lock/table/sql/handler

Steps to reproduce:

  1. Add the following line into my.cnf before instance start

    performance_schema=ON
    

  2. Run the following to set up all consumers and instrumentation

    UPDATE performance_schema.setup_consumers SET ENABLED = 'YES';
    UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES';
    set global innodb_lock_wait_timeout=7200;
    

  3. Set up test table

    create database test;
    use test
    create table t1 (id1 int(11), col1 varchar (200));
    insert into t1 values (1, 'aa');
    insert into t1 values (2, 'bb');
    

  4. Run 3 concurrent sessions to observe the bug
    1. Session 1

      start transaction;
      select id1 from test.t1 where id1=1 for update;
      select sleep(100000);
      

    2. Session 2

      start transaction;
      select id1 from test.t1 where id1=1 for update;
      

    3. Session 3

      MariaDB [test]> select * from performance_schema.threads where type = 'foreground'\G
      *************************** Only relevant row displayed below ***************************
                THREAD_ID: 12
                     NAME: thread/sql/one_connection
                     TYPE: FOREGROUND
           PROCESSLIST_ID: 4
         PROCESSLIST_USER: haidong
         PROCESSLIST_HOST: NULL
           PROCESSLIST_DB: NULL
      PROCESSLIST_COMMAND: Query
         PROCESSLIST_TIME: 9
        PROCESSLIST_STATE: Sending data
         PROCESSLIST_INFO: select id1 from test.t1 where id1=1 for update
         PARENT_THREAD_ID: 1
                     ROLE: NULL
             INSTRUMENTED: YES
                  HISTORY: YES
          CONNECTION_TYPE: Socket
             THREAD_OS_ID: 22508
       
      MariaDB [test]> select * from performance_schema.events_waits_current where thread_id=12\G
      *************************** 1. row ***************************
                  THREAD_ID: 12
                   EVENT_ID: 236
               END_EVENT_ID: NULL
                 EVENT_NAME: wait/io/table/sql/handler           <<<Incorrect instrument name
                     SOURCE:
                TIMER_START: 767747433709600
                  TIMER_END: 791300625915200
                 TIMER_WAIT: 23553192205600
                      SPINS: NULL
              OBJECT_SCHEMA: test
                OBJECT_NAME: t1
                 INDEX_NAME: NULL
                OBJECT_TYPE: TABLE
      OBJECT_INSTANCE_BEGIN: 139905003640240
           NESTING_EVENT_ID: 235
         NESTING_EVENT_TYPE: STAGE
                  OPERATION: fetch
            NUMBER_OF_BYTES: 47
                      FLAGS: NULL
      1 row in set (0.001 sec)
      

I'm working on a fix.



 Comments   
Comment by Daniel Black [ 2022-07-28 ]

vlad.lesin can you review pr 2206 please.

Comment by Haidong Ji [ 2022-08-04 ]

Hi Vladislav Lesin,

Daniel Black has reviewed and provided comments. I've since addressed them. I believe this fix will mitigate the exaggerated `wait/io/table/sql/handler` wait time issue.

Appreciate your feedback/approval. Thanks!

Comment by Sergei Golubchik [ 2023-10-14 ]

The fix was incorrect and caused sporadic failures in buildbot. Reverted. See also #2206

Generated at Thu Feb 08 10:05:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.