Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11226

MariaDB hang at MDL_context::acquire_lock , and thread pool was blocked, could not create additional thread to handle queries

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 10.0.19
    • N/A
    • None
    • suse 11 SP1 x86_64

    Description

      Thread 346 (Thread 0x7faf2583f700 (LWP 198744)):
      #0  0x00007fb079c6a879 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x0000000000675e9c in MDL_wait::timed_wait(MDL_context_owner*, timespec*, bool, PSI_stage_info_v1 const*) ()
      #2  0x000000000067869b in MDL_context::acquire_lock(MDL_request*, unsigned long) ()
      #3  0x000000000055524e in open_table_get_mdl_lock(THD*, Open_table_context*, MDL_request*, unsigned int, MDL_ticket**) ()
      #4  0x0000000000556f22 in open_table(THD*, TABLE_LIST*, st_mem_root*, Open_table_context*) ()
      #5  0x0000000000558a7a in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) ()
      #6  0x0000000000558dcb in open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) ()
      #7  0x0000000000584a8c in mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool) ()
      #8  0x000000000059e47e in mysql_execute_command(THD*) ()
      #9  0x000000000085e176 in mysql_open_cursor(THD*, select_result*, Server_side_cursor**) ()
      #10 0x00000000005ac224 in Prepared_statement::execute(String*, bool) ()
      #11 0x00000000005b00fa in Prepared_statement::execute_loop(String*, bool, unsigned char*, unsigned char*) ()
      #12 0x00000000005b0754 in mysqld_stmt_execute(THD*, char*, unsigned int) ()
      #13 0x00000000005a08da in dispatch_command(enum_server_command, THD*, char*, unsigned int) ()
      #14 0x00000000005a1ed9 in do_command(THD*) ()
      #15 0x00000000006b6798 in threadpool_process_request(THD*) ()
      #16 0x00000000006f3818 in worker_main(void*) ()
      #17 0x00007fb079c666a6 in start_thread () from /lib64/libpthread.so.0
      #18 0x00007fb078c29ccd in clone () from /lib64/libc.so.6
      #19 0x0000000000000000 in ?? ()
      

      Attachments

        Activity

          wlad Vladislav Vaintroub added a comment - - edited

          do you have the output of gdb thread apply all bt ? Also error log, and my.cnf would be helpful too. what were the threadpool related parameters?

          wlad Vladislav Vaintroub added a comment - - edited do you have the output of gdb thread apply all bt ? Also error log, and my.cnf would be helpful too. what were the threadpool related parameters?
          catofpei catofpei added a comment - - edited

          Here is the my.cnf parameters about threadpool:

          # # thread pool
          #plugin-load=thread_pool.so
          thread_handling=pool-of-threads
          thread_pool_size=36
          #thread_pool_stall_limit=6
          thread_pool_oversubscribe=5
          max_connections = 50000
          thread_cache_size = 18
          thread_concurrency = 64
          thread_stack = 512K
          

          ////////////////////////////////////////////////////////////////////////
          And the MariaDB's threads stack are similar:
          There are 551 threads, except some other io_handler_thread 、monitor thread 、purge threads
          500 threads hang at:

          #0  0x00007fb079c6a879 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
          #1  0x0000000000675e9c in MDL_wait::timed_wait(MDL_context_owner*, timespec*, bool, PSI_stage_info_v1 const*) ()
          #2  0x000000000067869b in MDL_context::acquire_lock(MDL_request*, unsigned long) ()
          #3  0x000000000055524e in open_table_get_mdl_lock(THD*, Open_table_context*, MDL_request*, unsigned int, MDL_ticket**) ()
          #4  0x0000000000556f22 in open_table(THD*, TABLE_LIST*, st_mem_root*, Open_table_context*) ()
          #5  0x0000000000558a7a in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) ()
          #6  0x0000000000558dcb in open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) ()
          

          And MariaDB Connector/C client hang at:

          #0  0x00007f2f9c748896 in poll () from /lib64/libc.so.6
          #1  0x00007f2f562a09ef in pvio_socket_wait_io_or_timeout (pvio=<optimized out>, is_read=<optimized out>, timeout=-1) at /home/sal/mariadb/mariadb/plugins/pvio/pvio_socket.c:510
          #2  0x00007f2f562a0a68 in pvio_socket_read (pvio=0x7f2f5d9bb730, buffer=0x7f2f043e55a0 "\f", length=16384) at /home/sal/mariadb/mariadb/plugins/pvio/pvio_socket.c:278
          #3  0x00007f2f562acced in ma_pvio_read (pvio=0x7f2f04441cc0, buffer=0x7f2f043e55a0 "\f", length=16384) at /home/sal/mariadb/mariadb/libmariadb/ma_pvio.c:251
          #4  0x00007f2f562acead in ma_pvio_cache_read (pvio=0x7f2f5d9bb730, buffer=0x7f2f043e95b0 "\n", length=4) at /home/sal/mariadb/mariadb/libmariadb/ma_pvio.c:291
          #5  0x00007f2f562a4267 in ma_real_read (net=0x7f2f044415b0, complen=0x7f2f5d9bb860) at /home/sal/mariadb/mariadb/libmariadb/ma_net.c:463
          #6  0x00007f2f562a44f5 in ma_net_read (net=0x7f2f5d9bb730) at /home/sal/mariadb/mariadb/libmariadb/ma_net.c:517
          #7  0x00007f2f562aa95c in ma_net_safe_read (mysql=0x7f2f044415b0) at /home/sal/mariadb/mariadb/libmariadb/mariadb_lib.c:188
          

          even a simple SQL like "select 1 from dual" was hanged.

          ////////////////////////////////////////////////////////////////////////
          And MariaDB's error log:

          161102 11:34:10 [ERROR] Threadpool could not create additional thread to handle queries, because the number of allowed threads was reached. Increasing 'thread_pool_max_threads' parameter can help in this situation.
           If 'extra_port' parameter is set, you can still connect to the database with superuser account (it must be TCP connection using extra_port as TCP port) and troubleshoot the situation. A likely cause of pool blocks are clients that lock resources for long time. 'show processlist' or 'show engine innodb status' can give additional hints.
          161102 11:34:10 [Note] Threadpool has been blocked for 30 seconds
          

          ////////////////////////////////////////////////////////////////////////
          We just test high concurrency, every connection start transaction and execute Insert/Update/Select, if finish then commit , or else rollback.

          very confused..

          catofpei catofpei added a comment - - edited Here is the my.cnf parameters about threadpool: # # thread pool #plugin-load=thread_pool.so thread_handling=pool-of-threads thread_pool_size=36 #thread_pool_stall_limit=6 thread_pool_oversubscribe=5 max_connections = 50000 thread_cache_size = 18 thread_concurrency = 64 thread_stack = 512K //////////////////////////////////////////////////////////////////////// And the MariaDB's threads stack are similar: There are 551 threads, except some other io_handler_thread 、monitor thread 、purge threads 500 threads hang at: #0 0x00007fb079c6a879 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000675e9c in MDL_wait::timed_wait(MDL_context_owner*, timespec*, bool, PSI_stage_info_v1 const*) () #2 0x000000000067869b in MDL_context::acquire_lock(MDL_request*, unsigned long) () #3 0x000000000055524e in open_table_get_mdl_lock(THD*, Open_table_context*, MDL_request*, unsigned int, MDL_ticket**) () #4 0x0000000000556f22 in open_table(THD*, TABLE_LIST*, st_mem_root*, Open_table_context*) () #5 0x0000000000558a7a in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) () #6 0x0000000000558dcb in open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) () And MariaDB Connector/C client hang at: #0 0x00007f2f9c748896 in poll () from /lib64/libc.so.6 #1 0x00007f2f562a09ef in pvio_socket_wait_io_or_timeout (pvio=<optimized out>, is_read=<optimized out>, timeout=-1) at /home/sal/mariadb/mariadb/plugins/pvio/pvio_socket.c:510 #2 0x00007f2f562a0a68 in pvio_socket_read (pvio=0x7f2f5d9bb730, buffer=0x7f2f043e55a0 "\f", length=16384) at /home/sal/mariadb/mariadb/plugins/pvio/pvio_socket.c:278 #3 0x00007f2f562acced in ma_pvio_read (pvio=0x7f2f04441cc0, buffer=0x7f2f043e55a0 "\f", length=16384) at /home/sal/mariadb/mariadb/libmariadb/ma_pvio.c:251 #4 0x00007f2f562acead in ma_pvio_cache_read (pvio=0x7f2f5d9bb730, buffer=0x7f2f043e95b0 "\n", length=4) at /home/sal/mariadb/mariadb/libmariadb/ma_pvio.c:291 #5 0x00007f2f562a4267 in ma_real_read (net=0x7f2f044415b0, complen=0x7f2f5d9bb860) at /home/sal/mariadb/mariadb/libmariadb/ma_net.c:463 #6 0x00007f2f562a44f5 in ma_net_read (net=0x7f2f5d9bb730) at /home/sal/mariadb/mariadb/libmariadb/ma_net.c:517 #7 0x00007f2f562aa95c in ma_net_safe_read (mysql=0x7f2f044415b0) at /home/sal/mariadb/mariadb/libmariadb/mariadb_lib.c:188 even a simple SQL like "select 1 from dual" was hanged. //////////////////////////////////////////////////////////////////////// And MariaDB's error log: 161102 11:34:10 [ERROR] Threadpool could not create additional thread to handle queries, because the number of allowed threads was reached. Increasing 'thread_pool_max_threads' parameter can help in this situation. If 'extra_port' parameter is set, you can still connect to the database with superuser account (it must be TCP connection using extra_port as TCP port) and troubleshoot the situation. A likely cause of pool blocks are clients that lock resources for long time. 'show processlist' or 'show engine innodb status' can give additional hints. 161102 11:34:10 [Note] Threadpool has been blocked for 30 seconds //////////////////////////////////////////////////////////////////////// We just test high concurrency, every connection start transaction and execute Insert/Update/Select, if finish then commit , or else rollback. very confused..
          wlad Vladislav Vaintroub added a comment - - edited

          I hoped the message in error log was helpful.

          thread_pool_max_threads is the hard limit of number of threads in the threadpool. It might be 500 or so as default.

          If this is less than max_connections, it can deadlock on a global lock. It won't ever deadlock if you size it as max_connections + thread_pool_size.

          Also, the message about extra_port is accurate. If you had enabled it, you could connect as superuser and resolve the deadlock by setting thread_pool_max_threads higher.

          That client hangs is because the server does not answer. The server does not answer because it is stalled, because the thread_pool_max_threads had reached.

          So I suggest to set thread_pool_max_threads=60000 or something big enough for you to never see this message again.

          wlad Vladislav Vaintroub added a comment - - edited I hoped the message in error log was helpful. thread_pool_max_threads is the hard limit of number of threads in the threadpool. It might be 500 or so as default. If this is less than max_connections, it can deadlock on a global lock. It won't ever deadlock if you size it as max_connections + thread_pool_size. Also, the message about extra_port is accurate. If you had enabled it, you could connect as superuser and resolve the deadlock by setting thread_pool_max_threads higher. That client hangs is because the server does not answer. The server does not answer because it is stalled, because the thread_pool_max_threads had reached. So I suggest to set thread_pool_max_threads=60000 or something big enough for you to never see this message again.
          catofpei catofpei added a comment -

          which global lock ? And I'm confused why threads were blocked at open tables procedure ? ..

          catofpei catofpei added a comment - which global lock ? And I'm confused why threads were blocked at open tables procedure ? ..

          It does not matter which lock, for the purpose of explaining threadpool. In this case it is MDL lock. It is possible to google to find out what an MDL lock does.

          It can theoretically be a table lock, user lock, "flush table with read lock" lock. It can also be a row lock in Innodb as well, if everyone is waiting for a specific row.

          wlad Vladislav Vaintroub added a comment - It does not matter which lock, for the purpose of explaining threadpool. In this case it is MDL lock. It is possible to google to find out what an MDL lock does. It can theoretically be a table lock, user lock, "flush table with read lock" lock. It can also be a row lock in Innodb as well, if everyone is waiting for a specific row.

          catofpei IS there anything else needing clarification? mdl lock is probably a result of LOCK TABLES or FLUSH TABLES WITH READ LOCK. you can know it better than we.

          wlad Vladislav Vaintroub added a comment - catofpei IS there anything else needing clarification? mdl lock is probably a result of LOCK TABLES or FLUSH TABLES WITH READ LOCK. you can know it better than we.
          catofpei catofpei added a comment -

          Very sorry for my delayed reply.

          I tried to increase the threadpool_max_threads value as you advised. And the test result was OK!
          No more threads were hanged in mariadb server, and the client was OK, too.

          In our test code, there were scenarios in which some threads tried to update the same row ,
          and some threads would threw the exception if lock wait timeout : Lock wait timeout exceeded, try restarting transaction.

          But we have no operations like "FLUSH TABLES" 、"LOCK TABLES" 、 DDL operations. Maybe as you wrote, "it can also be a row lock in Innodb" ..

          catofpei catofpei added a comment - Very sorry for my delayed reply. I tried to increase the threadpool_max_threads value as you advised. And the test result was OK! No more threads were hanged in mariadb server, and the client was OK, too. In our test code, there were scenarios in which some threads tried to update the same row , and some threads would threw the exception if lock wait timeout : Lock wait timeout exceeded, try restarting transaction. But we have no operations like "FLUSH TABLES" 、"LOCK TABLES" 、 DDL operations. Maybe as you wrote, "it can also be a row lock in Innodb" ..

          People

            Unassigned Unassigned
            catofpei catofpei
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.