Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32487

SIGSEGV in ha_resolve_storage_engine_name, UBSAN: runtime error: index 4294967295 out of bounds for type 'st_plugin_int *[64]'

Details

    Description

      The bug is seen very often in both optimized and debug builds, and yet any direct attempts at reproducing it always fail. Also, the SQL traces are at times empty, indicating that the issue can happen very close to startup. The issue stops many test runs short.

      preview-11.3-preview 76e20f00772148fa928c6c6e42401f38ca89abf0 (Debug)

      Core was generated by `/test/HEAD_preview_11.3_preview_MD171023-mariadb-11.3.0-linux-x86_64-dbg/bin/ma'.
      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  0x000055e400f34711 in ha_resolve_storage_engine_name (
          db_type=<optimized out>)
          at /test/preview-11.3-preview_dbg/sql/handler.h:5546
      5546	          db_type == view_pseudo_hton ? "VIEW" : hton_name(db_type)->str);
      [Current thread is 1 (Thread 0x14b940073640 (LWP 45814))]
      (gdb) bt
      #0  0x000055e400f34711 in ha_resolve_storage_engine_name (db_type=<optimized out>) at /test/preview-11.3-preview_dbg/sql/handler.h:5546
      #1  check_engine (thd=thd@entry=0x14b8a4000d58, db_name=<optimized out>, table_name=0x14b8a4013590 "t1", create_info=create_info@entry=0x14b940070200) at /test/preview-11.3-preview_dbg/sql/sql_table.cc:12498
      #2  0x000055e400f3dc30 in create_table_impl (thd=thd@entry=0x14b8a4000d58, ddl_log_state_create=ddl_log_state_create@entry=0x14b94006ff80, ddl_log_state_rm=<optimized out>, ddl_log_state_rm@entry=0x14b940070110, orig_db=@0x14b8a40135e0: {str = 0x14b8a4013cd8 "test", length = 4}, orig_table_name=@0x14b8a40135f0: {str = 0x14b8a4013590 "t1", length = 2}, db=@0x14b8a40135e0: {str = 0x14b8a4013cd8 "test", length = 4}, table_name=@0x14b8a40135f0: {str = 0x14b8a4013590 "t1", length = 2}, path=@0x14b94006fc40: {str = 0x14b94006fc60 "./test/t1", length = 9}, options=<optimized out>, create_info=0x14b940070200, alter_info=0x14b940070020, create_table_mode=0, is_trans=0x14b94006ff60, key_info=0x14b94006fc38, key_count=0x14b94006fc34, frm=0x14b94006fc50) at /test/preview-11.3-preview_dbg/sql/sql_table.cc:4576
      #3  0x000055e400f3eae2 in mysql_create_table_no_lock (thd=thd@entry=0x14b8a4000d58, ddl_log_state_create=ddl_log_state_create@entry=0x14b94006ff80, ddl_log_state_rm=ddl_log_state_rm@entry=0x14b940070110, db=db@entry=0x14b8a40135e0, table_name=table_name@entry=0x14b8a40135f0, create_info=create_info@entry=0x14b940070200, alter_info=0x14b940070020, is_trans=0x14b94006ff60, create_table_mode=0, table_list=0x14b8a40135c8) at /test/preview-11.3-preview_dbg/sql/sql_table.cc:4766
      #4  0x000055e400f409f2 in mysql_create_table (alter_info=0x14b940070020, create_info=0x14b940070200, create_table=0x14b8a40135c8, thd=0x14b8a4000d58) at /test/preview-11.3-preview_dbg/sql/sql_table.cc:4882
      #5  Sql_cmd_create_table_like::execute (this=<optimized out>, thd=0x14b8a4000d58) at /test/preview-11.3-preview_dbg/sql/sql_table.cc:12819
      #6  0x000055e400e5f0d2 in mysql_execute_command (thd=thd@entry=0x14b8a4000d58, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/preview-11.3-preview_dbg/sql/sql_parse.cc:5723
      #7  0x000055e400e60433 in mysql_parse (thd=thd@entry=0x14b8a4000d58, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x14b9400721e0) at /test/preview-11.3-preview_dbg/sql/sql_parse.cc:7734
      #8  0x000055e400e627a9 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x14b8a4000d58, packet=packet@entry=0x14b8a400b119 "CREATE TABLE t1 (a INT) ENGINE=Spider AUTO_INCREMENT_MODE=4;", packet_length=packet_length@entry=60, blocking=blocking@entry=true) at /test/preview-11.3-preview_dbg/sql/sql_class.h:251
      #9  0x000055e400e648d6 in do_command (thd=0x14b8a4000d58, blocking=blocking@entry=true) at /test/preview-11.3-preview_dbg/sql/sql_parse.cc:1406
      #10 0x000055e400fc46a3 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55e403b2ecf8, put_in_cache=put_in_cache@entry=true) at /test/preview-11.3-preview_dbg/sql/sql_connect.cc:1445
      #11 0x000055e400fc4998 in handle_one_connection (arg=arg@entry=0x55e403b2ecf8) at /test/preview-11.3-preview_dbg/sql/sql_connect.cc:1347
      #12 0x000055e4014073ba in pfs_spawn_thread (arg=0x55e403aaa618) at /test/preview-11.3-preview_dbg/storage/perfschema/pfs.cc:2201
      #13 0x000014b958694ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #14 0x000014b958726a40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
      

      The crash always seems to happen on line 5546 in sql/handler.h:

      db_type == view_pseudo_hton ? "VIEW" : hton_name(db_type)->str);
      

      Attachments

        Issue Links

          Activity

            Was able to reproduce the bug with --threads=10, but not with --threads=5. Ran a running command extract script in GDB and found that multiple threads were executing commands simultaneously, so it is not just the "very first "succeeding" query" but rather, likely, several queries (out of the two) running at the same time which causes the issue.

            Roel Roel Van de Paar added a comment - Was able to reproduce the bug with --threads=10, but not with --threads=5. Ran a running command extract script in GDB and found that multiple threads were executing commands simultaneously, so it is not just the "very first "succeeding" query" but rather, likely, several queries (out of the two) running at the same time which causes the issue.
            ycp Yuchen Pei added a comment - - edited

            Thanks very much Roel for reducing the testcase. I could reproduce it locally with an rr trace by running the following

            echo "CREATE TABLE t (c INT) ENGINE=SPIDER;
            INSTALL PLUGIN spider SONAME 'ha_spider.so';
            " > /tmp/in.sql && \
            rm /tmp/log2/* -f && \
              taskset -c 0-3 ./mysql-test/mtr --rr --mysqld=--innodb --mysqld=--partition --start-and-exit && \
              ~/source/mariadb-qa/pquery/pquery2-md --infile=/tmp/in.sql --database=test --threads=25 --queries-per-thread=100 --logdir=/tmp/log2 --log-all-queries --log-failed-queries --user=root --socket=./mysql-test/var/tmp/mysqld.1.sock >> /tmp/log2/pquery.log && \
              grep -i "gone away" /tmp/log2/*

            The grep is for confirming that the server indeed died. If it does not happen, try increasing --threads and/or --query-per-thread number.

            Initial debugging shows the problem is caused by one thread still installing spider, while another thread trying to create the table using spider. When this happens, eventually it calls hton_name() which does not check the bound when accessing an element of the array hton2plugin of size 64, all the while the default slot value of an hton is 4294967295, which is only updated after the plugin is initialised.

            static inline LEX_CSTRING *hton_name(const handlerton *hton)
            {
              return &(hton2plugin[hton->slot]->name);
            }

            It is rather surprising this bug was not found before, and I suspect this could happen to any dynamically loaded storage engine (to be verified).

            ycp Yuchen Pei added a comment - - edited Thanks very much Roel for reducing the testcase. I could reproduce it locally with an rr trace by running the following echo "CREATE TABLE t (c INT) ENGINE=SPIDER; INSTALL PLUGIN spider SONAME 'ha_spider.so'; " > /tmp/in.sql && \ rm /tmp/log2/* -f && \ taskset -c 0-3 ./mysql-test/mtr --rr --mysqld=--innodb --mysqld=--partition --start-and-exit && \ ~/source/mariadb-qa/pquery/pquery2-md --infile=/tmp/in.sql --database=test --threads=25 --queries-per-thread=100 --logdir=/tmp/log2 --log-all-queries --log-failed-queries --user=root --socket=./mysql-test/var/tmp/mysqld.1.sock >> /tmp/log2/pquery.log && \ grep -i "gone away" /tmp/log2/* The grep is for confirming that the server indeed died. If it does not happen, try increasing --threads and/or --query-per-thread number. Initial debugging shows the problem is caused by one thread still installing spider, while another thread trying to create the table using spider. When this happens, eventually it calls hton_name() which does not check the bound when accessing an element of the array hton2plugin of size 64, all the while the default slot value of an hton is 4294967295, which is only updated after the plugin is initialised. static inline LEX_CSTRING *hton_name( const handlerton *hton) { return &(hton2plugin[hton->slot]->name); } It is rather surprising this bug was not found before, and I suspect this could happen to any dynamically loaded storage engine (to be verified).
            ycp Yuchen Pei added a comment - - edited

            Hi holyfoot, ptal thanks:

            b2dc6645c0a bb-10.5-mdev-32487 MDEV-32487 Check plugin is ready when resolving storage engine
            

            ycp Yuchen Pei added a comment - - edited Hi holyfoot , ptal thanks: b2dc6645c0a bb-10.5-mdev-32487 MDEV-32487 Check plugin is ready when resolving storage engine

            see the commit commentary for recommendations.
            Otherwise ok to push.

            holyfoot Alexey Botchkov added a comment - see the commit commentary for recommendations. Otherwise ok to push.
            ycp Yuchen Pei added a comment -

            thanks for the review - pushed a6ae1c2dfb272008666d1f15dc9cb0cc06a5b82a to 10.5

            ycp Yuchen Pei added a comment - thanks for the review - pushed a6ae1c2dfb272008666d1f15dc9cb0cc06a5b82a to 10.5

            People

              ycp Yuchen Pei
              Roel Roel Van de Paar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.