Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5310

Columnstore crashes in malloc() with simple select query

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 22.08.3
    • 22.08.6
    • ExeMgr
    • DISTRIB_ID=Ubuntu
      DISTRIB_RELEASE=20.04
      DISTRIB_CODENAME=focal
      DISTRIB_DESCRIPTION="Ubuntu 20.04.5 L

      single server mix innodb and ColumnStore

    Description

      Customer has single server with CS engine
      keep having crashed with signal 11

      Nov 17 12:51:34 vm-uks-edf-maria env[159613]: ### ExeMgr ses:2148047732 caught: InetStreamSocket::readToMagic(): I/O error2.1: err = -1 e = 104: Connection reset by peer
      Nov 17 12:51:34 vm-uks-edf-maria env[159613]: ExeMgr[159613]: 34.074602 |2148047732|0|0| C 16 CAL0055: ERROR: ExeMgr has caught an exception. InetStreamSocket::readToMagic(): I/O error2.1: err =
      -1 e = 104: Connection reset by peer
      Nov 17 12:51:34 vm-uks-edf-maria env[159613]: ### ExeMgr ses:564088 caught: InetStreamSocket::readToMagic(): I/O error2.1: err = -1 e = 104: Connection reset by peer
      Nov 17 12:51:34 vm-uks-edf-maria env[159613]: ExeMgr[159613]: 34.074695 |564088|0|0| C 16 CAL0055: ERROR: ExeMgr has caught an exception. InetStreamSocket::readToMagic(): I/O error2.1: err = -1 e
      = 104: Connection reset by peer
      Nov 17 12:51:34 vm-uks-edf-maria ExeMgr[159613]: 34.074602 |2148047732|0|0| C 16 CAL0055: ERROR: ExeMgr has caught an exception. InetStreamSocket::readToMagic(): I/O error2.1: err = -1 e = 104: C
      onnection reset by peer
      Nov 17 12:51:34 vm-uks-edf-maria ExeMgr[159613]: 34.074695 |564088|0|0| C 16 CAL0055: ERROR: ExeMgr has caught an exception. InetStreamSocket::readToMagic(): I/O error2.1: err = -1 e = 104: Conne
      ction reset by peer
      Nov 17 12:51:34 vm-uks-edf-maria ExeMgr[159613]: 34.075276 |564091|0|0| C 16 CAL0055: ERROR: ExeMgr has caught an exception. InetStreamSocket::readToMagic(): I/O error2.1: err = -1 e = 104: Conne
      ction reset by peer
      Nov 17 12:51:34 vm-uks-edf-maria env[159613]: ### ExeMgr ses:564091 caught: InetStreamSocket::readToMagic(): I/O error2.1: err = -1 e = 104: Connection reset by peer
      Nov 17 12:51:34 vm-uks-edf-maria env[159613]: ExeMgr[159613]: 34.075276 |564091|0|0| C 16 CAL0055: ERROR: ExeMgr has caught an exception. InetStreamSocket::readToMagic(): I/O error2.1: err = -1 e
      = 104: Connection reset by peer
      Nov 17 12:51:34 vm-uks-edf-maria systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV
      Nov 17 12:51:34 vm-uks-edf-maria systemd[1]: mariadb.service: Failed with result 'signal'.
      Nov 17 12:51:35 vm-uks-edf-maria CRON[725208]: (CRON) info (No MTA installed, discarding output)
      Nov 17 12:51:39 vm-uks-edf-maria systemd[1]: mariadb.service: Scheduled restart job, restart counter is at 1.
      Nov 17 12:51:39 vm-uks-edf-maria systemd[1]: Stopped MariaDB 10.6.9-5 database server.

      Attachments

        Issue Links

          Activity

            David.Hall the support team is getting core file . JFYI

            alexey.vorovich alexey vorovich (Inactive) added a comment - David.Hall the support team is getting core file . JFYI
            toddstoffel Todd Stoffel (Inactive) added a comment - This might actually be https://jira.mariadb.org/browse/MDEV-26917

            massimo.disaro nhat.ho

            Do we have
            1. logs
            2. core files

            In a single place ? of not please provide an annotated list of URLs for them .

            thnks

            David.Hall FYI

            alexey.vorovich alexey vorovich (Inactive) added a comment - massimo.disaro nhat.ho Do we have 1. logs 2. core files In a single place ? of not please provide an annotated list of URLs for them . thnks David.Hall FYI

            Possible duplicate with MCOL-5309

            toddstoffel Todd Stoffel (Inactive) added a comment - Possible duplicate with MCOL-5309
            massimo.disaro Massimo added a comment -

            alexey.vorovich the log were already present see my comment for DEV only there are all the logs with configuration as well

            massimo.disaro Massimo added a comment - alexey.vorovich the log were already present see my comment for DEV only there are all the logs with configuration as well

            The recurring crashes seem to always happen with very simple queries , I have done some detective work and it appears that
            the offending queries that are reported in the crash are called from within a stored procedure, which is called from an InnoDB open transaction - not sure this has anything to do with the crash.
            The crash itself happens in malloc() every time, here's a stacktrace from today crash.

            Thread 32 (Thread 0x7f1f7e4b3700 (LWP 1268639)):
            #0  __libc_write (nbytes=1, buf=0x7f1ec481f0ee, fd=2) at ../sysdeps/unix/sysv/linux/write.c:26
            #1  __libc_write (fd=2, buf=0x7f1ec481f0ee, nbytes=1) at ../sysdeps/unix/sysv/linux/write.c:24
            #2  0x00005645168d6296 in my_safe_print_str ()
            #3  0x000056451638500a in handle_fatal_signal ()
            #4  <signal handler called>
            #5  tcache_get (tc_idx=<optimized out>) at malloc.c:2937
            #6  __GI___libc_malloc (bytes=31) at malloc.c:3051
            #7  0x00007f27c905fb39 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
            #8  0x00007f27b95bc2d1 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/x86_64-linux-gnu/libexecplan.so
            #9  0x00007f27b9595cac in execplan::make_table(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) () from /usr/lib/x86_64-linux-gnu/libexecplan.so
            #10 0x00007f27ba16229e in cal_impl_if::processFrom(bool&, st_select_lex&, cal_impl_if::gp_walk_info&, boost::shared_ptr<execplan::CalpontSelectExecutionPlan>&) () from /usr/lib/mysql/plugin/ha_columnstore.so
            #11 0x00007f27ba162d65 in cal_impl_if::getSelectPlan(cal_impl_if::gp_walk_info&, st_select_lex&, boost::shared_ptr<execplan::CalpontSelectExecutionPlan>&, bool, bool, std::vector<Item*, std::allocator<Item*> > const&) () from /usr/lib/mysql/plugin/ha_columnstore.so
            #12 0x00007f27ba167d34 in cal_impl_if::cs_get_select_plan(ha_columnstore_select_handler*, THD*, boost::shared_ptr<execplan::CalpontSelectExecutionPlan>&, cal_impl_if::gp_walk_info&) () from /usr/lib/mysql/plugin/ha_columnstore.so
            #13 0x00007f27ba10bdff in ha_mcs_impl_pushdown_init(mcs_handler_info*, TABLE*) () from /usr/lib/mysql/plugin/ha_columnstore.so
            #14 0x00007f27ba0f2b35 in create_columnstore_select_handler(THD*, st_select_lex*) () from /usr/lib/mysql/plugin/ha_columnstore.so
            #15 0x00005645161b7088 in mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) ()
            #16 0x00005645161b78a7 in handle_select(THD*, LEX*, select_result*, unsigned long) ()
            #17 0x0000564516144071 in ?? ()
            #18 0x0000564516152cd3 in mysql_execute_command(THD*, bool) ()
            #19 0x0000564516099cfb in sp_instr_stmt::exec_core(THD*, unsigned int*) ()
            #20 0x00005645160a310a in sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*) ()
            #21 0x00005645160a3b27 in sp_instr_stmt::execute(THD*, unsigned int*) ()
            #22 0x000056451609d411 in sp_head::execute(THD*, bool) ()
            #23 0x000056451609f00a in sp_head::execute_procedure(THD*, List<Item>*) ()
            #24 0x0000564516143ea7 in ?? ()
            #25 0x0000564516148a46 in Sql_cmd_call::execute(THD*) ()
            #26 0x000056451614f7d6 in mysql_execute_command(THD*, bool) ()
            #27 0x0000564516099cfb in sp_instr_stmt::exec_core(THD*, unsigned int*) ()
            #28 0x00005645160a310a in sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*) ()
            #29 0x00005645160a3b27 in sp_instr_stmt::execute(THD*, unsigned int*) ()
            #30 0x000056451609d411 in sp_head::execute(THD*, bool) ()
            #31 0x000056451609fa1a in sp_head::execute_function(THD*, Item**, unsigned int, Field*, sp_rcontext**, Query_arena*) ()
            #32 0x00005645163a7f7f in Item_sp::execute_impl(THD*, Item**, unsigned int) ()
            #33 0x00005645163a8113 in Item_sp::execute(THD*, bool*, Item**, unsigned int) ()
            #34 0x00005645164139a3 in Item_func_sp::val_str(String*) ()
            #35 0x00005645162e40a8 in Type_handler::Item_send_str(Item*, Protocol*, st_value*) const ()
            #36 0x000056451607e1c6 in Protocol::send_result_set_row(List<Item>*) ()
            #37 0x00005645160f67c7 in select_send::send_data(List<Item>&) ()
            #38 0x00005645161b8b6b in JOIN::exec_inner() ()
            #39 0x00005645161b8f89 in JOIN::exec() ()
            #40 0x00005645161b70ea in mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) ()
            #41 0x00005645161b78a7 in handle_select(THD*, LEX*, select_result*, unsigned long) ()
            #42 0x0000564516144071 in ?? ()
            #43 0x0000564516152cd3 in mysql_execute_command(THD*, bool) ()
            #44 0x000056451613e9c7 in mysql_parse(THD*, char*, unsigned int, Parser_state*) ()
            #45 0x000056451614b04d in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) ()
            #46 0x000056451614d768 in do_command(THD*, bool) ()
            #47 0x0000564516261607 in do_handle_one_connection(CONNECT*, bool) ()
            #48 0x000056451626195d in handle_one_connection ()
            #49 0x00005645165cfba6 in ?? ()
            #50 0x00007f27c91a5609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #51 0x00007f27c8d91133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            

            Offending query (according to mariadb server log)

            Query (0x7f1ec481f068): select if(count(*) > 0, 1, 0) into managerContactAccessRest from edf_colstore.vinci_manager_contact_access_t where manager_c_id =  NAM
            

            This is truncated but found the actual query in the stored procedure:

            select if(count(*) > 0, 1, 0) into managerContactAccessRest from edf_colstore.vinci_manager_contact_access_t where manager_c_id = pUserId;
            

            pUserId value when called is

            NAME_CONST('pUserId',433278)
            

            The crash almost always happens with this query and it's always in malloc()

            rpizzi Rick Pizzi (Inactive) added a comment - The recurring crashes seem to always happen with very simple queries , I have done some detective work and it appears that the offending queries that are reported in the crash are called from within a stored procedure, which is called from an InnoDB open transaction - not sure this has anything to do with the crash. The crash itself happens in malloc() every time, here's a stacktrace from today crash. Thread 32 (Thread 0x7f1f7e4b3700 (LWP 1268639)): #0 __libc_write (nbytes=1, buf=0x7f1ec481f0ee, fd=2) at ../sysdeps/unix/sysv/linux/write.c:26 #1 __libc_write (fd=2, buf=0x7f1ec481f0ee, nbytes=1) at ../sysdeps/unix/sysv/linux/write.c:24 #2 0x00005645168d6296 in my_safe_print_str () #3 0x000056451638500a in handle_fatal_signal () #4 <signal handler called> #5 tcache_get (tc_idx=<optimized out>) at malloc.c:2937 #6 __GI___libc_malloc (bytes=31) at malloc.c:3051 #7 0x00007f27c905fb39 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #8 0x00007f27b95bc2d1 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/x86_64-linux-gnu/libexecplan.so #9 0x00007f27b9595cac in execplan::make_table(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) () from /usr/lib/x86_64-linux-gnu/libexecplan.so #10 0x00007f27ba16229e in cal_impl_if::processFrom(bool&, st_select_lex&, cal_impl_if::gp_walk_info&, boost::shared_ptr<execplan::CalpontSelectExecutionPlan>&) () from /usr/lib/mysql/plugin/ha_columnstore.so #11 0x00007f27ba162d65 in cal_impl_if::getSelectPlan(cal_impl_if::gp_walk_info&, st_select_lex&, boost::shared_ptr<execplan::CalpontSelectExecutionPlan>&, bool, bool, std::vector<Item*, std::allocator<Item*> > const&) () from /usr/lib/mysql/plugin/ha_columnstore.so #12 0x00007f27ba167d34 in cal_impl_if::cs_get_select_plan(ha_columnstore_select_handler*, THD*, boost::shared_ptr<execplan::CalpontSelectExecutionPlan>&, cal_impl_if::gp_walk_info&) () from /usr/lib/mysql/plugin/ha_columnstore.so #13 0x00007f27ba10bdff in ha_mcs_impl_pushdown_init(mcs_handler_info*, TABLE*) () from /usr/lib/mysql/plugin/ha_columnstore.so #14 0x00007f27ba0f2b35 in create_columnstore_select_handler(THD*, st_select_lex*) () from /usr/lib/mysql/plugin/ha_columnstore.so #15 0x00005645161b7088 in mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) () #16 0x00005645161b78a7 in handle_select(THD*, LEX*, select_result*, unsigned long) () #17 0x0000564516144071 in ?? () #18 0x0000564516152cd3 in mysql_execute_command(THD*, bool) () #19 0x0000564516099cfb in sp_instr_stmt::exec_core(THD*, unsigned int*) () #20 0x00005645160a310a in sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*) () #21 0x00005645160a3b27 in sp_instr_stmt::execute(THD*, unsigned int*) () #22 0x000056451609d411 in sp_head::execute(THD*, bool) () #23 0x000056451609f00a in sp_head::execute_procedure(THD*, List<Item>*) () #24 0x0000564516143ea7 in ?? () #25 0x0000564516148a46 in Sql_cmd_call::execute(THD*) () #26 0x000056451614f7d6 in mysql_execute_command(THD*, bool) () #27 0x0000564516099cfb in sp_instr_stmt::exec_core(THD*, unsigned int*) () #28 0x00005645160a310a in sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*) () #29 0x00005645160a3b27 in sp_instr_stmt::execute(THD*, unsigned int*) () #30 0x000056451609d411 in sp_head::execute(THD*, bool) () #31 0x000056451609fa1a in sp_head::execute_function(THD*, Item**, unsigned int, Field*, sp_rcontext**, Query_arena*) () #32 0x00005645163a7f7f in Item_sp::execute_impl(THD*, Item**, unsigned int) () #33 0x00005645163a8113 in Item_sp::execute(THD*, bool*, Item**, unsigned int) () #34 0x00005645164139a3 in Item_func_sp::val_str(String*) () #35 0x00005645162e40a8 in Type_handler::Item_send_str(Item*, Protocol*, st_value*) const () #36 0x000056451607e1c6 in Protocol::send_result_set_row(List<Item>*) () #37 0x00005645160f67c7 in select_send::send_data(List<Item>&) () #38 0x00005645161b8b6b in JOIN::exec_inner() () #39 0x00005645161b8f89 in JOIN::exec() () #40 0x00005645161b70ea in mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) () #41 0x00005645161b78a7 in handle_select(THD*, LEX*, select_result*, unsigned long) () #42 0x0000564516144071 in ?? () #43 0x0000564516152cd3 in mysql_execute_command(THD*, bool) () #44 0x000056451613e9c7 in mysql_parse(THD*, char*, unsigned int, Parser_state*) () #45 0x000056451614b04d in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) () #46 0x000056451614d768 in do_command(THD*, bool) () #47 0x0000564516261607 in do_handle_one_connection(CONNECT*, bool) () #48 0x000056451626195d in handle_one_connection () #49 0x00005645165cfba6 in ?? () #50 0x00007f27c91a5609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #51 0x00007f27c8d91133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Offending query (according to mariadb server log) Query (0x7f1ec481f068): select if(count(*) > 0, 1, 0) into managerContactAccessRest from edf_colstore.vinci_manager_contact_access_t where manager_c_id = NAM This is truncated but found the actual query in the stored procedure: select if(count(*) > 0, 1, 0) into managerContactAccessRest from edf_colstore.vinci_manager_contact_access_t where manager_c_id = pUserId; pUserId value when called is NAME_CONST('pUserId',433278) The crash almost always happens with this query and it's always in malloc()
            drrtuy Roman added a comment -

            Let me remind you that we are looking into MDB runtime not PP or any other MCS processes thus there will be no jemalloc. However IMHO the fact MDB lacks jemalloc doesn't make much difference.
            IMHO we face a libc issue b/c the callstack looks fine to me, namely new() operator tries to allocate 31 bytes(the length of the table name that is 30 bytes + 1 extra).
            I found couple similar issues. Here is the one telling there might be issues in mem allocation code in glibc in deb based distributions. The text tells that the patch with the fix has been in glibc since 2.28(U20 has 2.31) but there is a workaround that might be of use for us. One has to export GLIBC_TUNABLES=glibc.malloc.tcache_count=0 running MDB. Plz try to alter MDB systemd unit adding this export and see if this helps. JFYI According with this there might be a perf degradation when tcache is disabled.

            drrtuy Roman added a comment - Let me remind you that we are looking into MDB runtime not PP or any other MCS processes thus there will be no jemalloc. However IMHO the fact MDB lacks jemalloc doesn't make much difference. IMHO we face a libc issue b/c the callstack looks fine to me, namely new() operator tries to allocate 31 bytes(the length of the table name that is 30 bytes + 1 extra). I found couple similar issues. Here is the one telling there might be issues in mem allocation code in glibc in deb based distributions. The text tells that the patch with the fix has been in glibc since 2.28(U20 has 2.31) but there is a workaround that might be of use for us. One has to export GLIBC_TUNABLES=glibc.malloc.tcache_count=0 running MDB. Plz try to alter MDB systemd unit adding this export and see if this helps. JFYI According with this there might be a perf degradation when tcache is disabled.
            drrtuy Roman added a comment -

            This SEGV is caused by SQL expression pattern that calculates a percentage over a decimal column. The expression is roughly widedecimal column / count(something). In this expression MCS treats count(something) is as decimal doing types coercion and its precision is 9999 that is set in multiple places in tupleaggregatestep.cpp(search for 9999 in the file for the actual places). Be aware that there is a hidden semantics of this precision 9999 that must be taken into account fixing this. The precision 9999 is used in Row::initToNull to assign 0 and not NULL to COUNT() column.

            drrtuy Roman added a comment - This SEGV is caused by SQL expression pattern that calculates a percentage over a decimal column. The expression is roughly widedecimal column / count(something). In this expression MCS treats count(something) is as decimal doing types coercion and its precision is 9999 that is set in multiple places in tupleaggregatestep.cpp(search for 9999 in the file for the actual places). Be aware that there is a hidden semantics of this precision 9999 that must be taken into account fixing this. The precision 9999 is used in Row::initToNull to assign 0 and not NULL to COUNT() column.

            People

              drrtuy Roman
              massimo.disaro Massimo
              Votes:
              3 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.