Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.4.13
    • 10.4.14, 10.5.5
    • Galera
    • None

    Description

      Galera wsrep layer leaks memory on nodes receiving writesets.
      For some reason any issued DDL on master makes such nodes release some of the leaked memory (but doesn't stop the leak).

      This is affecting a customer of ours.
      It is clearly seen that the leak is in wsrep layer because if node gets evicted (wsrep layer is shut down but mysqld stays up) all memory is released instantly.

      Besides, the nodes where the leak appears are receiving NO queries at all.

      Leak only affects nodes receiving writesets via wsrep layer - master is NOT affected.

      How to reproduce: 3 Centos7 nodes, 1G memory on each, use supplied sysbench script and leave it running for few hours.

      sysbench --db-driver=mysql --mysql-host=localhost --mysql-user=root code.lua --tables=16 prepare
      sysbench --db-driver=mysql --mysql-host=localhost --mysql-user=root code.lua  --tables=16 --threads=64 --time=0 run
      

      Attachments

        1. code.lua
          1 kB
        2. my.cnf
          2 kB

        Issue Links

          Activity

            As this bug most likely is on wsrep-lib interface or in Galera library, assigning it to teemu.ollakka

            jplindst Jan Lindström (Inactive) added a comment - As this bug most likely is on wsrep-lib interface or in Galera library, assigning it to teemu.ollakka

            If a memory leak is confirmed in this test, then this may be an explanation for another problem - for the MDEV-22908. As far as I understand, the primary component is lost in MDEV-22908 due to the exhaustion of memory on one of the nodes. It is difficult to say for sure in advance, but perhaps these problems have a common cause.

            sysprg Julius Goryavsky added a comment - If a memory leak is confirmed in this test, then this may be an explanation for another problem - for the MDEV-22908 . As far as I understand, the primary component is lost in MDEV-22908 due to the exhaustion of memory on one of the nodes. It is difficult to say for sure in advance, but perhaps these problems have a common cause.
            teemu.ollakka Teemu Ollakka added a comment -

            Valgrind/Massif revealed a leak in

               n2: 2757856 0x1501019: alloc_root (my_alloc.c:250)
                n2: 2097940 0x7EDFCF: Query_arena::alloc(unsigned long) (sql_class.h:1049)
                 n2: 2095660 0x884D89: lock_tables(THD*, TABLE_LIST*, unsigned int, unsigned 
            int) (sql_base.cc:5522)
                  n2: 2095660 0x88441E: open_and_lock_tables(THD*, DDL_options_st const&, TAB
            LE_LIST*, bool, unsigned int, Prelocking_strategy*) (sql_base.cc:5278)
                   n2: 2095660 0x83D8F9: open_and_lock_tables(THD*, TABLE_LIST*, bool, unsign
            ed int) (sql_base.h:505)
                    n1: 2095512 0xDD9596: Rows_log_event::do_apply_event(rpl_group_info*) (lo
            g_event.cc:11399)
                     n1: 2095512 0x826AFD: Log_event::apply_event(rpl_group_info*) (log_event
            .h:1482)
                      n1: 2095512 0xB95BE7: wsrep_apply_events(THD*, Relay_log_info*, void co
            nst*, unsigned long) (wsrep_applier.cc:200)
            

            With the following PR the leak disappeared: https://github.com/MariaDB/server/pull/1639

            Could someone verify that the fix plugs the leak?

            teemu.ollakka Teemu Ollakka added a comment - Valgrind/Massif revealed a leak in n2: 2757856 0x1501019: alloc_root (my_alloc.c:250) n2: 2097940 0x7EDFCF: Query_arena::alloc(unsigned long) (sql_class.h:1049) n2: 2095660 0x884D89: lock_tables(THD*, TABLE_LIST*, unsigned int, unsigned int) (sql_base.cc:5522) n2: 2095660 0x88441E: open_and_lock_tables(THD*, DDL_options_st const&, TAB LE_LIST*, bool, unsigned int, Prelocking_strategy*) (sql_base.cc:5278) n2: 2095660 0x83D8F9: open_and_lock_tables(THD*, TABLE_LIST*, bool, unsign ed int) (sql_base.h:505) n1: 2095512 0xDD9596: Rows_log_event::do_apply_event(rpl_group_info*) (lo g_event.cc:11399) n1: 2095512 0x826AFD: Log_event::apply_event(rpl_group_info*) (log_event .h:1482) n1: 2095512 0xB95BE7: wsrep_apply_events(THD*, Relay_log_info*, void co nst*, unsigned long) (wsrep_applier.cc:200) With the following PR the leak disappeared: https://github.com/MariaDB/server/pull/1639 Could someone verify that the fix plugs the leak?

            Has anyone seen this behavior in 10.3?

            douglasawh Doug Whitfield added a comment - Has anyone seen this behavior in 10.3?

            People

              jplindst Jan Lindström (Inactive)
              rpizzi Rick Pizzi (Inactive)
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.