[MXS-3059] Crash in Galera monitor Created: 2020-07-02  Updated: 2020-08-25  Resolved: 2020-07-07

Status: Closed
Project: MariaDB MaxScale
Component/s: galeramon
Affects Version/s: 2.4.10
Fix Version/s: 2.4.11

Type: Bug Priority: Major
Reporter: Valerii Kravchuk Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: crash, galeramon


 Description   

Galera monitor crashes with the following backtrace:

2020-07-01 23:10:11 alert : Fatal: MaxScale 2.4.10 received fatal signal 11. Commit ID: 7781f7042ab077811e2431794c2280162c0a6a3d System name: Linux Release string: Ubuntu 18.04.4 LTS
nm: /lib/x86_64-linux-gnu/libc.so.6: no symbols
2020-07-01 23:10:15 alert :
/usr/lib/x86_64-linux-gnu/maxscale/libgaleramon.so(_ZNK13GaleraMonitor16diagnostics_jsonEv+0x154): /usr/include/c++/7/bits/hashtable_policy.h:1291
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZNK8maxscale7Monitor7to_jsonEPKc+0x255): server/core/monitor.cc:673
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN14MonitorManager15monitor_to_jsonEPKN8maxscale7MonitorEPKc+0x5e): server/core/monitormanager.cc:469
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0xe11e3): server/core/resource.cc:626
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZNK8Resource4callERK11HttpRequest+0xb4): server/core/resource.cc:119
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0xe6f8c): server/core/resource.cc:1347 (discriminator 1)
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0xe8b3c): server/core/resource.cc:1387
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker14handle_messageERNS_12MessageQueueERKNS_19MessageQueueMessageE+0x6d): maxutils/maxbase/src/worker.cc:475
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase12MessageQueue18handle_poll_eventsEPNS_6WorkerEj+0x158): maxutils/maxbase/src/messagequeue.cc:306
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x1b6): maxutils/maxbase/src/worker.cc:858
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53): maxutils/maxbase/src/worker.cc:559
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd6df): ??:?
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db): ??:?
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f): ??:0



 Comments   
Comment by markus makela [ 2020-07-03 ]

The crash seems to happen here:

galeramon.cc

107
    for (auto ptr : servers())
108
    {
109
        auto it = m_info.find(ptr);
110
 
111
        if (it != m_info.end())
112
        {
113
            json_t* obj = json_object();

The strange thing is that it seems to be inside the std::unordered_map bucket selection code:

hashtable_policy.h

1286
      std::size_t
1287
      _M_bucket_index(const __node_type* __p, std::size_t __n) const
1288
        noexcept( noexcept(declval<const _H1&>()(declval<const _Key&>()))
1289
                  && noexcept(declval<const _H2&>()((__hash_code)0,
1290
                                                    (std::size_t)0)) )
1291
      { return _M_h2()(_M_h1()(_M_extract()(__p->_M_v())), __n); }

Could it be related to the hashing function being used?
The simplest answer turned out to be the real one: the m_info array was being concurrently used.

Generated at Thu Feb 08 04:18:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.