[MXS-5063] Maxscale crash - "terminate called after throwing an instance of 'std::bad_alloc'" - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 23.02.9, 23.08.5
Fix Version/s: 23.02.10, 23.08.6, 24.02.2
Component/s: Core
Labels:
None

Description

Maxscale crashed during company's out of hours, issue handled by VRRP / standby. We don't have GDB or anything installed on this server so this is all that's available right now.

terminate called after throwing an instance of 'std::bad_alloc'

  what():  std::bad_alloc

2024-04-14 20:37:42   alert  : (35462108) (SS); MaxScale 23.08.5 received fatal signal 6. Commit ID: 4bb6b679d2be75ba84226baea656db7a0f330eb4, System name: Linux, Release string: CentOS Linux 7 (Core), Thread: Worker-09

2024-04-14 20:37:42   alert  : (35462108) (SS); Statement currently being classified: none/unknown

2024-04-14 20:37:42   alert  : (35462108) (SS); Session: 35462108 Service: SS

2024-04-14 20:37:42   notice : (35462108) (SS); For a more detailed stacktrace, install GDB and add 'debug=gdb-stacktrace' under the [maxscale] section.

  /lib64/libc.so.6(gsignal+0x37) [0x7f95046b8387]: <binutils not installed>

  /lib64/libc.so.6(abort+0x148) [0x7f95046b9a78]: <binutils not installed>

  /lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x165) [0x7f9505913a95]: <binutils not installed>

  /lib64/libstdc++.so.6(+0x5ea06) [0x7f9505911a06]: <binutils not installed>

  /lib64/libstdc++.so.6(+0x5ea33) [0x7f9505911a33]: <binutils not installed>

  /lib64/libstdc++.so.6(+0x5ec53) [0x7f9505911c53]: <binutils not installed>

  /lib64/libstdc++.so.6(_Znwm+0x7d) [0x7f95059121ed]: <binutils not installed>

  /lib64/libstdc++.so.6(_Znam+0x9) [0x7f9505912289]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN10SHARED_BUFC1Em+0x15) [0x7f95078abc45]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN5GWBUF16prepare_to_writeEm+0x1b8) [0x7f95078abf48]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN5GWBUF6appendEPKhm+0x32) [0x7f95078abfd2]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN5GWBUF10merge_backEOS_+0x34) [0x7f95078ac324]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB13writeq_appendEO5GWBUF+0x18) [0x7f9507913968]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN23MariaDBClientConnection11clientReplyEO5GWBUFRKN8maxscale10ReplyRouteERKNS2_5ReplyE+0xa8) [0x7f9507a43fe8]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN15ServiceEndpoint13send_upstreamEO5GWBUFRKN8maxscale10ReplyRouteERKNS2_5ReplyE+0x31) [0x7f95079d3911]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN15ServiceEndpoint15ServiceUpstream11clientReplyEO5GWBUFRKN8maxscale10ReplyRouteERKNS3_5ReplyE+0xd) [0x7f95079e637d]: <binutils not installed>

  /usr/lib64/maxscale/libreadwritesplit.so(_ZN14RWSplitSession11clientReplyEO5GWBUFRKN8maxscale10ReplyRouteERKNS2_5ReplyE+0x5a5) [0x7f9507f51db5]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN15ServiceEndpoint11clientReplyEO5GWBUFRKN8maxscale10ReplyRouteERKNS2_5ReplyE+0x99) [0x7f95079d2be9]: <binutils

 not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN14ServerEndpoint11clientReplyEO5GWBUFRKN8maxscale10ReplyRouteERKNS2_5ReplyE+0xf4) [0x7f95079bddd4]: <binutils not installed>

    /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN24MariaDBBackendConnection11normal_readEv+0xdc9) [0x7f9507a3cd89]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN24MariaDBBackendConnection17ready_for_readingEP3DCB+0x140) [0x7f9507a405f0]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB14process_eventsEj+0x1f5) [0x7f95079150d5]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB13event_handlerEj+0x21) [0x7f9507915331]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB13FakeEventTask7executeERN7maxbase6WorkerE+0xb6) [0x7f9507915996]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker14handle_messageERNS_12MessageQueueERKNS_19MessageQueueMessageE+0xe1) [0x7f9507a8d651]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase17EventMessageQueue18handle_poll_eventsEPNS_6WorkerEjNS_8Pollable7ContextE+0x73) [0x7f9507a930e3]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker14deliver_eventsEPNS_8PollableEjNS1_7ContextE+0x23) [0x7f9507a90113]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x35f) [0x7f9507a9062f]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53) [0x7f9507a90993]: <binutils not installed>

  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x4977c0) [0x7f9507b5f7c0]: <binutils not installed>

  /lib64/libpthread.so.0(+0x7ea5) [0x7f950667aea5]: <binutils not installed>

  /lib64/libc.so.6(clone+0x6d) [0x7f9504780b0d]: <binutils not installed>

2024-04-14 20:37:42   notice : (35462108) (SS); GDB not found, attempting to dump stacktraces from all threads using internal profiler...

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Before crash.png
46 kB
2024-04-18 09:05
image-2024-04-15-10-56-15-270.png
62 kB
2024-04-15 09:55
image-2024-04-15-12-08-50-654.png
44 kB
2024-04-15 11:08
maxscale_config.txt
1 kB
2024-04-15 09:53
maxscale02.png
57 kB
2024-04-15 11:06
memleak.log
137 kB
2024-04-15 13:27
memleak-20-mins.log
1.64 MB
2024-04-15 14:39
memleak-during-big-query.log
1.63 MB
2024-04-16 08:47
memory-spike-after-bypassing-maxscale.png
115 kB
2024-04-16 08:13
pmm-memory-free-graph-12hrs.png
45 kB
2024-04-16 08:25
screenshot-1.png
55 kB
2024-04-15 11:06
screenshot-2.png
71 kB
2024-04-15 14:22

Activity

Ascending order - Click to sort in descending order

View 16 older comments

Phil Sumner added a comment - 2024-04-16 08:48 - edited

This graph shows the memory spikes before and after I switched one of these table sync queries not to use maxscale late on the 14th. One of the two servers is still running these queries and I'll run memleak for the whole duration of it when it next runs at 0830 GMT. The spikes are still there but much reduced, so it definitely looks like these are the cause.

This PMM graph shows the same pattern but also a gradual drop in free memory is clear, though at this point it's not at concerning levels and might level off over time.

I've attached memleak-during-big-query.log which was running as this query started. I watched the maxscale server free memory graph drop as soon as it connected and for some reason (I think network issue) it failed, so the free memory after a few minutes rather than after the usual half hour.

Phil Sumner added a comment - 2024-04-16 08:48 - edited This graph shows the memory spikes before and after I switched one of these table sync queries not to use maxscale late on the 14th. One of the two servers is still running these queries and I'll run memleak for the whole duration of it when it next runs at 0830 GMT. The spikes are still there but much reduced, so it definitely looks like these are the cause. This PMM graph shows the same pattern but also a gradual drop in free memory is clear, though at this point it's not at concerning levels and might level off over time. I've attached memleak-during-big-query.log which was running as this query started. I watched the maxscale server free memory graph drop as soon as it connected and for some reason (I think network issue) it failed, so the free memory after a few minutes rather than after the usual half hour.

markus makela added a comment - 2024-04-16 09:22

Yes, I think this pretty much confirms that it's a result that's being fully read from the network:

	501612544 bytes in 7654 allocations from stack

		operator new(unsigned long)+0x1d [libstdc++.so.6.0.19]

		GWBUF::prepare_to_write(unsigned long)+0x78 [libmaxscale-common.so.1.0.0]

		DCB::socket_read(unsigned long, DCB::ReadLimit)+0x42 [libmaxscale-common.so.1.0.0]

		DCB::read_impl(unsigned long, unsigned long, DCB::ReadLimit)+0x41b [libmaxscale-common.so.1.0.0]

		DCB::read(unsigned long, unsigned long)+0x15 [libmaxscale-common.so.1.0.0]

		MariaDBBackendConnection::normal_read()+0xa0 [libmaxscale-common.so.1.0.0]

		MariaDBBackendConnection::ready_for_reading(DCB*)+0x140 [libmaxscale-common.so.1.0.0]

		DCB::process_events(unsigned int)+0x1f5 [libmaxscale-common.so.1.0.0]

		DCB::event_handler(unsigned int)+0x21 [libmaxscale-common.so.1.0.0]

		maxbase::Worker::deliver_events(maxbase::Pollable*, unsigned int, maxbase::Pollable::Context)+0x23 [libmaxscale-common.so.1.0.0]

		maxbase::Worker::poll_waitevents()+0x35f [libmaxscale-common.so.1.0.0]

		maxbase::Worker::run(maxbase::Semaphore*)+0x53 [libmaxscale-common.so.1.0.0]

		execute_native_thread_routine+0x10 [libmaxscale-common.so.1.0.0]

		[unknown]

		std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(maxbase::Worker*, maxbase::Semaphore*), maxbase::Worker*, maxbase::Semaphore*> > >::~_State_impl()+0x0 [libmaxscale-common.so.1.0.0]

		[unknown]

That's just shy of 500MiB of memory and the size of the spike is pretty close to that.

markus makela added a comment - 2024-04-16 09:22 Yes, I think this pretty much confirms that it's a result that's being fully read from the network: 501612544 bytes in 7654 allocations from stack operator new(unsigned long)+0x1d [libstdc++.so.6.0.19] GWBUF::prepare_to_write(unsigned long)+0x78 [libmaxscale-common.so.1.0.0] DCB::socket_read(unsigned long, DCB::ReadLimit)+0x42 [libmaxscale-common.so.1.0.0] DCB::read_impl(unsigned long, unsigned long, DCB::ReadLimit)+0x41b [libmaxscale-common.so.1.0.0] DCB::read(unsigned long, unsigned long)+0x15 [libmaxscale-common.so.1.0.0] MariaDBBackendConnection::normal_read()+0xa0 [libmaxscale-common.so.1.0.0] MariaDBBackendConnection::ready_for_reading(DCB*)+0x140 [libmaxscale-common.so.1.0.0] DCB::process_events(unsigned int)+0x1f5 [libmaxscale-common.so.1.0.0] DCB::event_handler(unsigned int)+0x21 [libmaxscale-common.so.1.0.0] maxbase::Worker::deliver_events(maxbase::Pollable*, unsigned int, maxbase::Pollable::Context)+0x23 [libmaxscale-common.so.1.0.0] maxbase::Worker::poll_waitevents()+0x35f [libmaxscale-common.so.1.0.0] maxbase::Worker::run(maxbase::Semaphore*)+0x53 [libmaxscale-common.so.1.0.0] execute_native_thread_routine+0x10 [libmaxscale-common.so.1.0.0] [unknown] std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(maxbase::Worker*, maxbase::Semaphore*), maxbase::Worker*, maxbase::Semaphore*> > >::~_State_impl()+0x0 [libmaxscale-common.so.1.0.0] [unknown] That's just shy of 500MiB of memory and the size of the spike is pretty close to that.

markus makela added a comment - 2024-04-16 09:44

OK, I managed to reproduce it by just running a query that returns a very large resultset and reading it very slowly. This does not always trigger the problem so I suspect there might be an underlying bug that was simply revealed by the scheduling fairness change.

markus makela added a comment - 2024-04-16 09:44 OK, I managed to reproduce it by just running a query that returns a very large resultset and reading it very slowly. This does not always trigger the problem so I suspect there might be an underlying bug that was simply revealed by the scheduling fairness change.

markus makela added a comment - 2024-04-17 06:19

I believe that the culprit is indeed this commit:

commit daa7c34f43523ca642ec2eb625f15d02f1f72592

Author: Markus Mäkelä <markus.makela@mariadb.com>

Date:   Wed Jan 10 10:51:54 2024 +0200

    MXS-4924: Throttle fake events

    Once two fake events have been generated by the same DCB, the third one

    will be sent via the event loop. This will prevent a single DCB from

    repeatedly triggering fake events.

The code in 22.0 and 6.4 never limits the amount of data read from the network and always empties the socket. I do believe that it is less likely for this to happen in 6.4 than it is in 23.02 where the network reads are capped to writeq_high_water which by default is 64KiB.

markus makela added a comment - 2024-04-17 06:19 I believe that the culprit is indeed this commit: commit daa7c34f43523ca642ec2eb625f15d02f1f72592 Author: Markus Mäkelä <markus.makela@mariadb.com> Date: Wed Jan 10 10:51:54 2024 +0200 MXS-4924: Throttle fake events Once two fake events have been generated by the same DCB, the third one will be sent via the event loop. This will prevent a single DCB from repeatedly triggering fake events. The code in 22.0 and 6.4 never limits the amount of data read from the network and always empties the socket. I do believe that it is less likely for this to happen in 6.4 than it is in 23.02 where the network reads are capped to writeq_high_water which by default is 64KiB.

markus makela added a comment - 2024-04-17 08:18 - edited

I think that increasing the value of writeq_high_water to be larger than the TCP network buffer size could help avoid this from happening as often. The default 64KiB limit in 22.08 and all newer versions combined with the limited reads in 23.02 will cause this bug to be hit with all larger selects. Increasing it to the default in 6.4 (writeq_high_water=16MiB) may help avoid it but it still won't prevent it from happening.

markus makela added a comment - 2024-04-17 08:18 - edited I think that increasing the value of writeq_high_water to be larger than the TCP network buffer size could help avoid this from happening as often. The default 64KiB limit in 22.08 and all newer versions combined with the limited reads in 23.02 will cause this bug to be hit with all larger selects. Increasing it to the default in 6.4 ( writeq_high_water=16MiB ) may help avoid it but it still won't prevent it from happening.

MariaDB MaxScale

Maxscale crash - "terminate called after throwing an instance of 'std::bad_alloc'"

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration