[MDEV-10085] mysql process disappears in case of insufficient memory for Connect Engine request Created: 2016-05-19  Updated: 2020-11-06

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - Connect
Affects Version/s: 10.1.14
Fix Version/s: 10.1

Type: Bug Priority: Major
Reporter: Sergey Antonyuk Assignee: Olivier Bertrand
Resolution: Unresolved Votes: 1
Labels: None
Environment:

3.2.0-4-amd64 #1 SMP Debian 3.2.35-2 x86_64 GNU/Linux


Attachments: Text File LOG-MDEV-10085.txt     File my.cnf    

 Description   

The system executes Connect Engine queries and sometimes the following syslog messages appears:

mysqld: GetTDB: Not enough memory in Work area for request of 352 (used=67108816 free=48)
mysqld: GetTDB: Not enough memory in Work area for request of 56 (used=67108864 free=0)

After that mysql process disappears and then starts crash recovery. There are no segfault messages etc.

My opinion is that in the case of lack of memory SQL-query can fail, but the database should not be affected.



 Comments   
Comment by Elena Stepanova [ 2016-05-27 ]

Please check your system logs, the process is likely to be killed by the system OOM killer.

Comment by Sergey Antonyuk [ 2016-05-27 ]

There are no any logs. It was not killed by OOM killer.

Comment by Elena Stepanova [ 2016-05-27 ]

Sergey.Antonyuk,
What do you mean by "there are no any logs"? There are always system logs, and plenty of them at that; and OOM killer's message is short and shy and easy to miss.
Everything is possible, but so far I don't remember ever having reports about MariaDB committing a totally silent suicide, without saying anything at all in its error log. OOM killer, on the other hand, is notoriously known to cause exactly that. So please, double-check.

Comment by Sergey Antonyuk [ 2016-05-27 ]

Elena, I've attached the log file from our system, please take a look.

Comment by Elena Stepanova [ 2016-05-28 ]

Sergey.Antonyuk,

Thanks.
Is the lack of memory real, I mean, does the system actually run out of memory, or is the error entirely bogus?
Can you give any hint on which kinds of Connect tables and queries on them you are using?

Please also attach your cnf file(s).

Comment by Sergey Antonyuk [ 2016-05-30 ]

The lack of memory is not real. System has 4.5 gb of free memory + 16 gb swap.
The problem appears when I deliberately put too low value of connect_work_size parameter (4Mb) and problem doesn't appear on value more than 1Gb.

Please find my.cnf file attached.

I'll try to find a scenario to reproduce the problem.

Comment by Elena Stepanova [ 2016-05-30 ]

Thanks.
With the reduced connect_work_size value I'm getting the error itself rather easily, but it doesn't make the server disappear:

MariaDB [serverB]> insert into t2 select seq, seq, seq, seq, seq, seq, seq, seq, seq from seq_1_to_10000;
ERROR 1296 (HY000): Got error 122 'Not enough memory in Work area for request of 88 (used=4194264 free=40)' from CONNECT
MariaDB [serverB]> select 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

Maybe it has something to do with wheezy and/or our wheezy builds, I'll try there.

Comment by Elena Stepanova [ 2016-05-30 ]

Assigning to bertrandop – I think it should be much easier to track the problem through the code than to reproduce it on the client level. The comments around allocation suggest that it's callers' responsibility to check the returned value, apparently it doesn't happen everywhere.

I can get all kinds of crashes if I inject the failure in the allocation artificially, the only difference is that I am not getting silent crashes but "normal" SIGSEGVs instead. I don't know how the silent crash was achieved, maybe it's something specific to the machine.

Sergey.Antonyuk, just in the hope for clarity in this matter, in case you can afford crashing the server once more, could you please kill it with SIGSEGV (kill 11) and see if it prints anything in the error log?

Comment by Sergey Antonyuk [ 2016-05-31 ]

kill -11 leads to the following messages in the error log:

May 31 10:39:54 host mysqld: 160531 10:39:54 [ERROR] mysqld got signal 11 ;
May 31 10:39:54 host mysqld: This could be because you hit a bug. It is also possible that this binary
May 31 10:39:54 host mysqld: or one of the libraries it was linked against is corrupt, improperly built,
May 31 10:39:54 host mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
May 31 10:39:54 host mysqld:
May 31 10:39:54 host mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
May 31 10:39:54 host mysqld:
May 31 10:39:54 host mysqld: We will try our best to scrape up some info that will hopefully help
May 31 10:39:54 host mysqld: diagnose the problem, but since we have already crashed,
May 31 10:39:54 host mysqld: something is definitely wrong and this may fail.
May 31 10:39:54 host mysqld:
May 31 10:39:54 host mysqld: Server version: 10.1.14-MariaDB-1~wheezy
May 31 10:39:54 host mysqld: key_buffer_size=33554432
May 31 10:39:54 host mysqld: read_buffer_size=131072
May 31 10:39:54 host mysqld: max_used_connections=99
May 31 10:39:54 host mysqld: max_threads=302
May 31 10:39:54 host mysqld: thread_count=29
May 31 10:39:54 host mysqld: It is possible that mysqld could use up to
May 31 10:39:54 host mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 386837 K bytes of memory
May 31 10:39:54 host mysqld: Hope that's ok; if not, decrease some variables in the equation.
May 31 10:39:54 host mysqld:
May 31 10:39:54 host mysqld: Thread pointer: 0x0x0
May 31 10:39:54 host mysqld: Attempting backtrace. You can use the following information to find out
May 31 10:39:54 host mysqld: where mysqld died. If you see no messages after this, something went
May 31 10:39:54 host mysqld: terribly wrong...
May 31 10:39:54 host mysqld: stack_bottom = 0x0 thread_stack 0x80000
May 31 10:39:54 host mysqld: (my_addr_resolve failure: fork)
May 31 10:39:54 host mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2b) [0x7f240bbf370b]
May 31 10:39:54 host mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x475) [0x7f240b752235]
May 31 10:39:54 host mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7f240ad440a0]
May 31 10:39:54 host mysqld: /lib/x86_64-linux-gnu/libc.so.6(__poll+0x53) [0x7f24092126b3]
May 31 10:39:54 host mysqld: /usr/sbin/mysqld(handle_connections_sockets()+0x1ca) [0x7f240b5509da]
May 31 10:39:54 host mysqld: /usr/sbin/mysqld(mysqld_main(int, char**)+0x2a44) [0x7f240b558754]
May 31 10:39:54 host mysqld: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f240915dead]
May 31 10:39:54 host mysqld: /usr/sbin/mysqld(+0x3d8afd) [0x7f240b54bafd]
May 31 10:39:54 host mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
May 31 10:39:54 host mysqld: information that should help you find out what is causing the crash.

Comment by Olivier Bertrand [ 2016-06-01 ]

I am away from home and will look at that to morrow. Could you provide the exact scenario leading to this error? Thanks.

Generated at Thu Feb 08 07:39:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.