[MDEV-6005] MariaDB 10.0.10 crashing within 10 minutes on CentOS 6.5 (with UDF from lib_mysqludf_preg) Created: 2014-04-02  Updated: 2014-05-04  Resolved: 2014-05-04

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.10
Fix Version/s: 10.0.11

Type: Bug Priority: Minor
Reporter: Koen Crijns Assignee: Sergei Golubchik
Resolution: Won't Fix Votes: 0
Labels: jemalloc, udf
Environment:
  • Dell R810 server
  • CentOS 6.5 (fully update)
  • 2x Intel Xeon X7560 (2.26 GHz - 45 nm. Beckton) - 16 cores / 32 threads
  • 128 GB DDR3-1333 ECC
  • 5x Dell 146GB 15k rpm SAS (RAID 0+1 + hot spare) on Dell PERC H800

Attachments: Text File mariadb10-crashes.txt    

 Description   

We've upgraded our production server to MariaDB 10.0.10 today after succesfull tests on our development server.

On our production servers MariaDB crashes within 10 minutes. After crashing 7 times within 10 minutes we rolled back to 5.5. Since it's our production server, I can't do any more debugging.

I didn't see any specific query or any other specific thing that caused the crashes.

Please note that 5.5 runs rock solid on this server.

The MySQL error log part for three crashes is attached.

If you need more information, please let me know.



 Comments   
Comment by Elena Stepanova [ 2014-04-02 ]

Hi,

Does your development server have lib_mysqludf_preg.so, and of the same version, and use its functions?
Two of the three crashes that you attached happened in the library, and the third is likely to be related also.
So, I've installed the library (current code from git), and I am also getting a number of crashes with it. I will dig more, but for now the question is whether you really need it – and if so, maybe your development server has a better one?

Comment by Koen Crijns [ 2014-04-02 ]

Hi Elena,

Our development server has the exact same plugin. I just found out that MariaDB has builtin PCRE functionality that does exactly what we need (and a LOT faster). I'll try again tonight to install 10.0.10 on the production server without the plugin.

Comment by Elena Stepanova [ 2014-04-02 ]

With lib_mysqludf_preg from https://github.com/mysqludf/lib_mysqludf_preg.git, I'm getting different failures – crashes, oom and valgrind erros – on a release build (not exactly the same stack as provided, but it might be the matter of the query); but on the debug build, I'm mostly getting one assertion failure as below.
All bad things only happen on a server built with jemalloc (WITH_JEMALLOC=yes; same was used for release bintar). If I force WITH_JEMALLOC=no, no problems so far.

I'm not sure whether it's the problem of the udf, jemalloc, or MariaDB server.

CREATE FUNCTION preg_capture RETURNS STRING SONAME 'lib_mysqludf_preg.so';
select PREG_CAPTURE( '/(fox)/' , 'The brown fox' );

<jemalloc>: extra/jemalloc/include/jemalloc/internal/arena.h:761: Failed assertion: "((uintptr_t)ptr - ((uintptr_t)run + (uintptr_t)bin_info->reg0_offset)) % bin_info->reg_interval == 0"
 
#5  0x00007fb9d12966f0 in *__GI_abort () at abort.c:92
#6  0x0000000000ebf7c5 in jemalloc_internal_arena_ptr_small_binind_get (ptr=0x7fb9a4c125f0, mapbits=4225) at /home/elenst/bzr/10.0/extra/jemalloc/include/jemalloc/internal/arena.h:759
#7  0x0000000000ec040a in jemalloc_internal_arena_salloc (ptr=0x7fb9a4c125f0, demote=false) at /home/elenst/bzr/10.0/extra/jemalloc/include/jemalloc/internal/arena.h:990
#8  0x0000000000eb7c6a in jemalloc_internal_isalloc (ptr=0x7fb9a4c125f0, demote=false) at include/jemalloc/inter
nal/jemalloc_internal.h:865
#9  0x0000000000ebc3e1 in free (ptr=0x7fb9a4c125f0) at /home/elenst/bzr/10.0/extra/jemalloc/src/jemalloc.c:1267
#10 0x00007fb9d105d4a8 in pregMoveToReturnValues (initid=initid@entry=0x7fb9a4c13430, length=length@entry=0x7fb9d3150ec0, is_null=is_null@entry=0x7fb9d3150ecf "", error=error@entry=0x7fb9a4c13460 "", s=0x7fb9a4c125f0 "fox", s_len=<optimized out>) at preg.c:515
#11 0x00007fb9d105e5c6 in preg_capture (initid=0x7fb9a4c13430, args=0x7fb9a4c133f0, result=0x7fb9d3151070 "", length=0x7fb9d3150ec0, is_null=0x7fb9d3150ecf "", error=0x7fb9a4c13460 "") at lib_mysqludf_preg_capture.c:284
#12 0x00000000008cad57 in udf_handler::val_str (this=0x7fb9a4c133e0, str=0x7fb9d3151010, save_str=0x7fb9a4c13338) at /home/elenst/bzr/10.0/sql/item_func.cc:3719
#13 0x00000000008cb80f in Item_func_udf_str::val_str (this=0x7fb9a4c13320, str=0x7fb9d3151010) at /home/elenst/bzr/10.0/sql/item_func.cc:3913
#14 0x0000000000880329 in Item::send (this=0x7fb9a4c13320, protocol=0x7fb9b33bb5f8, buffer=0x7fb9d3151010) at /home/elenst/bzr/10.0/sql/item.cc:6595
#15 0x00000000005cdd28 in Protocol::send_result_set_row (this=0x7fb9b33bb5f8, row_items=0x7fb9b33bf578) at /home/elenst/bzr/10.0/sql/protocol.cc:900
#16 0x000000000063a60f in select_send::send_data (this=0x7fb9a4c13550, items=...) at /home/elenst/bzr/10.0/sql/sql_class.cc:2543
#17 0x00000000006adac4 in JOIN::exec_inner (this=0x7fb9a4c13570) at /home/elenst/bzr/10.0/sql/sql_select.cc:2441
#18 0x00000000006ad4e4 in JOIN::exec (this=0x7fb9a4c13570) at /home/elenst/bzr/10.0/sql/sql_select.cc:2355
#19 0x00000000006b087b in mysql_select (thd=0x7fb9b33bb070, rref_pointer_array=0x7fb9b33bf6d8, tables=0x0, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fb9a4c13550, unit=0x7fb9b33bed78, select_lex=0x7fb9b33bf460) at /home/elenst/bzr/10.0/sql/sql_select.cc:3293
#20 0x00000000006a6f93 in handle_select (thd=0x7fb9b33bb070, lex=0x7fb9b33becb0, result=0x7fb9a4c13550, setup_tables_done_option=0) at /home/elenst/bzr/10.0/sql/sql_select.cc:372
#21 0x000000000067bd51 in execute_sqlcom_select (thd=0x7fb9b33bb070, all_tables=0x0) at /home/elenst/bzr/10.0/sql/sql_parse.cc:5306
#22 0x000000000067411c in mysql_execute_command (thd=0x7fb9b33bb070) at /home/elenst/bzr/10.0/sql/sql_parse.cc:2590
#23 0x000000000067e4db in mysql_parse (thd=0x7fb9b33bb070, rawbuf=0x7fb9a4c13088 "select PREG_CAPTURE( '/(fox)/' , 'The brown fox' )", length=50, parser_state=0x7fb9d3152610) at /home/elenst/bzr/10.0/sql/sql_parse.cc:6452
#24 0x0000000000671294 in dispatch_command (command=COM_QUERY, thd=0x7fb9b33bb070, packet=0x7fb9b36a0071 "", packet_length=50) at /home/elenst/bzr/10.0/sql/sql_parse.cc:1308
#25 0x0000000000670636 in do_command (thd=0x7fb9b33bb070) at /home/elenst/bzr/10.0/sql/sql_parse.cc:1005
#26 0x000000000078b46e in do_handle_one_connection (thd_arg=0x7fb9b33bb070) at /home/elenst/bzr/10.0/sql/sql_connect.cc:1379
#27 0x000000000078b1c1 in handle_one_connection (arg=0x7fb9b33bb070) at /home/elenst/bzr/10.0/sql/sql_connect.cc:1293
#28 0x0000000000a30f90 in pfs_spawn_thread (arg=0x7fb9b33e6170) at /home/elenst/bzr/10.0/storage/perfschema/pfs.cc:1853
#29 0x00007fb9d2e30b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#30 0x00007fb9d133ba7d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

No failures on 5.5, even with jemalloc.

Comment by Koen Crijns [ 2014-04-03 ]

Update: I've disabled the lib_mysqludf_preg plugin and upgraded again from MariaDB 5.5.36 to MariaDB 10.0.10. So far running for 45 minutes without any crashes, so the problem seems to be solved for us.

(Offtopic: the performance difference between PREG_REPLACE from the UDF plugin and REGEXP_REPLACE from MariaDB 10.0.10 is incredible, also on our production box!)

Comment by Sergei Golubchik [ 2014-05-04 ]

This is a bug in lib_mysqludf_preg.
See the stack trace — everything is clear from there.
The memory is freed using the free() function in pregMoveToReturnValues(). But the memory was allocated in preg_capture() by pcre_get_substring(). The manpage for the latter function says

The memory in which the substring is placed is obtained by calling
pcre_malloc(). The convenience function pcre_free_substring() can be
used to free it when it is no longer needed.

So, one must use pcre_free_substring() and not free() in this case.

What really happens here — I suspect that the memory is allocated using system malloc, but passed to jemalloc for freeing. Thus the crash. Because jemalloc has provided its own free function, but pcre was loaded before that and pcre_malloc() calls the real system malloc.

A fix would be to use pcre_free_substring() as documented. Or set pcre_malloc() to use jemalloc. Or, better, both.

Generated at Thu Feb 08 07:08:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.