We have two servers running MariaDB with galera for replication. Every few weeks we get alerts that MariaDB has crashed with a segfault. We were on older 10.X versions MariaDB ( https://serverfault.com/questions/1016977/mariadb-crashing ) and had the same issues. I am not sure if it is a specific query that is causing MariaDB to crash or an issue elsewhere. Below is what I am seeing with a back trace
#25 0x000055d4793577d1 in do_handle_one_connection(CONNECT*) ()
#26 0x000055d47935789d in handle_one_connection ()
#27 0x00007f4378e4dea5 in start_thread () from /lib64/libpthread.so.0
#28 0x00007f4376fd98dd in clone () from /lib64/libc.so.6
Attached all the traces besides the backtrace (it's 4.3GB) and the hot name file. We do have all queries stored so we can pull them for the time of the crash if needed.
The stored procedure has get_lock() / release_lock() function calls, and these are not supported in galera replication (also noted in KB limitations). However, although not safe, they probably should nevertheless work in topologies where all writes go to same dedicated node, i.e. cluster write conflict would not happen. Dovid does your application/load balancer direct all writes to same node?
A crash is not a good reaction to use of non supported feature, so some work remains for fixing this bug. Preferably by rejecting the use of get_lock() & release_lock() functions. But, this does not help this application's use case. Dovid is it possible to change the stored procedure definition to not use these functions?
It could be possible to support locking functions as new feature, e.g. streaming replication might be helpful technology for it.
Seppo Jaakola
added a comment - The stored procedure has get_lock() / release_lock() function calls, and these are not supported in galera replication (also noted in KB limitations). However, although not safe, they probably should nevertheless work in topologies where all writes go to same dedicated node, i.e. cluster write conflict would not happen. Dovid does your application/load balancer direct all writes to same node?
A crash is not a good reaction to use of non supported feature, so some work remains for fixing this bug. Preferably by rejecting the use of get_lock() & release_lock() functions. But, this does not help this application's use case. Dovid is it possible to change the stored procedure definition to not use these functions?
It could be possible to support locking functions as new feature, e.g. streaming replication might be helpful technology for it.
Even though there is no answer to your question from the reporter, I assume you still want to handle the crash, so I'm not closing it as incomplete.
Elena Stepanova
added a comment - seppo ,
Even though there is no answer to your question from the reporter, I assume you still want to handle the crash, so I'm not closing it as incomplete.
#8 0x000055f1f65cd01d in my_hash_first (hash=hash@entry=0x145b54003068, key=key@entry=0x145bd0e47198 "\btest", length=<optimized out>, current_record=current_record@entry=0x145bd0e4712c) at /test/10.5_opt/mysys/hash.c:262
#9 0x000055f1f65cd035 in my_hash_search (hash=hash@entry=0x145b54003068, key=key@entry=0x145bd0e47198 "\btest", length=<optimized out>) at /test/10.5_opt/mysys/hash.c:235
#10 0x000055f1f607622a in Item_func_get_lock::val_int (this=0x145b54010b10) at /test/10.5_opt/sql/mdl.h:398
#11 0x000055f1f5f7122d in Type_handler::Item_send_long (this=<optimized out>, item=0x145b54010b10, protocol=0x145b540011a8, buf=<optimized out>) at /test/10.5_opt/sql/sql_type.cc:7487
#12 0x000055f1f5d2e810 in Protocol::send_result_set_row (this=this@entry=0x145b540011a8, row_items=row_items@entry=0x145b540105e8) at /test/10.5_opt/sql/protocol.cc:1083
#13 0x000055f1f5da2367 in select_send::send_data (this=0x145b54011518, items=@0x145b540105e8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x145b54010bf8, last = 0x145b54010bf8, elements = 1}, <No data fields>}) at /test/10.5_opt/sql/sql_class.cc:3081
#14 0x000055f1f5e62518 in select_result_sink::send_data_with_check (u=<optimized out>, sent=0, items=<optimized out>, this=<optimized out>) at /test/10.5_opt/sql/sql_class.h:5342
#16 JOIN::exec_inner (this=0x145b54011540) at /test/10.5_opt/sql/sql_select.cc:4384
#17 0x000055f1f5e62919 in JOIN::exec (this=this@entry=0x145b54011540) at /test/10.5_opt/sql/sql_select.cc:4296
#18 0x000055f1f5e608da in mysql_select (thd=0x145b54000c58, tables=0x0, fields=@0x145b540105e8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x145b54010bf8, last = 0x145b54010bf8, elements = 1}, <No data fields>}, conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=<optimized out>, result=0x145b54011518, unit=0x145b54004c40, select_lex=0x145b54010498) at /test/10.5_opt/sql/sql_select.cc:4773
#19 0x000055f1f5e612c7 in handle_select (thd=thd@entry=0x145b54000c58, lex=lex@entry=0x145b54004b78, result=result@entry=0x145b54011518, setup_tables_done_option=setup_tables_done_option@entry=0) at /test/10.5_opt/sql/sql_select.cc:444
#20 0x000055f1f5deda31 in execute_sqlcom_select (thd=0x145b54000c58, all_tables=0x0) at /test/10.5_opt/sql/sql_parse.cc:6314
#21 0x000055f1f5dfc79b in mysql_execute_command (thd=0x145b54000c58) at /test/10.5_opt/sql/sql_parse.cc:4005
#22 0x000055f1f5de7fbf in mysql_parse (thd=thd@entry=0x145b54000c58, rawbuf=rawbuf@entry=0x145b54010400 "SELECT get_lock ('test', 1.5)", length=length@entry=29, parser_state=parser_state@entry=0x145bd0e48410, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.5_opt/sql/sql_parse.cc:8100
#23 0x000055f1f5de7729 in wsrep_mysql_parse (thd=0x145b54000c58, rawbuf=0x145b54010400 "SELECT get_lock ('test', 1.5)", length=29, parser_state=0x145bd0e48410, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /test/10.5_opt/sql/sql_parse.cc:7903
#24 0x000055f1f5df684a in dispatch_command (command=COM_QUERY, thd=0x145b54000c58, packet=<optimized out>, packet_length=<optimized out>, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /test/10.5_opt/sql/sql_class.h:1290
#25 0x000055f1f5df772c in do_command (thd=0x145b54000c58) at /test/10.5_opt/sql/sql_parse.cc:1370
#26 0x000055f1f5eff631 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55f1f824b448, put_in_cache=put_in_cache@entry=true) at /test/10.5_opt/sql/sql_connect.cc:1418
#27 0x000055f1f5effaad in handle_one_connection (arg=arg@entry=0x55f1f824b448) at /test/10.5_opt/sql/sql_connect.cc:1312
#28 0x000055f1f6291cef in pfs_spawn_thread (arg=0x55f1f8262d98) at /test/10.5_opt/storage/perfschema/pfs.cc:2201
#29 0x0000145be14dc609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#30 0x0000145be10ca293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
It looks that root cause of this segfault is related to https://jira.mariadb.org/browse/MDEV-27713 , current fix is to handle properly clean thread ull structures after BF abort is triggered.
Please retest with fix found on related ticket.
Mario Karuza (Inactive)
added a comment - It looks that root cause of this segfault is related to https://jira.mariadb.org/browse/MDEV-27713 , current fix is to handle properly clean thread ull structures after BF abort is triggered.
Please retest with fix found on related ticket.
giseong choi
added a comment - - edited I encountered the same issue.
Here is my case:
MariaDB 10.5.13
Debian 11
Galera 4.10
Steps to reproduce:
On node 1:
START TRANSACTION ;
SELECT GET_LOCK( 'name' , 5);
Exit.
The database crashes with signal 11.
This issue has been fixed since this commit .
People
Jan Lindström (Inactive)
Dovid Bender
Votes:
0Vote for this issue
Watchers:
10Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1319.6000000238419,"ttfb":596.1000000238419,"pageVisibility":"visible","entityId":93568,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"bb81d48e-4bb1-456d-bde3-1c2fa9650b96","navigationType":0,"readyForUser":1474.1000000238419,"redirectCount":0,"resourceLoadedEnd":1691.9000000953674,"resourceLoadedStart":602.6000000238419,"resourceTiming":[{"duration":74,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":602.6000000238419,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":602.6000000238419,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":676.6000000238419,"responseStart":0,"secureConnectionStart":0},{"duration":74,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":603,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":603,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":677,"responseStart":0,"secureConnectionStart":0},{"duration":144.60000002384186,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":603.1000000238419,"connectEnd":603.1000000238419,"connectStart":603.1000000238419,"domainLookupEnd":603.1000000238419,"domainLookupStart":603.1000000238419,"fetchStart":603.1000000238419,"redirectEnd":0,"redirectStart":0,"requestStart":603.1000000238419,"responseEnd":747.7000000476837,"responseStart":747.7000000476837,"secureConnectionStart":603.1000000238419},{"duration":209.10000002384186,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":603.3000000715256,"connectEnd":603.3000000715256,"connectStart":603.3000000715256,"domainLookupEnd":603.3000000715256,"domainLookupStart":603.3000000715256,"fetchStart":603.3000000715256,"redirectEnd":0,"redirectStart":0,"requestStart":603.3000000715256,"responseEnd":812.4000000953674,"responseStart":812.4000000953674,"secureConnectionStart":603.3000000715256},{"duration":213.59999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":603.4000000953674,"connectEnd":603.4000000953674,"connectStart":603.4000000953674,"domainLookupEnd":603.4000000953674,"domainLookupStart":603.4000000953674,"fetchStart":603.4000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":603.4000000953674,"responseEnd":817,"responseStart":817,"secureConnectionStart":603.4000000953674},{"duration":213.79999995231628,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":603.7000000476837,"connectEnd":603.7000000476837,"connectStart":603.7000000476837,"domainLookupEnd":603.7000000476837,"domainLookupStart":603.7000000476837,"fetchStart":603.7000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":603.7000000476837,"responseEnd":817.5,"responseStart":817.5,"secureConnectionStart":603.7000000476837},{"duration":216.19999992847443,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":603.9000000953674,"connectEnd":603.9000000953674,"connectStart":603.9000000953674,"domainLookupEnd":603.9000000953674,"domainLookupStart":603.9000000953674,"fetchStart":603.9000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":603.9000000953674,"responseEnd":820.1000000238419,"responseStart":820.1000000238419,"secureConnectionStart":603.9000000953674},{"duration":315.40000009536743,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":604,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":604,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":919.4000000953674,"responseStart":0,"secureConnectionStart":0},{"duration":216.89999997615814,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":604.2000000476837,"connectEnd":604.2000000476837,"connectStart":604.2000000476837,"domainLookupEnd":604.2000000476837,"domainLookupStart":604.2000000476837,"fetchStart":604.2000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":604.2000000476837,"responseEnd":821.1000000238419,"responseStart":821.1000000238419,"secureConnectionStart":604.2000000476837},{"duration":315.59999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":604.4000000953674,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":604.4000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":920,"responseStart":0,"secureConnectionStart":0},{"duration":217.39999997615814,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":604.6000000238419,"connectEnd":604.6000000238419,"connectStart":604.6000000238419,"domainLookupEnd":604.6000000238419,"domainLookupStart":604.6000000238419,"fetchStart":604.6000000238419,"redirectEnd":0,"redirectStart":0,"requestStart":604.6000000238419,"responseEnd":822,"responseStart":822,"secureConnectionStart":604.6000000238419},{"duration":979.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":611.8000000715256,"connectEnd":611.8000000715256,"connectStart":611.8000000715256,"domainLookupEnd":611.8000000715256,"domainLookupStart":611.8000000715256,"fetchStart":611.8000000715256,"redirectEnd":0,"redirectStart":0,"requestStart":611.8000000715256,"responseEnd":1591.3000000715256,"responseStart":1591.3000000715256,"secureConnectionStart":611.8000000715256},{"duration":1080.1000000238419,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":611.8000000715256,"connectEnd":611.8000000715256,"connectStart":611.8000000715256,"domainLookupEnd":611.8000000715256,"domainLookupStart":611.8000000715256,"fetchStart":611.8000000715256,"redirectEnd":0,"redirectStart":0,"requestStart":611.8000000715256,"responseEnd":1691.9000000953674,"responseStart":1691.9000000953674,"secureConnectionStart":611.8000000715256},{"duration":635.7999999523163,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":955.9000000953674,"connectEnd":955.9000000953674,"connectStart":955.9000000953674,"domainLookupEnd":955.9000000953674,"domainLookupStart":955.9000000953674,"fetchStart":955.9000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":955.9000000953674,"responseEnd":1591.7000000476837,"responseStart":1591.7000000476837,"secureConnectionStart":955.9000000953674}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":383,"responseStart":596,"responseEnd":602,"domLoading":601,"domInteractive":1753,"domContentLoadedEventStart":1753,"domContentLoadedEventEnd":1821,"domComplete":2700,"loadEventStart":2700,"loadEventEnd":2700,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1715.9000000953674},{"name":"bigPipe.sidebar-id.end","time":1716.9000000953674},{"name":"bigPipe.activity-panel-pipe-id.start","time":1717},{"name":"bigPipe.activity-panel-pipe-id.end","time":1720.6000000238419},{"name":"activityTabFullyLoaded","time":1842.8000000715256}],"measures":[],"correlationId":"a9f316e9899695","effectiveType":"4g","downlink":9.4,"rtt":0,"serverDuration":124,"dbReadsTimeInMs":18,"dbConnsTimeInMs":27,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
The stored procedure has get_lock() / release_lock() function calls, and these are not supported in galera replication (also noted in KB limitations). However, although not safe, they probably should nevertheless work in topologies where all writes go to same dedicated node, i.e. cluster write conflict would not happen. Dovid does your application/load balancer direct all writes to same node?
A crash is not a good reaction to use of non supported feature, so some work remains for fixing this bug. Preferably by rejecting the use of get_lock() & release_lock() functions. But, this does not help this application's use case. Dovid is it possible to change the stored procedure definition to not use these functions?
It could be possible to support locking functions as new feature, e.g. streaming replication might be helpful technology for it.