We are experiencing technical difficulties with the latest MariaDB 10.1.41-MariaDB.
This is only happening on one server while we have more with the same system package versions.
The database is freezing and does not accept new connections.
The error_log shows so much error data eg:
InnoDB: Warning: a long semaphore wait:
--Thread 140300680931072 has waited at dict0dict.cc line 984for241.00 seconds the semaphore:
Mutex at 0x7f9e26c112e8'&dict_sys->mutex', lock var 1
Last time reserved by thread 140300697716480 in file not yet reserved line 0, waiters flag 1
InnoDB: Warning: semaphore wait:
--Thread 140300680931072 has waited at dict0dict.cc line 984for241.00 seconds the semaphore:
Mutex at 0x7f9e26c112e8'&dict_sys->mutex', lock var 1
Last time reserved by thread 140300697716480 in file not yet reserved line 0, waiters flag 1
We can provide more error log data but not in a public.
- per Thiru: Unlikely that its caused by the changes in bb-10.3-thiru
- occuring only once == Attempts to replay that on actual 10.2 have a too low chance
https://jira.mariadb.org/browse/MDEV-20843
Matthias Leich
added a comment - - edited
Results of RQG testing on bb-10.2-thiru commit 0b91f74906c8dcbcc1dac486fcc66c1e9c0c603a
- > 1500 RQG tests were executed
There was some surprising low fraction of failing tests.
All asserts/crashes are already covered by open bugs in JIRA except one
- mysqld: sql/sql_list.h:684: void ilink::assert_linked(): Assertion `prev != 0 && next != 0' failed.
happening during shutdown of the server
- per Thiru: Unlikely that its caused by the changes in bb-10.3-thiru
- occuring only once == Attempts to replay that on actual 10.2 have a too low chance
https://jira.mariadb.org/browse/MDEV-20843
This is a welcome step to the right direction, but I think that this needs some more work.
First of all, the in_queue should not be stored in a bit-field that is shared with other bit-fields that are protected by a different mutex.
I would suggest to use bool, and to document the possible state transitions carefully. We might consider using atomic memory access.
Second, in 10.1, fts_optimize_init() is not adding tables to the queue, while in 10.2 it is doing that. I’d like to see a 10.1 patch that does this. It should also avoid the unnecessary use of std::vector.
Third, fts_optimize_remove_table() should assert !table->fts->in_queue in the end.
Marko Mäkelä
added a comment - This is a welcome step to the right direction, but I think that this needs some more work.
First of all, the in_queue should not be stored in a bit-field that is shared with other bit-fields that are protected by a different mutex.
I would suggest to use bool , and to document the possible state transitions carefully. We might consider using atomic memory access.
Second, in 10.1, fts_optimize_init() is not adding tables to the queue, while in 10.2 it is doing that. I’d like to see a 10.1 patch that does this. It should also avoid the unnecessary use of std::vector .
Third, fts_optimize_remove_table() should assert !table->fts->in_queue in the end.
At the end of fts_optimize_remove_table(), the fts_optimize_wq->mutex acquisition and release around the debug assertion should be inside ut_d(), to avoid unnecessary operations on the release build.
I saw a redundant sync_table = mem_heap_alloc(…) call whose result was immediately overwritten by {{sync_table=table;}
In fts_optimize_new_table() the assignment slot->running = false is redundant because of a preceding memset() call.
If fts_slots can be accessed by multiple threads, then we should extend some mutex hold time. It could be that it is only being accessed by a single thread.
Should we call fts_init_index() already on ha_innobase::open()? Otherwise, it seems that FTS-indexed columns could be updated before any fulltext search is performed (and ha_innobase::ft_init_ext() is called). Could that lead to some updates being missed by the fulltext indexes?
Finally, please check the following for differences in white-space or comments, and try to fix those:
diff -I^@@ <(git show origin/bb-10.1-thiru storage/innobase) <(git show origin/bb-10.1-thiru storage/xtradb/)
git show origin/bb-10.2-thiru|diff -^@@ - <(git show origin/bb-10.1-thiru storage/innobase)
Marko Mäkelä
added a comment - At the end of fts_optimize_remove_table() , the fts_optimize_wq->mutex acquisition and release around the debug assertion should be inside ut_d() , to avoid unnecessary operations on the release build.
I saw a redundant sync_table = mem_heap_alloc(…) call whose result was immediately overwritten by {{sync_table=table;}
In fts_optimize_new_table() the assignment slot->running = false is redundant because of a preceding memset() call.
If fts_slots can be accessed by multiple threads, then we should extend some mutex hold time. It could be that it is only being accessed by a single thread.
Should we call fts_init_index() already on ha_innobase::open() ? Otherwise, it seems that FTS-indexed columns could be updated before any fulltext search is performed (and ha_innobase::ft_init_ext() is called). Could that lead to some updates being missed by the fulltext indexes?
Finally, please check the following for differences in white-space or comments, and try to fix those:
diff -I^@@ <(git show origin/bb-10.1-thiru storage/innobase) <(git show origin/bb-10.1-thiru storage/xtradb/)
git show origin/bb-10.2-thiru|diff -^@@ - <(git show origin/bb-10.1-thiru storage/innobase)
Thanks, this looks OK. I made a suggestion to declare fts_optimize_wq) without static scope, to avoid having to add trivial non-inline accessor functions.
Marko Mäkelä
added a comment - Thanks, this looks OK. I made a suggestion to declare fts_optimize_wq ) without static scope, to avoid having to add trivial non- inline accessor functions.
So from my point of view the MDEV-20621 patch is ok.
Matthias Leich
added a comment -
I tested the tree bb-10.2-thiru commit ce813ca178e499ab2171978bf0140537cb9ca612 which contains
patches for the current MDEV.
There were no asserts/crashes which do not occur in actual
10.2 commit 28098420317bc2efe082df799c917babde879242
too.
So from my point of view the MDEV-20621 patch is ok.
People
Thirunarayanan Balathandayuthapani
Stevo
Votes:
0Vote for this issue
Watchers:
5Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":703.5,"ttfb":178.5,"pageVisibility":"visible","entityId":79059,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":1,"journeyId":"b89ea2c3-ddd2-4fea-8157-c6b6cb62e337","navigationType":0,"readyForUser":799.7999997138977,"redirectCount":0,"resourceLoadedEnd":859.7999997138977,"resourceLoadedStart":184.2999997138977,"resourceTiming":[{"duration":7.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":184.2999997138977,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":184.2999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":191.7999997138977,"responseStart":0,"secureConnectionStart":0},{"duration":7.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":184.59999990463257,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":184.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":192.09999990463257,"responseStart":0,"secureConnectionStart":0},{"duration":62.19999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":184.69999980926514,"connectEnd":184.69999980926514,"connectStart":184.69999980926514,"domainLookupEnd":184.69999980926514,"domainLookupStart":184.69999980926514,"fetchStart":184.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":184.69999980926514,"responseEnd":246.89999961853027,"responseStart":246.89999961853027,"secureConnectionStart":184.69999980926514},{"duration":109.60000038146973,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":184.89999961853027,"connectEnd":184.89999961853027,"connectStart":184.89999961853027,"domainLookupEnd":184.89999961853027,"domainLookupStart":184.89999961853027,"fetchStart":184.89999961853027,"redirectEnd":0,"redirectStart":0,"requestStart":184.89999961853027,"responseEnd":294.5,"responseStart":294.5,"secureConnectionStart":184.89999961853027},{"duration":113.40000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":185.19999980926514,"connectEnd":185.19999980926514,"connectStart":185.19999980926514,"domainLookupEnd":185.19999980926514,"domainLookupStart":185.19999980926514,"fetchStart":185.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":185.19999980926514,"responseEnd":298.59999990463257,"responseStart":298.59999990463257,"secureConnectionStart":185.19999980926514},{"duration":113.7000002861023,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":185.39999961853027,"connectEnd":185.39999961853027,"connectStart":185.39999961853027,"domainLookupEnd":185.39999961853027,"domainLookupStart":185.39999961853027,"fetchStart":185.39999961853027,"redirectEnd":0,"redirectStart":0,"requestStart":185.39999961853027,"responseEnd":299.09999990463257,"responseStart":299.09999990463257,"secureConnectionStart":185.39999961853027},{"duration":114.19999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":185.5,"connectEnd":185.5,"connectStart":185.5,"domainLookupEnd":185.5,"domainLookupStart":185.5,"fetchStart":185.5,"redirectEnd":0,"redirectStart":0,"requestStart":185.5,"responseEnd":299.69999980926514,"responseStart":299.69999980926514,"secureConnectionStart":185.5},{"duration":204.69999980926514,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":185.69999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":185.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":390.3999996185303,"responseStart":0,"secureConnectionStart":0},{"duration":114.30000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":185.7999997138977,"connectEnd":185.7999997138977,"connectStart":185.7999997138977,"domainLookupEnd":185.7999997138977,"domainLookupStart":185.7999997138977,"fetchStart":185.7999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":185.7999997138977,"responseEnd":300.09999990463257,"responseStart":300.09999990463257,"secureConnectionStart":185.7999997138977},{"duration":204.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":186,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":186,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":390.5,"responseStart":0,"secureConnectionStart":0},{"duration":114.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":186.19999980926514,"connectEnd":186.19999980926514,"connectStart":186.19999980926514,"domainLookupEnd":186.19999980926514,"domainLookupStart":186.19999980926514,"fetchStart":186.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":186.19999980926514,"responseEnd":300.69999980926514,"responseStart":300.69999980926514,"secureConnectionStart":186.19999980926514},{"duration":641,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":187.19999980926514,"connectEnd":187.19999980926514,"connectStart":187.19999980926514,"domainLookupEnd":187.19999980926514,"domainLookupStart":187.19999980926514,"fetchStart":187.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":187.19999980926514,"responseEnd":828.1999998092651,"responseStart":828.1999998092651,"secureConnectionStart":187.19999980926514},{"duration":636.9000000953674,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":191.69999980926514,"connectEnd":191.69999980926514,"connectStart":191.69999980926514,"domainLookupEnd":191.69999980926514,"domainLookupStart":191.69999980926514,"fetchStart":191.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":191.69999980926514,"responseEnd":828.5999999046326,"responseStart":828.5999999046326,"secureConnectionStart":191.69999980926514},{"duration":424.5,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":402,"connectEnd":402,"connectStart":402,"domainLookupEnd":402,"domainLookupStart":402,"fetchStart":402,"redirectEnd":0,"redirectStart":0,"requestStart":402,"responseEnd":826.5,"responseStart":826.5,"secureConnectionStart":402},{"duration":148.09999990463257,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":697.0999999046326,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":697.0999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":845.1999998092651,"responseStart":0,"secureConnectionStart":0},{"duration":16,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/e65b778d185daf5aee24936755b43da6/_/download/contextbatch/js/browser-metrics-plugin.contrib,-_super,-project.issue.navigator,-jira.view.issue,-atl.general/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":843.7999997138977,"connectEnd":843.7999997138977,"connectStart":843.7999997138977,"domainLookupEnd":843.7999997138977,"domainLookupStart":843.7999997138977,"fetchStart":843.7999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":843.7999997138977,"responseEnd":859.7999997138977,"responseStart":859.7999997138977,"secureConnectionStart":843.7999997138977}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":7,"responseStart":179,"responseEnd":184,"domLoading":182,"domInteractive":944,"domContentLoadedEventStart":944,"domContentLoadedEventEnd":989,"domComplete":1067,"loadEventStart":1067,"loadEventEnd":1067,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":908.5999999046326},{"name":"bigPipe.sidebar-id.end","time":909.2999997138977},{"name":"bigPipe.activity-panel-pipe-id.start","time":909.5},{"name":"bigPipe.activity-panel-pipe-id.end","time":912.3999996185303},{"name":"activityTabFullyLoaded","time":1005.2999997138977}],"measures":[],"correlationId":"4f08683c2cc849","effectiveType":"4g","downlink":9,"rtt":0,"serverDuration":109,"dbReadsTimeInMs":14,"dbConnsTimeInMs":23,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
Results of RQG testing on bb-10.2-thiru commit 0b91f74906c8dcbcc1dac486fcc66c1e9c0c603a
- > 1500 RQG tests were executed
There was some surprising low fraction of failing tests.
All asserts/crashes are already covered by open bugs in JIRA except one
- mysqld: sql/sql_list.h:684: void ilink::assert_linked(): Assertion `prev != 0 && next != 0' failed.
happening during shutdown of the server
- per Thiru: Unlikely that its caused by the changes in bb-10.3-thiru
- occuring only once == Attempts to replay that on actual 10.2 have a too low chance
https://jira.mariadb.org/browse/MDEV-20843