Purge thread does take shared lock on innodb dictionary lock while processing the undo log record to avoid the dropping of table. But it also blocks DDL for
the InnoDB. There are few issues exist for virtual column computation.
Because purge thread acquires mdl lock for virtual column computation and could
have deadlock with DDL. (fixed in 10.2+)
Allow InnoDB background thread to take MDL lock on the table. In that case, it blocks DDL only for that table.
For FOREIGN KEY constraint checks, we would prefer not to acquire dict_operation_lock S-latch, and rely on the correct acquisition of MDL on the SQL layer (to be covered by MDEV-21175).
In `row_update_for_mysql()`, there is no need to take data dictionary lock to initialize fts_doc_id. Because marko mentioned that SQL layer takes MDL lock when foreign key
is involved.
Thirunarayanan Balathandayuthapani
added a comment - In `row_update_for_mysql()`, there is no need to take data dictionary lock to initialize fts_doc_id. Because marko mentioned that SQL layer takes MDL lock when foreign key
is involved.
Matthias Leich
added a comment -
MDEV-16678_1.tgz - Archive with files for replaying the problem
mysqld: storage/innobase/fts/fts0fts.cc:4290: dberr_t fts_sync(fts_sync_t*, bool, bool): Assertion `sync->unlock_cache' failed.
How to install and run:
git clone https://github.com/mleich1/rqg --branch experimental RQG_mleich1
cd RQG_mleich1
tar xvzf <path_to>/MDEV-16678_1.tgz
./MDEV-16678_1.sh <path to MariaDB binaries>
Matthias Leich
added a comment -
There are not that rare RQG runs which end up with
DEADLOCK of threads detected!
....
[ERROR] [FATAL] InnoDB: ######################################## Deadlock Detected!
[ERROR] mysqld got signal 6 ;
I am working on some simplified replay testcase.
I tried the same test on 10.5. 10 attempts but no replay.
Matthias Leich
added a comment - - edited
MDEV-16678_2.test - MTR based test which throws
[ERROR] [FATAL] InnoDB: ######################################## Deadlock Detected!
I tried the same test on 10.5. 10 attempts but no replay.
https://jira.mariadb.org/browse/MDEV-20038 Problem is in 10.n too
- frequent [ERROR] InnoDB: Table test/t4 contains 7 indexes inside InnoDB, which is different from the number of indexes 8 defined in the MariaDB
which is surprising because the test does not invoke crash recovery
- frequent test fails where RQG means to have met a server freeze/deadlock or the server
did not shut down properly
There is a good probability of false alarm by RQG + these effects are known to be in 10.5 too.
Matthias Leich
added a comment - - edited
bb-10.5-MDEV-16678-rebase2 commit commit 6333bd7b334b821d9688b5eee4e79066241e036b
1. mysqld: storage/innobase/fts/fts0fts.cc:4290: dberr_t fts_sync(fts_sync_t*, bool, bool): Assertion `sync->unlock_cache' failed.
and
[ERROR] [FATAL] InnoDB: ######################################## Deadlock Detected!
were never observed again.
2. Remaining failures (need to check if already in JIRA)
The frequency/numbers were taken from some grammar simplification campaign(4700 RQG runs) and permanent changing RQG grammars.
- 1 * storage/innobase/trx/trx0rec.cc:238: byte* trx_undo_log_v_idx(buf_block_t*, const dict_table_t*, ulint, byte*, bool): Assertion `n_idx > 0' failed.
- 1 * storage/innobase/include/dict0mem.h:738: void dict_v_col_t::detach(const dict_index_t&): Assertion `n == n_v_indexes' failed.
- 11 * storage/innobase/handler/handler0alter.cc:560: bool dict_table_t::instant_column(const dict_table_t&, const ulint*): Assertion `v.v_indexes.empty()' failed.
- 1 * storage/innobase/handler/handler0alter.cc:11077: virtual bool ha_innobase::commit_inplace_alter_table(TABLE*, Alter_inplace_info*, bool): Assertion `ctx0->old_table->get_ref_count() == 1' failed.
- 3 * bb-10.5-MDEV-16678-rebase/storage/innobase/dict/dict0load.cc:1955: void dict_load_virtual_one_col(dict_table_t*, ulint, dict_v_col_t*, mem_heap_t*): Assertion `pos == vcol_pos' failed.
- 2 * storage/innobase/dict/dict0load.cc:1649: const char* dict_load_column_low(dict_table_t*, mem_heap_t*, dict_col_t*, table_id_t*, const char**, const rec_t*, ulint*): Assertion `vcol->v_pos == dict_get_v_col_pos(pos)' failed.
- frequent: storage/innobase/btr/btr0cur.cc:507: dberr_t btr_cur_instant_init_low(dict_index_t*, mtr_t*): Assertion `index->n_core_fields + n_add >= index->n_fields' failed.
https://jira.mariadb.org/browse/MDEV-21148 Problem is in actual 10.5 too.
- 1 * storage/innobase/btr/btr0cur.cc:1476: dberr_t btr_cur_search_to_nth_level_func(dict_index_t*, ulint, const dtuple_t*, page_cur_mode_t, ulint, btr_cur_t*, rw_lock_t*, const char*, unsigned int, mtr_t*, ib_uint64_t): Assertion `rw_lock_own(dict_index_get_lock(index), RW_LOCK_S)' failed.
https://jira.mariadb.org/browse/MDEV-20038 Problem is in 10.n too
- frequent [ERROR] InnoDB: Table test/t4 contains 7 indexes inside InnoDB, which is different from the number of indexes 8 defined in the MariaDB
which is surprising because the test does not invoke crash recovery
- frequent test fails where RQG means to have met a server freeze/deadlock or the server
did not shut down properly
There is a good probability of false alarm by RQG + these effects are known to be in 10.5 too.
So in case the last assert is not MDEV-16678 specific than MDEV-16678 should be now ok.
Matthias Leich
added a comment -
Test round on origin/bb-10.5-MDEV-16678-rebase2 b51478b219a9b347b496b2460c8b77a83dad1aa2 2019-11-27
with main focus on replaying "Assertion `ctx0->old_table->get_ref_count() == 1' failed"
- none of the virtual column related asserts was replayed
== Looks like the fix for MDEV-21148 did some exceptional good job
- 7 times mysqld: storage/innobase/rem/rem0rec.cc:507: bool rec_offs_validate(const rec_t*, const dict_index_t*, const ulint*): Assertion `ulint(rec) == offsets[2]' failed.
AFAIK not MDEV-16678 specific
- 1 time mysqld: storage/innobase/dict/dict0dict.cc:4559: void dict_table_check_for_dup_indexes(const dict_table_t*, check_name): Assertion `index1->is_committed() != index2->is_committed() || strcmp(index1->name, index2->name) != 0' failed.
Not found in JIRA
So in case the last assert is not MDEV-16678 specific than MDEV-16678 should be now ok.
I pushed some cleanup, mainly to the MDL acquisition code.
There was a race condition where the dict_table_t::name was being freed and renamed while we converting the table name to a MDL ticket name. This race could affect 10.2‥10.4 as well.
axel, please run the normal R/W benchmarks on the branch and compare to the latest 10.5. We would like to ensure that there is no performance degradation for DML workloads.
Marko Mäkelä
added a comment - I pushed some cleanup, mainly to the MDL acquisition code.
There was a race condition where the dict_table_t::name was being freed and renamed while we converting the table name to a MDL ticket name. This race could affect 10.2‥10.4 as well.
axel , please run the normal R/W benchmarks on the branch and compare to the latest 10.5. We would like to ensure that there is no performance degradation for DML workloads.
I ran standard OLTP workloads. Numbers and diagrams attached in MDEV-16678.ods
Observations:
the builds from 10.5 master (baseline) and bb-10.5-MDEV-16678-rebase2 (new) behave very similarly; the new code tends to scale better at high thread counts but is a little slower at 16 or 32 threads
disabling backup logs has a positive effect on performance for write-heavy workloads
Axel Schwenke
added a comment - I ran standard OLTP workloads. Numbers and diagrams attached in MDEV-16678.ods
Observations:
the builds from 10.5 master (baseline) and bb-10.5- MDEV-16678 -rebase2 (new) behave very similarly; the new code tends to scale better at high thread counts but is a little slower at 16 or 32 threads
disabling backup logs has a positive effect on performance for write-heavy workloads
People
Marko Mäkelä
Thirunarayanan Balathandayuthapani
Votes:
2Vote for this issue
Watchers:
10Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1761.7000000476837,"ttfb":226.39999985694885,"pageVisibility":"visible","entityId":68497,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":32,"apdex":0.5,"journeyId":"ea4d2f8d-f48b-4a6a-9d3e-6b9c4266e44d","navigationType":0,"readyForUser":1898.8999998569489,"redirectCount":0,"resourceLoadedEnd":2367.7999999523163,"resourceLoadedStart":240,"resourceTiming":[{"duration":903.2000000476837,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":240,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":240,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1143.2000000476837,"responseStart":0,"secureConnectionStart":0},{"duration":903.2000000476837,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":240.39999985694885,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":240.39999985694885,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1143.5999999046326,"responseStart":0,"secureConnectionStart":0},{"duration":913.4000000953674,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":240.59999990463257,"connectEnd":240.59999990463257,"connectStart":240.59999990463257,"domainLookupEnd":240.59999990463257,"domainLookupStart":240.59999990463257,"fetchStart":240.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":240.59999990463257,"responseEnd":1154,"responseStart":1154,"secureConnectionStart":240.59999990463257},{"duration":980.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":240.79999995231628,"connectEnd":240.79999995231628,"connectStart":240.79999995231628,"domainLookupEnd":240.79999995231628,"domainLookupStart":240.79999995231628,"fetchStart":240.79999995231628,"redirectEnd":0,"redirectStart":0,"requestStart":240.79999995231628,"responseEnd":1221.2999999523163,"responseStart":1221.2999999523163,"secureConnectionStart":240.79999995231628},{"duration":984.2000000476837,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":241,"connectEnd":241,"connectStart":241,"domainLookupEnd":241,"domainLookupStart":241,"fetchStart":241,"redirectEnd":0,"redirectStart":0,"requestStart":241,"responseEnd":1225.2000000476837,"responseStart":1225.2000000476837,"secureConnectionStart":241},{"duration":985,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":241.20000004768372,"connectEnd":241.20000004768372,"connectStart":241.20000004768372,"domainLookupEnd":241.20000004768372,"domainLookupStart":241.20000004768372,"fetchStart":241.20000004768372,"redirectEnd":0,"redirectStart":0,"requestStart":241.20000004768372,"responseEnd":1226.2000000476837,"responseStart":1226.2000000476837,"secureConnectionStart":241.20000004768372},{"duration":985.7999999523163,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":241.29999995231628,"connectEnd":241.29999995231628,"connectStart":241.29999995231628,"domainLookupEnd":241.29999995231628,"domainLookupStart":241.29999995231628,"fetchStart":241.29999995231628,"redirectEnd":0,"redirectStart":0,"requestStart":241.29999995231628,"responseEnd":1227.0999999046326,"responseStart":1227.0999999046326,"secureConnectionStart":241.29999995231628},{"duration":1105.9000000953674,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":241.59999990463257,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":241.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1347.5,"responseStart":0,"secureConnectionStart":0},{"duration":986.5,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":241.70000004768372,"connectEnd":241.70000004768372,"connectStart":241.70000004768372,"domainLookupEnd":241.70000004768372,"domainLookupStart":241.70000004768372,"fetchStart":241.70000004768372,"redirectEnd":0,"redirectStart":0,"requestStart":241.70000004768372,"responseEnd":1228.2000000476837,"responseStart":1228.2000000476837,"secureConnectionStart":241.70000004768372},{"duration":1105.7999999523163,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":241.79999995231628,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":241.79999995231628,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1347.5999999046326,"responseStart":0,"secureConnectionStart":0},{"duration":987.7999999523163,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":242,"connectEnd":242,"connectStart":242,"domainLookupEnd":242,"domainLookupStart":242,"fetchStart":242,"redirectEnd":0,"redirectStart":0,"requestStart":242,"responseEnd":1229.7999999523163,"responseStart":1229.7999999523163,"secureConnectionStart":242},{"duration":2121.4000000953674,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":244.09999990463257,"connectEnd":244.09999990463257,"connectStart":244.09999990463257,"domainLookupEnd":244.09999990463257,"domainLookupStart":244.09999990463257,"fetchStart":244.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":244.09999990463257,"responseEnd":2365.5,"responseStart":2365.5,"secureConnectionStart":244.09999990463257},{"duration":2123.7000000476837,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":244.09999990463257,"connectEnd":244.09999990463257,"connectStart":244.09999990463257,"domainLookupEnd":244.09999990463257,"domainLookupStart":244.09999990463257,"fetchStart":244.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":244.09999990463257,"responseEnd":2367.7999999523163,"responseStart":2367.7999999523163,"secureConnectionStart":244.09999990463257},{"duration":346.09999990463257,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":1372.7999999523163,"connectEnd":1372.7999999523163,"connectStart":1372.7999999523163,"domainLookupEnd":1372.7999999523163,"domainLookupStart":1372.7999999523163,"fetchStart":1372.7999999523163,"redirectEnd":0,"redirectStart":0,"requestStart":1372.7999999523163,"responseEnd":1718.8999998569489,"responseStart":1718.8999998569489,"secureConnectionStart":1372.7999999523163},{"duration":795.9000000953674,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":1753.2999999523163,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":1753.2999999523163,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":2549.2000000476837,"responseStart":0,"secureConnectionStart":0}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":22,"responseStart":227,"responseEnd":231,"domLoading":237,"domInteractive":2483,"domContentLoadedEventStart":2483,"domContentLoadedEventEnd":2545,"domComplete":3526,"loadEventStart":3526,"loadEventEnd":3527,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":2371.0999999046326},{"name":"bigPipe.sidebar-id.end","time":2372.0999999046326},{"name":"bigPipe.activity-panel-pipe-id.start","time":2372.2000000476837},{"name":"bigPipe.activity-panel-pipe-id.end","time":2376},{"name":"activityTabFullyLoaded","time":2565.7999999523163}],"measures":[],"correlationId":"3590971ef3bc28","effectiveType":"4g","downlink":9.8,"rtt":0,"serverDuration":140,"dbReadsTimeInMs":30,"dbConnsTimeInMs":40,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
In `row_update_for_mysql()`, there is no need to take data dictionary lock to initialize fts_doc_id. Because marko mentioned that SQL layer takes MDL lock when foreign key
is involved.