[MDEV-15822] WSREP: BF lock wait long for trx Created: 2018-04-09 Updated: 2022-09-16 Resolved: 2018-07-25 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera |
| Affects Version/s: | 10.1, 10.2, 10.3 |
| Fix Version/s: | 10.1.35, 10.2.17, 10.3.9 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Reinhard Sojka | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Environment: |
CentOS 7.4 (with microcode_ctl-2.1-22.2.el7.x86_64, AFAIK withdrawn because of stability issues) |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
hi, this might be duplicate if We ran in this bug last month, but I had no time to look deeper into this until now. It was on a Cluster with 2 nodes. The problems started on node "DB2", and shifted to node "DB1" when trying to restart "DB2". The restart failed, so I rebooted the node. here is the timeline of what happend when on node DB2: from /var/log/messages: 10:41 - rebooted node DB2 |
| Comments |
| Comment by Jan Lindström (Inactive) [ 2018-06-05 ] | |||||||||||||||||||||||||||||||||||||||
|
There is record lock wait between galera BF (brure force) DDL-transaction doing ALTER TABLE and mysql.innodb_table_stats update:
| |||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2018-06-05 ] | |||||||||||||||||||||||||||||||||||||||
|
Dimov Questions: Why these mysql.innodb_index_stats and mysql.innodb_table_stats updates are done in different trx compared to trx that causes the update ? Secondly, why it is done in middle of huge DDL ? Finally, can we somehow move this update in case of BF later ? Does Galera replicate the these update from every updated node to other nodes and do this same update regardless of this replication ? | |||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2018-06-05 ] | |||||||||||||||||||||||||||||||||||||||
|
Meanwhile: a work-around could be `SET GLOBAL innodb_stats_auto_recalc=OFF` | |||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2018-06-05 ] | |||||||||||||||||||||||||||||||||||||||
|
Another similar case:
| |||||||||||||||||||||||||||||||||||||||
| Comment by Vasil (Inactive) [ 2018-06-18 ] | |||||||||||||||||||||||||||||||||||||||
|
> Why these mysql.innodb_index_stats and mysql.innodb_table_stats updates are done in different trx compared to trx that causes the update? The stats recalculations are done in an async manner with the DML that crossed the threshold and made stats recalculation necessary. In async manner by a background thread. This is to avoid delay of the DML with stats recalculation. > Secondly, why it is done in middle of huge DDL? It is done in an async manner. Once a DML changes too many rows a table if flagged as "needs stats recalc" and later at some point the background stats thread picks that table and recalculates its statistics, like a background/hidden ANALYZE TABLE. So, it can happen at any time, also in the middle of huge DDL, if triggered by some previous DML. > Finally, can we somehow move this update in case of BF later? But this shouldn't be necessary. Both operations should run concurrently and serialize where necessary. The background stats recalc is pretty much the same as the user running ANALYZE TABLE manually in another terminal. We can't ask the user not to run or delay his ANALYZE during BF. > Does Galera replicate the these update from every updated node to other nodes and do this same update regardless of this replication? I would guess so, but I am not sure. | |||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2018-06-26 ] | |||||||||||||||||||||||||||||||||||||||
|
Dimov ok, can we at least somehow avoid after significant change done by BF DDL-transaction putting that table for background statistic calculation ? We maybe can't avoid the situation where that table is selected before BF transaction starts. In my understanding MDL should cause wait for ANALYZE table during other DDL. Same MDL or InnoDB row locks should avoid situation where some other transaction will do significant change to same table causing statistic recalc during BF DDL or that transaction would be selected as victim. | |||||||||||||||||||||||||||||||||||||||
| Comment by Vasil (Inactive) [ 2018-06-26 ] | |||||||||||||||||||||||||||||||||||||||
|
A table is queued for background stats recalc from `row_update_statistics_if_needed()` - maybe the logic can be fiddled to avoid the call to `dict_stats_recalc_pool_add()`? There is also `dict_stats_recalc_pool_del()` which can be used to remove a table from the queue of already present. | |||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2018-06-28 ] | |||||||||||||||||||||||||||||||||||||||
|
Dimov I do not have yet fully working test case, I can get DDL to BF state but do not know yet how to get BF lock long wait. Can you see the attached galera.diff and comment my "try" ? | |||||||||||||||||||||||||||||||||||||||
| Comment by Vasil (Inactive) [ 2018-06-28 ] | |||||||||||||||||||||||||||||||||||||||
|
Just for the records, a copy from IM:
| |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-07-05 ] | |||||||||||||||||||||||||||||||||||||||
|
Dimov, I believe that in MariaDB’s mysql-test-run, InnoDB persistent statistics are disabled by default. | |||||||||||||||||||||||||||||||||||||||
| Comment by Vasil (Inactive) [ 2018-07-05 ] | |||||||||||||||||||||||||||||||||||||||
|
Ok, then of course, the `.opt` file is necessary. | |||||||||||||||||||||||||||||||||||||||
| Comment by Chow King Tak [ 2018-11-22 ] | |||||||||||||||||||||||||||||||||||||||
|
I am using version 10.3.9 but I just encountered the BF error. It blocked all transactions and I needed to restart the node. Is there any configuration in the my.cnf I have to adjust to avoid the error? The following are some of the error messages: 2018-11-22 8:53:28 0 [Note] InnoDB: WSREP: BF lock wait long for trx:0x2129df query: DELETE FROM pe3logex1.tmp_app_to_db_status_report WHERE site='SP'¢Yõ[^S^B ===================================== Record lock, heap no 4 PHYSICAL RECORD: n_fields 12; compact format; info bits 32 | |||||||||||||||||||||||||||||||||||||||
| Comment by peng gao [ 2022-09-16 ] | |||||||||||||||||||||||||||||||||||||||
|
Hi All: Like Chow King Tak, we also use mariadb 10.3.9 with gelera, one node stuck for ever.finally, we restart cluster, 2022-09-14 9:33:43 59 [Note] InnoDB: WSREP: BF lock wait long for trx:0x12a95746f query: insert into *** () TRANSACTION 5009405039, ACTIVE 3706 sec inserting Here insert transaction stuck 3706s,state is inserting . Have any fixed here ? Thanks.. |