[MDEV-6296] runtime adjustment of wsrep_slave_threads creates threads but never removes them Created: 2014-06-03  Updated: 2014-06-20  Resolved: 2014-06-15

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5.37-galera
Fix Version/s: 5.5.38-galera, 10.0.12-galera

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Nirbhay Choubey (Inactive)
Resolution: Fixed Votes: 0
Labels: galera
Environment:

Ubuntu 12.04.4 LTS (GNU/Linux 2.6.32-28-server x86_64)


Attachments: File wsrep_close_slave.patch    

 Description   

select @@wsrep_slave_threads

64

MariaDB [(none)]> set global wsrep_slave_threads=1;  
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [(none)]> show processlist;
| Id   | User        | Host      | db   | Command | Time    | State              | Info             | Progress |
|    4 | system user |           | NULL | Sleep   | 2443658 | wsrep aborter idle | NULL             |    0.000 |
|    5 | system user |           | NULL | Sleep   |     164 | committed 13294138 | NULL             |    0.000 |
|    6 | system user |           | NULL | Sleep   |     213 | committed 13294127 | NULL             |    0.000 |
|    7 | system user |           | NULL | Sleep   |     119 | committed 13294154 | NULL             |    0.000 |
|    8 | system user |           | NULL | Sleep   |     213 | committed 13294122 | NULL             |    0.000 |
|    9 | system user |           | NULL | Sleep   |      65 | committed 13294165 | NULL             |    0.000 |
|   10 | system user |           | NULL | Sleep   |      68 | committed 13294162 | NULL             |    0.000 |
|   11 | system user |           | NULL | Sleep   |     213 | committed 13294121 | NULL             |    0.000 |
|   12 | system user |           | NULL | Sleep   |     214 | committed 13294111 | NULL             |    0.000 |
|   13 | system user |           | NULL | Sleep   |     213 | committed 13294117 | NULL             |    0.000 |
|   14 | system user |           | NULL | Sleep   |      77 | committed 13294160 | NULL             |    0.000 |
|   15 | system user |           | NULL | Sleep   |      77 | committed 13294156 | NULL             |    0.000 |
|   16 | system user |           | NULL | Sleep   |     214 | committed 13294114 | NULL             |    0.000 |
|   17 | system user |           | NULL | Sleep   |     213 | committed 13294120 | NULL             |    0.000 |
|   18 | system user |           | NULL | Sleep   |      77 | committed 13294157 | NULL             |    0.000 |
|   19 | system user |           | NULL | Sleep   |     208 | committed 13294129 | NULL             |    0.000 |
|   20 | system user |           | NULL | Sleep   |     189 | committed 13294133 | NULL             |    0.000 |
|   21 | system user |           | NULL | Sleep   |     137 | committed 13294142 | NULL             |    0.000 |
|   22 | system user |           | NULL | Sleep   |      55 | committed 13294170 | NULL             |    0.000 |
|   23 | system user |           | NULL | Sleep   |      68 | committed 13294163 | NULL             |    0.000 |
|   24 | system user |           | NULL | Sleep   |     137 | committed 13294140 | NULL             |    0.000 |
|   25 | system user |           | NULL | Sleep   |     213 | committed 13294118 | NULL             |    0.000 |
|   26 | system user |           | NULL | Sleep   |     191 | committed 13294131 | NULL             |    0.000 |
|   27 | system user |           | NULL | Sleep   |     121 | committed 13294151 | NULL             |    0.000 |
|   28 | system user |           | NULL | Sleep   |      74 | committed 13294161 | NULL             |    0.000 |
|   29 | system user |           | NULL | Sleep   |     137 | committed 13294143 | NULL             |    0.000 |
|   30 | system user |           | NULL | Sleep   |     119 | committed 13294153 | NULL             |    0.000 |
|   31 | system user |           | NULL | Sleep   |     214 | committed 13294115 | NULL             |    0.000 |
|   32 | system user |           | NULL | Sleep   |     213 | committed 13294125 | NULL             |    0.000 |
|   33 | system user |           | NULL | Sleep   |     201 | committed 13294130 | NULL             |    0.000 |
|   34 | system user |           | NULL | Sleep   |     187 | committed 13294135 | NULL             |    0.000 |
|   35 | system user |           | NULL | Sleep   |     133 | committed 13294150 | NULL             |    0.000 |
|   36 | system user |           | NULL | Sleep   |     220 | committed 13294108 | NULL             |    0.000 |
|   37 | system user |           | NULL | Sleep   |     213 | committed 13294128 | NULL             |    0.000 |
|   38 | system user |           | NULL | Sleep   |     221 | committed 13294107 | NULL             |    0.000 |
|   39 | system user |           | NULL | Sleep   |     133 | committed 13294146 | NULL             |    0.000 |
|   40 | system user |           | NULL | Sleep   |     133 | committed 13294148 | NULL             |    0.000 |
|   41 | system user |           | NULL | Sleep   |     137 | committed 13294141 | NULL             |    0.000 |
|   42 | system user |           | NULL | Sleep   |     214 | committed 13294113 | NULL             |    0.000 |
|   43 | system user |           | NULL | Sleep   |      57 | committed 13294169 | NULL             |    0.000 |
|   44 | system user |           | NULL | Sleep   |     213 | committed 13294126 | NULL             |    0.000 |
|   45 | system user |           | NULL | Sleep   |     213 | committed 13294123 | NULL             |    0.000 |
|   46 | system user |           | NULL | Sleep   |      66 | committed 13294164 | NULL             |    0.000 |
|   47 | system user |           | NULL | Sleep   |     214 | committed 13294112 | NULL             |    0.000 |
|   48 | system user |           | NULL | Sleep   |     214 | committed 13294110 | NULL             |    0.000 |
|   49 | system user |           | NULL | Sleep   |     185 | committed 13294137 | NULL             |    0.000 |
|   50 | system user |           | NULL | Sleep   |     135 | committed 13294145 | NULL             |    0.000 |
|   51 | system user |           | NULL | Sleep   |      63 | committed 13294166 | NULL             |    0.000 |
|   52 | system user |           | NULL | Sleep   |     137 | committed 13294139 | NULL             |    0.000 |
|   53 | system user |           | NULL | Sleep   |      58 | committed 13294168 | NULL             |    0.000 |
|   54 | system user |           | NULL | Sleep   |     188 | committed 13294134 | NULL             |    0.000 |
|   55 | system user |           | NULL | Sleep   |     213 | committed 13294124 | NULL             |    0.000 |
|   56 | system user |           | NULL | Sleep   |     214 | committed 13294109 | NULL             |    0.000 |
|   57 | system user |           | NULL | Sleep   |     213 | committed 13294116 | NULL             |    0.000 |
|   58 | system user |           | NULL | Sleep   |     136 | committed 13294144 | NULL             |    0.000 |
|   59 | system user |           | NULL | Sleep   |     187 | committed 13294136 | NULL             |    0.000 |
|   60 | system user |           | NULL | Sleep   |     133 | committed 13294149 | NULL             |    0.000 |
|   61 | system user |           | NULL | Sleep   |      63 | committed 13294167 | NULL             |    0.000 |
|   62 | system user |           | NULL | Sleep   |     119 | committed 13294152 | NULL             |    0.000 |
|   63 | system user |           | NULL | Sleep   |     213 | committed 13294119 | NULL             |    0.000 |
|   64 | system user |           | NULL | Sleep   |      77 | committed 13294158 | NULL             |    0.000 |
|   65 | system user |           | NULL | Sleep   |     190 | committed 13294132 | NULL             |    0.000 |
|   66 | system user |           | NULL | Sleep   |      91 | committed 13294155 | NULL             |    0.000 |
|   67 | system user |           | NULL | Sleep   |     133 | committed 13294147 | NULL             |    0.000 |
|   68 | system user |           | NULL | Sleep   |      77 | committed 13294159 | NULL             |    0.000 |
| 9660 | root        | localhost | NULL | Query   |       0 | sleeping           | show processlist |    0.000 |
+------+-------------+-----------+------+---------+---------+--------------------+------------------+----------+
66 rows in set (0.04 sec)

none deleted.

oh well. lets set back to 64.

MariaDB [(none)]> set global wsrep_slave_threads=64;
Query OK, 0 rows affected (0.67 sec)
 
MariaDB [(none)]> show processlist;                 
+------+-------------+-----------+------+---------+---------+--------------------+------------------+----------+
| Id   | User        | Host      | db   | Command | Time    | State              | Info             | Progress |
+------+-------------+-----------+------+---------+---------+--------------------+------------------+----------+
|    4 | system user |           | NULL | Sleep   | 2443728 | wsrep aborter idle | NULL             |    0.000 |
|    5 | system user |           | NULL | Sleep   |     234 | committed 13294138 | NULL             |    0.000 |
|    6 | system user |           | NULL | Sleep   |     283 | committed 13294127 | NULL             |    0.000 |
|    7 | system user |           | NULL | Sleep   |     189 | committed 13294154 | NULL             |    0.000 |
|    8 | system user |           | NULL | Sleep   |     283 | committed 13294122 | NULL             |    0.000 |
|    9 | system user |           | NULL | Sleep   |     135 | committed 13294165 | NULL             |    0.000 |
|   10 | system user |           | NULL | Sleep   |     138 | committed 13294162 | NULL             |    0.000 |
|   11 | system user |           | NULL | Sleep   |     283 | committed 13294121 | NULL             |    0.000 |
|   13 | system user |           | NULL | Sleep   |     283 | committed 13294117 | NULL             |    0.000 |
|   14 | system user |           | NULL | Sleep   |     147 | committed 13294160 | NULL             |    0.000 |
|   15 | system user |           | NULL | Sleep   |     147 | committed 13294156 | NULL             |    0.000 |
|   17 | system user |           | NULL | Sleep   |     283 | committed 13294120 | NULL             |    0.000 |
|   18 | system user |           | NULL | Sleep   |     147 | committed 13294157 | NULL             |    0.000 |
|   19 | system user |           | NULL | Sleep   |     278 | committed 13294129 | NULL             |    0.000 |
|   20 | system user |           | NULL | Sleep   |     259 | committed 13294133 | NULL             |    0.000 |
|   21 | system user |           | NULL | Sleep   |     207 | committed 13294142 | NULL             |    0.000 |
|   22 | system user |           | NULL | Sleep   |     125 | committed 13294170 | NULL             |    0.000 |
|   23 | system user |           | NULL | Sleep   |     138 | committed 13294163 | NULL             |    0.000 |
|   24 | system user |           | NULL | Sleep   |     207 | committed 13294140 | NULL             |    0.000 |
|   25 | system user |           | NULL | Sleep   |     283 | committed 13294118 | NULL             |    0.000 |
|   26 | system user |           | NULL | Sleep   |     261 | committed 13294131 | NULL             |    0.000 |
|   27 | system user |           | NULL | Sleep   |     191 | committed 13294151 | NULL             |    0.000 |
|   28 | system user |           | NULL | Sleep   |     144 | committed 13294161 | NULL             |    0.000 |
|   29 | system user |           | NULL | Sleep   |     207 | committed 13294143 | NULL             |    0.000 |
|   30 | system user |           | NULL | Sleep   |     189 | committed 13294153 | NULL             |    0.000 |
|   32 | system user |           | NULL | Sleep   |     283 | committed 13294125 | NULL             |    0.000 |
|   33 | system user |           | NULL | Sleep   |     271 | committed 13294130 | NULL             |    0.000 |
|   34 | system user |           | NULL | Sleep   |     257 | committed 13294135 | NULL             |    0.000 |
|   35 | system user |           | NULL | Sleep   |     203 | committed 13294150 | NULL             |    0.000 |
|   37 | system user |           | NULL | Sleep   |     283 | committed 13294128 | NULL             |    0.000 |
|   39 | system user |           | NULL | Sleep   |     203 | committed 13294146 | NULL             |    0.000 |
|   40 | system user |           | NULL | Sleep   |     203 | committed 13294148 | NULL             |    0.000 |
|   41 | system user |           | NULL | Sleep   |     207 | committed 13294141 | NULL             |    0.000 |
|   43 | system user |           | NULL | Sleep   |     127 | committed 13294169 | NULL             |    0.000 |
|   44 | system user |           | NULL | Sleep   |     283 | committed 13294126 | NULL             |    0.000 |
|   45 | system user |           | NULL | Sleep   |     283 | committed 13294123 | NULL             |    0.000 |
|   46 | system user |           | NULL | Sleep   |     136 | committed 13294164 | NULL             |    0.000 |
|   49 | system user |           | NULL | Sleep   |     255 | committed 13294137 | NULL             |    0.000 |
|   50 | system user |           | NULL | Sleep   |     205 | committed 13294145 | NULL             |    0.000 |
|   51 | system user |           | NULL | Sleep   |     133 | committed 13294166 | NULL             |    0.000 |
|   52 | system user |           | NULL | Sleep   |     207 | committed 13294139 | NULL             |    0.000 |
|   53 | system user |           | NULL | Sleep   |     128 | committed 13294168 | NULL             |    0.000 |
|   54 | system user |           | NULL | Sleep   |     258 | committed 13294134 | NULL             |    0.000 |
|   55 | system user |           | NULL | Sleep   |     283 | committed 13294124 | NULL             |    0.000 |
|   57 | system user |           | NULL | Sleep   |     283 | committed 13294116 | NULL             |    0.000 |
|   58 | system user |           | NULL | Sleep   |     206 | committed 13294144 | NULL             |    0.000 |
|   59 | system user |           | NULL | Sleep   |     257 | committed 13294136 | NULL             |    0.000 |
|   60 | system user |           | NULL | Sleep   |     203 | committed 13294149 | NULL             |    0.000 |
|   61 | system user |           | NULL | Sleep   |     133 | committed 13294167 | NULL             |    0.000 |
|   62 | system user |           | NULL | Sleep   |     189 | committed 13294152 | NULL             |    0.000 |
|   63 | system user |           | NULL | Sleep   |     283 | committed 13294119 | NULL             |    0.000 |
|   64 | system user |           | NULL | Sleep   |     147 | committed 13294158 | NULL             |    0.000 |
|   65 | system user |           | NULL | Sleep   |     260 | committed 13294132 | NULL             |    0.000 |
|   66 | system user |           | NULL | Sleep   |     161 | committed 13294155 | NULL             |    0.000 |
|   67 | system user |           | NULL | Sleep   |     203 | committed 13294147 | NULL             |    0.000 |
|   68 | system user |           | NULL | Sleep   |     147 | committed 13294159 | NULL             |    0.000 |
| 9660 | root        | localhost | NULL | Query   |       0 | sleeping           | show processlist |    0.000 |
| 9670 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9671 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9672 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9673 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9674 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9675 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9676 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9677 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9678 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9679 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9680 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9681 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9682 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9683 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9684 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9685 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9686 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9687 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9688 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9689 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9690 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9691 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9692 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9693 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9694 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9695 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9696 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9697 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9698 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9699 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9700 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9701 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9702 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9703 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9704 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9705 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9706 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9707 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9708 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9709 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9710 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9711 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9712 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9713 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9714 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9715 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9716 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9717 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9718 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9719 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9720 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9721 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9722 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9723 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9724 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9725 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9726 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9727 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9728 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9729 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9730 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9731 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
| 9732 | system user |           | NULL | Sleep   |       3 | NULL               | NULL             |    0.000 |
+------+-------------+-----------+------+---------+---------+--------------------+------------------+----------+
120 rows in set (0.03 sec)

another 64 threads? Is that what I said.



 Comments   
Comment by Nirbhay Choubey (Inactive) [ 2014-06-13 ]

Hi danblack!
IMO decreasing the number of slave threads at runtime is not supported at the moment.
Also, I am not sure if decreasing it at runtime would be safe. However the 2nd part (where
the number is getting added on top of the existing) is certainly a bug. I am currently planning
on fixing the 2nd issue. The 1st part (if really needed) can be fixed as an FR in a separate
MDEV.
Thank you.

Comment by Daniel Black [ 2014-06-14 ]

nirbhay_c, thanks and here's my rough patch at the FR.

since wsrep_applier_closing wasn't used I removed it.

wsrep_close_applier_threads wasn't used either so I completed the function. The end_connection calls the free_connection in the wsapi causes them to fall out of the wsrep->recv function in the thread wsrep_replication_process (without aborting any transactions).

Comment by Nirbhay Choubey (Inactive) [ 2014-06-15 ]

danblack It turned out the wsrep_slave_threads can be increased/decreased at runtime.
The related documentation was either incorrect or not updated.

I looked further into the code and found that the previous "change" in applier thread count
was not taken into account while serving a new request (SET @@global.wsrep_slave_threads=N)
to change it. This caused both the above reported anomalies.

Your patch does look good. But with the current design, the server needs to inform
the galera library first of the decision to exit the applier thread, instead of just killing it.

I have written a blog to explain this further.
http://nirbhay.in/2014/06/galera-applier-threads-runtime-adjustment/

Comment by Nirbhay Choubey (Inactive) [ 2014-06-15 ]

Updated wsrep_slave_threads to be dynamically adjustable.
https://mariadb.com/kb/en/galera-cluster-system-variables/#wsrep_slave_threads

Comment by Nirbhay Choubey (Inactive) [ 2014-06-15 ]

Fix pushed to maria-5.5-galera
http://bazaar.launchpad.net/~maria-captains/maria/maria-5.5-galera/revision/3504

Comment by Daniel Black [ 2014-06-16 ]

Thanks nirbhay_c. I'm off to try to get more info for MDEV-6300 now.

Generated at Thu Feb 08 07:10:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.