[MDEV-16216] slave_max_allowed_packet is not big enough when binlog is row format Created: 2018-05-18 Updated: 2018-08-13 Resolved: 2018-08-13 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | zhang jian | Assignee: | Sergei Golubchik |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | need_feedback, replication | ||
| Issue Links: |
|
||||||||
| Sprint: | 10.4.0-1 | ||||||||
| Description |
|
The global variable max_allowed_packet limit the client packet and slave_max_allowed_packet limit the packet from master. The max value of both vairiables is 1G. The problem is when binlog is row format, the update statement data size will be doubled in slave node. so if max_allowed_packet is same as slave_max_allowed_packet the slave will get error no 1594. Here is example, the binlog is row format.
First create a table and set the global variables both in master and slave.
In this example, we update a row and need to sent 10000000 Byte from client to master, but master need to sent 20000000 Byte to slave. 10000000 < max_allowed_packet and slave_max_allowed_packet < 20000000. so the slave IO thread will exit will error
Fix: |
| Comments |
| Comment by Sergei Golubchik [ 2018-05-18 ] | ||||||||||
|
In this example you update a row and send strlen("update t_rf_compact set f_text=repeat("d", 10000000) where f_id=1;")=66 bytes from client to master (a bit more, because of the protocol overhead). The point is, no matter how large the slave_max_allowed_packet will be, you can still generate a larger row event with just a few bytes sent from the client to the master. The shortest is, something like
And it can generate an arbitrary large event.
this way even if every individual repeat is below the limit, the row packet will be many times above it. Even for insert, not only for update. | ||||||||||
| Comment by zhang jian [ 2018-05-19 ] | ||||||||||
|
Yes, you are right. That would be less problem if slave_max_allowed_packet is far bigger than max_allowed_packet in production environment. But can not solve it all. I just wonder if we can remove the limit of slave_max_allowed_packet by cutting a big event into some pieces. Because as users ( for example cloud DB users), they can execute a statement in master but will fail in the slave, that's make no sense to them : ) | ||||||||||
| Comment by Sergei Golubchik [ 2018-05-19 ] | ||||||||||
|
I don't know... I suspect slave_max_allowed_packet was added for a good reason, to limit the packet size. And you suggest to introduce a loophole to bypass it, basically, and send larger packets. May be a fix should be different — you say "users <...> can execute a statement in master but will fail in the slave, that's make no sense to them", so perhaps the statement should fail on the master? If the generated row event is going to be larger than slave_max_allowed_packet the statement is aborted with an error? Do you think your users would prefer that instead of failed replication? | ||||||||||
| Comment by zhang jian [ 2018-05-19 ] | ||||||||||
|
I think make the statement fail on the master is better than fail on slave's IO thread.. But sometimes users even do not know their master has a slave. The slave in our environment is just a backup and transparent to users. so, I guess they prefer a failed replication... (we prefer a failed master statement : ) | ||||||||||
| Comment by Andrei Elkin [ 2018-05-21 ] | ||||||||||
|
ZhangJian> I think make the statement fail on the master is better than fail on slave's IO thread True, as the sooner the better. But Sergei has pointed to I think we can close this one as a duplicate. Objections, Sergei, Zhang Jian? | ||||||||||
| Comment by zhang jian [ 2018-05-22 ] | ||||||||||
|
Yes,Thanks for Andrei's comment I get to know why we have a variable slave_max_allowed_packet, just one last question, can we make the max value of slave_max_allowed_packet bigger( maybe twice or times bigger) than the max value of max_allowed_packet. It's seems safer . | ||||||||||
| Comment by zhang jian [ 2018-05-22 ] | ||||||||||
|
By the way , here is a same issue and the suggestion fix is to ignore max allow packet completely. https://bugs.mysql.com/bug.php?id=77817 | ||||||||||
| Comment by Sergei Golubchik [ 2018-05-24 ] | ||||||||||
No, it's not a duplicate of In this issue there is no base64. It's different. | ||||||||||
| Comment by Andrei Elkin [ 2018-05-24 ] | ||||||||||
|
> True, my bad. It's a similar to that one's fragmentation idea that apparently could be exploited. | ||||||||||
| Comment by Sergei Golubchik [ 2018-05-25 ] | ||||||||||
|
Well, as I wrote above, slave_max_allowed_packet is a safety measure to limit the max packet size and, thus, the memory consumption on the slave. It's currently set to 1GB. Perhaps we can increase it, but I doubt it would be desirable to remove the limit completely. Getting out-of-memory error on the slave or having the slave killed by an OOM killer is not really better than rejecting a huge event up front. | ||||||||||
| Comment by Andrei Elkin [ 2018-05-29 ] | ||||||||||
|
After thinking over the issue I have gotten the following outline. Just for the sake of clearness let's consider M.A_max == S.A_max == A_max A_max stands for two control items, both dealing with memory
including parts of a row-based event. Technically though in order to let a 2 * M.A_max maximum-size row-based event Notice in such A_max-equal replication configuration slave_max_allowed_packet is Let's turn from the equal to a general A_max distribution. [*] S.A_max >= M.A_max satisfies replication from the event applying perspective. From above we learn that for successful acceptance by the slave receiver the following must hold: slave_io_thread@@session.max_allowed_packet == slave_max_allowed_packet >= 2 * master@@global.max_allowed_packet Hence from [*] the slave receive buffer max size can be set to [**] slave_io_thread@@session.max_allowed_packet := 2 * slave@@global.max_allowed_packet}} that is to a value computed from the slave's global max_allowed_packet. The general case also optimizes slave_io_thread@@session.max_allowed_packet away. Now practically as the current ticket resolution we would deprecate the "manual" control of We will also make the slave receiver thread to verify [*] at its We won't be tacking a case when M.A_max was/is changing: Separately, as max_allowed_packet itself is under 1GB limit, row-based events over the limit can be transmitted | ||||||||||
| Comment by Andrei Elkin [ 2018-05-29 ] | ||||||||||
|
If there will be agreement on slave_io_thread@@session.max_allowed_packet := 2 * slave@@global.max_allowed_packet etc fixes, what version we should target? | ||||||||||
| Comment by Sergei Golubchik [ 2018-05-30 ] | ||||||||||
|
Again, I wrote above and as bug#77817 says 2x doesn't help you can do many times more than max_allowed_packet. But note that by default slave_max_allowed_packet is not max_allowed_packet.
That is, the slave_max_allowed_packet by default is set to max possible value for a packet size. We can increase that. But I don't see a reason to decrease it. | ||||||||||
| Comment by Andrei Elkin [ 2018-05-30 ] | ||||||||||
|
> bug#77817 says 2x doesn't help you can do many times more than max_allowed_packet. You mean 2 * master@@global.max_allowed_packet > 1GB ? This can be only addressed with fragmentation layer which I am about to 'separate' from this ticket. As to the slave_max_allowed_packet by default is set to max possible, I should've checked the default value, thanks! Indeed [***] slave@slave_io_thread@@session.max_allowed_packet >= 2 * slave@@global.max_allowed_packet which would be conducted at time of either @@globals changes its value. | ||||||||||
| Comment by Andrei Elkin [ 2018-06-01 ] | ||||||||||
|
After some more consideration I have committed myself to implement the following two steps that cover two existing issues
Fragmentation and the transport_event creation is serg, anybody could you please rate the idea. Thanks. | ||||||||||
| Comment by Sergei Golubchik [ 2018-06-05 ] | ||||||||||
|
There is no need for a automatic control of SLAVE_MAX_ALLOWED_PACKET. Because it is already set to is maximal possible value. You cannot increase it further automatic or not. You can decrease it, but it won't help to avoid these "packet too large" errors, will it? And, again, SLAVE_MAX_ALLOWED_PACKET is a safety measure. So, no fragmentation should be able to produce packets larger than that. The only thing we can do is to increase SLAVE_MAX_ALLOWED_PACKET. 1GB is too little? Let's try if 2GB or 4GB will work. | ||||||||||
| Comment by Sergei Golubchik [ 2018-07-09 ] | ||||||||||
|
So, shall we try a larger SLAVE_MAX_ALLOWED_PACKET ? Like 2GB? An alternative would be to close this issue, because it's not a bug and it cannot be fixed. |