[MDEV-6459] max_relay_log_size and sql_slave_skip_counter misbehave on PPC64 Created: 2014-07-18  Updated: 2014-09-11  Resolved: 2014-09-11

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.13
Fix Version/s: 10.0.14

Type: Bug Priority: Major
Reporter: Sergey Vojtovich Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: None
Environment:

PPC64 RHEL 6.5


Issue Links:
PartOf
is part of MDEV-6478 MariaDB on Power8 Closed

 Description   

The following tests fail on PPC64 due to misbehaving variables: multi_source.skip_counter, rpl.rpl_auto_increment, rpl.rpl_mdev6020, rpl.rpl_skip_replication, rpl.rpl_stm_max_relay_size, sys_vars.max_relay_log_size_basic, sys_vars.sql_slave_skip_counter_basic.

BB link: http://buildbot.askmonty.org/buildbot/builders/bintar-rhel6-p8/builds/211/steps/test/logs/stdio

Most failures look as following:

@@ -21,17 +21,17 @@
 set global sql_slave_skip_counter = 2;
 select @@global.sql_slave_skip_counter;
 @@global.sql_slave_skip_counter
-2
+8589934592



 Comments   
Comment by Sergey Vojtovich [ 2014-07-18 ]

Kristian, please review fix for this bug.

The patch has been pushed to 10.0.13:

revno: 4293
revision-id: svoj@mariadb.org-20140718154521-mwoz6ezimga0axcj
parent: svoj@mariadb.org-20140718111625-uch1ssbh8kf6i4ib
committer: Sergey Vojtovich <svoj@mariadb.org>
branch nick: 10.0
timestamp: Fri 2014-07-18 19:45:21 +0400
message:
  MDEV-6459 - max_relay_log_size and sql_slave_skip_counter
              misbehave on PPC64
  
  There was a mix of ulong and uint casts/variables which caused
  incorrect value to be passed to/retrieved from max_relay_log_size
  and sql_slave_skip_counter.
  
  This mix failed to work on big-endian PPC64 where sizeof(int)= 4,
  sizeof(long)= 8. E.g. session_var(thd, uint)= 1 will in fact store
  0x100000000.

Comment by Sergei Golubchik [ 2014-07-30 ]

Try to avoid long in sysvars, prefer int or longlong instead. They are a lot more stable between architectures, int is typically 23-bit, longlong is 64-bit. But long can be either, so the variable gets different limits on different platforms — this makes documenting the variable (and writing test cases) rather complicated.

Comment by Sergey Vojtovich [ 2014-07-31 ]

Max value for sql_slave_skip_counter is UINT_MAX and for max_relay_log_size is 1024L*1024*1024. That is both fit 32-bit unsigned integer.

Not sure if there was a good reason to choose ulong and not the other type. Since Kristian created this code, I'm better handing off this recommendation to him.

Comment by Kristian Nielsen [ 2014-08-11 ]

> Not sure if there was a good reason to choose ulong and not the other
> type. Since Kristian created this code, I'm better handing off this
> recommendation to him.

I don't think I could have created the code for max_relay_log_size and
sql_slave_skip_counter? Those have existed since far before I started working
on replication, AFAIK?

Generally, I would agree with Serg that it's best to avoid using ulong. Using
ulonglong seems fine here.

I have noticed that binlog sizes and offsets have a tendency to use 32-bit
values around the replication code (which is generally wrong for file
offsets). I suspect that there are other bugs related to this lingering
around.

Using ulonglong by default when adding or otherwise changing code seems a
reasonable approach to me, where there are no performance concerns that would
suggest using a 32-bit type (and that does not seem to be the case here).

  • Kristian.
Generated at Thu Feb 08 07:12:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.