[MDEV-4723] "State" column of SHOW PROCESSLIST returns wrong values (non-ascii chars) for some states - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Minor
Resolution: Fixed
Affects Version/s: 5.5.31
Fix Version/s: 5.5.34
Component/s: None
Labels:
- processlist
- upstream
Environment:
Debian Squeeze amd64

Description

An example of these abnormal values (i added the "|cat -v" at the end of the command to convert the non-ascii chars to standardized notations to avoid the loss/modification of those while copy/pasting to JIRA) :

# /bin/ksh -c 'i=0; while((i<100));do mysql -e "SHOW FULL PROCESSLIST\G" |perl -e "while(\$nl=<>){if(m/^\s*State:\s+[^A-z\s].*\$/){print \$_.\$nl;}} continue{\$_=\$nl;}"; sleep 0.2; ((i++)); done' |cat -v

   State: ^F

    Info: INSERT IGNORE INTO dfreeirzq6.t43769 (id,date,mail,bgtrfvd,etc) SELECT tbl.id, tbl.date, tbl.mail, tbl.etc,43769 FROM fjtyris.dfreeirzq437 AS tbl WHERE tbl.bgtrfvd=43769 LIMIT 100

   State: M-^XM-#M-^@y:^?

    Info: INSERT IGNORE INTO [...]

   State: M-xYM->M-D9^?

    Info: INSERT IGNORE INTO [...]

   State: M-`M-L.M-U9^?

    Info: INSERT IGNORE INTO [...]

   State: ^HM-Gv4:^?

    Info: INSERT IGNORE INTO [...]

   State: ^F

    Info: INSERT IGNORE INTO [...]

   State: 8i+y:^?

    Info: INSERT IGNORE INTO [...]

   State: @a^LM-u9^?

    Info: INSERT IGNORE INTO [...]

Insert queries in the example has been stripped for security reason but all looks the same but not ran on the same base and table ; these queries are all almost the same.

This does not seems to happen on the other server under MySQL 5.5.30 having the same tables/structures and applications executing the same kind of queries.

I havent found so far anything out of the ordinary on the app executing the queries (it does other queries than these INSERTS using the same function without encountering the same bug.

ps: it seems that the STATE value on INFORMATION_SCHEMA.PROCESSLIST is correct and is "Repair with 3 threads" for the problematic recuring INSERT queries so the bug is only on the output of the SHOW PROCESSLIST command.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

varlist.txt
2013-06-29 02:24
12 kB
Jean Weisbuch

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2013-06-29 00:00

Hi,

Is the table MyISAM?
Could you please attach your cnf file or paste SHOW VARIABLES?

Thanks.

Elena Stepanova added a comment - 2013-06-29 00:00 Hi, Is the table MyISAM? Could you please attach your cnf file or paste SHOW VARIABLES? Thanks.

Jean Weisbuch added a comment - 2013-06-29 02:24

The tables are in MyISAM and i attached the vairables list.

Notice especially these values :
myisam_repair_threads = 8
key_cache_segments = 4

Jean Weisbuch added a comment - 2013-06-29 02:24 The tables are in MyISAM and i attached the vairables list. Notice especially these values : myisam_repair_threads = 8 key_cache_segments = 4

Elena Stepanova added a comment - 2013-07-03 21:33

I can easily trigger it with a stress test. By far the most frequent value that the field gets populated with, at least in my env, is CR (13). Here are other examples in decimal codes that I was getting on the client side while executing SHOW PROCESSLIST.

63 63 1 16 63 127
13 0 0 63 0 0 0 0 38 0 0 0 0 0 0 0 0 0 63 0 0
etc.

Unfortunately I wasn't able to produce a syncpoint-based test case in any reasonable time. I think the problem is not so much the synchronization itself (it seems fairly simple – let the field value be generated, then wait till the query it belongs to finishes, proceed), but the fact that the problem is only visible when the old data is actually corrupted, which I guess is sporadic and depends on the memory layout, and I couldn't figure how to make it reliable.

So instead of a sync point I added a check and assertion in the SHOW PROCESSLIST code where it gets corrupted (in mysqld_list_processes(THD *thd,const char *user, bool verbose)):

=== modified file 'sql/sql_show.cc'
— sql/sql_show.cc 2013-05-11 17:31:50 +0000
+++ sql/sql_show.cc 2013-07-02 23:03:21 +0000
@@ -2282,6 +2282,12 @@
thd_info->query_string.charset());
if (!thd->variables.old_mode)
protocol->store(thd_info->progress, 3, &store_buffer);
+
+ if ( thd_info && thd_info->state_info && (strlen(thd_info->state_info) > 0) && !my_isalpha(system_charset_info,(thd_info->state_info)[0]) )
+

{ + fprintf(stderr,"MDEV-4723: state: '%s' starts with: %i\n", thd_info->state_info, (thd_info->state_info)[0]); + DBUG_ASSERT(0); + }

if (protocol->write())
break; /* purecov: inspected */
}

(sorry if it's not elegant, but I'm not a practicing developer, my code is supposed to be ugly to make the real coders' code look better).
The idea was to make sure that if the state_info exists and isn't empty, then it starts with a letter. Hopefully I did it right.

In fact, the value gets broken any time after
2223: thd_info->state_info= thread_state_info(tmp);
2224: if (mysys_var)
2225: mysql_mutex_unlock(&mysys_var->mutex);
2226:
2227: /* Lock THD mutex that protects its data when looking at it. */

I just chose the downmost point, because once broken, it remains broken, so the further down the higher probability.

With that and a primitive RQG grammar below, the assertion gets hit quite quickly, at least on my machine and on perro. Obviously, instead of the assertion you can do something smart in there.

RQG grammar:
$ cat mdev4723.yy

thread1_init:
SET myisam_repair_threads = 8 ; SET GLOBAL max_allowed_packet = 1024*1024*1024
; DROP TABLE IF EXISTS `t` ; CREATE TABLE `t1` ( `p` INT, KEY (`p`) ) ENGINE=MyISAM ;

thread2:
SHOW PROCESSLIST;

query:
INSERT INTO `t1` SELECT `p` FROM `t1` LIMIT 100 ;

End of RQG grammar

runall command line:

perl ./runall-new.pl --grammar=mdev4723.yy --skip-gendata --threads=4 --queries=100M --duration=600 --basedir=<basedir> --vardir=<vardir>

gentest command line:

perl ./gentest.pl --grammar=mdev4723.yy --threads=4 --queries=100M --duration=600 --dsn='dbi:mysql:host=127.0.0.1:port=3306:user=root:database=test'

You can also try to increase the number of threads.

No specific server options are needed, server can be start with all defaults.
I built server with BUILD/compile-pentium-valgrind-max-no-ndb, ran tests on
bzr version-info
revision-id: monty@askmonty.org-20130611104943-se0nzq67eexlhd9z
revno: 3793
branch-nick: 5.5

If it's not reproducible in your environment, I can set it up on perro, I already checked it fails there.

Elena Stepanova added a comment - 2013-07-03 21:33 I can easily trigger it with a stress test. By far the most frequent value that the field gets populated with, at least in my env, is CR (13). Here are other examples in decimal codes that I was getting on the client side while executing SHOW PROCESSLIST. 63 63 1 16 63 127 13 0 0 63 0 0 0 0 38 0 0 0 0 0 0 0 0 0 63 0 0 etc. Unfortunately I wasn't able to produce a syncpoint-based test case in any reasonable time. I think the problem is not so much the synchronization itself (it seems fairly simple – let the field value be generated, then wait till the query it belongs to finishes, proceed), but the fact that the problem is only visible when the old data is actually corrupted, which I guess is sporadic and depends on the memory layout, and I couldn't figure how to make it reliable. So instead of a sync point I added a check and assertion in the SHOW PROCESSLIST code where it gets corrupted (in mysqld_list_processes(THD *thd,const char *user, bool verbose)): === modified file 'sql/sql_show.cc' — sql/sql_show.cc 2013-05-11 17:31:50 +0000 +++ sql/sql_show.cc 2013-07-02 23:03:21 +0000 @@ -2282,6 +2282,12 @@ thd_info->query_string.charset()); if (!thd->variables.old_mode) protocol->store(thd_info->progress, 3, &store_buffer); + + if ( thd_info && thd_info->state_info && (strlen(thd_info->state_info) > 0) && !my_isalpha(system_charset_info,(thd_info->state_info) [0] ) ) + { + fprintf(stderr,"MDEV-4723: state: '%s' starts with: %i\n", thd_info->state_info, (thd_info->state_info)[0]); + DBUG_ASSERT(0); + } if (protocol->write()) break; /* purecov: inspected */ } (sorry if it's not elegant, but I'm not a practicing developer, my code is supposed to be ugly to make the real coders' code look better). The idea was to make sure that if the state_info exists and isn't empty, then it starts with a letter. Hopefully I did it right. In fact, the value gets broken any time after 2223: thd_info->state_info= thread_state_info(tmp); 2224: if (mysys_var) 2225: mysql_mutex_unlock(&mysys_var->mutex); 2226: 2227: /* Lock THD mutex that protects its data when looking at it. */ I just chose the downmost point, because once broken, it remains broken, so the further down the higher probability. With that and a primitive RQG grammar below, the assertion gets hit quite quickly, at least on my machine and on perro. Obviously, instead of the assertion you can do something smart in there. RQG grammar: $ cat mdev4723.yy thread1_init: SET myisam_repair_threads = 8 ; SET GLOBAL max_allowed_packet = 1024*1024*1024 ; DROP TABLE IF EXISTS `t` ; CREATE TABLE `t1` ( `p` INT, KEY (`p`) ) ENGINE=MyISAM ; thread2: SHOW PROCESSLIST; query: INSERT INTO `t1` SELECT `p` FROM `t1` LIMIT 100 ; End of RQG grammar runall command line: perl ./runall-new.pl --grammar=mdev4723.yy --skip-gendata --threads=4 --queries=100M --duration=600 --basedir=<basedir> --vardir=<vardir> gentest command line: perl ./gentest.pl --grammar=mdev4723.yy --threads=4 --queries=100M --duration=600 --dsn='dbi:mysql:host=127.0.0.1:port=3306:user=root:database=test' You can also try to increase the number of threads. No specific server options are needed, server can be start with all defaults. I built server with BUILD/compile-pentium-valgrind-max-no-ndb, ran tests on bzr version-info revision-id: monty@askmonty.org-20130611104943-se0nzq67eexlhd9z revno: 3793 branch-nick: 5.5 If it's not reproducible in your environment, I can set it up on perro, I already checked it fails there.

Elena Stepanova added a comment - 2013-11-11 18:10

I ran tests on 5.5-serg revision-id: sergii@pisem.net-20131111090500-ur3361v8k40y6v00 / revno: 3960.
The failure triggered by the injected assertion from the provided test case seems to be gone; but the field still happens to contain garbage, although more rarely. I modified the injection this way:

=== modified file 'sql/sql_show.cc'
— sql/sql_show.cc 2013-11-11 08:31:17 +0000
+++ sql/sql_show.cc 2013-11-11 15:42:20 +0000
@@ -2282,6 +2282,13 @@
thd_info->query_string.charset());
if (!thd->variables.old_mode)
protocol->store(thd_info->progress, 3, &store_buffer);
+
+ if ( thd_info && thd_info->state_info && (strlen(thd_info->state_info) > 0) )

{ + fprintf(stderr,"MDEV-4723: state: '%s' starts with: %i\n", thd_info->state_info, (thd_info->state_info)[0]); + DBUG_ASSERT(my_isalpha(system_charset_info,(thd_info->state_info)[0])); + DBUG_ASSERT( (thd_info->state_info)[0] >= 0 ); + }

+
if (protocol->write())
break; /* purecov: inspected */
}

Now it fails on the 2nd DBUG_ASSERT (while on a tree before the fix the first one fails).
The grammar is the same.

I've set up the test on perro under mariadb.

log in as usual;
cd mdev4723
. ./run

It starts server on port 4723 and runs the RQG test. The basedir is ./5.5-serg (symlink). The logs go to ./vardir.
The patch above is already applied there.
Logs from the last failure are in ./vardir (you can see the garbage value printed before the assertion failure in vardir/mysql.err

Elena Stepanova added a comment - 2013-11-11 18:10 I ran tests on 5.5-serg revision-id: sergii@pisem.net-20131111090500-ur3361v8k40y6v00 / revno: 3960. The failure triggered by the injected assertion from the provided test case seems to be gone; but the field still happens to contain garbage, although more rarely. I modified the injection this way: === modified file 'sql/sql_show.cc' — sql/sql_show.cc 2013-11-11 08:31:17 +0000 +++ sql/sql_show.cc 2013-11-11 15:42:20 +0000 @@ -2282,6 +2282,13 @@ thd_info->query_string.charset()); if (!thd->variables.old_mode) protocol->store(thd_info->progress, 3, &store_buffer); + + if ( thd_info && thd_info->state_info && (strlen(thd_info->state_info) > 0) ) { + fprintf(stderr,"MDEV-4723: state: '%s' starts with: %i\n", thd_info->state_info, (thd_info->state_info)[0]); + DBUG_ASSERT(my_isalpha(system_charset_info,(thd_info->state_info)[0])); + DBUG_ASSERT( (thd_info->state_info)[0] >= 0 ); + } + if (protocol->write()) break; /* purecov: inspected */ } Now it fails on the 2nd DBUG_ASSERT (while on a tree before the fix the first one fails). The grammar is the same. I've set up the test on perro under mariadb. log in as usual; cd mdev4723 . ./run It starts server on port 4723 and runs the RQG test. The basedir is ./5.5-serg (symlink). The logs go to ./vardir. The patch above is already applied there. Logs from the last failure are in ./vardir (you can see the garbage value printed before the assertion failure in vardir/mysql.err

MariaDB Server

"State" column of SHOW PROCESSLIST returns wrong values (non-ascii chars) for some states

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration