[MDEV-12084] Test timeouts on CentOS - interaction between server and client stops Created: 2017-02-19  Updated: 2020-06-04

Status: Open
Project: MariaDB Server
Component/s: Scripts & Clients
Affects Version/s: 10.2
Fix Version/s: 10.2

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Georg Richter
Resolution: Unresolved Votes: 0
Labels: 10.2-ga

Attachments: File test2.sh    
Issue Links:
Duplicate
duplicates MDEV-11805 main.alter_table_trans failed in buil... Closed
Relates
relates to MDEV-12174 sys_vars.innodb_status_output_basic f... Closed
relates to MDEV-12788 Server hangs or Assertion `0' fails i... Closed
relates to MDEV-22672 10.5 bintar builders Closed
relates to MDEV-12247 Server does not respond for simple qu... Closed

 Description   

We are getting random (rare) timeouts in buildbot on kvm-bintar-centos5-amd64. Here is an example:

http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-centos5-amd64/builds/4623/steps/test/logs/stdio

parts.partition_float_myisam             w2 [ fail ]  timeout after 900 seconds
        Test ended at 2017-02-08 22:14:47
 
Test case timeout after 900 seconds
 
== /usr/local/mariadb-10.2.4-linux-x86_64/mysql-test/var/2/log/partition_float_myisam.log == 
-2.2250738585072014e-208
0
1.5
1234.567
2.2250738585072016e208
select * from t2 where a=1234.567;
a
1234.567
delete from t2 where a=1234.567;
select * from t2;
a
-2.2250738585072016e208
-1.5
-1
-2.2250738585072014e-208
0
1.5
2.2250738585072016e208
delete from t2;
16384*3 inserts;

Normally the test takes ~20 seconds there.

The problem is reproducible by running the test on the VM with very high --repeat value – it happens, very roughly, once in 300-400 runs.

The main part of the test is running 16K of single-row INSERTs into a MyISAM table. There is no concurrency, INSERTs are executed one by one. The problem occurs somewhere during this process, in random places, not at a specific row. When it happens, the server is still reachable, show process shows the connection being idle:

MariaDB [(none)]> show processlist;
+----+------+-----------------+------+---------+------+-------+------------------+----------+
| Id | User | Host            | db   | Command | Time | State | Info             | Progress |
+----+------+-----------------+------+---------+------+-------+------------------+----------+
| 11 | root | localhost       | test | Sleep   |  188 |       | NULL             |    0.000 |
| 12 | root | localhost:39097 | NULL | Query   |    0 | init  | show processlist |    0.000 |
+----+------+-----------------+------+---------+------+-------+------------------+----------+
2 rows in set (0.00 sec)

(connection 11 is the one which has been performing updates). The table is healthy and responds to SELECTs. There is no disk space problem.

When it stops, it stops seemingly forever. However, running gdb on the server sometimes "wakes it up", makes the flow to be resumed.

Example of a stack trace from the server when it's in this state:

Thread 7 (Thread 0x40a5f940 (LWP 31197)):
#0  0x00000036df40ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000de58b9 in inline_mysql_cond_timedwait (arg=<value optimized out>) at /home/buildbot/buildbot/build/include/mysql/psi/mysql_thread.h:1175
#2  timer_handler (arg=<value optimized out>) at /home/buildbot/buildbot/build/mysys/thr_timer.c:292
#3  0x0000000000ce2d19 in pfs_spawn_thread (arg=<value optimized out>) at /home/buildbot/buildbot/build/storage/perfschema/pfs.cc:1862
#4  0x00000036df406367 in start_thread () from /lib64/libpthread.so.0
#5  0x00000036decd2f7d in clone () from /lib64/libc.so.6
 
Thread 6 (Thread 0x4149f940 (LWP 31198)):
#0  0x00000036df40ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000d217b7 in inline_mysql_cond_timedwait (control=0x16e7020, sleep_time=<value optimized out>) at /home/buildbot/buildbot/build/include/mysql/psi/mysql_thread.h:1175
#2  my_service_thread_sleep (control=0x16e7020, sleep_time=<value optimized out>) at /home/buildbot/buildbot/build/storage/maria/ma_servicethread.c:115
#3  0x0000000000d1acd4 in ma_checkpoint_background (arg=<value optimized out>) at /home/buildbot/buildbot/build/storage/maria/ma_checkpoint.c:708
#4  0x0000000000ce2d19 in pfs_spawn_thread (arg=<value optimized out>) at /home/buildbot/buildbot/build/storage/perfschema/pfs.cc:1862
#5  0x00000036df406367 in start_thread () from /lib64/libpthread.so.0
#6  0x00000036decd2f7d in clone () from /lib64/libc.so.6
 
Thread 5 (Thread 0x41f80540 (LWP 31199)):
#0  0x00000036df40e1e8 in do_sigwait () from /lib64/libpthread.so.0
#1  0x00000036df40e28d in sigwait () from /lib64/libpthread.so.0
#2  0x00000000005461eb in signal_hand (arg=<value optimized out>) at /home/buildbot/buildbot/build/sql/mysqld.cc:3560
#3  0x0000000000ce2d19 in pfs_spawn_thread (arg=<value optimized out>) at /home/buildbot/buildbot/build/storage/perfschema/pfs.cc:1862
#4  0x00000036df406367 in start_thread () from /lib64/libpthread.so.0
#5  0x00000036decd2f7d in clone () from /lib64/libc.so.6
 
Thread 4 (Thread 0x41fca540 (LWP 31200)):
#0  0x00000036df40a899 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000056aa90 in inline_mysql_cond_wait (arg=<value optimized out>) at /home/buildbot/buildbot/build/include/mysql/psi/mysql_thread.h:1138
#2  handle_slave_background (arg=<value optimized out>) at /home/buildbot/buildbot/build/sql/slave.cc:339
#3  0x0000000000ce2d19 in pfs_spawn_thread (arg=<value optimized out>) at /home/buildbot/buildbot/build/storage/perfschema/pfs.cc:1862
#4  0x00000036df406367 in start_thread () from /lib64/libpthread.so.0
#5  0x00000036decd2f7d in clone () from /lib64/libc.so.6
 
Thread 3 (Thread 0x42014540 (LWP 31203)):
#0  0x00000036decca436 in poll () from /lib64/libc.so.6
#1  0x0000000000e7cf73 in vio_io_wait (vio=0x2aaac242b808, event=<value optimized out>, timeout=28800000) at /home/buildbot/buildbot/build/vio/viosocket.c:945
#2  0x0000000000e7d2f3 in vio_socket_io_wait (vio=0x420135e0, event=VIO_IO_EVENT_WRITE) at /home/buildbot/buildbot/build/vio/viosocket.c:108
#3  0x0000000000e7d57e in vio_read (vio=0x2aaac242b808, buf=0x2aaaacc34008 "\a", size=4) at /home/buildbot/buildbot/build/vio/viosocket.c:184
#4  0x000000000054ed89 in my_real_read (net=0x2aaaacc14248, complen=0x42013780, header=1 '\001') at /home/buildbot/buildbot/build/sql/net_serv.cc:888
#5  0x000000000054f19d in my_net_read_packet_reallen (net=0x420135e0, read_from_server=1 '\001', reallen=0x420137c8) at /home/buildbot/buildbot/build/sql/net_serv.cc:1158
#6  0x000000000054f29d in my_net_read_packet (net=0x420135e0, read_from_server=1 '\001') at /home/buildbot/buildbot/build/sql/net_serv.cc:1142
#7  0x00000000005f4988 in do_command (thd=0x2aaaacc14008) at /home/buildbot/buildbot/build/sql/sql_parse.cc:1242
#8  0x00000000006d1fd5 in do_handle_one_connection (connect=0x1) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1354
#9  0x00000000006d219f in handle_one_connection (arg=0x2aaac245a508) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1260
#10 0x0000000000ce2d19 in pfs_spawn_thread (arg=<value optimized out>) at /home/buildbot/buildbot/build/storage/perfschema/pfs.cc:1862
#11 0x00000036df406367 in start_thread () from /lib64/libpthread.so.0
#12 0x00000036decd2f7d in clone () from /lib64/libc.so.6
 
Thread 2 (Thread 0x4205e540 (LWP 31248)):
#0  0x00000036df40ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000054da53 in one_thread_per_connection_end (thd=0x2aaaad414008, put_in_cache=<value optimized out>)
    at /home/buildbot/buildbot/build/include/mysql/psi/mysql_thread.h:1175
#2  0x00000000006d1f38 in do_handle_one_connection (connect=<value optimized out>) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1373
#3  0x00000000006d219f in handle_one_connection (arg=0x2aaac245a8c8) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1260
#4  0x0000000000ce2d19 in pfs_spawn_thread (arg=<value optimized out>) at /home/buildbot/buildbot/build/storage/perfschema/pfs.cc:1862
#5  0x00000036df406367 in start_thread () from /lib64/libpthread.so.0
#6  0x00000036decd2f7d in clone () from /lib64/libc.so.6
 
Thread 1 (Thread 0x2aaac1ea2730 (LWP 31196)):
#0  0x00000036decca436 in poll () from /lib64/libc.so.6
#1  0x00000000005451e3 in handle_connections_sockets () at /home/buildbot/buildbot/build/sql/mysqld.cc:6626
#2  0x000000000054bc6d in mysqld_main (argc=127, argv=0x2aaac2447a58) at /home/buildbot/buildbot/build/sql/mysqld.cc:6092
#3  0x00000036dec1d974 in __libc_start_main () from /lib64/libc.so.6
#4  0x0000000000540909 in _start ()

Example of a stack trace from mysqltest:

Thread 1 (Thread 0x2b675ed3ead0 (LWP 31223)):
#0  0x00000036decca3ff in poll () from /lib64/libc.so.6
#1  0x0000000000467d90 in pvio_socket_wait_io_or_timeout (pvio=<value optimized out>, is_read=<value optimized out>, timeout=-1)
    at /home/buildbot/buildbot/build/libmariadb/plugins/pvio/pvio_socket.c:511
#2  0x0000000000467e27 in pvio_socket_read (pvio=0x7fff4c072860, buffer=0x10cd4450 "\a", length=16384) at /home/buildbot/buildbot/build/libmariadb/plugins/pvio/pvio_socket.c:279
#3  0x0000000000455df0 in ma_pvio_read (pvio=0x10cd43e0, buffer=0x10cd4450 "\a", length=16384) at /home/buildbot/buildbot/build/libmariadb/libmariadb/ma_pvio.c:251
#4  0x0000000000455f01 in ma_pvio_cache_read (pvio=0x7fff4c072860, buffer=0x10cd8460 "\"", length=4) at /home/buildbot/buildbot/build/libmariadb/libmariadb/ma_pvio.c:293
#5  0x000000000046b6fb in ma_real_read (net=0x10cc2c10, complen=0x7fff4c0729b0) at /home/buildbot/buildbot/build/libmariadb/libmariadb/ma_net.c:376
#6  0x000000000046b861 in ma_net_read (net=0x7fff4c072860) at /home/buildbot/buildbot/build/libmariadb/libmariadb/ma_net.c:430
#7  0x0000000000453131 in ma_net_safe_read (mysql=0x10cc2c10) at /home/buildbot/buildbot/build/libmariadb/libmariadb/mariadb_lib.c:188
#8  0x0000000000453465 in mthd_my_read_query_result (mysql=0x7fff4c072860) at /home/buildbot/buildbot/build/libmariadb/libmariadb/mariadb_lib.c:1955
#9  0x000000000044dfbe in mysql_read_query_result (mysql=0x7fff4c072860) at /home/buildbot/buildbot/build/libmariadb/libmariadb/mariadb_lib.c:2136
#10 0x0000000000449f29 in wrap_mysql_read_query_result (cn=0x10cbff18, command=0x10ce5528, flags=<value optimized out>, query=0x1 <Address 0x1 out of bounds>, query_len=0, 
    ds=0xbb2e20, ds_warnings=0x7fff4c072e10) at /home/buildbot/buildbot/build/client/../tests/nonblock-wrappers.h:324
#11 run_query_normal (cn=0x10cbff18, command=0x10ce5528, flags=<value optimized out>, query=0x1 <Address 0x1 out of bounds>, query_len=0, ds=0xbb2e20, ds_warnings=0x7fff4c072e10)
    at /home/buildbot/buildbot/build/client/mysqltest.cc:7745
#12 0x000000000044a958 in run_query (cn=0x10cbff18, command=0x10ce5528, flags=3) at /home/buildbot/buildbot/build/client/mysqltest.cc:8516
#13 0x000000000044cd1b in main (argc=<value optimized out>, argv=<value optimized out>) at /home/buildbot/buildbot/build/client/mysqltest.cc:9296



 Comments   
Comment by Elena Stepanova [ 2017-03-11 ]

Possibly also related:
http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-centos5-amd64/builds/4783/steps/test/logs/stdio

main.mysqldump 'xtradb'                  w2 [ fail ]  timeout after 900 seconds
        Test ended at 2017-03-09 08:36:55
 
Test case timeout after 900 seconds
 
== /usr/local/mariadb-10.1.22-linux-x86_64/mysql-test/var/2/log/mysqldump.log == 
  `a` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;
DROP TABLE IF EXISTS `t2`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `t2` (
  `a` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;
 
/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;
 
 
 == /usr/local/mariadb-10.1.22-linux-x86_64/mysql-test/var/2/tmp/analyze-timeout-mysqld.1.err ==
mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/usr/local/mariadb-10.1.22-linux-x86_64/mysql-test/var/tmp/2/mysqld.1.sock' (111 "Connection refused")

Comment by Daniel Lee (Inactive) [ 2017-05-05 ]

I also ran into this issue when I was testing TEXT/BLOB in ColumnStore. It gets triggered randomly in different tests.

mysql was running at 100%, but the test stuck.

MariaDB [(none)]> show processlist;
+-----+-------------+-----------+--------+---------+------+--------------------------+------------------+----------+
| Id  | User        | Host      | db     | Command | Time | State                    | Info             | Progress |
+-----+-------------+-----------+--------+---------+------+--------------------------+------------------+----------+
|   1 | system user |           | NULL   | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|   3 | system user |           | NULL   | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|   2 | system user |           | NULL   | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|   4 | system user |           | NULL   | Daemon  | NULL | InnoDB purge coordinator | NULL             |    0.000 |
|   5 | system user |           | NULL   | Daemon  | NULL | InnoDB shutdown handler  | NULL             |    0.000 |
| 149 | root        | localhost | mytest | Sleep   |  955 |                          | NULL             |    0.000 |
| 150 | root        | localhost | NULL   | Query   |    0 | init                     | show processlist |    0.000 |
+-----+-------------+-----------+--------+---------+------+--------------------------+------------------+----------+

Back trace on the mysql process

0x00007fd3adb0d680 in __poll_nocancel () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fd3adb0d680 in __poll_nocancel () from /lib64/libc.so.6
#1  0x00007fd3af97e567 in poll (__timeout=-1, __nfds=1, __fds=0x7fffa12c71c0) at /usr/include/bits/poll2.h:46
#2  pvio_socket_wait_io_or_timeout (pvio=<optimized out>, is_read=is_read@entry=1 '\001', timeout=-1)
   at /root/columnstore/mariadb-columnstore-server/libmariadb/plugins/pvio/pvio_socket.c:511
#3  0x00007fd3af97e5b4 in pvio_socket_read (pvio=<optimized out>, buffer=0x7fd3b1d041c0 "\a", length=16384)
   at /root/columnstore/mariadb-columnstore-server/libmariadb/plugins/pvio/pvio_socket.c:279
#4  0x00007fd3af96d8b7 in ma_pvio_read (pvio=pvio@entry=0x7fd3b1d04150, buffer=0x7fd3b1d041c0 "\a", length=length@entry=16384)
   at /root/columnstore/mariadb-columnstore-server/libmariadb/libmariadb/ma_pvio.c:251
#5  0x00007fd3af96d9e8 in ma_pvio_cache_read (pvio=0x7fd3b1d04150, buffer=buffer@entry=0x7fd3b1d081f0 "\200", <incomplete sequence \352>,
   length=length@entry=4) at /root/columnstore/mariadb-columnstore-server/libmariadb/libmariadb/ma_pvio.c:293
#6  0x00007fd3af981ad7 in ma_real_read (net=0x7fd3aff09080 <mysql>, complen=complen@entry=0x7fffa12c72f0)
   at /root/columnstore/mariadb-columnstore-server/libmariadb/libmariadb/ma_net.c:376
#7  0x00007fd3af9824ed in ma_net_read (net=net@entry=0x7fd3aff09080 <mysql>)
   at /root/columnstore/mariadb-columnstore-server/libmariadb/libmariadb/ma_net.c:430
#8  0x00007fd3af96a04c in ma_net_safe_read (mysql=mysql@entry=0x7fd3aff09080 <mysql>)
   at /root/columnstore/mariadb-columnstore-server/libmariadb/libmariadb/mariadb_lib.c:188
#9  0x00007fd3af96cc5b in mthd_my_read_query_result (mysql=0x7fd3aff09080 <mysql>)
   at /root/columnstore/mariadb-columnstore-server/libmariadb/libmariadb/mariadb_lib.c:1955
#10 0x00007fd3af95f0ca in mysql_real_query_for_lazy (
   buf=0x7fd3b1d3b8d8 "insert into text1 values (1,'abcdefghij0123456789abcdefghij0123456789abcdefghij0123456789abcdefghij0123456789abcdefghij0123456789abcdefghij0123456789abcdefghij0123456789abcdefghij0123456789abcdefghij0"..., length=60031)
   at /root/columnstore/mariadb-columnstore-server/client/mysql.cc:2989
#11 0x00007fd3af961a14 in com_go (buffer=0x7fd3aff08fa0 <glob_buffer>, line=<optimized out>)
   at /root/columnstore/mariadb-columnstore-server/client/mysql.cc:3252
#12 0x00007fd3af962d7f in add_line (buffer=..., truncated=false, ml_comment=<synthetic pointer>, in_string=<synthetic pointer>,
   line_length=60032,
   line=0x7fd3b1d2b8b8 "insert into text1 values (1,'abcdefghij0123456789abcdefghij0123456789abcdefghij0123456789abcdefghij0123456789abcdefghij0123456789abcdefghij0123456789abcdefghij0123456789abcdefghij0123456789abcdefghij0"...)
   at /root/columnstore/mariadb-columnstore-server/client/mysql.cc:2422
#13 read_and_execute (interactive=<optimized out>) at /root/columnstore/mariadb-columnstore-server/client/mysql.cc:2124
#14 0x00007fd3af95b669 in main (argc=9, argv=0x7fd3b1d006e0) at /root/columnstore/mariadb-columnstore-server/client/mysql.cc:1291

Comment by Elena Stepanova [ 2017-05-06 ]

dleeyh, could you please specify the environment?
OS, architecture, MariaDB version, build options.

Comment by Elena Stepanova [ 2017-05-06 ]

I tried the same test

perl ./mysql-test-run.pl --testcase-timeout=300 parts.partition_float_myisam --repeat=1000

on CentOS 5, 6, 7.3 (vm-centos5-amd64-build.qcow2, vm-centos6-amd64-build.qcow2, vm-centos73-amd64-build.qcow2), same 10.2 source tarball, same build options (cmake . && make.
I'm still getting the problem only on CentOS 5, but not on 6 or 7.3. Any information about other affected systems might be helpful.

Comment by Elena Stepanova [ 2017-05-07 ]

Another curious observation. I've set testcase-timeout=3000 and suite-timeout=3000 (that is, 50 hours). When I hit the problem with these settings, the test had hung for 8 hours, and then it continued normally. 8 hours must be the connection timeout, there is no big mystery in the value; but it did not fail, did not disconnect, it just continued normally after the timeout, whatever it means (and later after a number of iterations it hit the problem again in the same run).

Comment by Andrew Hutchings (Inactive) [ 2017-05-07 ]

Just managed to reproduce this on CentOS 7 in a semi-easily reproduce it this way.

This is my create table:

CREATE TABLE `tb` (
  `a` int(11) DEFAULT NULL,
  `b` text DEFAULT NULL
) ENGINE=MyISAM

Script is test2.sh

Create the table in test, run the script, hit Ctrl-C. This might take a few tries but when it causes "Ctrl-C – query killed. Continuing normally." the client hangs with 100% CPU usage and that stack trace.

#0  0x00007f31b51b6de0 in __poll_nocancel () from /lib64/libc.so.6
#1  0x00007f31b703535f in pvio_socket_wait_io_or_timeout (pvio=0x7f31b7d925e0, is_read=1 '\001', timeout=-1) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/libmariadb/plugins/pvio/pvio_socket.c:511
#2  0x00007f31b703505e in pvio_socket_read (pvio=0x7f31b7d925e0, buffer=0x7f31b7d92650 "\001", length=16384) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/libmariadb/plugins/pvio/pvio_socket.c:279
#3  0x00007f31b701bd16 in ma_pvio_read (pvio=0x7f31b7d925e0, buffer=0x7f31b7d92650 "\001", length=16384) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/libmariadb/libmariadb/ma_pvio.c:251
#4  0x00007f31b701bec8 in ma_pvio_cache_read (pvio=0x7f31b7d925e0, buffer=0x7f31b7d96680 <incomplete sequence \352>, length=4) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/libmariadb/libmariadb/ma_pvio.c:293
#5  0x00007f31b703aeff in ma_real_read (net=0x7f31b75f1ae0 <mysql>, complen=0x7ffdc4b9deb8) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/libmariadb/libmariadb/ma_net.c:376
#6  0x00007f31b703b12f in ma_net_read (net=0x7f31b75f1ae0 <mysql>) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/libmariadb/libmariadb/ma_net.c:430
#7  0x00007f31b7010d1c in ma_net_safe_read (mysql=0x7f31b75f1ae0 <mysql>) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/libmariadb/libmariadb/mariadb_lib.c:188
#8  0x00007f31b701582c in mthd_my_read_query_result (mysql=0x7f31b75f1ae0 <mysql>) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/libmariadb/libmariadb/mariadb_lib.c:1955
#9  0x00007f31b7016438 in mysql_real_query (mysql=0x7f31b75f1ae0 <mysql>, query=0x7f31b7dbb060 "insert into tb values (1,'A wonderful serenity has taken possession of my entire soul, like these sweet mornings of spring which I enjoy with my whole heart. I am alone, and feel the charm of existenc"..., length=60028) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/libmariadb/libmariadb/mariadb_lib.c:2152
#10 0x00007f31b7006cdf in mysql_real_query_for_lazy (buf=0x7f31b7dbb060 "insert into tb values (1,'A wonderful serenity has taken possession of my entire soul, like these sweet mornings of spring which I enjoy with my whole heart. I am alone, and feel the charm of existenc"..., length=60028) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/client/mysql.cc:2989
#11 0x00007f31b7007a2f in com_go (buffer=0x7f31b75f2080 <glob_buffer>, line=0x0) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/client/mysql.cc:3252
#12 0x00007f31b700499e in read_and_execute (interactive=false) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/client/mysql.cc:2136
#13 0x00007f31b7003510 in main (argc=6, argv=0x7f31b7d7fb68) at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/client/mysql.cc:1291

GDB showing the loop it is stuck on:

511	      rc= poll(&p_fd, 1, timeout);
(gdb) p timeout
$1 = -1
(gdb) n
Single stepping until exit from function __poll_nocancel,
which has no line number information.
pvio_socket_wait_io_or_timeout (pvio=0x7f31b7d925e0, is_read=1 '\001', 
    timeout=-1)
    at /home/linuxjedi/Programming/Git/mariadb-columnstore-server/libmariadb/plugins/pvio/pvio_socket.c:512
512	    } while (rc == -1 || errno == EINTR);
(gdb) 
511	      rc= poll(&p_fd, 1, timeout);
(gdb) 
512	    } while (rc == -1 || errno == EINTR);
(gdb) 
511	      rc= poll(&p_fd, 1, timeout);
(gdb) 
512	    } while (rc == -1 || errno == EINTR);
(gdb) 
511	      rc= poll(&p_fd, 1, timeout);

Comment by Elena Stepanova [ 2017-05-07 ]

My variation of the problem is a bit different, in my case the client does not consume any CPU, it seems to be just stalling. The stack trace is very similar.

I think both variants need to be analyzed in order to make sure it's really the same issue.

Given the new evidence that the problem is not limited to the EOL-ed CentOS 5, I think it needs to be considered high priority.

Comment by Georg Richter [ 2017-05-08 ]

I think this issue was fixed already in commit 44a740c348544acce35f289221a945941dc31979.

I also wonder why there is a buildbot running CentOS5. According to https://mariadb.com/kb/en/mariadb/deprecation-policy/ CentOS5 is not supported anymore.

Comment by Andrew Hutchings (Inactive) [ 2017-05-08 ]

Agreed, that commit fixes the problem in my CentOS 7 tests. Unfortunately that commit isn't in the 10.2 submodule checkout yet.

Comment by Elena Stepanova [ 2017-05-08 ]

serg, do you expect the commit 44a740c348544acce35f289221a945941dc31979 will make it to the 10.2 submodule before the release? I think it should.

georg, We don't build RPM packages on CentOS 5 anymore, but there is an "old" bintar that we provide, for systems with old glibc etc. Historically, it's built on CentOS 5, and it stays this way, so that we don't introduce sudden differences in that bintar.

I'll try to re-check the mentioned commit on CentOS 5; but better still, let's have it in 10.2 tree, since it's already known to fix the problem on CentOS 7.

Generated at Thu Feb 08 07:54:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.