[CONC-220] core dump in net.c after network problem Created: 2016-12-07  Updated: 2020-03-16  Resolved: 2020-03-16

Status: Closed
Project: MariaDB Connector/C
Component/s: None
Affects Version/s: 2.1
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: William Reich Assignee: Georg Richter
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

linux 64 bit


Attachments: File out2.out     File tarball.tgz    

 Description   

we have a C++ client using mariadb connector C version 2.1.
The db is on machine A, and the client is on machine B.
There was a network failure between the two machines.
Shortly after the network failure was observed, the client coredump'd.

Thread 1 (LWP 4595):
#0 0x0000003383b387c6 in __strncpy_ssse3 () from /lib64/libc.so.6
#1 0x00007f91b1eac87b in net_write_buff (net=0x33a6500, packet=0x7f91a5a10a70 "\001", len=5) at /vob/adc/router/thirdparty/mariadb-connector/libmariadb/net.c:375
#2 0x00007f91b1eacb2c in net_write_command (net=0x33a6500, command=<optimized out>, packet=0x7f91b1ed4542 "", len=0) at /vob/adc/router/thirdparty/mariadb-connector/libmariadb/net.c:329
#3 0x00007f91b1eb21ce in mthd_my_send_cmd (mysql=0x33a6500, command=<optimized out>, arg=0x7f91b1ed4542 "", length=0, skipp_check=1 '\001', opt_arg=<optimized out>) at /vob/adc/router/thirdparty/mariadb-connector/libmariadb/libmariadb.c:593
#4 0x00007f91b1eb077a in mysql_close_slow_part (mysql=0x33a6500) at /vob/adc/router/thirdparty/mariadb-connector/libmariadb/libmariadb.c:2240
#5 0x00007f91b1eb0636 in mysql_close (mysql=0x33a6500) at /vob/adc/router/thirdparty/mariadb-connector/libmariadb/libmariadb.c:2255
#6 0x00000000008b88fd in MySqlClient::close (this=0x1fd0388) at MySqlClient.cpp:192
#7 0x00000000008b8187 in DataBaseClient<MySqlClient>::operator<< (this=0x1fd0380, sql="INSERT into MeasurementStat(statName, statValue, info_id) VALUES('memInstance', '2451320', '9448181984');") at /vob/adc/router/db_client/DataBaseClient.h:195
#8 0x00000000008b6611 in UlticomADC::MySqlWriterThread::processKPIRequest (this=0x1fb4840, kpi=0x62dff00) at /vob/adc/router/meas/MySqlWriterThread.cpp:426
#9 0x00000000008b7aec in UlticomADC::MySqlWriterThread::handleMessage (this=0x1fb4840, m=0x62dff00) at /vob/adc/router/meas/MySqlWriterThread.cpp:104
#10 0x00000000007f518b in UlticomADC::SubsystemThread::threadMain (this=0x1fb4840, pArguments=<optimized out>) at /vob/adc/router/common/SubsystemThread.cpp:117
#11 0x000000000057b36b in UlticomADC::_thread_internal_main_routine (arg=<optimized out>) at /vob/adc/router/include/Thread.h:210
#12 0x0000003383e077f1 in __nptl_setxid () from /lib64/libpthread.so.0
#13 0x0000000000000000 in ?? ()

The attachment ( tarball.tgz ) contains:

  • backtrace of all threads
  • important c files to match core dump
  • gdb commands poking around the core dump.

The result of the core dump inspection reveals that
the code is trying to write to net->write_pos , which has a value of zero.



 Comments   
Comment by William Reich [ 2016-12-07 ]

This error is not repeatable on demand. The 'how to repeat' is unknown.

Comment by William Reich [ 2016-12-08 ]

out2.out The attached file ( out2.out ) gives more details about the
value of *net and *net->vio.

Noting that we have a broken network, the code seems to be trying to close the connection to the mysql db. Somewhere before we got here , net->buff was set to zero.
Then by following the backtrace, we see that net->write_pos will be set to net->buff, which is zero.
Since net->write_pos is zero, we core dump.

So, why is the program trying to write a command onto a connection is that broken.
Even though the net->vio is non-zero, the structure pointed to by net->vio is all zeros ( as verified by the out2.out file ). So, it seems that a check for ( net->vio != NULL ) is not good enough.

Comment by William Reich [ 2016-12-13 ]

anybody out there ?

Comment by Georg Richter [ 2016-12-14 ]

We fixed several things in mysql_close, close_slow_part and close_options - can you try to repeat the issue with latest 2.3 release please?
Can you also provide options like reconnect (p *mysql in gdb would be enough).

Thanks!

Comment by William Reich [ 2016-12-14 ]

sorry - I do not have the ability to experiment.
This error was found on an older version of our product.
The most current version of our product already is using version 2.3 of the connector.

Besides, the files that I supplied in this ticket from MariaDb-connector v 2.1.x
are virtually identical to the 2.3 versions.

??

Comment by William Reich [ 2016-12-14 ]

the content of the variable *mysql is already provided at line 393 of file rob.out
which is contained in the 'tarball.tgz'

Comment by Georg Richter [ 2016-12-15 ]

Sorry, I meant mysql->net.vio - it looks like it is overwritten somehow.

Comment by William Reich [ 2016-12-16 ]

the display of *net->vio is found at line 50 of the out2.out file that is attached to this ticket.

Comment by Georg Richter [ 2016-12-16 ]

Some observations:

1) vio->type is VIO_CLOSED - value was assigned in function vio_close()

  vio->type= VIO_CLOSED;
  vio->sd=   -1;

but p *net->vio shows another value for sd:

sd = 54158592

2) net->buff was set to 0 in function net_end().
net_end is called inside function end_server(), before net->vio is set to 0.

  if (mysql->net.vio != 0)
  {
    init_sigpipe_variables
    set_sigpipe(mysql);
    vio_delete(mysql->net.vio);
    reset_sigpipe(mysql);
    mysql->net.vio= 0;    /* Marker */
  }
  net_end(&mysql->net);

p *net shows another value for vio:

vio = 0x33a6500

The value for vio is exactly the same as for vio->sd: 54158592 is 0x33a6500 - so it looks like an invalid write happens somewhere: either in MariaDB Connector/C or in your application.

With latest C/C 2.3 I wasn't able to reproduce the crash nor did valgrind report any invalid memory operations.

Comment by William Reich [ 2016-12-16 ]

the key piece of information in this ticket is that there was a network failure.
Thus, error logic/handling is play a role in this situation.

Comment by William Reich [ 2016-12-20 ]

my question from Dec 8...
why is the program trying to write a command onto a connection is that broken?
Even though the net->vio is non-zero, the structure pointed to by net->vio is all zeros ( as verified by the out2.out file ). So, it seems that a check for ( net->vio != NULL ) is not good enough.

Generated at Thu Feb 08 03:03:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.