[MDEV-7144] Warnings "bytes lost" during server shutdown after running connect.part_file test in buildbot - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.0(EOL)
Fix Version/s: 10.0.16
Component/s: Data Manipulation - Update
Labels:
- buildbot
- tests

Description

Buildbot example: http://buildbot.askmonty.org/buildbot/builders/kvm-dgcov-jaunty-i386/builds/4449/steps/test_2/logs/stdio

I can reproduce the problem by running the test with valgrind on a debug build. The valgrind report goes OK, but the warnings are generated on shutdown. It happens almost every time when I run the test:

perl ./mtr connect.part_file  --valgrind-mysqld

Logging: ./mtr  connect.part_file --valgrind-mysqld

...

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019

connect.part_file                        [ pass ]   4482

***Warnings generated in error logs during shutdown after running tests: connect.part_file

Warning: 65536 bytes lost at 0xfaea070, allocated by T@0 at

valgrind_report                          [ pass ]

--------------------------------------------------------------------------

The servers were restarted 0 times

Spent 4.482 of 31 seconds executing testcases

Warnings in log: All 2 tests were successful.

Errors/warnings were found in logfiles during server shutdown after running the

following sequence(s) of tests:

    connect.part_file

mysql-test-run: *** ERROR: There where errors/warnings in server logs after running test cases.

Attachments

Issue Links

relates to

MDEV-7069 Fix buildbot failures in main server trees

Closed

Activity

Ascending order - Click to sort in descending order

View 6 older comments

Olivier Bertrand added a comment - 2014-11-21 00:39

Elena,

If you mean that this test should be run with the mentioned change in TDBCSV::PrepareWriting

I didn't mean that. Running the small test without the change is just to check whether the warning is raised because of that particular UPDATE command. What you could do is to split the original test in parts to check which command(s) are causing the warning.

And also checking with the partition table made with another engine.

Thanks and regards,
Olivier

Olivier Bertrand added a comment - 2014-11-21 00:39 Elena, If you mean that this test should be run with the mentioned change in TDBCSV::PrepareWriting I didn't mean that. Running the small test without the change is just to check whether the warning is raised because of that particular UPDATE command. What you could do is to split the original test in parts to check which command(s) are causing the warning. And also checking with the partition table made with another engine. Thanks and regards, Olivier

Elena Stepanova added a comment - 2014-11-21 11:39

Hi Olivier,

No, I am not getting the warnings on your test case. Here is the minimal one which triggers "bytes lost" warnings for me (they look not quite the same as the initial one, but I suppose it's just because there are more of them; the first warning from the three could be the same as the initial unfinished one):

--source include/have_partition.inc

CREATE TABLE t1 (

  id INT NOT NULL,

  msg VARCHAR(32)

) ENGINE=CONNECT TABLE_TYPE=CSV AVG_ROW_LENGTH=10

PARTITION BY RANGE(id) (

PARTITION first VALUES LESS THAN(10),

PARTITION middle VALUES LESS THAN(50),

PARTITION last VALUES LESS THAN(MAXVALUE));

--error ER_GET_ERRMSG

UPDATE t1 set id = 41 WHERE msg = 'four';

perl ./mtr part_file2  --valgrind-mysqld --mysqld=--plugin-load-add=ha_connect

...

CREATE TABLE t1 (

id INT NOT NULL,

msg VARCHAR(32)

) ENGINE=CONNECT TABLE_TYPE=CSV AVG_ROW_LENGTH=10

PARTITION BY RANGE(id) (

PARTITION first VALUES LESS THAN(10),

PARTITION middle VALUES LESS THAN(50),

PARTITION last VALUES LESS THAN(MAXVALUE));

UPDATE t1 set id = 41 WHERE msg = 'four';

ERROR HY000: Got error 174 'Cannot update column id because it is used for partitioning' from CONNECT

***Warnings generated in error logs during shutdown after running tests: main.part_file2

Warning: 65536 bytes lost at 0xfaa3070, allocated by T@0 at mysys/mf_iocache.c:231, mysys/mf_cache.c:69, sql/sql_update.cc:580, sql/sql_parse.cc:3295, sql/sql_parse.cc:6407, sql/sql_parse.cc:1301, sql/sql_parse.cc:996, sql/sql_connect.cc:1379

Warning:    8 bytes lost at 0xf850b70, allocated by T@0 at mysys/my_malloc.c:239, mysys/mf_cache.c:65, sql/sql_update.cc:580, sql/sql_parse.cc:3295, sql/sql_parse.cc:6407, sql/sql_parse.cc:1301, sql/sql_parse.cc:996, sql/sql_connect.cc:1379

Warning:   48 bytes lost at 0xf84ffb0, allocated by T@0 at mysys/my_malloc.c:239, mysys/mf_cache.c:64, sql/sql_update.cc:580, sql/sql_parse.cc:3295, sql/sql_parse.cc:6407, sql/sql_parse.cc:1301, sql/sql_parse.cc:996, sql/sql_connect.cc:1379

mysql-test-run: *** ERROR: Test suite aborted

I tried the same set with MyISAM, InnoDB, and Archive engines (the latter is a dynamic library, like Connect), no warnings there.
Of course, the other difference is that neither of them return an error on the update, it's Connect-specific, maybe the warnings are somehow related to it.

Elena Stepanova added a comment - 2014-11-21 11:39 Hi Olivier, No, I am not getting the warnings on your test case. Here is the minimal one which triggers "bytes lost" warnings for me (they look not quite the same as the initial one, but I suppose it's just because there are more of them; the first warning from the three could be the same as the initial unfinished one): --source include/have_partition.inc CREATE TABLE t1 ( id INT NOT NULL , msg VARCHAR (32) ) ENGINE= CONNECT TABLE_TYPE=CSV AVG_ROW_LENGTH=10 PARTITION BY RANGE(id) ( PARTITION first VALUES LESS THAN(10), PARTITION middle VALUES LESS THAN(50), PARTITION last VALUES LESS THAN(MAXVALUE)); --error ER_GET_ERRMSG UPDATE t1 set id = 41 WHERE msg = 'four' ; perl ./mtr part_file2 --valgrind-mysqld --mysqld=--plugin-load-add=ha_connect ... CREATE TABLE t1 ( id INT NOT NULL, msg VARCHAR(32) ) ENGINE=CONNECT TABLE_TYPE=CSV AVG_ROW_LENGTH=10 PARTITION BY RANGE(id) ( PARTITION first VALUES LESS THAN(10), PARTITION middle VALUES LESS THAN(50), PARTITION last VALUES LESS THAN(MAXVALUE)); UPDATE t1 set id = 41 WHERE msg = 'four'; ERROR HY000: Got error 174 'Cannot update column id because it is used for partitioning' from CONNECT ***Warnings generated in error logs during shutdown after running tests: main.part_file2 Warning: 65536 bytes lost at 0xfaa3070, allocated by T@0 at mysys/mf_iocache.c:231, mysys/mf_cache.c:69, sql/sql_update.cc:580, sql/sql_parse.cc:3295, sql/sql_parse.cc:6407, sql/sql_parse.cc:1301, sql/sql_parse.cc:996, sql/sql_connect.cc:1379 Warning: 8 bytes lost at 0xf850b70, allocated by T@0 at mysys/my_malloc.c:239, mysys/mf_cache.c:65, sql/sql_update.cc:580, sql/sql_parse.cc:3295, sql/sql_parse.cc:6407, sql/sql_parse.cc:1301, sql/sql_parse.cc:996, sql/sql_connect.cc:1379 Warning: 48 bytes lost at 0xf84ffb0, allocated by T@0 at mysys/my_malloc.c:239, mysys/mf_cache.c:64, sql/sql_update.cc:580, sql/sql_parse.cc:3295, sql/sql_parse.cc:6407, sql/sql_parse.cc:1301, sql/sql_parse.cc:996, sql/sql_connect.cc:1379 mysql-test-run: *** ERROR: Test suite aborted I tried the same set with MyISAM, InnoDB, and Archive engines (the latter is a dynamic library, like Connect), no warnings there. Of course, the other difference is that neither of them return an error on the update, it's Connect-specific, maybe the warnings are somehow related to it.

Olivier Bertrand added a comment - 2014-11-21 19:38

Helena,

Great! Thanks to this information I have a better idea of what is happening.
It begins in mysql_update (in sql_update.cc):

int mysql_update:

...

  We are doing a search on a key that is updated. In this case

  we go trough the matching rows, save a pointer to them and

  update these in a separate loop based on the pointer.

      IO_CACHE tempfile;

      if (open_cached_file(&tempfile, mysql_tmpdir,TEMP_PREFIX,

         DISK_BUFFER_SIZE, MYF(MY_WME)))

  goto err;

      /* If quick select is used, initialize it before retrieving rows. */

      if (select && select->quick && select->quick->reset())

        close_cached_file(&tempfile);

        goto err;

      table->file->try_semi_consistent_read(1);

/*

        When we get here, we have one of the following options:

        A. query_plan.index == MAX_KEY

           This means we should use full table scan, and start it with

           init_read_record call

        B. query_plan.index != MAX_KEY

           B.1 quick select is used, start the scan with init_read_record

           B.2 quick select is not used, this is full index scan (with LIMIT)

               Full index scan must be started with init_read_record_idx

*/

      if (query_plan.index == MAX_KEY || (select && select->quick))

        if (init_read_record(&info, thd, table, select, 0, 1, FALSE))

          goto err;

      else

        init_read_record_idx(&info, thd, table, 1, query_plan.index, reverse);

It calls open_cached_file that calls init_io_cache

int init_io_cache:

...

      if ((info->buffer= (uchar*) my_malloc(buffer_block, flags)) != 0)

...

This is where the buffer of 65536 bytes is allocated. When updating is done,
for instance for a MyISAM table, there is a call to close_cached_file

void close_cached_file(IO_CACHE *cache)

  DBUG_ENTER("close_cached_file");

  if (my_b_inited(cache))

    File file=cache->file;

    cache->file= -1;        /* Don't flush data */

    (void) end_io_cache(cache);

    if (file >= 0)

      (void) my_close(file,MYF(0));

#ifdef CANT_DELETE_OPEN_FILES

      if (cache->file_name)

  (void) my_delete(cache->file_name,MYF(MY_WME | ME_NOINPUT));

  my_free(cache->file_name);

#endif

    my_free(cache->dir);

    my_free(cache->prefix);

  DBUG_VOID_RETURN;

and in end_io_cache

int end_io_cache(IO_CACHE *info)

...

    my_free(info->buffer);

...

the allocated memory is freed.

However, in the case of a CONNECT table, when calling init_read_record
different functions are called until ha_connect::rnd_init

int ha_connect::rnd_init(bool scan)

...

  if (OpenTable(g, xmod == MODE_DELETE))

    DBUG_RETURN(HA_ERR_INITIALIZATION);

and in ha_connect::OpenTable

int ha_connect::OpenTable(PGLOBAL g, bool del)

...

          if (part_id && bitmap_is_set(part_id, fp->field_index)) {

            // Trying to update a column used for partitioning

            // This cannot be currently done because it may require

            // a row to be moved in another partition.

            sprintf(g->Message,

              "Cannot update column %s because it is used for partitioning",

p);

            return HA_ERR_INTERNAL_ERROR;

            } // endif part_id

This causes in mysql_update the statement:

        if (init_read_record(&info, thd, table, select, 0, 1, FALSE))

          goto err;

...

err:

  delete select;

  free_underlaid_joins(thd, select_lex);

  table->disable_keyread();

  thd->abort_on_warning= 0;

  DBUG_RETURN(1);

to return in error. Actually in delete select the function close_cached_file
is called but the cache parameter passed to it is not the one that have been
allocated previously and this function does nothing (my_b_inited(cache) is false)
This explains the warnings, memory being lost indeed.

Is this a CONNECT bug or a MariaDB bug? I can't tell but this could happen with
any engine returning an error in this place.

Perhaps, adding to CONNECT the flag HA_NOT_DELETE_WITH_CACHE could remove this problem
but this would be masking it instead of really fix it.

Olivier Bertrand added a comment - 2014-11-21 19:38 Helena, Great! Thanks to this information I have a better idea of what is happening. It begins in mysql_update (in sql_update.cc): int mysql_update: ... We are doing a search on a key that is updated. In this case we go trough the matching rows, save a pointer to them and update these in a separate loop based on the pointer. IO_CACHE tempfile; if (open_cached_file(&tempfile, mysql_tmpdir,TEMP_PREFIX, DISK_BUFFER_SIZE, MYF(MY_WME))) goto err; /* If quick select is used, initialize it before retrieving rows. */ if (select && select->quick && select->quick->reset()) { close_cached_file(&tempfile); goto err; } table->file->try_semi_consistent_read(1); /* When we get here, we have one of the following options: A. query_plan.index == MAX_KEY This means we should use full table scan, and start it with init_read_record call B. query_plan.index != MAX_KEY B.1 quick select is used, start the scan with init_read_record B.2 quick select is not used, this is full index scan (with LIMIT) Full index scan must be started with init_read_record_idx */ if (query_plan.index == MAX_KEY || (select && select->quick)) { if (init_read_record(&info, thd, table, select, 0, 1, FALSE)) goto err; } else init_read_record_idx(&info, thd, table, 1, query_plan.index, reverse); It calls open_cached_file that calls init_io_cache int init_io_cache: ... if ((info->buffer= (uchar*) my_malloc(buffer_block, flags)) != 0) ... This is where the buffer of 65536 bytes is allocated. When updating is done, for instance for a MyISAM table, there is a call to close_cached_file void close_cached_file(IO_CACHE *cache) { DBUG_ENTER("close_cached_file"); if (my_b_inited(cache)) { File file=cache->file; cache->file= -1; /* Don't flush data */ (void) end_io_cache(cache); if (file >= 0) { (void) my_close(file,MYF(0)); #ifdef CANT_DELETE_OPEN_FILES if (cache->file_name) { (void) my_delete(cache->file_name,MYF(MY_WME | ME_NOINPUT)); my_free(cache->file_name); } #endif } my_free(cache->dir); my_free(cache->prefix); } DBUG_VOID_RETURN; } and in end_io_cache int end_io_cache(IO_CACHE *info) ... my_free(info->buffer); ... the allocated memory is freed. However, in the case of a CONNECT table, when calling init_read_record different functions are called until ha_connect::rnd_init int ha_connect::rnd_init(bool scan) ... if (OpenTable(g, xmod == MODE_DELETE)) DBUG_RETURN(HA_ERR_INITIALIZATION); and in ha_connect::OpenTable int ha_connect::OpenTable(PGLOBAL g, bool del) ... if (part_id && bitmap_is_set(part_id, fp->field_index)) { // Trying to update a column used for partitioning // This cannot be currently done because it may require // a row to be moved in another partition. sprintf(g->Message, "Cannot update column %s because it is used for partitioning", p); return HA_ERR_INTERNAL_ERROR; } // endif part_id This causes in mysql_update the statement: if (init_read_record(&info, thd, table, select, 0, 1, FALSE)) goto err; ... err: delete select; free_underlaid_joins(thd, select_lex); table->disable_keyread(); thd->abort_on_warning= 0; DBUG_RETURN(1); to return in error. Actually in delete select the function close_cached_file is called but the cache parameter passed to it is not the one that have been allocated previously and this function does nothing (my_b_inited(cache) is false) This explains the warnings, memory being lost indeed. Is this a CONNECT bug or a MariaDB bug? I can't tell but this could happen with any engine returning an error in this place. Perhaps, adding to CONNECT the flag HA_NOT_DELETE_WITH_CACHE could remove this problem but this would be masking it instead of really fix it.

Elena Stepanova added a comment - 2014-11-21 22:08

Is this a CONNECT bug or a MariaDB bug?

It's probably best to ask serg about it.

Elena Stepanova added a comment - 2014-11-21 22:08 Is this a CONNECT bug or a MariaDB bug? It's probably best to ask serg about it.

Sergei Golubchik added a comment - 2014-11-21 23:51 - edited

Looks like a MariaDB bug to me. Another goto err few lines above is correctly accompanied with the close_cached_file(&tempfile).

I'll fix it. Thanks for pointing out exactly what line to fix!

Sergei Golubchik added a comment - 2014-11-21 23:51 - edited Looks like a MariaDB bug to me. Another goto err few lines above is correctly accompanied with the close_cached_file(&tempfile) . I'll fix it. Thanks for pointing out exactly what line to fix!

People

Assignee:: Sergei Golubchik

Reporter:: Elena Stepanova

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2014-11-20 00:18

Updated:: 2015-01-15 20:06

Resolved:: 2015-01-15 20:06

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server