Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7807

information_schema.processlist truncates queries with binary strings

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.16
    • 10.1.5
    • Character Sets
    • None

    Description

      When I start a statement with binary data the column INFO of information_schema.processlist is truncated at the first byte > 128. show full processlist shows the full query in it's Info column.

      My test query is generated and run by this perl script:

      perl -e "use DBI; $dbh = DBI->connect(qw(dbi:mysql:host=myhost;database=mydb myuser mypass)); $dbh->do(qq{select sleep(30+0*?)}, undef, join '', map chr, 0..255);"

      While the sleep is still asleep, use

      select * from information_schema.processlist;
      show full processlist;

      or look at the info column specifically:

      select info from information_schema.processlist where state = 'User sleep';

      . With

      select char_length(info), length(info) from information_schema.processlist where state = 'User sleep';

      which results in

      154 154

      you can verify that the values are truncated inside the server, this is not a problem with the client connection or the client.

      For comparison use

      perl -e "use DBI; $dbh = DBI->connect(qw(dbi:mysql:host=myhost;database=mydb myuser mypass)); $dbh->do(qq{select sleep(30+0*?)}, undef, join '', map chr, map $_ % 128, 0..255);"

      and

      select char_length(info), length(info) from information_schema.processlist where state = 'User sleep';

      with the result

      291 291

      The character set settings are:

      'character_set_client', 'utf8'
      'character_set_connection', 'utf8'
      'character_set_database', 'latin1'
      'character_set_filesystem', 'binary'
      'character_set_results', 'utf8'
      'character_set_server', 'latin1'
      'character_set_system', 'utf8'

      Attachments

        Activity

          ralfneubauer2 Ralf Neubauer created issue -
          serg Sergei Golubchik made changes -
          Field Original Value New Value
          Description When I start a statement with binary data the column INFO of information_schema.processlist is truncated at the first byte > 128. show full processlist shows the full query in it's Info column.


          My test query is generated and run by this perl script:

          perl -e "use DBI; $dbh = DBI->connect(qw(dbi:mysql:host=myhost;database=mydb myuser mypass)); $dbh->do(qq{select sleep(30+0*?)}, undef, join '', map chr, 0..255);"

          While the sleep is still asleep, use

          select * from information_schema.processlist;
          show full processlist;

          or look at the info column specifically:

          select info from information_schema.processlist where state = 'User sleep';

          . With

          select char_length(info), length(info) from information_schema.processlist where state = 'User sleep';

          which results in

          154 154

          you can verify that the values are truncated inside the server, this is not a problem with the client connection or the client.

          For comparison use

          perl -e "use DBI; $dbh = DBI->connect(qw(dbi:mysql:host=myhost;database=mydb myuser mypass)); $dbh->do(qq{select sleep(30+0*?)}, undef, join '', map chr, map $_ % 128, 0..255);"

          and

          select char_length(info), length(info) from information_schema.processlist where state = 'User sleep';

          with the result

          291 291


          The character set settings are:

          'character_set_client', 'utf8'
          'character_set_connection', 'utf8'
          'character_set_database', 'latin1'
          'character_set_filesystem', 'binary'
          'character_set_results', 'utf8'
          'character_set_server', 'latin1'
          'character_set_system', 'utf8'
          When I start a statement with binary data the column INFO of information_schema.processlist is truncated at the first byte > 128. show full processlist shows the full query in it's Info column.

          My test query is generated and run by this perl script:
          {noformat}
          perl -e "use DBI; $dbh = DBI->connect(qw(dbi:mysql:host=myhost;database=mydb myuser mypass)); $dbh->do(qq{select sleep(30+0*?)}, undef, join '', map chr, 0..255);"
          {noformat}
          While the sleep is still asleep, use
          {noformat}
          select * from information_schema.processlist;
          show full processlist;
          {noformat}
          or look at the info column specifically:
          {noformat}
          select info from information_schema.processlist where state = 'User sleep';
          {noformat}
          . With
          {noformat}
          select char_length(info), length(info) from information_schema.processlist where state = 'User sleep';
          {noformat}
          which results in
          {noformat}
          154 154
          {noformat}
          you can verify that the values are truncated inside the server, this is not a problem with the client connection or the client.

          For comparison use
          {noformat}
          perl -e "use DBI; $dbh = DBI->connect(qw(dbi:mysql:host=myhost;database=mydb myuser mypass)); $dbh->do(qq{select sleep(30+0*?)}, undef, join '', map chr, map $_ % 128, 0..255);"
          {noformat}
          and
          {noformat}
          select char_length(info), length(info) from information_schema.processlist where state = 'User sleep';
          {noformat}
          with the result
          {noformat}
          291 291
          {noformat}

          The character set settings are:
          {noformat}
          'character_set_client', 'utf8'
          'character_set_connection', 'utf8'
          'character_set_database', 'latin1'
          'character_set_filesystem', 'binary'
          'character_set_results', 'utf8'
          'character_set_server', 'latin1'
          'character_set_system', 'utf8'
          {noformat}

          I think that's what happening: your character_set_client is utf8, the query is expected to be a valid utf8 string. But it is not. The 'info' column of the information_schema.processlist is also utf8. When the server tries to convert an invalid utf8 string to a valid utf8 string — you get a truncation at the first invalid character.

          serg Sergei Golubchik added a comment - I think that's what happening: your character_set_client is utf8, the query is expected to be a valid utf8 string. But it is not. The 'info' column of the information_schema.processlist is also utf8. When the server tries to convert an invalid utf8 string to a valid utf8 string — you get a truncation at the first invalid character.
          serg Sergei Golubchik made changes -
          Due Date 2015-04-20
          elenst Elena Stepanova made changes -
          Description When I start a statement with binary data the column INFO of information_schema.processlist is truncated at the first byte > 128. show full processlist shows the full query in it's Info column.

          My test query is generated and run by this perl script:
          {noformat}
          perl -e "use DBI; $dbh = DBI->connect(qw(dbi:mysql:host=myhost;database=mydb myuser mypass)); $dbh->do(qq{select sleep(30+0*?)}, undef, join '', map chr, 0..255);"
          {noformat}
          While the sleep is still asleep, use
          {noformat}
          select * from information_schema.processlist;
          show full processlist;
          {noformat}
          or look at the info column specifically:
          {noformat}
          select info from information_schema.processlist where state = 'User sleep';
          {noformat}
          . With
          {noformat}
          select char_length(info), length(info) from information_schema.processlist where state = 'User sleep';
          {noformat}
          which results in
          {noformat}
          154 154
          {noformat}
          you can verify that the values are truncated inside the server, this is not a problem with the client connection or the client.

          For comparison use
          {noformat}
          perl -e "use DBI; $dbh = DBI->connect(qw(dbi:mysql:host=myhost;database=mydb myuser mypass)); $dbh->do(qq{select sleep(30+0*?)}, undef, join '', map chr, map $_ % 128, 0..255);"
          {noformat}
          and
          {noformat}
          select char_length(info), length(info) from information_schema.processlist where state = 'User sleep';
          {noformat}
          with the result
          {noformat}
          291 291
          {noformat}

          The character set settings are:
          {noformat}
          'character_set_client', 'utf8'
          'character_set_connection', 'utf8'
          'character_set_database', 'latin1'
          'character_set_filesystem', 'binary'
          'character_set_results', 'utf8'
          'character_set_server', 'latin1'
          'character_set_system', 'utf8'
          {noformat}
          When I start a statement with binary data the column INFO of information_schema.processlist is truncated at the first byte > 128. show full processlist shows the full query in it's Info column.


          My test query is generated and run by this perl script:
          {noformat}
          perl -e "use DBI; $dbh = DBI->connect(qw(dbi:mysql:host=myhost;database=mydb myuser mypass)); $dbh->do(qq{select sleep(30+0*?)}, undef, join '', map chr, 0..255);"
          {noformat}
          While the sleep is still asleep, use
          {code:sql}
          select * from information_schema.processlist;
          show full processlist;
          {code}
          or look at the info column specifically:
          {code:sql}
          select info from information_schema.processlist where state = 'User sleep';
          {code}
          . With
          {code:sql}
          select char_length(info), length(info) from information_schema.processlist where state = 'User sleep';
          {code}
          which results in
          {noformat}
          154 154
          {noformat}
          you can verify that the values are truncated inside the server, this is not a problem with the client connection or the client.

          For comparison use
          {noformat}
          perl -e "use DBI; $dbh = DBI->connect(qw(dbi:mysql:host=myhost;database=mydb myuser mypass)); $dbh->do(qq{select sleep(30+0*?)}, undef, join '', map chr, map $_ % 128, 0..255);"
          {noformat}
          and
          {code:sql}
          select char_length(info), length(info) from information_schema.processlist where state = 'User sleep';
          {code}
          with the result
          {noformat}
          291 291
          {noformat}

          The character set settings are:
          {noformat}
          'character_set_client', 'utf8'
          'character_set_connection', 'utf8'
          'character_set_database', 'latin1'
          'character_set_filesystem', 'binary'
          'character_set_results', 'utf8'
          'character_set_server', 'latin1'
          'character_set_system', 'utf8'
          {noformat}

          There is also a warning when you select from information_schema.processlist:

          MariaDB [test]> show warnings;
          +---------+------+----------------------------------------------------------------------------------+
          | Level   | Code | Message                                                                          |
          +---------+------+----------------------------------------------------------------------------------+
          | Warning | 1366 | Incorrect string value: '\x80\x81\x82\x83\x84\x85...' for column 'INFO' at row 1 |
          +---------+------+----------------------------------------------------------------------------------+
          1 row in set (0.00 sec)

          elenst Elena Stepanova added a comment - There is also a warning when you select from information_schema.processlist: MariaDB [test]> show warnings; +---------+------+----------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+----------------------------------------------------------------------------------+ | Warning | 1366 | Incorrect string value: '\x80\x81\x82\x83\x84\x85...' for column 'INFO' at row 1 | +---------+------+----------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
          ralfneubauer2 Ralf Neubauer added a comment -

          That's what I thought. But notice that:

          • The string is truncated before being sent to the client, as verified by selecting LENGTH(Info) (or OCTET_LENGTH(info)). It LENGTH(Info) varies with the client connection charset, I would consider this a bug.
          • It works with show full processlist.
          • The query is totally legal (binary data is a fact of life – the same would happen with an insert of e.g. a PNG image into a blob column) and it should be possible to monitor any legal query. (In my case the perl script runs with character_set_client latin1 and character_set_connection latin1 – according to documentation 0x80 is a legal character in MySQL's latin1 (the Euro sign), so the query string is not only valid as binary data, but as text).
          • I see the warning, but I didn't do anything wrong, I think. The warning is helpful in diagnosing, what goes wrong in the Info column, but not helpful for seeing the running queries. Or for diagnosing, what queries an application is sending to the server (I detected the problem, because an application sent an UTF-8 byte sequence in a comment field on table creation).
          ralfneubauer2 Ralf Neubauer added a comment - That's what I thought. But notice that: The string is truncated before being sent to the client, as verified by selecting LENGTH(Info) (or OCTET_LENGTH(info)). It LENGTH(Info) varies with the client connection charset, I would consider this a bug. It works with show full processlist. The query is totally legal (binary data is a fact of life – the same would happen with an insert of e.g. a PNG image into a blob column) and it should be possible to monitor any legal query. (In my case the perl script runs with character_set_client latin1 and character_set_connection latin1 – according to documentation 0x80 is a legal character in MySQL's latin1 (the Euro sign), so the query string is not only valid as binary data, but as text). I see the warning, but I didn't do anything wrong, I think. The warning is helpful in diagnosing, what goes wrong in the Info column, but not helpful for seeing the running queries. Or for diagnosing, what queries an application is sending to the server (I detected the problem, because an application sent an UTF-8 byte sequence in a comment field on table creation).
          bar Alexander Barkov made changes -
          Assignee Alexander Barkov [ bar ]

          Yes, the string is truncated before it is sent to client. It's truncated when it's stored in the INFORMATION_SCHEMA.PROCESSLIST table (well, not “stored” per se, as there is no real table, but something similar happens anyway). Because the column is utf8 and it can only return valid utf8 data.

          This issue cannot be possibly solved completely. INFORMATION_SCHEMA.PROCESSLIST returns the utf8 representation of the statement. And it is possible to send a statement that cannot be represented in utf8. We will never be able to claim that INFORMATION_SCHEMA.PROCESSLIST always shows all statements in their original unmodified form.

          But truncation is wrong, the correct behavior is to replace invalid characters with question marks. We have fixed it in 10.1 — the query string in INFORMATION_SCHEMA.PROCESSLIST is not truncated anymore.

          serg Sergei Golubchik added a comment - Yes, the string is truncated before it is sent to client. It's truncated when it's stored in the INFORMATION_SCHEMA.PROCESSLIST table (well, not “stored” per se, as there is no real table, but something similar happens anyway). Because the column is utf8 and it can only return valid utf8 data. This issue cannot be possibly solved completely. INFORMATION_SCHEMA.PROCESSLIST returns the utf8 representation of the statement. And it is possible to send a statement that cannot be represented in utf8. We will never be able to claim that INFORMATION_SCHEMA.PROCESSLIST always shows all statements in their original unmodified form. But truncation is wrong, the correct behavior is to replace invalid characters with question marks. We have fixed it in 10.1 — the query string in INFORMATION_SCHEMA.PROCESSLIST is not truncated anymore.
          ralfneubauer2 Ralf Neubauer added a comment -

          Wouldn't it be more useful for some use cases to add columns Info_binary and Info_character_set with the binary octet string and the corresponding encoding as used by the client having started the query? This way no information would get lost, you could even extract binary data which made the query run slow, as for example a PNG bytestream.

          ralfneubauer2 Ralf Neubauer added a comment - Wouldn't it be more useful for some use cases to add columns Info_binary and Info_character_set with the binary octet string and the corresponding encoding as used by the client having started the query? This way no information would get lost, you could even extract binary data which made the query run slow, as for example a PNG bytestream.
          serg Sergei Golubchik made changes -
          Fix Version/s 10.0 [ 16000 ]

          Yes, you're completely right. We were discussing this issue internally just now and this idea was one of the possibilities. Another one would be to print binary strings in hex, like select sleep(30+0*x'8081828384'). Would that work for you?

          serg Sergei Golubchik added a comment - Yes, you're completely right. We were discussing this issue internally just now and this idea was one of the possibilities. Another one would be to print binary strings in hex, like select sleep(30+0*x'8081828384') . Would that work for you?
          ralfneubauer2 Ralf Neubauer added a comment -

          Hex strings would work technically, but they make it harder to see what is going on, if you encounter them – I envision myself decoding 1000s of hex strings manually, and then maybe the character encoding of this decoded hex strings, while looking at processlists. The contents of strings are as important as the queries themselves, to see what the application is doing now, and as a German whose umlauts tend to poison text as 'binary' (depending on the circumstances), I fear many interesting strings could get 'hex-encrypted' and hardly readable. It certainly helps if the contents are human readable most of the time, even if they contain strange bytes. I think it would be easier to decode the complete statement text than to decode a handful of contained hex strings. But if there would be a 'human readable' column (UTF8 with question marks) and a 'machine readable' column (with hex strings), both would be possible at the same time.

          ralfneubauer2 Ralf Neubauer added a comment - Hex strings would work technically, but they make it harder to see what is going on, if you encounter them – I envision myself decoding 1000s of hex strings manually, and then maybe the character encoding of this decoded hex strings, while looking at processlists. The contents of strings are as important as the queries themselves, to see what the application is doing now, and as a German whose umlauts tend to poison text as 'binary' (depending on the circumstances), I fear many interesting strings could get 'hex-encrypted' and hardly readable. It certainly helps if the contents are human readable most of the time, even if they contain strange bytes. I think it would be easier to decode the complete statement text than to decode a handful of contained hex strings. But if there would be a 'human readable' column (UTF8 with question marks) and a 'machine readable' column (with hex strings), both would be possible at the same time.
          bar Alexander Barkov added a comment - - edited

          Ralf, we are thinking of implementing your idea with adding a new column INFO_BINARY
          into INFORMATION_SCHEMA.PROCESSLIST. This change is fine for 10.1.
          But we're afraid it's too late to change structures of INFORMATION_SCHEMA tables in 10.0, which is GA.

          We can still add the new column into 10.0, under conditional compilation.
          So one will be able to build MariaDB-10.0 with the new column support using an extra command line parameter, like this:

          cmake -DPROCESSLIST_INFO_BINARY=1
          make
          make install

          Would this solve the problem for you? Or, do you use binary packages?

          bar Alexander Barkov added a comment - - edited Ralf, we are thinking of implementing your idea with adding a new column INFO_BINARY into INFORMATION_SCHEMA.PROCESSLIST. This change is fine for 10.1. But we're afraid it's too late to change structures of INFORMATION_SCHEMA tables in 10.0, which is GA. We can still add the new column into 10.0, under conditional compilation. So one will be able to build MariaDB-10.0 with the new column support using an extra command line parameter, like this: cmake -DPROCESSLIST_INFO_BINARY=1 make make install Would this solve the problem for you? Or, do you use binary packages?
          ralfneubauer2 Ralf Neubauer added a comment -

          We use binary packages, but I think we can live with the "show full processlist; and extract the interesting row in the client" workaround or the question marks, until 10.1 is released.

          ralfneubauer2 Ralf Neubauer added a comment - We use binary packages, but I think we can live with the "show full processlist; and extract the interesting row in the client" workaround or the question marks, until 10.1 is released.

          Okey, I'm changing "fix version" to 10.1 then. Thanks.

          bar Alexander Barkov added a comment - Okey, I'm changing "fix version" to 10.1 then. Thanks.
          bar Alexander Barkov made changes -
          Fix Version/s 10.1 [ 16100 ]
          Fix Version/s 10.0 [ 16000 ]
          bar Alexander Barkov made changes -
          Assignee Alexander Barkov [ bar ] Sergei Golubchik [ serg ]
          Status Open [ 1 ] In Review [ 10002 ]
          serg Sergei Golubchik made changes -
          Assignee Sergei Golubchik [ serg ] Alexander Barkov [ bar ]
          Status In Review [ 10002 ] Stalled [ 10000 ]
          bar Alexander Barkov made changes -
          Component/s OTHER [ 10125 ]
          Fix Version/s 10.1.5 [ 18813 ]
          Fix Version/s 10.1 [ 16100 ]
          Resolution Fixed [ 1 ]
          Status Stalled [ 10000 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Component/s Character Sets [ 10801 ]
          Component/s OTHER [ 10125 ]
          ratzpo Rasmus Johansson (Inactive) made changes -
          Workflow MariaDB v2 [ 60167 ] MariaDB v3 [ 67377 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 67377 ] MariaDB v4 [ 148915 ]

          People

            bar Alexander Barkov
            ralfneubauer2 Ralf Neubauer
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.