Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7807

information_schema.processlist truncates queries with binary strings

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.16
    • 10.1.5
    • Character Sets
    • None

    Description

      When I start a statement with binary data the column INFO of information_schema.processlist is truncated at the first byte > 128. show full processlist shows the full query in it's Info column.

      My test query is generated and run by this perl script:

      perl -e "use DBI; $dbh = DBI->connect(qw(dbi:mysql:host=myhost;database=mydb myuser mypass)); $dbh->do(qq{select sleep(30+0*?)}, undef, join '', map chr, 0..255);"

      While the sleep is still asleep, use

      select * from information_schema.processlist;
      show full processlist;

      or look at the info column specifically:

      select info from information_schema.processlist where state = 'User sleep';

      . With

      select char_length(info), length(info) from information_schema.processlist where state = 'User sleep';

      which results in

      154 154

      you can verify that the values are truncated inside the server, this is not a problem with the client connection or the client.

      For comparison use

      perl -e "use DBI; $dbh = DBI->connect(qw(dbi:mysql:host=myhost;database=mydb myuser mypass)); $dbh->do(qq{select sleep(30+0*?)}, undef, join '', map chr, map $_ % 128, 0..255);"

      and

      select char_length(info), length(info) from information_schema.processlist where state = 'User sleep';

      with the result

      291 291

      The character set settings are:

      'character_set_client', 'utf8'
      'character_set_connection', 'utf8'
      'character_set_database', 'latin1'
      'character_set_filesystem', 'binary'
      'character_set_results', 'utf8'
      'character_set_server', 'latin1'
      'character_set_system', 'utf8'

      Attachments

        Activity

          Yes, you're completely right. We were discussing this issue internally just now and this idea was one of the possibilities. Another one would be to print binary strings in hex, like select sleep(30+0*x'8081828384'). Would that work for you?

          serg Sergei Golubchik added a comment - Yes, you're completely right. We were discussing this issue internally just now and this idea was one of the possibilities. Another one would be to print binary strings in hex, like select sleep(30+0*x'8081828384') . Would that work for you?
          ralfneubauer2 Ralf Neubauer added a comment -

          Hex strings would work technically, but they make it harder to see what is going on, if you encounter them – I envision myself decoding 1000s of hex strings manually, and then maybe the character encoding of this decoded hex strings, while looking at processlists. The contents of strings are as important as the queries themselves, to see what the application is doing now, and as a German whose umlauts tend to poison text as 'binary' (depending on the circumstances), I fear many interesting strings could get 'hex-encrypted' and hardly readable. It certainly helps if the contents are human readable most of the time, even if they contain strange bytes. I think it would be easier to decode the complete statement text than to decode a handful of contained hex strings. But if there would be a 'human readable' column (UTF8 with question marks) and a 'machine readable' column (with hex strings), both would be possible at the same time.

          ralfneubauer2 Ralf Neubauer added a comment - Hex strings would work technically, but they make it harder to see what is going on, if you encounter them – I envision myself decoding 1000s of hex strings manually, and then maybe the character encoding of this decoded hex strings, while looking at processlists. The contents of strings are as important as the queries themselves, to see what the application is doing now, and as a German whose umlauts tend to poison text as 'binary' (depending on the circumstances), I fear many interesting strings could get 'hex-encrypted' and hardly readable. It certainly helps if the contents are human readable most of the time, even if they contain strange bytes. I think it would be easier to decode the complete statement text than to decode a handful of contained hex strings. But if there would be a 'human readable' column (UTF8 with question marks) and a 'machine readable' column (with hex strings), both would be possible at the same time.
          bar Alexander Barkov added a comment - - edited

          Ralf, we are thinking of implementing your idea with adding a new column INFO_BINARY
          into INFORMATION_SCHEMA.PROCESSLIST. This change is fine for 10.1.
          But we're afraid it's too late to change structures of INFORMATION_SCHEMA tables in 10.0, which is GA.

          We can still add the new column into 10.0, under conditional compilation.
          So one will be able to build MariaDB-10.0 with the new column support using an extra command line parameter, like this:

          cmake -DPROCESSLIST_INFO_BINARY=1
          make
          make install

          Would this solve the problem for you? Or, do you use binary packages?

          bar Alexander Barkov added a comment - - edited Ralf, we are thinking of implementing your idea with adding a new column INFO_BINARY into INFORMATION_SCHEMA.PROCESSLIST. This change is fine for 10.1. But we're afraid it's too late to change structures of INFORMATION_SCHEMA tables in 10.0, which is GA. We can still add the new column into 10.0, under conditional compilation. So one will be able to build MariaDB-10.0 with the new column support using an extra command line parameter, like this: cmake -DPROCESSLIST_INFO_BINARY=1 make make install Would this solve the problem for you? Or, do you use binary packages?
          ralfneubauer2 Ralf Neubauer added a comment -

          We use binary packages, but I think we can live with the "show full processlist; and extract the interesting row in the client" workaround or the question marks, until 10.1 is released.

          ralfneubauer2 Ralf Neubauer added a comment - We use binary packages, but I think we can live with the "show full processlist; and extract the interesting row in the client" workaround or the question marks, until 10.1 is released.

          Okey, I'm changing "fix version" to 10.1 then. Thanks.

          bar Alexander Barkov added a comment - Okey, I'm changing "fix version" to 10.1 then. Thanks.

          People

            bar Alexander Barkov
            ralfneubauer2 Ralf Neubauer
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.