Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28221

User variables lose coercibility information, leading to unexpected collation errors

Details

    • Task
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Won't Fix
    • N/A
    • Variables
    • None

    Description

      User variables seem to have a COERCIBILITY() of 2, the same as table columns, even when assigned values with higher COERCIBILITY. This can lead to unexpected errors:

      CREATE TEMPORARY TABLE t (v varchar(100) COLLATE latin1_swedish_ci, w varchar(100) COLLATE latin1_general_ci);
      INSERT INTO t (v,w) VALUES ('foo','bar');
      SELECT v,w INTO @v,@w FROM t;
      SELECT v='foo',v='bar',w='foo',w='bar' FROM t;  # comparison of table columns with literals:  OK because columns have lower COERCIBILITY than literals
      SELECT @v='foo',@v='bar',@w='foo',@w='bar';     # comparison of user variables with literals:  OK because user variables have lower COERCIBILITY than literals
      SET @f := 'foo', @b := 'bar';
      SELECT v=@f,v=@b,w=@f,w=@b FROM t;              # comparison of table columns with user variables; FAILS WITH ERROR:  Illegal mix of collations
      SELECT @v=@f,@v=@b,@w=@f,@w=@b;                 # comparison of user variables with each other; FAILS WITH ERROR:  Illegal mix of collations
      DROP TEMPORARY TABLE t;
      

      Or perhaps more succinctly, the following look like they ought to be equivalent, but they aren't:

      1. SELECT col='value' FROM table;
      2. SET @v := 'value'; SELECT col=@v FROM table;

      I would like to suggest that they SHOULD BE... and that the way to make this work correctly would be for user variables to retain/remember the COERCIBILITY of whatever value they're assigned.
      i.e. in the following:

      SET @a := (SELECT column FROM table), @b := 'literal value';
      SELECT COERCIBILITY(@a), COERCIBILITY(@b);

      Currently the returned coercibility values are (2,2)... but I am suggesting that they should be (2,4) instead, which are the coercibilities of the values that have been assigned to @a and @b respectively.

      Attachments

        Issue Links

          Activity

            I've moved it to a "Task", as the current behavior is clearly intentional (so, not a bug), user variables have "implicit" coercibility, like columns.

            serg Sergei Golubchik added a comment - I've moved it to a "Task", as the current behavior is clearly intentional (so, not a bug), user variables have "implicit" coercibility, like columns.
            Dean T Dean Trower added a comment -

            Thanks Sergei. I was looking for "Feature Request", but didn't see an option for it.

            Dean T Dean Trower added a comment - Thanks Sergei. I was looking for "Feature Request", but didn't see an option for it.
            bar Alexander Barkov added a comment - - edited

            Hello Dean T,

            The current implementation is intentional.
            Rationale:

            • We'd like user variables to work as close as possible to stored routine variables, so they can easily be replaced to each other without having to rewrite statements.
            • Stored routine variables of character string data types have collation derivation (aka coercibility) of IMPLICIT, which in MariaDB is referenced to as the number 2.
            • Fixing user variables to remember the collation derivation from the assigned expressions will lead to unexpected results in many contexts. When an SQL script consist of multiple files including each other with the "source" command, it will be absolutely hard to follow what the derivation of a certain variable is.

            Instead of fixing user variables to have a flexible collation derivation, I'd propose adding a new special syntax to set a collation derivation of ANY expression (including a user variable reference) to a given value.

            Something like this:

            SELECT * FROM t1 WHERE v=@f COERCIBILITY 4;
            

            (the exact syntax is a subject for discussions)

            What do you think? Would this solution be useful for you?

            Thanks.

            bar Alexander Barkov added a comment - - edited Hello Dean T , The current implementation is intentional. Rationale: We'd like user variables to work as close as possible to stored routine variables, so they can easily be replaced to each other without having to rewrite statements. Stored routine variables of character string data types have collation derivation (aka coercibility) of IMPLICIT, which in MariaDB is referenced to as the number 2. Fixing user variables to remember the collation derivation from the assigned expressions will lead to unexpected results in many contexts. When an SQL script consist of multiple files including each other with the "source" command, it will be absolutely hard to follow what the derivation of a certain variable is. Instead of fixing user variables to have a flexible collation derivation, I'd propose adding a new special syntax to set a collation derivation of ANY expression (including a user variable reference) to a given value. Something like this: SELECT * FROM t1 WHERE v=@f COERCIBILITY 4; (the exact syntax is a subject for discussions) What do you think? Would this solution be useful for you? Thanks.
            Dean T Dean Trower added a comment -

            Hi Alexander.

            I suspect that most of the time

            SELECT * FROM t1 WHERE v=@f COERCIBILITY 4;
            

            would not provide too much of an advantage over simply writing:

            SELECT * FROM t1 WHERE v=(@f COLLATE whatever_the_collation_of_v_is);
            

            unless perhaps you're trying to output generic SQL where you don't know in advance the collation of v.

            So while it still may be helpful in some situations to have that syntax available (and I think anything that increases the programmer's fine-grained control is a good idea!), the real issue for me is that the in-equivalency between:

            SELECT * FROM t1 WHERE v=(_something_);

            and

            SET @vFilter:=(_something_);SELECT * FROM t1 WHERE v=@vFilter;

            violates the "principle of least surprise", in that I'd look at them and expect them to work identically.
            (I've been coding in SQL professionally for around 20 years, yet was still caught out by this, which prompted this feature request).

            While the above just generates an unexpected error, you can create (admittedly contrived) situations where there is no error, but the results of the statements might actually be different (which I think is very much worse, in being much harder to detect). e.g.

            SELECT USER()<'Bob'; # collation of USER() used in comparison

            vs

            SET @n:='Bob';SELECT USER()<@n; # collation of @n used in comparison: no error, but result may differ from the above!!!

            Also, as far as I can see, stored routine variables suffer from exactly the same issues, so as far as your first two bullet points, it might be worth considering making those remember the coercibility of whatever's assigned to them as well: i.e. this feature request should apply to both session variables (@var) and stored procedure variables (DECLARE v varchar(20)) also.

            I guess the real question is then the weighing between "How useful a change is this in getting rid of unexpected collation bugs?" vs "How likely is this change to break existing code?"

            If you MySQL developers think that the latter would outweigh the former, then I'll happily defer to your greater wisdom and accept that my suggestion isn't workable.

            (In that case, maybe just update the docs to include an example of the problem, to make developers more aware of it? Perhaps on the "Comparison Operators" page?)

            Dean T Dean Trower added a comment - Hi Alexander. I suspect that most of the time SELECT * FROM t1 WHERE v=@f COERCIBILITY 4; would not provide too much of an advantage over simply writing: SELECT * FROM t1 WHERE v=(@f COLLATE whatever_the_collation_of_v_is); unless perhaps you're trying to output generic SQL where you don't know in advance the collation of v. So while it still may be helpful in some situations to have that syntax available (and I think anything that increases the programmer's fine-grained control is a good idea!), the real issue for me is that the in-equivalency between: SELECT * FROM t1 WHERE v=(_something_); and SET @vFilter:=(_something_); SELECT * FROM t1 WHERE v=@vFilter; violates the "principle of least surprise", in that I'd look at them and expect them to work identically. (I've been coding in SQL professionally for around 20 years, yet was still caught out by this, which prompted this feature request). While the above just generates an unexpected error, you can create (admittedly contrived) situations where there is no error, but the results of the statements might actually be different (which I think is very much worse, in being much harder to detect). e.g. SELECT USER ()< 'Bob' ; # collation of USER () used in comparison vs SET @n:= 'Bob' ; SELECT USER ()<@n; # collation of @n used in comparison: no error, but result may differ from the above!!! Also, as far as I can see, stored routine variables suffer from exactly the same issues, so as far as your first two bullet points, it might be worth considering making those remember the coercibility of whatever's assigned to them as well: i.e. this feature request should apply to both session variables (@var) and stored procedure variables (DECLARE v varchar(20)) also. I guess the real question is then the weighing between "How useful a change is this in getting rid of unexpected collation bugs?" vs "How likely is this change to break existing code?" If you MySQL developers think that the latter would outweigh the former, then I'll happily defer to your greater wisdom and accept that my suggestion isn't workable. (In that case, maybe just update the docs to include an example of the problem, to make developers more aware of it? Perhaps on the "Comparison Operators" page?)

            Unfortunately, stored routine variables cannot remember the coercibility. Their behavior is specifically described by the SQL standard, and we have to follow the standard here.

            User variables are not part of the SQL standard, but, as bar wrote above, we tried to make them similar to stored routine variables.

            serg Sergei Golubchik added a comment - Unfortunately, stored routine variables cannot remember the coercibility. Their behavior is specifically described by the SQL standard, and we have to follow the standard here. User variables are not part of the SQL standard, but, as bar wrote above, we tried to make them similar to stored routine variables.
            Dean T Dean Trower added a comment -

            Fair enough.
            Maybe just update the docs then, to include an example of the problem and to make developers more aware of it?
            Perhaps on the "Comparison Operators" page?

            Dean T Dean Trower added a comment - Fair enough. Maybe just update the docs then, to include an example of the problem and to make developers more aware of it? Perhaps on the "Comparison Operators" page?

            People

              serg Sergei Golubchik
              Dean T Dean Trower
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.