[MDEV-3795] WRONG ORDER BY Created: 2012-10-04  Updated: 2013-09-10  Resolved: 2012-10-29

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5.25-galera
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: roberto spadim Assignee: Unassigned
Resolution: Not a Bug Votes: 1
Labels: None
Environment:

linux


Issue Links:
Relates
relates to MDEV-5007 wrong order by moving from mysql to m... Closed

 Description   

hi, i'm running two queries but they return diferent order by results when they should be the same i will attach files to create tables



 Comments   
Comment by roberto spadim [ 2012-10-04 ]

wrong query:

SELECT tmp_tbl22.* FROM (
SELECT a.pessoa_tipo,a.pessoa_id,a.spavendas_vendedor_tipo,a.spavendas_vendedor_id,a.spavendas_tbl_preco,a.cfop_vendas_id,a.cfop_vendas_id_red,a.spacompra_vendedor_tipo,a.spacompra_vendedor_id,a.cfop_compra_id,a.cfop_compra_id_red,(
select plano_conta_numero FROM ROTAS_ca49da4cb6a34b20343ffc942dd7e8ae WHERE plano_conta_id=a.rota_distribuicao_id AND plano_conta_id_red=a.rota_distribuicao_id_red) AS rota_numero,(
SELECT descricao FROM ROTAS_ca49da4cb6a34b20343ffc942dd7e8ae WHERE plano_conta_id=a.rota_distribuicao_id AND plano_conta_id_red=a.rota_distribuicao_id_red) AS rota_desc,a.rota_distribuicao_sequencia,'j' AS distribuidor_tipo,id AS distribuidor_id,IF(a.pessoa_tipo='f',(
SELECT (pais_entrega || uf_entrega || municipio_entrega || bairro_entrega || endereco_entrega_rua || endereco_entrega_numero || endereco_entrega_complemento) AS tmp_end FROM pessoa_fisica WHERE id=a.pessoa_id),(
SELECT (pais_entrega || uf_entrega || municipio_entrega || bairro_entrega || endereco_entrega_rua || endereco_entrega_numero || endereco_entrega_complemento) AS tmp_end FROM pessoa_juridica WHERE id=a.pessoa_id)) AS end_2,IF(a.pessoa_tipo='f',(
SELECT IF(habilitada!='a','Y','N') FROM pessoa_fisica WHERE id=a.pessoa_id),(SELECT IF(habilitada!='a','Y','N') FROM pessoa_juridica WHERE id=a.pessoa_id)) AS desabilitado,IF(a.pessoa_tipo='f',(SELECT vendas_bloqueada FROM pessoa_fisica WHERE id=a.pessoa_id),(SELECT vendas_bloqueada FROM pessoa_juridica WHERE id=a.pessoa_id)) AS vendas_bloqueada,(SELECT plano_conta_numero FROM ROTAS_ca49da4cb6a34b20343ffc942dd7e8ae WHERE plano_conta_id=a.rota_comercial_id AND plano_conta_id_red=a.rota_comercial_id_red) AS rota_comercial_numero,(SELECT descricao FROM ROTAS_ca49da4cb6a34b20343ffc942dd7e8ae WHERE plano_conta_id=a.rota_comercial_id AND plano_conta_id_red=a.rota_comercial_id_red) AS rota_comercial_descricao,a.rota_comercial_sequencia,a.data_ultima_venda

FROM pessoa_juridica_mov AS a WHERE id=1 AND spavendas_vendedor_id=0

ORDER BY data_ultima_venda
) AS tmp_tbl22 LIMIT 30 OFFSET 0

right query:

SELECT tmp_tbl22.* FROM (
SELECT a.pessoa_tipo,a.pessoa_id,a.spavendas_vendedor_tipo,a.spavendas_vendedor_id,a.spavendas_tbl_preco,a.cfop_vendas_id,a.cfop_vendas_id_red,a.spacompra_vendedor_tipo,a.spacompra_vendedor_id,a.cfop_compra_id,a.cfop_compra_id_red,(
select plano_conta_numero FROM ROTAS_ca49da4cb6a34b20343ffc942dd7e8ae WHERE plano_conta_id=a.rota_distribuicao_id AND plano_conta_id_red=a.rota_distribuicao_id_red) AS rota_numero,(
SELECT descricao FROM ROTAS_ca49da4cb6a34b20343ffc942dd7e8ae WHERE plano_conta_id=a.rota_distribuicao_id AND plano_conta_id_red=a.rota_distribuicao_id_red) AS rota_desc,a.rota_distribuicao_sequencia,'j' AS distribuidor_tipo,id AS distribuidor_id,IF(a.pessoa_tipo='f',(
SELECT (pais_entrega || uf_entrega || municipio_entrega || bairro_entrega || endereco_entrega_rua || endereco_entrega_numero || endereco_entrega_complemento) AS tmp_end FROM pessoa_fisica WHERE id=a.pessoa_id),(
SELECT (pais_entrega || uf_entrega || municipio_entrega || bairro_entrega || endereco_entrega_rua || endereco_entrega_numero || endereco_entrega_complemento) AS tmp_end FROM pessoa_juridica WHERE id=a.pessoa_id)) AS end_2,IF(a.pessoa_tipo='f',(
SELECT IF(habilitada!='a','Y','N') FROM pessoa_fisica WHERE id=a.pessoa_id),(SELECT IF(habilitada!='a','Y','N') FROM pessoa_juridica WHERE id=a.pessoa_id)) AS desabilitado,IF(a.pessoa_tipo='f',(SELECT vendas_bloqueada FROM pessoa_fisica WHERE id=a.pessoa_id),(SELECT vendas_bloqueada FROM pessoa_juridica WHERE id=a.pessoa_id)) AS vendas_bloqueada,(SELECT plano_conta_numero FROM ROTAS_ca49da4cb6a34b20343ffc942dd7e8ae WHERE plano_conta_id=a.rota_comercial_id AND plano_conta_id_red=a.rota_comercial_id_red) AS rota_comercial_numero,(SELECT descricao FROM ROTAS_ca49da4cb6a34b20343ffc942dd7e8ae WHERE plano_conta_id=a.rota_comercial_id AND plano_conta_id_red=a.rota_comercial_id_red) AS rota_comercial_descricao,a.rota_comercial_sequencia,a.data_ultima_venda

FROM pessoa_juridica_mov AS a WHERE id=1 AND spavendas_vendedor_id=0

) AS tmp_tbl22
ORDER BY data_ultima_venda
LIMIT 30 OFFSET 0

Comment by roberto spadim [ 2012-10-04 ]

wrong result (HTML):
www.spadim.com.br/wrong.html

right result (HTML):
www.spadim.com.br/right.html

Comment by roberto spadim [ 2012-10-04 ]

files attached
please check
the result should be many "0000-00-00 00:00:00" after others datetimes, but it's returning some 0000, and some dates, after some 000 again... something is wrong

but it execute nice if order by is outside of
select * from ( SELECT ORDER BY) as tmp_tbl
like:
select * from ( SELECT ) as tmp_tbl ORDER BY

Comment by Elena Stepanova [ 2012-10-29 ]

Hi Roberto,

Please point at any MariaDB/MySQL documentation that suggests that the queries should provide identical results.

Comment by roberto spadim [ 2012-10-29 ]

well 0000-00-00 00:00:00 is lower than 2010-08-30 14:36:05, right?
why the 'wrong result' give me a 2010-08-30 14:36:05 before 0000-00-00 00:00:00? that's the wrong part that i'm talking about, the order by is wrong if the ORDER BY is inside the () table alias

check that the first query, is:
SELECT * FROM (SELECT fields FROM tables WHERE where_part) AS tmp_table ORDER BY order_by_part LIMIT limit_part
and the second is:
SELECT * FROM (SELECT fields FROM tables WHERE where_part ORDER BY order_by_part) AS tmp_table LIMIT limit_part

in others words, the diference is the ORDER BY at the end of select, or inside the tmp_table
point 1) this should reproduce diferent ORDER BY results? i think not, i'm wrong? does mariadb read the 'tmp_table' in random order?

check that the explain values are different, for the first (the right one, with order by outside tmp_table) mariadb is show in extra: "Using index condition; Using where; Using filesort"
for the second (with wrong order, with order by inside () tmp_table) mariadb is showing in extra: "Using where"

the wrong isn't well ordered since 0000-00-00 00:00:00 is the lowest possible datetime value (i don't have NULL values), it (0000-00-00 00:00:00) should be before any datetime value, and after it the others datetime values should continue ascending like this (from right result):
0000-00-00 00:00:00
2009-09-18 12:54:21
2009-12-11 17:38:35
2010-08-30 14:36:05
2011-05-23 23:11:57
2011-12-07 10:21:07
2012-05-15 07:58:21
2012-06-26 12:58:34

and not like this (from wrong result):
0000-00-00 00:00:00
2011-05-23 23:11:57 <---- OK it's bigger than 0000-00-00
2012-06-26 12:58:34
2009-09-18 12:54:21 <---- wrong, it's lower than 2012-06-26 12:58:34!
0000-00-00 00:00:00 <---- wrong, it's lower than 2009-09-18 12:54:21!
2011-12-07 10:21:07
0000-00-00 00:00:00 <---- wrong, it's lower than 2011-12-07 10:21:07!
0000-00-00 00:00:00
0000-00-00 00:00:00

understood the wrong result of order by? i didn't found anything in manual that could explain this, the only situation that i can understand that a order by should reproduce diferent results is the random read of tmp_table instead the ordered result

Comment by Dmitriy Pavlichenko [ 2012-10-29 ]

Hello!

I got same situtaion. Our service got a bug after replacing MySQL with MariaDB.
We have an automaitc engine that takes any query and wraps it with something like

SELECT t.* FROM ( ... ) t LIMIT 0, 10

to extract a certain page of data.
Of course, a query can use ORDER BY..
Now all these queries work wrong.
However, it all was OK in MySQL.

MariaDB is positioned like 100% replacement for MySQL.
And this is really a bug, isn't it?

Comment by roberto spadim [ 2012-10-29 ]

in my opnion, it's not a bug, but a miss of documentation
the last query don't tell if data should be ordered, in other words if the query inside ( ... ) return a bad order or a good one, the last select don't know and could reorder (if it's a performace optimization)...
i'm rewriting the queries but could be nice an 'optimization' to get the order inside ( .. ) continue without reorder
check the explain... it don't use file order or index or other comment telling that the query inside (..) is being ordered, in other words, i think it remove the ORDER BY to get 'more speed'... (it's nice, but the query must be rewrite to get the right order)

Comment by Dmitriy Pavlichenko [ 2012-10-29 ]

Sometimes documentation doesn't covery every particular case of functionality.
At the same time if they say MariaDB is a drop-in replacement for MySQL then (as mininum!) my queries must return same result in both databases.
Where am I wrong?

This is the same situation as with browsers: different browsers implement a common standard, but web-programmer usually have to consider which browser is used by current user to show him more or less same interface

Comment by Elena Stepanova [ 2012-10-29 ]

Hi Dmitriy,

MariaDB must return the same result when the result is deterministic (and correct), while your query is non-deterministic by nature. The fact that it works in MySQL just like you want it to is your pure luck which will wear off sooner or later, and in your terms MySQL will stop being a drop-in replacement to itself.

Requiring MariaDB to return the exact same result in this case is the same as claiming that it should always return the same order of records as MySQL for queries without order by, like select * from t1.

Comment by roberto spadim [ 2012-10-29 ]

from mysql manual: http://dev.mysql.com/doc/refman/5.6/en/select.html

If ORDER BY occurs within a subquery and also is applied in the outer query, the outermost ORDER BY takes precedence. For example, results for the following statement are sorted in descending order, not ascending order:

(SELECT ... ORDER BY a) ORDER BY a DESC;
Use of column positions is deprecated because the syntax has been removed from the SQL standard.

ok the outermost order by works, but the manual don't explain what happen without the outermost order by, i think it should work with the inner most don't?

if you try
1)select * from (select ... order by fields) as tmp
and
2)select * from (select ....) as tmp

they are the same explain and the same data, there's no order by...

Comment by roberto spadim [ 2012-10-29 ]

the point elena is:

select * from ( a table where the data is explicity ordered)

should return
(a table where the data is explicity ordered)

and not
(a table where the data can or not be ordered)

check that we are talking about:
select * from (select * from t order by c)
and not
select * from (select * from t)

the first one should be the same as
select * from (select * from t) order by c

but it's not, why? that's a miss documented thing that i think could be a otimization of how mariadb 'read' results from the (select * from t) part

i don't know if mysql is wrong or right, but it's more human readable to undestant that
select * from (select * from t order by c)
will return results that will be ordered by c column, and not without order like i'm seeing

ok theres a work around rewriting the query to
select * from (select * from t) order by c

but users must rewrite queries... (that's the problem)

Comment by Dmitriy Pavlichenko [ 2012-10-29 ]

I agree, that If you look from the point of view of the pure SQL,
then SELECT statement doesn't have to care about order if it is not specified explicitly.
That's because pure SQL works with pure sets of rows that don't have any order by default.

But can someone tell me what to do in following cases?

1. I got an arbitrary SQL-query. I need to extract a given range (LIMIT x, dx) of data from it.
This query can already contain a LIMIT and an ORDER BY parts.

2. Let we have a query like this:
SELECT * FROM (SELECT a, b FROM t1 ORDER BY b) GROUP BY a

I know then while grouping MySQL uses first met value for those columns that are not grouped.
So this query in MySQL will give me the least value of "b" corresponding to each "a". While
MariaDB's behaviour is unpredictable. The problem is that subquery is also quite arbitrary and
it's very hard to modify it on the fly. It will need to do a parsing,

Thanks

Comment by roberto spadim [ 2012-10-29 ]

yes, that's the point
i see it as a miss of documentation and not as a 'bug', it's easier to tell users to use order by on 'last' select, and not in the subquery

but i seed this feature as a easy way to execute html page
user give you a select, and you create many pages with LIMIT 0,30... LIMIT 30,30... LIMIT 60,30 ... etc
like in php:

$SQL="SELECT * FROM ( $USER_SQL ) LIMIT $page_start,30";

that's why is more human readable to see the subquery materialization, like:
create table tmp select * from t order by c
and after a
select * from tmp

a work around to my sql example could be
1) create table
2) select from temporary table with limits
3) drop table

Comment by Elena Stepanova [ 2012-10-29 ]

>> I know then while grouping MySQL uses first met value for those columns that are not grouped.

No, it doesn't. Again, you rely on pure luck; and now, this is explicitly documented in MySQL manual:

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. <..> You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.

http://dev.mysql.com/doc/refman/5.6/en/group-by-hidden-columns.html

Comment by Dmitriy Pavlichenko [ 2012-10-29 ]

Elena.

First of all thank you, because I really didn't know that.
But here is one more question.

Roberto suggests this way:

create temporary table tmp select * from t order by c
select * from tmp

The last SELECT really gives an ordered set, but now I'm not sure - maybe I can't rely on it too?
Is it guaranteed that SELECT will return rows from a table in their phisical order?

Comment by roberto spadim [ 2012-10-29 ]

no, it's not guaranteed that select will return in the phisical order, who can tell this is the table engine
but when you have a query inside another query, well i think it could be in the 'original' order, example:
select * from (select * from t order by c)
i think it could be in the original order, but... maybe optimizer want a 'faster' result, and show it in any other order... no problem, just explain it to users, and show how to get the right order
check that in the example of php:
$SQL="SELECT * FROM ($USER_SQL) AS tmp_tbl";
i don't know the order, and user may give a ORDER BY, that should be in the right order

Comment by roberto spadim [ 2012-10-29 ]

may be the point here should be a option to user to tune optimizer to materialize the $USER_SQL and execute it in the 'right' orders instead of execute it in last SELECT order (no order)

Comment by Elena Stepanova [ 2012-10-29 ]

Hi Dmitriy,

>> Is it guaranteed that SELECT will return rows from a table in their phisical order?

In general, you cannot rely on it, either. I think it's better to make an effort and convert the logic so it's truly deterministic and you don't have to worry about it again.

Comment by Dmitriy Pavlichenko [ 2012-10-29 ]

Ok, I see, thanks...

But now i've got a really hard work to do

And I don't see a way to do minimum modifications without syntactic analysis,
otherwise I will have to rewrite a lot of queries that are already using this logic. Awful.

Comment by roberto spadim [ 2012-10-30 ]

from other comment...
just a opnion , i don't know if its the today result...
for this query:
SELECT * FROM (SELECT a, b FROM t1 ORDER BY b) GROUP BY a
the result of mariadb should be near this:
SELECT * FROM (SELECT a, b FROM t1) GROUP BY a ORDER BY a

in other words, doesn't matter the "order by b", since you put a 'explicit' group by without order, in this case a 'implicit' order by is assumed like the group by order
in this case "ORDER BY b" is useless, and a 'optimization' could be done here removing the order by b without problem... but a order by a is assumed (since grouping normally make a order by... but it can't do too since the order isn't explicit...)

but for the others cases where we don't have group by or order by outside the select... well the inside order by should work...
i didn't tested but maybe some queries that use GROUP_CONCAT may have problem with this order by being removed...

Comment by roberto spadim [ 2012-10-30 ]

for now...
i think a explain about 'implicit' remove or add of order by could be nice to be documented
this is nice, since we don't know if only doing group by will order the result or not, since it's not a explicit order by, maybe optimizer can avoid order... but in this cases the 'implicit' database option could/should be explained to users understand the problem of don't explicity explain how result should be ordered

Comment by Dmitriy Pavlichenko [ 2012-10-30 ]

Iteresting that this query work exactly as expected:

SELECT * FROM
( SELECT a, b FROM t1 GROUP BY a, b DESC
) t
LIMIT 0, 10

Obviously, because GROUP BY is not "optimized" by optimizer

Comment by roberto spadim [ 2012-10-30 ]

i know it´s not right place, but could you test it with
SELECT * FROM
( SELECT a, b FROM t1 GROUP BY a, b DESC ORDER BY b ASC
) t
LIMIT 0, 10

so we could check if the order by is know by the LIMIT 0,10 query

Comment by Dmitriy Pavlichenko [ 2012-10-30 ]

Works fine. But in fact there's no need for addional ORDER BY with GROUP BY in this case.

From docs (http://dev.mysql.com/doc/refman/5.6/en/select.html):

If you use GROUP BY, output rows are sorted according to the GROUP BY columns as if you had an ORDER BY for the same columns. To avoid the overhead of sorting that GROUP BY produces, add ORDER BY NULL:

SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;

MySQL extends the GROUP BY clause so that you can also specify ASC and DESC after columns named in the clause:

SELECT a, COUNT(b) FROM test_table GROUP BY a DESC;

Comment by roberto spadim [ 2012-10-30 ]

and it returned with order by b? or by a? if it returned from by b, the parser didn't ignored the implicity order by of group by... in other words why a order by is ignored and a order by with group by isn't?

Comment by Dmitriy Pavlichenko [ 2012-10-30 ]

SELECT * FROM
( SELECT a, b FROM t1 GROUP BY a, b DESC ORDER BY b ASC
) t
LIMIT 0, 10

Result is ordered by "b ASC". So ORDER BY is not ignored.

Comment by roberto spadim [ 2012-10-30 ]

now try this
alter table t1 order by a,b desc;
SELECT * FROM
( SELECT a, b FROM t1 ORDER BY b ASC
) t
LIMIT 0, 10

and test if it's ordered by a,b desc or b asc

Comment by Dmitriy Pavlichenko [ 2012-10-30 ]

It's ordered by a,b desc.
ORDER BY b ASC is ignored

Comment by roberto spadim [ 2012-10-30 ]

that's the point
why it's ignored, if it's explicit? ok it's not explicit for the out select, but it's explicit for one select...
i think it's as undocumented, or a bug... i prefer think as a undocumented, and show a workaround to make it work... but for some guy that changed from mysql to mariadb, will consider it as a bug...

Generated at Thu Feb 08 06:51:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.