[PonyORM-list] A simple query going wrong

Alexander Kozlovsky alexander.kozlovsky at gmail.com
Tue May 19 15:05:26 UTC 2015


Hi Matthew!

Thanks for the reporting - this was a bug, and I just fixed it and upload
the new version of code to GitHub.

> Why do these generate different SQL?

In the first query Pony tried to optimize the query by replacing subselect
with LEFT JOIN+GROUP BY+HAVING, because for many databases subselects are
less effective then joins. Pony can do such optimization if an expression
inside the aggregate function is just an attribute chain, like
sum(customer.orders.products.price). If the subquery is expressed  as a
full-blown generator, Pony cannot apply such optimization yet, and this is
the reason why the second query still uses subselect instead of the left
join.

As it turns out, this optimization had a bug which manifests itself when
the query requires the second level of aggregation. In your query you have
two different count() - one is for k.users and other is for the entire
query result. In such case it is incorrect to put both counts into the same
query. Instead of this, it is necessary to wrap the main query into another
query, which do second level of aggregation.

Now, after the bug is fixed, the first (correctly optimized) SQL query
should look something like

SELECT COUNT(*)
FROM (
  SELECT "k".id
  FROM "keyword" "k"
    LEFT JOIN "keyword_user" "t-1"
      ON "k"."id" = "t-1"."keyword"
  WHERE "k"."scheduled" < %(p1)s
  GROUP BY "k"."id"
  HAVING COUNT(DISTINCT "t-1"."user") > 0
)

and the second (not optimized) query should still looks as before:

SELECT COUNT(*)
FROM "keyword" "k"
WHERE (
    SELECT COUNT(DISTINCT "t-1"."user")
    FROM "keyword_user" "t-1"
    WHERE "k"."id" = "t-1"."keyword"
    ) > 0
  AND "k"."scheduled" < %(p1)s

But the result of these queries should be the same. Maybe some time in the
future Pony will be able to optimize the second query as well.


Regards,
Alexander Kozlovsky



On Tue, May 19, 2015 at 10:06 AM, Matthew Bell <matthewrobertbell at gmail.com>
wrote:

> Hi,
>
> The problematic query I have is:
>
> In [9]: count(k for k in Keyword if count(k.users) > 0 and k.scheduled <
> datetime.date.today())
> SELECT COUNT(*)
> FROM "keyword" "k"
>   LEFT JOIN "keyword_user" "t-1"
>     ON "k"."id" = "t-1"."keyword"
> WHERE "k"."scheduled" < %(p1)s
> GROUP BY "k"."id"
> HAVING COUNT(DISTINCT "t-1"."user") > 0
> {'p1':datetime.date(2015, 5, 19)}
>
> It is always only returning one result, I think because of the distinct.
>
> On the other hand, this query works fine:
>
> In [3]: count(k for k in Keyword if count(u for u in k.users) > 0 and
> k.scheduled < datetime.date.today())
> GET CONNECTION FROM THE LOCAL POOL
> SWITCH TO AUTOCOMMIT MODE
> SELECT COUNT(*)
> FROM "keyword" "k"
> WHERE (
>     SELECT COUNT(DISTINCT "t-1"."user")
>     FROM "keyword_user" "t-1"
>     WHERE "k"."id" = "t-1"."keyword"
>     ) > 0
>   AND "k"."scheduled" < %(p1)s
> {'p1':datetime.date(2015, 5, 19)}
>
> Why do these generate different SQL?
>
> Thanks
>
> --
> Regards,
>
> Matthew Bell
>
> _______________________________________________
> ponyorm-list mailing list
> ponyorm-list at ponyorm.com
> /ponyorm-list
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </ponyorm-list/attachments/20150519/ba9890cb/attachment.html>


More information about the ponyorm-list mailing list