[PonyORM-list] Counting appearances of a row value

Alexander Kozlovsky alexander.kozlovsky at gmail.com
Wed Nov 5 08:56:27 UTC 2014


Let me know if index-only scan will give you significant performance
increase.

Regards,
Alexander

On Wed, Nov 5, 2014 at 11:49 AM, Alexander Kozlovsky <
alexander.kozlovsky at gmail.com> wrote:

> Hi, Matthew!
>
> > Is this the most efficient method?
>
> Almost. You can use `models.count()` instead of `models.count(r)`, this
> will produce `COUNT(*)` instead of `COUNT(DISTINCT "r"."id")`, which may
> be a bit more efficient here.
>
> In order to speed up this query, you can do the following:
>
> 1) Add index on r.date column. This can be done in pony by adding
> `index=True` option to the attribute and doing
> db.generate_mapping(create_tables=True). Or you can execute "CREATE INDEX
> ON MyTable(date)" manually.
>
> 2) Perform vacuum on you PostgreSQL database. If you use PostgreSQL>=9.2,
> this will allow PostgreSQL to use index-only scan:
> https://wiki.postgresql.org/wiki/Index-only_scans
>
>
> On Wed, Nov 5, 2014 at 11:27 AM, Matthew Bell <matthewrobertbell at gmail.com
> > wrote:
>
>> More followup:
>>
>> SELECT "r"."date", COUNT(DISTINCT "r"."id")
>> FROM "result" "r"
>> WHERE "r"."date" >= %(p1)s
>> GROUP BY "r"."date"
>> {'p1':datetime.date(2014, 10, 6)}
>>
>> Is the SQL generated, and it's taking 51 seconds to run.
>>
>> On 5 November 2014 08:12, Matthew Bell <matthewrobertbell at gmail.com>
>> wrote:
>>
>>> Of course, as soon as I send the email, I find something that works:
>>>
>>> models.select((r.date, models.count(r)) for r in models.Result if r.date
>>> >= datetime.date.today() - datetime.timedelta(days=10))[:]
>>>
>>> Is this the most efficient method?
>>>
>>> On 5 November 2014 08:02, Matthew Bell <matthewrobertbell at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have naively tried to execute this query:
>>>>
>>>> models.select((r.date, models.count(r.date)) for r in models.Result if
>>>> r.date >= datetime.date.today() -
>>>> datetime.timedelta(days=10)).without_distinct()
>>>>
>>>> With the aim of counting how many times each date appears. I was doing
>>>> this with collections.Counter, but this doesn't scale to millions of
>>>> results (far too much CPU and memory used). What query can I use to make
>>>> the database do the work?
>>>>
>>>> Thanks
>>>> --
>>>> Regards,
>>>>
>>>> Matthew Bell
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Matthew Bell
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Matthew Bell
>>
>> _______________________________________________
>> ponyorm-list mailing list
>> ponyorm-list at ponyorm.com
>> /ponyorm-list
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </ponyorm-list/attachments/20141105/7874112d/attachment.html>


More information about the ponyorm-list mailing list