[PonyORM-list] Counting appearances of a row value
    Matthew Bell 
    matthewrobertbell at gmail.com
       
    Wed Nov  5 09:05:16 UTC 2014
    
    
  
I already had an index on date, the change to models.count() made it very
fast, thank you!
On 5 November 2014 08:56, Alexander Kozlovsky <alexander.kozlovsky at gmail.com
> wrote:
> Let me know if index-only scan will give you significant performance
> increase.
>
> Regards,
> Alexander
>
> On Wed, Nov 5, 2014 at 11:49 AM, Alexander Kozlovsky <
> alexander.kozlovsky at gmail.com> wrote:
>
>> Hi, Matthew!
>>
>> > Is this the most efficient method?
>>
>> Almost. You can use `models.count()` instead of `models.count(r)`, this
>> will produce `COUNT(*)` instead of `COUNT(DISTINCT "r"."id")`, which may
>> be a bit more efficient here.
>>
>> In order to speed up this query, you can do the following:
>>
>> 1) Add index on r.date column. This can be done in pony by adding
>> `index=True` option to the attribute and doing
>> db.generate_mapping(create_tables=True). Or you can execute "CREATE INDEX
>> ON MyTable(date)" manually.
>>
>> 2) Perform vacuum on you PostgreSQL database. If you use PostgreSQL>=9.2,
>> this will allow PostgreSQL to use index-only scan:
>> https://wiki.postgresql.org/wiki/Index-only_scans
>>
>>
>> On Wed, Nov 5, 2014 at 11:27 AM, Matthew Bell <
>> matthewrobertbell at gmail.com> wrote:
>>
>>> More followup:
>>>
>>> SELECT "r"."date", COUNT(DISTINCT "r"."id")
>>> FROM "result" "r"
>>> WHERE "r"."date" >= %(p1)s
>>> GROUP BY "r"."date"
>>> {'p1':datetime.date(2014, 10, 6)}
>>>
>>> Is the SQL generated, and it's taking 51 seconds to run.
>>>
>>> On 5 November 2014 08:12, Matthew Bell <matthewrobertbell at gmail.com>
>>> wrote:
>>>
>>>> Of course, as soon as I send the email, I find something that works:
>>>>
>>>> models.select((r.date, models.count(r)) for r in models.Result if
>>>> r.date >= datetime.date.today() - datetime.timedelta(days=10))[:]
>>>>
>>>> Is this the most efficient method?
>>>>
>>>> On 5 November 2014 08:02, Matthew Bell <matthewrobertbell at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have naively tried to execute this query:
>>>>>
>>>>> models.select((r.date, models.count(r.date)) for r in models.Result if
>>>>> r.date >= datetime.date.today() -
>>>>> datetime.timedelta(days=10)).without_distinct()
>>>>>
>>>>> With the aim of counting how many times each date appears. I was doing
>>>>> this with collections.Counter, but this doesn't scale to millions of
>>>>> results (far too much CPU and memory used). What query can I use to make
>>>>> the database do the work?
>>>>>
>>>>> Thanks
>>>>> --
>>>>> Regards,
>>>>>
>>>>> Matthew Bell
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Matthew Bell
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Matthew Bell
>>>
>>> _______________________________________________
>>> ponyorm-list mailing list
>>> ponyorm-list at ponyorm.com
>>> /ponyorm-list
>>>
>>>
>>
>
> _______________________________________________
> ponyorm-list mailing list
> ponyorm-list at ponyorm.com
> /ponyorm-list
>
>
-- 
Regards,
Matthew Bell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </ponyorm-list/attachments/20141105/57bf261c/attachment.html>
    
    
More information about the ponyorm-list
mailing list