<div dir="ltr">Hi, Matthew!<br><br><span style="font-family:arial,sans-serif;font-size:13px">> Is this the most efficient method?<br></span><br>Almost. You can use `<span style="font-family:arial,sans-serif;font-size:13px">models.count()` instead of `</span><span style="font-family:arial,sans-serif;font-size:13px">models.count(r)`, this will produce `COUNT(*)` instead of `</span><font face="arial, sans-serif">COUNT(DISTINCT "r"."id")`, which may be a bit more efficient here.</font><br><br><font face="arial, sans-serif">In order to speed up this query, you can do the following:</font><br><font face="arial, sans-serif"><br></font><div><font face="arial, sans-serif">1) Add index on r.date column. This can be done in pony by adding `index=True` option to the attribute and doing db.generate_mapping(create_tables=True). Or you can execute "CREATE INDEX ON MyTable(date)" manually.</font><br><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">2) Perform vacuum on you PostgreSQL database. If you use PostgreSQL>=9.2, this will allow PostgreSQL to use index-only scan: </font><font face="arial, sans-serif"><a href="https://wiki.postgresql.org/wiki/Index-only_scans">https://wiki.postgresql.org/wiki/Index-only_scans</a></font><br><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Nov 5, 2014 at 11:27 AM, Matthew Bell <span dir="ltr"><<a href="mailto:matthewrobertbell@gmail.com" target="_blank">matthewrobertbell@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>More followup: <br><br>SELECT "r"."date", COUNT(DISTINCT "r"."id")<br>FROM "result" "r"<br>WHERE "r"."date" >= %(p1)s<br>GROUP BY "r"."date"<br>{'p1':datetime.date(2014, 10, 6)}<br><br></div>Is the SQL generated, and it's taking 51 seconds to run.<br></div><div class="gmail_extra"><div><div class="h5"><br><div class="gmail_quote">On 5 November 2014 08:12, Matthew Bell <span dir="ltr"><<a href="mailto:matthewrobertbell@gmail.com" target="_blank">matthewrobertbell@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Of course, as soon as I send the email, I find something that works:<br><br>models.select((r.date, models.count(r)) for r in models.Result if r.date >= datetime.date.today() - datetime.timedelta(days=10))[:]<br><br></div>Is this the most efficient method?<br><div><div class="gmail_extra"><div><div><br><div class="gmail_quote">On 5 November 2014 08:02, Matthew Bell <span dir="ltr"><<a href="mailto:matthewrobertbell@gmail.com" target="_blank">matthewrobertbell@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi, <br><br>I have naively tried to execute this query:<br><br>models.select((r.date, models.count(r.date)) for r in models.Result if r.date >= datetime.date.today() - datetime.timedelta(days=10)).without_distinct()<br><br></div>With the aim of counting how many times each date appears. I was doing this with collections.Counter, but this doesn't scale to millions of results (far too much CPU and memory used). What query can I use to make the database do the work?<br><br>Thanks<span><font color="#888888"><br><div><div>-- <br><div>Regards,<br><br>Matthew Bell<br></div>
</div></div></font></span></div>
</blockquote></div><br><br clear="all"><br></div></div><span><font color="#888888">-- <br><div>Regards,<br><br>Matthew Bell<br></div>
</font></span></div></div></div>
</blockquote></div><br><br clear="all"><br></div></div><span class="HOEnZb"><font color="#888888">-- <br><div>Regards,<br><br>Matthew Bell<br></div>
</font></span></div>
<br>_______________________________________________<br>
ponyorm-list mailing list<br>
<a href="mailto:ponyorm-list@ponyorm.org">ponyorm-list@ponyorm.org</a><br>
<a href="/ponyorm-list" target="_blank">/ponyorm-list</a><br>
<br></blockquote></div><br></div>