[PonyORM-list] trying to optimize the cache

Matthew Bell matthewrobertbell at gmail.com
Tue May 27 11:35:14 UTC 2014

In my experience, this is likely to be a gevent + requests + lxml leak.

Here's the easy way to get around it: remove grequests, setup rq  -
http://python-rq.org/ - (very easy). Create a simple function that takes an
ID, then does the scraping. do a loop over the

ids = xrange(12210, 150000) and schedule a job for each ID, run as
many workers as you wish. It may use a little more memory, but it
won't leak, due to rq cleaning up properly (forking for each job)

Pony / mysql will have no problems with you doing it this way. It's
sensible to run the rq workers under supervisor, with a config like:


You can easily scale it to multiple machines if you wish, just point the
workers to the same redis and database :)

On 27 May 2014 10:37, Роман Рубан <ryr1986 at gmail.com> wrote:

> hello,
> https://gist.github.com/ryr/6a2d8997057a70be7eb3
> it's my working site crawler.
> it consumes 20gb+ of memory
> how to optimize the cache usage?
> _______________________________________________
> ponyorm-list mailing list
> ponyorm-list at ponyorm.com
> /ponyorm-list


Matthew Bell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </ponyorm-list/attachments/20140527/75f12ef9/attachment.html>

More information about the ponyorm-list mailing list