[PonyORM-list] trying to optimize the cache
matthewrobertbell at gmail.com
Tue May 27 11:41:01 UTC 2014
Also, you should use a pool when using gevent, to limit concurrency.
On 27 May 2014 12:35, Matthew Bell <matthewrobertbell at gmail.com> wrote:
> In my experience, this is likely to be a gevent + requests + lxml leak.
> Here's the easy way to get around it: remove grequests, setup rq -
> http://python-rq.org/ - (very easy). Create a simple function that takes
> an ID, then does the scraping. do a loop over the
> ids = xrange(12210, 150000) and schedule a job for each ID, run as many workers as you wish. It may use a little more memory, but it won't leak, due to rq cleaning up properly (forking for each job)
> Pony / mysql will have no problems with you doing it this way. It's
> sensible to run the rq workers under supervisor, with a config like:
> You can easily scale it to multiple machines if you wish, just point the
> workers to the same redis and database :)
> On 27 May 2014 10:37, Роман Рубан <ryr1986 at gmail.com> wrote:
>> it's my working site crawler.
>> it consumes 20gb+ of memory
>> how to optimize the cache usage?
>> ponyorm-list mailing list
>> ponyorm-list at ponyorm.com
> Matthew Bell
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ponyorm-list