[PonyORM-list] trying to optimize the cache
Matthew Bell
matthewrobertbell at gmail.com
Tue May 27 11:41:49 UTC 2014
I misread, disregard the last message.
On 27 May 2014 12:41, Matthew Bell <matthewrobertbell at gmail.com> wrote:
> Also, you should use a pool when using gevent, to limit concurrency.
>
>
> On 27 May 2014 12:35, Matthew Bell <matthewrobertbell at gmail.com> wrote:
>
>> In my experience, this is likely to be a gevent + requests + lxml leak.
>>
>> Here's the easy way to get around it: remove grequests, setup rq -
>> http://python-rq.org/ - (very easy). Create a simple function that takes
>> an ID, then does the scraping. do a loop over the
>>
>> ids = xrange(12210, 150000) and schedule a job for each ID, run as many workers as you wish. It may use a little more memory, but it won't leak, due to rq cleaning up properly (forking for each job)
>>
>> Pony / mysql will have no problems with you doing it this way. It's
>> sensible to run the rq workers under supervisor, with a config like:
>>
>> [program:rq]
>> directory=/app_folder/
>> command=rqworker
>> process_name=%(process_num)02d
>> numprocs=6
>> autostart=true
>> autorestart=true
>> stopsignal=TERM
>>
>> You can easily scale it to multiple machines if you wish, just point the
>> workers to the same redis and database :)
>>
>> On 27 May 2014 10:37, Роман Рубан <ryr1986 at gmail.com> wrote:
>>
>>> hello,
>>>
>>> https://gist.github.com/ryr/6a2d8997057a70be7eb3
>>> it's my working site crawler.
>>> it consumes 20gb+ of memory
>>>
>>> how to optimize the cache usage?
>>>
>>> _______________________________________________
>>> ponyorm-list mailing list
>>> ponyorm-list at ponyorm.com
>>> /ponyorm-list
>>>
>>>
>>
>>
>> --
>> Regards,
>>
>> Matthew Bell
>>
>
>
>
> --
> Regards,
>
> Matthew Bell
>
--
Regards,
Matthew Bell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </ponyorm-list/attachments/20140527/3456d433/attachment.html>
More information about the ponyorm-list
mailing list