[PonyORM-list] Controlling session caching
Alexander Kozlovsky
alexander.kozlovsky at gmail.com
Mon Feb 13 07:05:22 UTC 2017
Hi Matthew,
The problem with these approaches is that Pony doesn't know which objects
are referenced from application code and should not be purged from the
cache.
Maybe we can add explicit `obj.purge()` function which may be applied
explicitly to specific objects you want to purge.
But also you can try another approach. Pony allows to pickle and unpickle
objects between different db sessions. If your code has some big loop, you
can make each loop iteration a separate db_session. At the end of iteration
you will pickle all useful objects and then unpickle them at the start of
the new loop iteration. The code will look something like that:
import cPickle as pickle
@db_session
def get_initial_objects():
objects = select(obj for obj in MyObjects if obj.level == 1)[:]
return pickle.dumps(objects)
@db_session # you can try to add (strict=True) here
def loop_iteration(prev_objects):
prev_objects = pickle.loads(prev_objects)
new_objects = set()
for obj in prev_objects:
new_objects.update(obj.children)
return pickle.dumps(new_objects)
pickled_objects = get_initial_objects()
while True:
pickled_objects = loop_iteration(pickled_objects)
if <some condition>:
break
This approach may be more convenient comparing to `obj.purge()`
hypothetical function, because it allows you to keep useful objects instead
of purging useless ones
Regards,
Alexander
On Mon, Feb 13, 2017 at 1:01 AM, Matthew Bell <matthewrobertbell at gmail.com>
wrote:
> Hi Alexander,
>
> Strict mode is useful, thanks, however I believe there's a use case where
> this isn't sufficient:
>
> with db_session:
> main_object = Model.get(id)
> # some process that references main_object every time, but always
> using different entities from another model
>
>
> This happens with background processes I have which are "refining" large
> parts of my data set into some new objects. Due to the nature of the
> problem, it has to be solved by pulling lots of objects in many queries.
> Each query depends on the result of the previous query. This is currently
> eating more and more ram as the total database size grows, it's currently
> at tens of GB used.
>
> There are two ways I can think of that would solve this: an explicit
> pony.orm.trim_cache() function to be called inside the loop, and a
> db_session(max_cache_objects=1000). With either of these, memory usage
> should be on the order of 100MB.
>
> What do you think?
>
> Thanks,
>
> Matt
>
> On 13 May 2016 at 19:16, Alexander Kozlovsky <
> alexander.kozlovsky at gmail.com> wrote:
>
>> Hi Matthew!
>>
>> I just added an experimental `strict` parameter to db_session. By default
>> it is set to False. If it is set to True, then the cache will be cleared
>> upon exit from db_session:
>>
>> with db_session(strict=True):
>> do_something()
>> # cache will be cleared here
>>
>> or:
>>
>> @db_session(strict=True)
>> def do_something():
>> ...
>>
>> Can you take the code from PonyORM GitHub repository and check if it fix
>> your problem? If not, then maybe you just retrieve too many objects during
>> the same db_session. Or maybe we need to optimize cascade delete thing that
>> we discussed earlier.
>>
>> Regards,
>> Aexander
>>
>>
>> On Thu, May 12, 2016 at 5:13 PM, Matthew Bell <
>> matthewrobertbell at gmail.com> wrote:
>>
>>> Hi Alexander,
>>>
>>> With regards to:
>>>
>>> In principle, I'm for pruning cache more aggressively on db_session
>>> exit, but unfortunately some people like to continue working with objects
>>> after exiting from db_session (for example, generate HTML content using
>>> some template engine, etc.), although in my opinion it is more correct to
>>> perform such actions inside db_session.
>>>
>>> I think it's important to be strict with dropping the cache on session
>>> exit, it's a pain to have scripts that do decent amounts of work on lots of
>>> objects blow up to many gigabytes of ram usage. I then have to produce
>>> workarounds, like restarting processes often, which is a big pain>
>>>
>>> Thanks for your work.
>>>
>>> Matt
>>>
>>> On 15 April 2016 at 19:28, Alexander Kozlovsky <
>>> alexander.kozlovsky at gmail.com> wrote:
>>>
>>>> Thanks for the suggestion, I'll think how to implement cache pruning.
>>>>
>>>> Regarding timing of queries, Pony already does collect such
>>>> information, regardless of the `sql_debug` state. A Database object has a
>>>> thread-local property `db.local_stats` which contains statistics
>>>> information about current thread, and also can be used for a
>>>> single-threaded application. The property value is a dict, where keys are
>>>> SQL queries and values are special QueryStat objects. Each QueryStat object
>>>> has the following attributes:
>>>>
>>>> - `sql` - the text of SQL query
>>>> - `db_count` - the number of times this query was send to the database
>>>> - `cache_count` - the number of time the query result was taken
>>>> directly from the db_session cache (for cases when a query was called
>>>> repeatedly inside the same db_session)
>>>> - `min_time`, `max_time`, 'avg_time' - the time required for database
>>>> to execute the query
>>>> - `sum_time` - total time spent (should be equal to `avg_time` *
>>>> `db_count`)
>>>>
>>>> So you can do something like that:
>>>>
>>>> query_stats = sorted(db.local_stats.values(), reverse=True,
>>>> key=attrgetter('sum_time'))
>>>> for qs in query_stats:
>>>> print(qs.sum_time, qs.db_count, qs.sql)
>>>>
>>>> If you call the method `db.merge_local_stats()` then the content of
>>>> `db.local_stats` will be merged to `db.global_stats`, and `db.local_stats`
>>>> will be cleared. If you are writing a web application you can
>>>> call `db.merge_local_stats()` when you finish processing HTTP request in
>>>> order to clear `db.local_stats` before processing of the next request.
>>>> `db.global_stats` property can be used in multi-threaded application in
>>>> order to get total statistics over all threads.
>>>>
>>>> Hope that helps
>>>>
>>>>
>>>> On Fri, Apr 15, 2016 at 8:49 PM, Matthew Bell <
>>>> matthewrobertbell at gmail.com> wrote:
>>>>
>>>>> Hi Alex,
>>>>>
>>>>> I don't believe any objects were leaking out of the session, the only
>>>>> thing i store between sessions is integers (object IDs). I have solved this
>>>>> problem myself by doing the work in python-rq jobs, rather than in one big
>>>>> script, however it would be great to have some sort of "force clear the
>>>>> cache" functionality - ideally, as you say having it strictly happen upon
>>>>> leaving session scope.
>>>>>
>>>>> Also useful for some niche situations would be having the option to
>>>>> disable caching for a given session.
>>>>>
>>>>> Another suggestion which is unrelated - an option or default of the
>>>>> timing of queries when using sql_debug(True) - this would make performance
>>>>> profiling much simpler, especially in web apps where many queries happen on
>>>>> a given request.
>>>>>
>>>>> Thanks for your work!
>>>>>
>>>>> Matt
>>>>>
>>>>> On 15 April 2016 at 16:28, Alexander Kozlovsky <
>>>>> alexander.kozlovsky at gmail.com> wrote:
>>>>>
>>>>>> Hi Matthew!
>>>>>>
>>>>>> At first sight it looks like a memory leak. Also it is possible that
>>>>>> bigger values of x in your loop retrieve larger number of objects and hence
>>>>>> require more memory?
>>>>>>
>>>>>> Regarding memory leak: after db_session is over, Pony releases
>>>>>> pointer to session cache, and in the best case all cache content will be
>>>>>> gathered by garbage collector. But if your code still holds a pointer to
>>>>>> some object in the cache, that will prevent garbage collection, because
>>>>>> objects inside a cache are interconnected. Are you holding some pointers to
>>>>>> objects from previous db sessions?
>>>>>>
>>>>>> It is possible that we have some memory leak inside Pony, but right
>>>>>> now we are not aware of it.
>>>>>>
>>>>>> You mentioned in one of your previous messages that in your code you
>>>>>> perform cascade deletion of multiple objects, which all are loaded into
>>>>>> memory. Does you current program perform something like that?
>>>>>>
>>>>>> In principle, I'm for pruning cache more aggressively on db_session
>>>>>> exit, but unfortunately some people like to continue working with objects
>>>>>> after exiting from db_session (for example, generate HTML content using
>>>>>> some template engine, etc.), although in my opinion it is more correct to
>>>>>> perform such actions inside db_session.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Alexander
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 14, 2016 at 10:46 PM, Matthew Bell <
>>>>>> matthewrobertbell at gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have code like:
>>>>>>>
>>>>>>> for x in list_of_ints:
>>>>>>> with db_session:
>>>>>>> # do lots of database processing tied to x
>>>>>>>
>>>>>>> I am doing it like this to stop the pony cache from using a lot of
>>>>>>> memory, but cache usage still grows over time. How can I stop this
>>>>>>> happening?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Matt
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>>
>>>>>>> Matthew Bell
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ponyorm-list mailing list
>>>>>>> ponyorm-list at ponyorm.com
>>>>>>> /ponyorm-list
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ponyorm-list mailing list
>>>>>> ponyorm-list at ponyorm.com
>>>>>> /ponyorm-list
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>>
>>>>> Matthew Bell
>>>>>
>>>>> _______________________________________________
>>>>> ponyorm-list mailing list
>>>>> ponyorm-list at ponyorm.com
>>>>> /ponyorm-list
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> ponyorm-list mailing list
>>>> ponyorm-list at ponyorm.com
>>>> /ponyorm-list
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Matthew Bell
>>>
>>> _______________________________________________
>>> ponyorm-list mailing list
>>> ponyorm-list at ponyorm.com
>>> /ponyorm-list
>>>
>>>
>>
>> _______________________________________________
>> ponyorm-list mailing list
>> ponyorm-list at ponyorm.com
>> /ponyorm-list
>>
>>
>
>
> --
> Regards,
>
> Matthew Bell
>
> _______________________________________________
> ponyorm-list mailing list
> ponyorm-list at ponyorm.com
> /ponyorm-list
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </ponyorm-list/attachments/20170213/42cc3661/attachment-0001.html>
More information about the ponyorm-list
mailing list