[PonyORM-list] Controlling session caching

Matthew Bell matthewrobertbell at gmail.com
Mon Feb 13 14:35:18 UTC 2017
Previous message (by thread): [PonyORM-list] Controlling session caching
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Alexander,

Is the only negative consequence of a cache missing an object that the
object has to be re-fetched from the database?

If so, would a strictly opt-in LRU or counting cache be able to offer the
tradeoff of more overhead (for tracking object accesses) versus memory
usages?

I think the example with pickling you gave would work, but it seems rather
inelegant compared to letting pony automatically handle things.

Thanks,

Matt

On 13 February 2017 at 07:05, Alexander Kozlovsky <
alexander.kozlovsky at gmail.com> wrote:

> Hi Matthew,
>
> The problem with these approaches is that Pony doesn't know which objects
> are referenced from application code and should not be purged from the
> cache.
>
> Maybe we can add explicit `obj.purge()` function which may be applied
> explicitly to specific objects you want to purge.
>
> But also you can try another approach. Pony allows to pickle and unpickle
> objects between different db sessions. If your code has some big loop, you
> can make each loop iteration a separate db_session. At the end of iteration
> you will pickle all useful objects and then unpickle them at the start of
> the new loop iteration. The code will look something like that:
>
>
> import cPickle as pickle
>
> @db_session
> def get_initial_objects():
>     objects = select(obj for obj in MyObjects if obj.level == 1)[:]
>     return pickle.dumps(objects)
>
> @db_session  # you can try to add (strict=True) here
> def loop_iteration(prev_objects):
>     prev_objects = pickle.loads(prev_objects)
>     new_objects = set()
>     for obj in prev_objects:
>         new_objects.update(obj.children)
>     return pickle.dumps(new_objects)
>
> pickled_objects = get_initial_objects()
> while True:
>     pickled_objects = loop_iteration(pickled_objects)
>     if <some condition>:
>         break
>
>
> This approach may be more convenient comparing to `obj.purge()`
> hypothetical function, because it allows you to keep useful objects instead
> of purging useless ones
>
>
> Regards,
> Alexander
>
> On Mon, Feb 13, 2017 at 1:01 AM, Matthew Bell <matthewrobertbell at gmail.com
> > wrote:
>
>> Hi Alexander,
>>
>> Strict mode is useful, thanks, however I believe there's a use case where
>> this isn't sufficient:
>>
>> with db_session:
>>         main_object = Model.get(id)
>>         # some process that references main_object every time, but always
>> using different entities from another model
>>
>>
>> This happens with background processes I have which are "refining" large
>> parts of my data set into some new objects. Due to the nature of the
>> problem, it has to be solved by pulling lots of objects in many queries.
>> Each query depends on the result of the previous query. This is currently
>> eating more and more ram as the total database size grows, it's currently
>> at tens of GB used.
>>
>> There are two ways I can think of that would solve this: an explicit
>> pony.orm.trim_cache() function to be called inside the loop, and a
>> db_session(max_cache_objects=1000). With either of these, memory usage
>> should be on the order of 100MB.
>>
>> What do you think?
>>
>> Thanks,
>>
>> Matt
>>
>> On 13 May 2016 at 19:16, Alexander Kozlovsky <
>> alexander.kozlovsky at gmail.com> wrote:
>>
>>> Hi Matthew!
>>>
>>> I just added an experimental `strict` parameter to db_session. By
>>> default it is set to False. If it is set to True, then the cache will be
>>> cleared upon exit from db_session:
>>>
>>>     with db_session(strict=True):
>>>         do_something()
>>>     # cache will be cleared here
>>>
>>> or:
>>>
>>>     @db_session(strict=True)
>>>     def do_something():
>>>         ...
>>>
>>> Can you take the code from PonyORM GitHub repository and check if it fix
>>> your problem? If not, then maybe you just retrieve too many objects during
>>> the same db_session. Or maybe we need to optimize cascade delete thing that
>>> we discussed earlier.
>>>
>>> Regards,
>>> Aexander
>>>
>>>
>>> On Thu, May 12, 2016 at 5:13 PM, Matthew Bell <
>>> matthewrobertbell at gmail.com> wrote:
>>>
>>>> Hi Alexander,
>>>>
>>>> With regards to:
>>>>
>>>> In principle, I'm for pruning cache more aggressively on db_session
>>>> exit, but unfortunately some people like to continue working with objects
>>>> after exiting from db_session (for example, generate HTML content using
>>>> some template engine, etc.), although in my opinion it is more correct to
>>>> perform such actions inside db_session.
>>>>
>>>> I think it's important to be strict with dropping the cache on session
>>>> exit, it's a pain to have scripts that do decent amounts of work on lots of
>>>> objects blow up to many gigabytes of ram usage. I then have to produce
>>>> workarounds, like restarting processes often, which is a big pain>
>>>>
>>>> Thanks for your work.
>>>>
>>>> Matt
>>>>
>>>> On 15 April 2016 at 19:28, Alexander Kozlovsky <
>>>> alexander.kozlovsky at gmail.com> wrote:
>>>>
>>>>> Thanks for the suggestion, I'll think how to implement cache pruning.
>>>>>
>>>>> Regarding timing of queries, Pony already does collect such
>>>>> information, regardless of the `sql_debug` state. A Database object has a
>>>>> thread-local property `db.local_stats` which contains statistics
>>>>> information about current thread, and also can be used for a
>>>>> single-threaded application. The property value is a dict, where keys are
>>>>> SQL queries and values are special QueryStat objects. Each QueryStat object
>>>>> has the following attributes:
>>>>>
>>>>> - `sql` - the text of SQL query
>>>>> - `db_count` - the number of times this query was send to the database
>>>>> - `cache_count` - the number of time the query result was taken
>>>>> directly from the db_session cache (for cases when a query was called
>>>>> repeatedly inside the same db_session)
>>>>> - `min_time`, `max_time`, 'avg_time' - the time required for database
>>>>> to execute the query
>>>>> - `sum_time` - total time spent (should be equal to `avg_time` *
>>>>> `db_count`)
>>>>>
>>>>> So you can do something like that:
>>>>>
>>>>>     query_stats = sorted(db.local_stats.values(), reverse=True,
>>>>> key=attrgetter('sum_time'))
>>>>>     for qs in query_stats:
>>>>>         print(qs.sum_time, qs.db_count, qs.sql)
>>>>>
>>>>> If you call the method `db.merge_local_stats()` then the content of
>>>>> `db.local_stats` will be merged to `db.global_stats`, and `db.local_stats`
>>>>> will be cleared. If you are writing a web application you can
>>>>> call `db.merge_local_stats()` when you finish processing HTTP request in
>>>>> order to clear `db.local_stats` before processing of the next request.
>>>>> `db.global_stats` property can be used in multi-threaded application in
>>>>> order to get total statistics over all threads.
>>>>>
>>>>> Hope that helps
>>>>>
>>>>>
>>>>> On Fri, Apr 15, 2016 at 8:49 PM, Matthew Bell <
>>>>> matthewrobertbell at gmail.com> wrote:
>>>>>
>>>>>> Hi Alex,
>>>>>>
>>>>>> I don't believe any objects were leaking out of the session, the only
>>>>>> thing i store between sessions is integers (object IDs). I have solved this
>>>>>> problem myself by doing the work in python-rq jobs, rather than in one big
>>>>>> script, however it would be great to have some sort of "force clear the
>>>>>> cache" functionality - ideally, as you say having it strictly happen upon
>>>>>> leaving session scope.
>>>>>>
>>>>>> Also useful for some niche situations would be having the option to
>>>>>> disable caching for a given session.
>>>>>>
>>>>>> Another suggestion which is unrelated - an option or default of the
>>>>>> timing of queries when using sql_debug(True) - this would make performance
>>>>>> profiling much simpler, especially in web apps where many queries happen on
>>>>>> a given request.
>>>>>>
>>>>>> Thanks for your work!
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>> On 15 April 2016 at 16:28, Alexander Kozlovsky <
>>>>>> alexander.kozlovsky at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Matthew!
>>>>>>>
>>>>>>> At first sight it looks like a memory leak. Also it is possible that
>>>>>>> bigger values of x in your loop retrieve larger number of objects and hence
>>>>>>> require more memory?
>>>>>>>
>>>>>>> Regarding memory leak: after db_session is over, Pony releases
>>>>>>> pointer to session cache, and in the best case all cache content will be
>>>>>>> gathered by garbage collector. But if your code still holds a pointer to
>>>>>>> some object in the cache, that will prevent garbage collection, because
>>>>>>> objects inside a cache are interconnected. Are you holding some pointers to
>>>>>>> objects from previous db sessions?
>>>>>>>
>>>>>>> It is possible that we have some memory leak inside Pony, but right
>>>>>>> now we are not aware of it.
>>>>>>>
>>>>>>> You mentioned in one of your previous messages that in your code you
>>>>>>> perform cascade deletion of multiple objects, which all are loaded into
>>>>>>> memory. Does you current program perform something like that?
>>>>>>>
>>>>>>> In principle, I'm for pruning cache more aggressively on db_session
>>>>>>> exit, but unfortunately some people like to continue working with objects
>>>>>>> after exiting from db_session (for example, generate HTML content using
>>>>>>> some template engine, etc.), although in my opinion it is more correct to
>>>>>>> perform such actions inside db_session.
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Alexander
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 14, 2016 at 10:46 PM, Matthew Bell <
>>>>>>> matthewrobertbell at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have code like:
>>>>>>>>
>>>>>>>> for x in list_of_ints:
>>>>>>>>   with db_session:
>>>>>>>>      # do lots of database processing tied to x
>>>>>>>>
>>>>>>>> I am doing it like this to stop the pony cache from using a lot of
>>>>>>>> memory, but cache usage still grows over time. How can I stop this
>>>>>>>> happening?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Matthew Bell
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ponyorm-list mailing list
>>>>>>>> ponyorm-list at ponyorm.com
>>>>>>>> /ponyorm-list
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ponyorm-list mailing list
>>>>>>> ponyorm-list at ponyorm.com
>>>>>>> /ponyorm-list
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>
>>>>>> Matthew Bell
>>>>>>
>>>>>> _______________________________________________
>>>>>> ponyorm-list mailing list
>>>>>> ponyorm-list at ponyorm.com
>>>>>> /ponyorm-list
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ponyorm-list mailing list
>>>>> ponyorm-list at ponyorm.com
>>>>> /ponyorm-list
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Matthew Bell
>>>>
>>>> _______________________________________________
>>>> ponyorm-list mailing list
>>>> ponyorm-list at ponyorm.com
>>>> /ponyorm-list
>>>>
>>>>
>>>
>>> _______________________________________________
>>> ponyorm-list mailing list
>>> ponyorm-list at ponyorm.com
>>> /ponyorm-list
>>>
>>>
>>
>>
>> --
>> Regards,
>>
>> Matthew Bell
>>
>> _______________________________________________
>> ponyorm-list mailing list
>> ponyorm-list at ponyorm.com
>> /ponyorm-list
>>
>>
>
> _______________________________________________
> ponyorm-list mailing list
> ponyorm-list at ponyorm.com
> /ponyorm-list
>
>


-- 
Regards,

Matthew Bell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </ponyorm-list/attachments/20170213/8055845a/attachment-0001.html>
Previous message (by thread): [PonyORM-list] Controlling session caching
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the ponyorm-list mailing list