[PonyORM-list] Controlling session caching

Matthew Bell matthewrobertbell at gmail.com
Sun Feb 12 22:01:47 UTC 2017


Hi Alexander,

Strict mode is useful, thanks, however I believe there's a use case where
this isn't sufficient:

with db_session:
        main_object = Model.get(id)
        # some process that references main_object every time, but always
using different entities from another model


This happens with background processes I have which are "refining" large
parts of my data set into some new objects. Due to the nature of the
problem, it has to be solved by pulling lots of objects in many queries.
Each query depends on the result of the previous query. This is currently
eating more and more ram as the total database size grows, it's currently
at tens of GB used.

There are two ways I can think of that would solve this: an explicit
pony.orm.trim_cache() function to be called inside the loop, and a
db_session(max_cache_objects=1000). With either of these, memory usage
should be on the order of 100MB.

What do you think?

Thanks,

Matt

On 13 May 2016 at 19:16, Alexander Kozlovsky <alexander.kozlovsky at gmail.com>
wrote:

> Hi Matthew!
>
> I just added an experimental `strict` parameter to db_session. By default
> it is set to False. If it is set to True, then the cache will be cleared
> upon exit from db_session:
>
>     with db_session(strict=True):
>         do_something()
>     # cache will be cleared here
>
> or:
>
>     @db_session(strict=True)
>     def do_something():
>         ...
>
> Can you take the code from PonyORM GitHub repository and check if it fix
> your problem? If not, then maybe you just retrieve too many objects during
> the same db_session. Or maybe we need to optimize cascade delete thing that
> we discussed earlier.
>
> Regards,
> Aexander
>
>
> On Thu, May 12, 2016 at 5:13 PM, Matthew Bell <matthewrobertbell at gmail.com
> > wrote:
>
>> Hi Alexander,
>>
>> With regards to:
>>
>> In principle, I'm for pruning cache more aggressively on db_session exit,
>> but unfortunately some people like to continue working with objects after
>> exiting from db_session (for example, generate HTML content using some
>> template engine, etc.), although in my opinion it is more correct to
>> perform such actions inside db_session.
>>
>> I think it's important to be strict with dropping the cache on session
>> exit, it's a pain to have scripts that do decent amounts of work on lots of
>> objects blow up to many gigabytes of ram usage. I then have to produce
>> workarounds, like restarting processes often, which is a big pain>
>>
>> Thanks for your work.
>>
>> Matt
>>
>> On 15 April 2016 at 19:28, Alexander Kozlovsky <
>> alexander.kozlovsky at gmail.com> wrote:
>>
>>> Thanks for the suggestion, I'll think how to implement cache pruning.
>>>
>>> Regarding timing of queries, Pony already does collect such information,
>>> regardless of the `sql_debug` state. A Database object has a thread-local
>>> property `db.local_stats` which contains statistics information about
>>> current thread, and also can be used for a single-threaded application. The
>>> property value is a dict, where keys are SQL queries and values are special
>>> QueryStat objects. Each QueryStat object has the following attributes:
>>>
>>> - `sql` - the text of SQL query
>>> - `db_count` - the number of times this query was send to the database
>>> - `cache_count` - the number of time the query result was taken directly
>>> from the db_session cache (for cases when a query was called repeatedly
>>> inside the same db_session)
>>> - `min_time`, `max_time`, 'avg_time' - the time required for database to
>>> execute the query
>>> - `sum_time` - total time spent (should be equal to `avg_time` *
>>> `db_count`)
>>>
>>> So you can do something like that:
>>>
>>>     query_stats = sorted(db.local_stats.values(), reverse=True,
>>> key=attrgetter('sum_time'))
>>>     for qs in query_stats:
>>>         print(qs.sum_time, qs.db_count, qs.sql)
>>>
>>> If you call the method `db.merge_local_stats()` then the content of
>>> `db.local_stats` will be merged to `db.global_stats`, and `db.local_stats`
>>> will be cleared. If you are writing a web application you can
>>> call `db.merge_local_stats()` when you finish processing HTTP request in
>>> order to clear `db.local_stats` before processing of the next request.
>>> `db.global_stats` property can be used in multi-threaded application in
>>> order to get total statistics over all threads.
>>>
>>> Hope that helps
>>>
>>>
>>> On Fri, Apr 15, 2016 at 8:49 PM, Matthew Bell <
>>> matthewrobertbell at gmail.com> wrote:
>>>
>>>> Hi Alex,
>>>>
>>>> I don't believe any objects were leaking out of the session, the only
>>>> thing i store between sessions is integers (object IDs). I have solved this
>>>> problem myself by doing the work in python-rq jobs, rather than in one big
>>>> script, however it would be great to have some sort of "force clear the
>>>> cache" functionality - ideally, as you say having it strictly happen upon
>>>> leaving session scope.
>>>>
>>>> Also useful for some niche situations would be having the option to
>>>> disable caching for a given session.
>>>>
>>>> Another suggestion which is unrelated - an option or default of the
>>>> timing of queries when using sql_debug(True) - this would make performance
>>>> profiling much simpler, especially in web apps where many queries happen on
>>>> a given request.
>>>>
>>>> Thanks for your work!
>>>>
>>>> Matt
>>>>
>>>> On 15 April 2016 at 16:28, Alexander Kozlovsky <
>>>> alexander.kozlovsky at gmail.com> wrote:
>>>>
>>>>> Hi Matthew!
>>>>>
>>>>> At first sight it looks like a memory leak. Also it is possible that
>>>>> bigger values of x in your loop retrieve larger number of objects and hence
>>>>> require more memory?
>>>>>
>>>>> Regarding memory leak: after db_session is over, Pony releases pointer
>>>>> to session cache, and in the best case all cache content will be gathered
>>>>> by garbage collector. But if your code still holds a pointer to some object
>>>>> in the cache, that will prevent garbage collection, because objects inside
>>>>> a cache are interconnected. Are you holding some pointers to objects from
>>>>> previous db sessions?
>>>>>
>>>>> It is possible that we have some memory leak inside Pony, but right
>>>>> now we are not aware of it.
>>>>>
>>>>> You mentioned in one of your previous messages that in your code you
>>>>> perform cascade deletion of multiple objects, which all are loaded into
>>>>> memory. Does you current program perform something like that?
>>>>>
>>>>> In principle, I'm for pruning cache more aggressively on db_session
>>>>> exit, but unfortunately some people like to continue working with objects
>>>>> after exiting from db_session (for example, generate HTML content using
>>>>> some template engine, etc.), although in my opinion it is more correct to
>>>>> perform such actions inside db_session.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Alexander
>>>>>
>>>>>
>>>>> On Thu, Apr 14, 2016 at 10:46 PM, Matthew Bell <
>>>>> matthewrobertbell at gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have code like:
>>>>>>
>>>>>> for x in list_of_ints:
>>>>>>   with db_session:
>>>>>>      # do lots of database processing tied to x
>>>>>>
>>>>>> I am doing it like this to stop the pony cache from using a lot of
>>>>>> memory, but cache usage still grows over time. How can I stop this
>>>>>> happening?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>
>>>>>> Matthew Bell
>>>>>>
>>>>>> _______________________________________________
>>>>>> ponyorm-list mailing list
>>>>>> ponyorm-list at ponyorm.com
>>>>>> /ponyorm-list
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ponyorm-list mailing list
>>>>> ponyorm-list at ponyorm.com
>>>>> /ponyorm-list
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Matthew Bell
>>>>
>>>> _______________________________________________
>>>> ponyorm-list mailing list
>>>> ponyorm-list at ponyorm.com
>>>> /ponyorm-list
>>>>
>>>>
>>>
>>> _______________________________________________
>>> ponyorm-list mailing list
>>> ponyorm-list at ponyorm.com
>>> /ponyorm-list
>>>
>>>
>>
>>
>> --
>> Regards,
>>
>> Matthew Bell
>>
>> _______________________________________________
>> ponyorm-list mailing list
>> ponyorm-list at ponyorm.com
>> /ponyorm-list
>>
>>
>
> _______________________________________________
> ponyorm-list mailing list
> ponyorm-list at ponyorm.com
> /ponyorm-list
>
>


-- 
Regards,

Matthew Bell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </ponyorm-list/attachments/20170213/9cc9acdf/attachment.html>


More information about the ponyorm-list mailing list