[PonyORM-list] Controlling session caching

Alexander Kozlovsky alexander.kozlovsky at gmail.com
Fri May 13 11:16:01 UTC 2016


Hi Matthew!

I just added an experimental `strict` parameter to db_session. By default
it is set to False. If it is set to True, then the cache will be cleared
upon exit from db_session:

    with db_session(strict=True):
        do_something()
    # cache will be cleared here

or:

    @db_session(strict=True)
    def do_something():
        ...

Can you take the code from PonyORM GitHub repository and check if it fix
your problem? If not, then maybe you just retrieve too many objects during
the same db_session. Or maybe we need to optimize cascade delete thing that
we discussed earlier.

Regards,
Aexander


On Thu, May 12, 2016 at 5:13 PM, Matthew Bell <matthewrobertbell at gmail.com>
wrote:

> Hi Alexander,
>
> With regards to:
>
> In principle, I'm for pruning cache more aggressively on db_session exit,
> but unfortunately some people like to continue working with objects after
> exiting from db_session (for example, generate HTML content using some
> template engine, etc.), although in my opinion it is more correct to
> perform such actions inside db_session.
>
> I think it's important to be strict with dropping the cache on session
> exit, it's a pain to have scripts that do decent amounts of work on lots of
> objects blow up to many gigabytes of ram usage. I then have to produce
> workarounds, like restarting processes often, which is a big pain>
>
> Thanks for your work.
>
> Matt
>
> On 15 April 2016 at 19:28, Alexander Kozlovsky <
> alexander.kozlovsky at gmail.com> wrote:
>
>> Thanks for the suggestion, I'll think how to implement cache pruning.
>>
>> Regarding timing of queries, Pony already does collect such information,
>> regardless of the `sql_debug` state. A Database object has a thread-local
>> property `db.local_stats` which contains statistics information about
>> current thread, and also can be used for a single-threaded application. The
>> property value is a dict, where keys are SQL queries and values are special
>> QueryStat objects. Each QueryStat object has the following attributes:
>>
>> - `sql` - the text of SQL query
>> - `db_count` - the number of times this query was send to the database
>> - `cache_count` - the number of time the query result was taken directly
>> from the db_session cache (for cases when a query was called repeatedly
>> inside the same db_session)
>> - `min_time`, `max_time`, 'avg_time' - the time required for database to
>> execute the query
>> - `sum_time` - total time spent (should be equal to `avg_time` *
>> `db_count`)
>>
>> So you can do something like that:
>>
>>     query_stats = sorted(db.local_stats.values(), reverse=True,
>> key=attrgetter('sum_time'))
>>     for qs in query_stats:
>>         print(qs.sum_time, qs.db_count, qs.sql)
>>
>> If you call the method `db.merge_local_stats()` then the content of
>> `db.local_stats` will be merged to `db.global_stats`, and `db.local_stats`
>> will be cleared. If you are writing a web application you can
>> call `db.merge_local_stats()` when you finish processing HTTP request in
>> order to clear `db.local_stats` before processing of the next request.
>> `db.global_stats` property can be used in multi-threaded application in
>> order to get total statistics over all threads.
>>
>> Hope that helps
>>
>>
>> On Fri, Apr 15, 2016 at 8:49 PM, Matthew Bell <
>> matthewrobertbell at gmail.com> wrote:
>>
>>> Hi Alex,
>>>
>>> I don't believe any objects were leaking out of the session, the only
>>> thing i store between sessions is integers (object IDs). I have solved this
>>> problem myself by doing the work in python-rq jobs, rather than in one big
>>> script, however it would be great to have some sort of "force clear the
>>> cache" functionality - ideally, as you say having it strictly happen upon
>>> leaving session scope.
>>>
>>> Also useful for some niche situations would be having the option to
>>> disable caching for a given session.
>>>
>>> Another suggestion which is unrelated - an option or default of the
>>> timing of queries when using sql_debug(True) - this would make performance
>>> profiling much simpler, especially in web apps where many queries happen on
>>> a given request.
>>>
>>> Thanks for your work!
>>>
>>> Matt
>>>
>>> On 15 April 2016 at 16:28, Alexander Kozlovsky <
>>> alexander.kozlovsky at gmail.com> wrote:
>>>
>>>> Hi Matthew!
>>>>
>>>> At first sight it looks like a memory leak. Also it is possible that
>>>> bigger values of x in your loop retrieve larger number of objects and hence
>>>> require more memory?
>>>>
>>>> Regarding memory leak: after db_session is over, Pony releases pointer
>>>> to session cache, and in the best case all cache content will be gathered
>>>> by garbage collector. But if your code still holds a pointer to some object
>>>> in the cache, that will prevent garbage collection, because objects inside
>>>> a cache are interconnected. Are you holding some pointers to objects from
>>>> previous db sessions?
>>>>
>>>> It is possible that we have some memory leak inside Pony, but right now
>>>> we are not aware of it.
>>>>
>>>> You mentioned in one of your previous messages that in your code you
>>>> perform cascade deletion of multiple objects, which all are loaded into
>>>> memory. Does you current program perform something like that?
>>>>
>>>> In principle, I'm for pruning cache more aggressively on db_session
>>>> exit, but unfortunately some people like to continue working with objects
>>>> after exiting from db_session (for example, generate HTML content using
>>>> some template engine, etc.), although in my opinion it is more correct to
>>>> perform such actions inside db_session.
>>>>
>>>>
>>>> Regards,
>>>> Alexander
>>>>
>>>>
>>>> On Thu, Apr 14, 2016 at 10:46 PM, Matthew Bell <
>>>> matthewrobertbell at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have code like:
>>>>>
>>>>> for x in list_of_ints:
>>>>>   with db_session:
>>>>>      # do lots of database processing tied to x
>>>>>
>>>>> I am doing it like this to stop the pony cache from using a lot of
>>>>> memory, but cache usage still grows over time. How can I stop this
>>>>> happening?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Matt
>>>>>
>>>>> --
>>>>> Regards,
>>>>>
>>>>> Matthew Bell
>>>>>
>>>>> _______________________________________________
>>>>> ponyorm-list mailing list
>>>>> ponyorm-list at ponyorm.com
>>>>> /ponyorm-list
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> ponyorm-list mailing list
>>>> ponyorm-list at ponyorm.com
>>>> /ponyorm-list
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Matthew Bell
>>>
>>> _______________________________________________
>>> ponyorm-list mailing list
>>> ponyorm-list at ponyorm.com
>>> /ponyorm-list
>>>
>>>
>>
>> _______________________________________________
>> ponyorm-list mailing list
>> ponyorm-list at ponyorm.com
>> /ponyorm-list
>>
>>
>
>
> --
> Regards,
>
> Matthew Bell
>
> _______________________________________________
> ponyorm-list mailing list
> ponyorm-list at ponyorm.com
> /ponyorm-list
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </ponyorm-list/attachments/20160513/893d398e/attachment-0001.html>


More information about the ponyorm-list mailing list