[PonyORM-list] implementing new types ?

Alexander Kozlovsky alexander.kozlovsky at gmail.com
Mon Sep 29 19:43:08 UTC 2014


On Sat, Sep 27, 2014 at 1:12 AM, stephane martin <stef.martin at gmail.com>
wrote:

> Hello,
> i'd like to be able to use more complicated postgresql datatypes with pony
> (ENUM, ARRAYS, JSON...)
> I still dont know much of pony code. Could you provide a few hints about
> what i need to do to get those other types working with the usual pony "attr_name
> = kind(type)" ?
>
>
Hello!

Yes, it is possible to add such types. The task of adding new type to Pony
may be separated to two different sub-tasks:
(1) Adding support of new type in entity definitions, so you can write
"attr_name = kind(type)" and perform basic CRUD operations.
(2) Adding support to translator, so you can write queries like "ARRAY[2,7]
<@ ARRAY[1,7,4,2,6]" in PostgreSQL.

The (2) is hard, because it requires deep understanding of how Pony
translator works. But the (1) is relatively easy, and I'll describe it
here. Maybe it can satisfy your needs without the extension of the
translator. In any case, translator support may always be added later. So,
for now I'll concentrate on the adding new types to entity definition.


1. SELECTING CORRESPONDING PYTHON TYPES
############################################################

The first step is to define what Python type will represent the
corresponding PostgreSQL type when you write "attr_name = kind(type)". One
option is to use some standard Python types, the other option is to define
new types. So, the first option is to use Python `list` type for PostgreSQL
ARRAY type and Python `dict` type for PostgreSQL JSON type. This probably
will look nice in entity definition. On the other hand, JSON values are not
necessary dictionaries, they may be arrays as well. Because of this,
another option is to define new Python type Json, and use it during
attribute definition. At this moment I don't yet come to the conclusion
which way is better.

Regarding ENUM, there are two options here too:
1) Use standard Python Enum class, introduced in Python 3.4 and backported
to Python 2.7
2) Write custom Enum class, which semantic matches PostgreSQL ENUM datatype
better.

Regarding of what way you will choose, the entity definitions will look
either like this:

    class Color(Enum):
        red = 1
        green = 2
        blue = 3

    class MyEntity(db.Entity):
        name = Required(unicode)
        colors = Required(Color)  # ENUM
        items = Required(list, int, 10)  # integer ARRAY[10]
        data = Required(dict)  # JSON

or like this:

    Color = SomeEnumClassFactory("Color", "red green blue")

    class MyEntity(db.Entity):
        name = Required(unicode)
        colors = Required(Color)  # ENUM
        items = Required(Array(int, 10))  # integer ARRAY[10]
        data = Required(Json)  # JSON


2. DEFINING CONVERTER CLASSES
############################################################

After you decided what Python types you want to use, you need to tell Pony
how to work with these types. To do it, you should define converter classes
inside database provider's module. Pony has special database provider
module for each supported database, these modules are placed inside
`pony.orm.dbproviders` package. Currently there are four provider modules -
`pony.orm.dbproviders.sqlite`, `pony.orm.dbproviders.mysql`,
`pony.orm.dbproviders.postgres` and `pony.orm.dbproviders.oracle`. Each
provider module contains converter classes for all supported types.

The converter class should inherit from the base Converter class defined in
`pony.orm.dbapiprovider` module. This module represent abstract database
provider which contains basic implementation of all necessary stuff
required to implement specific database provider. The specific provider,
such as `pony.orm.dbproviders.postgres` import classes from
`pony.orm.dbapiprovider` and redefined what is necessary to implement all
database-specific details.

In future, when we will emulate ENUM, ARRAY and JSON types in all
databases, we can place base implementation of corresponding converters
(EnumConverter, ArrayConverter and JSONConverter) in the
`pony.orm.dbapiprovider` module and then inherit from this classes in all
specific provider modules. But since right now we are not going to support
these types in any database except PostgreSQL, you can define these
converter classes (EnumConverter, ArrayConverter and JSONConverter) right
inside `pony.orm.dbproviders.postgres` module and inherit them from the
base `pony.orm.dbapiptovider.Converter` class.


2.1. RELATION BETWEEN CONVERTERS, TYPES AND ATTRIBUTES
############################################################

Each converter CLASS must know how to handle specific Python TYPE. You can
see implementation of such classes as IntConverter, DateConverter, etc. A
converter INSTANCE is typically created to handle specific entity ATTRIBUTE
of the corresponding type. You can see converter instance which is defined
for the specific attribute:

    >>> from pony.orm import *
    >>> db = Database('sqlite', ':memory:')
    >>> class Person(db.Entity):
    ... name = Required(unicode)
    ...
    >>> db.generate_mapping(create_tables=True)
    >>> print Person.name.converters
    [<pony.orm.dbapiprovider.UnicodeConverter object at 0x04763E10>]

In that case, SQLite database provider uses base UnicodeConverter class
without overriding of any methods, because that class is already suitable
for the SQLite needs. Note that an attribute can have more then one
converter if that attribute represented composite foreign key. But typical
attribute have just one converter.

Most converter instances are tied to specific entity attribute, which can
be accessed as `converter.attr`, but some converters are not tied to any
attribute. Such converters represents query parameters. For example, the
query

    select(p for p in Person if p.name == x)

have the UnicodeConverter which represents external parameter `x`. For such
converters, value of `converter.attr` is None.


2.2. CONVERTER RESPONSIBILITIES
############################################################

The converter instances should be able to do the following tasks:

1) Processing of the additional positional and keyword options of the
attribute. For example,  when we define attribute

    salary = Required(Decimal, 10, 2, min=Decimal("100"),
max=Decimal("100000"))

the converter is responsible for parsing all attribute arguments after the
Decimal type.

2) If necessary, converting values from Python type to the corresponding
database format and vise versa. For example, when time values are stored to
MySQL database, they must be converted to timedelta values before storing
to the database, and converted from the timedelta to time values after
reading values from the database.

3) Conversion to appropriate type when the value is assigned to the
attribute. Beside the main Python type specified in the attribute, the
converter can also accept values of other types which can be converted to
the values of the specified attribute type. For example, DecimalConverter
can also accept integer and float values. Also, each converter should
accept text values in an appropriate format and convert them to the normal
 values of the specified type. This is useful if the attribute values are
received as GET or POST parameters of the HTTP request. For example,
IntConverter is able to accept value "123" and convert it to the number
123, and TimeConverter is able to accept value "12:34:56" and convert it to
time(12, 34, 56).

It is not necessary that the real value of the attribute should have type
specified in the attribute. The type used in the attribute definition is
just a marker to find the desired converter class. For example, if you
define attribute as Required(list) and have registered converter for the
list type, the real value stored in the attribute is not need to be an
instance of list. You can define some custom type MyArray if this is
necessary to reflect semantic of database type correctly, and store values
of that type in the attribute.

4) The next task is the validation of values, assigned to the object
attribute. It is possible to specify constraints, such as min and max
values for a numeric attribute types. Converter is responsible for checking
if the provided value satisfies all specified constraints.

5) The last converter task is the generation of the SQL type definition for
the attribute. For example, if an attribute is specified as
Required(Decimal, 10, 2), the converter of the attribute should be able to
generate SQL type "DECIMAL(10, 2)".


2.2. CONVERTER METHODS
############################################################

Each converter class should define the following methods:

class MySpecificConverter(Converter):

    def init(self, kwargs):
        # Override this method to process additional positional
        # and keyword arguments of the attribute

        if self.attr is not None:
            # self.attr.args can be analyzed here
            self.args = self.attr.args

        self.my_optional_argument = kwargs.pop("kwarg_name")
        # You should take all valid options from this kwargs
        # What is left in is regarded as unrecognized option

    def validate(self, val):
        # convert value to the necessary type (e.g. from string)
        # validate all necessary constraints (e.g. min/max bounds)
        return val

    def py2sql(self, val):
        # prepare the value (if necessary) to storing in the database
        return val

    def sql2py(self, value):
        # convert value (if necessary) after the reading from the db
        return val

    def sql_type(self):
        # generate corresponding SQL type, based on attribute options
        return "SOME_SQL_TYPE_DEFINITION"


You can study the code of the existing converters to see how these methods
are implemented.


2.2. REGISTERING OF THE CONVERTER
############################################################

After the converter class is implemented, it should be registered inside
the provider class. Each provider module has corresponding provider class.
For example, `pony.orm.dbproviders.postgres` module has `PGProvider` class.
In this class you can see list of all PostgreSQL converters with the
corresponding Python type:

    class PGProvider(DBAPIProvider):
        ...
        converter_classes = [
            (bool, dbapiprovider.BoolConverter),
            (unicode, PGUnicodeConverter),
            (str, PGStrConverter),
            (long, PGLongConverter),
            ...

You can add new converters to the end of this list:

            (list, ArrayConverter),
            (dict, JSONConverter),
            (Enum, EnumConverter),
        ]

After that the new types can be used in entity definitions. When Pony
search converter for the attribute, it call isinstance on the python type
specified with the converter class, and returns first converter for which
isinstance(attr_type, converter_py_type) returns True


3. SOME CAVEATS
############################################################

Most of Python types which can be used in entity definitions are immutable.
This means that the attribute value can be changed only if new value
assigned directly to the attribute. But the new types ARRAY and JSON which
you want to implement will not be immutable if you implement them in Python
as a convenient list and dict values. This is undesirable, because Pony
wants to track each attribute in order to know which attribute values were
read and which attribute values were written. For example, the following
code may lead to problems:

    a_list = my_object.my_array
    a_list.append(new_value)
    # oops, Pony doesn't know that the value
    # of the attribute has changed as a side-effect

When Pony process commit() command, it stores changed attribute values to
the database. If an attribute can change unbeknown to Pony, the attribute
value may be missed and not stored to the database. So I think that when
assigning to the attributes of ARRAY and JSON types you should assign not
the usual Python list and dict values, but some immutable objects, which
cannot be changed as a side-effect of other operation except direct
assignment of new value to the attribute. This way Pony always can track
which attributes were read and which attributes were written.


As you can see, the adding of new type is not an easy task, but it is
doable. Don't hesitate to ask additional question.

Regards,
Alexander Kozlovsky
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </ponyorm-list/attachments/20140929/48f6a38b/attachment-0001.html>


More information about the ponyorm-list mailing list