Обсуждение: Advices on custom data type and extension development

Поиск
Список
Период
Сортировка

Advices on custom data type and extension development

От
Luciano Coutinho Barcellos
Дата:
Dear friends,
    I'm planning to develop an extension, and I'm here for getting some 
help. But I would like to share the problem I intend to solve. Maybe my 
desired solution is not a good option.
    What I have:
        * a lot of data being generated every day, which are mainly 
queried by an immutable column of type date or timestamp;        * as a standard, almost every table has a bigserial id
column
 
as a primary key;        * data is huge enough to demand table partitioning, which is 
implemented as suggested in Postgres documentation, by using triggers 
and table inheritance. A function called by cron deal with creation of 
new partitions.
    What I would like to develop first is a custom type (let's call it 
datedserial) for replacing bigserial as the primary key:
        * the type would be 8 bytes long, being 4 dedicated to storing 
the Date, and 4 dedicated to storing a serial within that day;        * the text representation of the type would show
itsdate and 
 
its serial number (something like '2015-10-02.0000007296' as a canonical 
form, but which could accept inputs like '20151002.0000007296');        * as a consequence of this internal
representation,the serial 
 
part could not be greater than 4 billion and some;        * support for operator classes allowing the type being used
in
 
GIN and GIST indexes would be optional for now.
    That would allow me to have a compact primary key which I could use 
to partition the table based on the object's date. That would also allow 
me to partition detail tables on the foreign key column having this data 
type. Besides that, just by examining the value, mainly when used as a 
foreign key, I could infer where the record belongs to.
    When I have a working custom data type, I would go to the next and 
harder part. I would like to create a new structure like a sequence, and 
it should behave exactly like sequences, but separated by a date space. 
So I would have functions similar to the following:
        * createsequencegroup(sequence_group_name text): create a new 
named structure for managing the sequence group;        * nextval(sequence_group_name text, context_date date): return

next value of the sequence (as a datedserial) belonging to the sequence 
group and associated with the context date. The value returned have the 
context_date in its date part and the next value for that date in the 
sequence part. The first call for a specific date would return 1 for the 
sequence part. Concerning to concurrency and transactions, the function 
behaves exactly like nextval(sequence_group_name text);        * currval(sequence_group_name text, context_date date):
the
 
currval function counterpart;        * setval(sequence_group_name text, context_date date, int4 
value): the setval function counterpart;        * freeze_before(sequence_group_name text, freeze_date date): 
disallow using the sequence group with context dates before the freeze_date.
    I would consider extending the data type to allow including 
information about the cluster which generated the values. This way, the 
user could set a configuration entry defining a byte value for 
identifying the cluster among others involved in replication, so that 
the sequence group could have different sequences not only for different 
dates, but for different nodes as well.
    As I've said, I would like to package the resulting work as an 
extension.
    For now, I would like some help about where to start. I've 
downloaded the postgres source code and have successfully compiled it 
using my Ubuntu desktop, although have not tested the resulting binary. 
Should I create a folder in the contrib directory and use another 
extension as a starting point? Is this the recommended path? Or is this 
too much and I should create a separate project?
    Thanks in advance.
    Best regards,    Luciano Barcellos




Re: Advices on custom data type and extension development

От
Kevin Grittner
Дата:
On Mon, Jan 18, 2016 at 9:36 PM, Luciano Coutinho Barcellos
<luciano@geocontrol.com.br> wrote:

>         * a lot of data being generated every day, which are mainly queried
> by an immutable column of type date or timestamp;
>         * as a standard, almost every table has a bigserial id column as a
> primary key;
>         * data is huge enough to demand table partitioning, which is
> implemented as suggested in Postgres documentation, by using triggers and
> table inheritance. A function called by cron deal with creation of new
> partitions.
>
>     What I would like to develop first is a custom type (let's call it
> datedserial) for replacing bigserial as the primary key:
>
>         * the type would be 8 bytes long, being 4 dedicated to storing the
> Date, and 4 dedicated to storing a serial within that day;

Seriously, you should consider having a primary key with two
columns, of type date and int.  It would take exactly the same
space as your current plan, and performance should be very close to
what you propose.  As long as you aren't using some ORM that is too
dumb to deal with this, it should be far easier than creating the
custom type.

If you can't overcome the limitations of the "standard" or your
development framework any other way, you plan sounds like the next
best thing.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Advices on custom data type and extension development

От
Corey Huinker
Дата:
Seriously, you should consider having a primary key with two
columns, of type date and int.  It would take exactly the same
space as your current plan, and performance should be very close to
what you propose.  As long as you aren't using some ORM that is too
dumb to deal with this, it should be far easier than creating the
custom type.

+1 

Most ORMs cannot handle ENUMs, let alone user defined composite types.

That, or they flood the database with SELECT * FROM pg_type WHERE ... queries. I'm looking at you, Cake.

You're far better off trying a (date,integer) key as Kevin said.

If the ORM doesn't allow that, I'd suggest a custom function that encodes the date bit-shifted to the high 4 bytes, and then adds in the four bytes from a cycling sequence. At least then you've got a shot at partitioning, though the lower/upper bounds of the partitions would not make sense to the casual observer.

Re: Advices on custom data type and extension development

От
Alvaro Herrera
Дата:
Luciano Coutinho Barcellos wrote:

>         * the type would be 8 bytes long, being 4 dedicated to storing the
> Date, and 4 dedicated to storing a serial within that day;

Another thing to consider -- have you carefully defined what the
"current day" is?  This might sound a stupid question, but as far as I
remember Brazil has at least two timezones, which means that you could
have one date while far east and a different one at the west border of
the country.  If you misplace an order that a customer filed after 11pm,
they will be pretty pissed.  (This of course becomes more pressing if
you have things outside the country.)

If you simply state that dates are whatever is current in UTC zone,
you're covered.  (Pray you never get an order during a leap second.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services