Re: [HACKERS] LONG
От | wieck@debis.com (Jan Wieck) |
---|---|
Тема | Re: [HACKERS] LONG |
Дата | |
Msg-id | m11xOws-0003kGC@orion.SAPserv.Hamburg.dsh.de обсуждение исходный текст |
Ответ на | Re: [HACKERS] LONG (Bruce Momjian <pgman@candle.pha.pa.us>) |
Ответы |
Re: [HACKERS] LONG
(wieck@debis.com (Jan Wieck))
Re: [HACKERS] LONG (Bruce Momjian <pgman@candle.pha.pa.us>) |
Список | pgsql-hackers |
Bruce Momjian wrote: > > > No need for attno in there anymore. > > > > I still need it to explicitly remove one long value on > > update, while the other one is untouched. Otherwise I would > > have to drop all long values for the row together and > > reinsert all new ones. > > I am suggesting the longoid is not the oid of the primary or long* > table, but a unque id we assigned just to number all parts of the long* > tuple. I thought that's what your oid was for. It's not even an Oid of any existing tuple, just an identifier to quickly find all the chunks of one LONG value by (non-unique) index. My idea is this now: The schema of the expansion relation is value_id Oid chunk_seq int32 chunk_data text with a non unique index on value_id. We change heap_formtuple(), heap_copytuple() etc. not to allocate the entire thing in one palloc(). Instead the tuple portion itself is allocated separately and the current memory context remembered too in the HeapTuple struct (this is required below). The long value reference in a tuple is defined as: vl_len int32; /* high bit set, 32-bit = 18 */ vl_datasize int32; /* real vl_len of long value */ vl_valueid Oid; /* value_id in expansion relation */ vl_relid Oid; /* Oid of "expansion" table */ vl_rowid Oid; /* Oid of the row in "primary" table */ vl_attno int16; /* attribute number in "primary" table */ The tuple given to heap_update() (the most complex one) can now contain usual VARLENA values of the format high-bit=0|31-bit-size|data or if the value is the result of a scan eventually high-bit=1|31-bit=18|datasize|valueid|relid|rowid|attno Now there are a couple of different cases. 1. The value found is a plain VARLENA that must be moved off. To move it off a new Oid for value_id is obtained, the value itself stored in the expansion relation and the attribute in the tuple is replaced by the above structure with the values 1, 18, original VARSIZE(), value_id, "expansion" relid, "primary" tuples Oid and attno. 2. The value found is a long value reference that has our own "expansion" relid and the correct rowid and attno. This would be the result of an UPDATE without touching this long value. Nothing to be done. 3. The value found is a long value reference of another attribute, row or relation and this attribute is enabled for move off. The long value is fetched from the expansion relation it is living in, and the same as for 1. is done with that value. There's space for optimization here, because we might have room to store the value plain. This can happen if the operation was an INSERT INTO t1 SELECT FROM t2, where t1 has few small plus one varsize attribute, while t2 has many, many long varsizes. 4. The value found is a long value reference of another attribute, row or relation and this attribute is disabled for move off (either per column or because our relation does not have an expansion relation at all). The long value is fetched from the expansion relation it is living in, and the reference in our tuple is replaced with this plain VARLENA. This in place replacement of values in the main tuple is the reason, why we have to make another allocation for the tuple data and remember the memory context where made. Due to the above process, the tuple data can expand, and we then need to change into that context and reallocate it. What heap_update() further must do is to examine the OLD tuple (that it already has grabbed by CTID for header modification) and delete all long values by their value_id, that aren't any longer present in the new tuple. The VARLENA arguments to type specific functions now can also have both formats. The macro #define VAR_GETPLAIN(arg) \ (VARLENA_ISLONG(arg) ? expand_long(arg) : (arg)) can be used to get a pointer to an allways plain representation, and the macro #define VAR_FREEPLAIN(arg,userptr) \ if (arg != userptr) pfree(userptr); is to be used to tidy up before returning. In this scenario, a function like smaller(text,text) would look like text * smaller(text *t1, text *t2) { text *plain1 = VAR_GETPLAIN(t1); text *plain2 = VAR_GETPLAIN(t2); text *result; if ( /* whatever to compare plain1 and plain2 */ ) result = t1; else result = t2; VAR_FREEPLAIN(t1,plain1); VAR_FREEPLAIN(t2,plain2); return result; } The LRU cache used in expand_long() will the again and again expansion become cheap enough. The benefit would be, that huge values resulting from table scans will be passed around in the system (in and out of sorting, grouping etc.) until they are modified or really stored/output. And the LONG index stuff should be covered here already (free lunch)! Index_insert() MUST allways be called after heap_insert()/heap_update(), because it needs the there assigned CTID. So at that time, the moved off attributes are replaced in the tuple data by the references. These will be stored instead of the values that originally where in the tuple. Should also work with hash indices, as long as the hashing functions use VAR_GETPLAIN as well. If we want to use auto compression too, no problem. We code this into another bit of the first 32-bit vl_len. The question if to call expand_long() changes now to "is one of these set". This way, we can store both, compressed and uncompressed into both, "primary" tuple or "expansion" relation. expand_long() will take care for it. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
В списке pgsql-hackers по дате отправления: