Обсуждение: a question about relkind of RelationData handed over to heap_update function

Поиск
Список
Период
Сортировка

a question about relkind of RelationData handed over to heap_update function

От
노홍찬
Дата:
<div class="Section1"><p class="MsoNormal">Dear hackers,<p class="MsoNormal"> <p class="MsoNormal">I’m modifying
backendsource codes of pgsql. <p class="MsoNormal"> <p class="MsoNormal">While inspecting the heap_update function
(src/backend/access/heapam.c),<p class="MsoNormal"> <p class="MsoNormal">I found that the relkind fields of all
RelationDatawhich is handed over to heap_update are all the same as ‘r’.<p class="MsoNormal"> <p class="MsoNormal">I
wantto distinguish normal relation (actual table) from primary index relation  (primary indexes of some tables).<p
class="MsoNormal"> <pclass="MsoNormal">As you know, there are 6 different relkinds (I,r,S,u,t,v,c). <p
class="MsoNormal"> <pclass="MsoNormal">I guess primary index relation’s relkind’d be the same as normal relation’s
(i.e.‘r’).<p class="MsoNormal"> <p class="MsoNormal"> <p class="MsoNormal">Is there any way I can distinguish normal
relationfrom primary index relation in the heap_update function?<p class="MsoNormal"> <p class="MsoNormal">In the
followingcode, I want to make ‘doIcl = false’ for the primary index relation.<p class="MsoNormal"> <p
class="MsoNormal">Thankyou for reading this.<p class="MsoNormal">-------------- -------------- --------------
---------------------------- -------------- -------------- -------------- -------------- -------------- --------------
---------------------------- -------------- -------------- -------------- <p class="MsoNormal"> <p
class="MsoNormal">heap_update(Relationrelation, ItemPointer otid, HeapTuple newtup,<p
class="MsoNormal">                                               ItemPointer ctid, TransactionId *update_xmax,<p
class="MsoNormal">                                               CommandId cid, Snapshot crosscheck, bool wait)<p
class="MsoNormal">{<pclass="MsoNormal">                HTSU_Result result;<p class="MsoNormal">               
TransactionIdxid = GetCurrentTransactionId();<p class="MsoNormal">                Bitmapset  *hot_attrs;<p
class="MsoNormal">               ItemId                  lp;<p class="MsoNormal">                HeapTupleData
oldtup;<pclass="MsoNormal">                HeapTuple heaptup;<p class="MsoNormal">                Page page;<p
class="MsoNormal">               Buffer buffer, newbuf;<p class="MsoNormal">                bool need_toast,
already_marked;<pclass="MsoNormal">                Size newtupsize, pagefree;<p class="MsoNormal">                bool
have_tuple_lock= false;<p class="MsoNormal">                bool iscombo;<p class="MsoNormal">                bool
use_hot_update= false;<p class="MsoNormal">                bool all_visible_cleared = false;<p
class="MsoNormal">               bool all_visible_cleared_new = false;<p class="MsoNormal"> <p
class="MsoNormal">               /* hongs added; variables */<p class="MsoNormal">#ifdef USE_ICL<p
class="MsoNormal">               bool doIcl = false, newDoIcl = false;<p class="MsoNormal">                BufferDesc
*bufHdr= NULL;<p class="MsoNormal">                BufferDesc *newBufHdr = NULL;              //for inserting icl log
ofPageSetLSN<p class="MsoNormal">                Page newpage; //for inserting icl log of PageSetLSN<p
class="MsoNormal">               ItemId  newlp;<p class="MsoNormal">                if(relation->rd_rel->relkind
!='r') {<p class="MsoNormal">                                doIcl = true;<p class="MsoNormal">                }<p
class="MsoNormal">               else<p class="MsoNormal">                                doIcl = false;<p
class="MsoNormal">#endif<pclass="MsoNormal"> <p class="MsoNormal">-------------- -------------- --------------
---------------------------- -------------- -------------- -------------- -------------- -------------- --------------
---------------------------- -------------- -------------- -------------- <p class="MsoNormal"> <p
class="MsoNormal"> <pclass="MsoNormal"> <p class="MsoNormal" style="text-autospace:none;word-break:break-all"><b><span
style="font-size:10.0pt;font-family:"맑은고딕";color:#1F497D">- Best Regards<br />   Hongchan<br />   (<a
href="mailto:fallsmal@cs.yonsei.ac.kr"><spanstyle="color:blue">fallsmal@cs.yonsei.ac.kr</span></a>, (02)2123-7757)
-</span></b><spanstyle="font-size:10.0pt;font-family:"맑은 고딕""></span><p class="MsoNormal"> <p class="MsoNormal"> </div> 

Re: a question about relkind of RelationData handed over to heap_update function

От
Tom Lane
Дата:
노홍찬 <fallsmal@cs.yonsei.ac.kr> writes:
> I found that the relkind fields of all RelationData which is handed over to
> heap_update are all the same as ��r��.

Well, yeah: heap_update is applied to heaps (ordinary tables).  Not indexes.
The indexes are generally updated in a separate operation afterwards.

> I want to distinguish normal relation (actual table) from primary index
> relation  (primary indexes of some tables).

Perhaps you should take about three steps back and explain what it is
you want to do, because heap_update is probably not the right place
to be doing it.
        regards, tom lane


Re: a question about relkind of RelationData handed over to heap_update function

От
노홍찬
Дата:
Dear tom lane and hackers,

I am sorry, I should have explained the reason.

Actually, I'm not modifying the backend source code.

Since I am not a native speaker, I am not good at writing in English.

I'm just trying to make my own pgsql code for my research purpose.

Later, if my research turns out successful, then I can contribute in enhancing pgsql at that time

by concretely implementing it.


I'm researching on DBMS I/O performance issues regarding flash memory and flash-SSDs.

Flash-memory has asynchronous read/write latency, and flash-SSDs as well.

Therefore, reducing random-writes to flash based storage is quite a issue.


What I am trying to do now is to examine the real dirty portion of buffer pages to be flushed like the following.
  page 1
-------------
|           |     dportion1 (real dirty portion 1) ranges between 20 ~ 80
| dportion1 |
|           |    dportion2 (real dirty portion 2) ranges between 8190 ~ 8192
|           |
| dportion2 |
-------------

Since there are many different kinds of page-updates such as updates to local buffer, temp relation, indexes, toasted
attributes,and so forth. 

It would be a big burden to me if I inspect all that codes.

Therefore, I decided to make a start point as inspecting only updates to the ordinary tables.

I added a log array field to BufferDesc struct, and added logs to the designated bufferDesc of the updated buffer

when it comes to ordinary table updates (The logs specifies the real dirty portion ranges of the buffer).


So far, I covered (at least I thought I covered ..) several functions such as heap_udpate, heap_insert, heap_delete,
heap_inplace_update, 
, heap_lock_tuple, heap_page_prune, heap_page_prune_execute, heap_lock_tuple, pageAddItem, pageRepairFragmenation,
putRelationTuple.

Until now I didn't care about vacuum-related function since I turned off the autovacuum option in the conf file.


I think it's too early to tell how my idea is going to work. When I am ready to confidently say that my idea

can enhance the pgsql's performance a little bit at less expense of losing other features, I will submit a proposal.


It's, for sure, not easy to grasp how the backend works, though.

Several articles and wiki pages helped me a lot, and especially well-annotated codes was the most helpful.

What I have been going through helped me a lot to understand the internal of DBMSs, and actually it was fun to read

the real working codes of a DBMS.

In the aspect that this remarkable open-source DBMS codes are so well maintained and continuously enhanced by this
community

that many people including me can study and participate in, I really thank you and hackers.


About the question, I think I am confused a little. I don't know why, but still the debug routine of my code says that

the log inserted in heap_update belongs to a primary index relation. I will figure it out.


- Best Regards Hongchan Roh -


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, October 26, 2009 12:07 AM
To: 노홍찬
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] a question about relkind of RelationData handed over to heap_update function

노홍찬 <fallsmal@cs.yonsei.ac.kr> writes:
> I found that the relkind fields of all RelationData which is handed over to
> heap_update are all the same as ‘r’.

Well, yeah: heap_update is applied to heaps (ordinary tables).  Not indexes.
The indexes are generally updated in a separate operation afterwards.

> I want to distinguish normal relation (actual table) from primary index
> relation  (primary indexes of some tables).

Perhaps you should take about three steps back and explain what it is
you want to do, because heap_update is probably not the right place
to be doing it.
        regards, tom lane



Re: a question about relkind of RelationData handed over to heap_update function

От
Greg Smith
Дата:
On Mon, 26 Oct 2009, ??? wrote:

> What I am trying to do now is to examine the real dirty portion of 
> buffer pages to be flushed like the following.

You can trivially use pg_buffercache for view this, and its code in 
contrib/pg_buffercache will show you how to navigate the buffer cache data 
too.  There's example of how to use it in the documentation for that 
module and I've got some additional ones on my web page at 
http://www.westnet.com/~gsmith/content/postgresql in the slides and 
examples for "Inside the PostgreSQL Buffer Cache".

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: a question about relkind of RelationData handed over to heap_update function

От
노홍찬
Дата:
Dear Greg Smith,

Thank you for letting me know about the presentations in your homepage.

It's going to be much helpful in understanding the internal of postgresql further.


- Best Regards Hongchan Roh -


-----Original Message-----
From: Greg Smith [mailto:gsmith@gregsmith.com]
Sent: Monday, October 26, 2009 5:32 AM
To: 노홍찬
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] a question about relkind of RelationData handed over to heap_update function

On Mon, 26 Oct 2009, ??? wrote:

> What I am trying to do now is to examine the real dirty portion of
> buffer pages to be flushed like the following.

You can trivially use pg_buffercache for view this, and its code in
contrib/pg_buffercache will show you how to navigate the buffer cache data
too.  There's example of how to use it in the documentation for that
module and I've got some additional ones on my web page at
http://www.westnet.com/~gsmith/content/postgresql in the slides and
examples for "Inside the PostgreSQL Buffer Cache".

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD



Re: a question about relkind of RelationData handed over to heap_update function

От
Greg Stark
Дата:
On Sun, Oct 25, 2009 at 9:37 AM, 노홍찬 <fallsmal@cs.yonsei.ac.kr> wrote:
> What I am trying to do now is to examine the real dirty portion of buffer pages to be flushed like the following.
>
>   page 1
> -------------
> |           |   dportion1 (real dirty portion 1) ranges between 20 ~ 80
> | dportion1 |
> |           |   dportion2 (real dirty portion 2) ranges between 8190 ~ 8192
> |           |
> | dportion2 |
> -------------
>
> Since there are many different kinds of page-updates such as updates to local buffer, temp relation, indexes, toasted
attributes,and so forth. 
>
> It would be a big burden to me if I inspect all that codes.
>
> Therefore, I decided to make a start point as inspecting only updates to the ordinary tables.
>
> I added a log array field to BufferDesc struct, and added logs to the designated bufferDesc of the updated buffer
>
> when it comes to ordinary table updates (The logs specifies the real dirty portion ranges of the buffer).
>

I would think you would want to modify MarkBufferDirty to take a start
and end point and store that in your log. Then modify every existing
MarkBufferDirty operation that you can to specify the range that the
subsequent operation is going to modify. You're going to run into
problems where you have code which looks like:
- mark buffer dirty- do some work which modifies a predictable portion- if (some rare condition)   - do some more work
whichmodifies other parts of the buffer 

The "some more work" may be some function call which doesn't usually
do much either.

So you may end up having to restructure a lot of code so that every
function is responsible for marking the buffer range dirty itself
instead of assuming it's already been marked.


--
greg


Re: a question about relkind of RelationData handed over to heap_update function

От
노홍찬
Дата:
Dear Greg Stark,

Totally, right. I want to record the all updated region.
So, doing some work is not doing a little work.

But, I am trying to not touch the existing codes as much as I can.
Therefore, I mostly added my code, I didn't changed markDirtyBuffer function at all, but, of course, I have created a
functionthat is supposed to work similarly to what you mentioned. 

I am sorry that I couldn't understand the following sentence's meaning (The "some more work" may be some function call
whichdoesn't usually do much either.). 
What did you mean in that sentence? Please excuse my poor English understanding, and it would be great if you can
explainthe meaning more again. 


Until now, it's like this, I have appended several fields to BufferDesc structure, and my own structure (IclNewLog) is
usedfor recording those dirty regions. 

------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
------------------------ ------------ 

typedef struct sbufdesc {BufferTag tag; /* ID of page contained in buffer */BufFlags flags; /* see bit definitions
above*/uint16 usage_count; /* usage counter for clock sweep code */unsigned refcount; /* # of backends holding pins on
buffer*/int wait_backend_pid; /* backend PID of pin-count waiter */ 
slock_t buf_hdr_lock; /* protects the above fields */
int buf_id; /* buffer's index number (from 0) */int freeNext; /* link in freelist chain */
LWLockId io_in_progress_lock; /* to wait for I/O to complete */LWLockId content_lock; /* to lock access to buffer
contents*/ 
/* hongs added */
#ifdef USE_ICLbool isBufferPageNewOrXlogRead;int    icl_length;IclNewLog icl_logs[ICL_LEN_LIMIT];
#endif/* hongs added */

} BufferDesc;

typedef struct IclNewLog {int change_start;int change_end;uint32 file; //for ICL_DEBUGint line; //for ICL_DEBUGint
icl_log_global_seq;   //for ICL_DEBUG 
} IclNewLog;

------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
------------------------ ------------ 

* a part of heap_update function *

Line number: 2761: oldtup.t_data->t_ctid = heaptup->t_self;

/* hongs added; ICL logs oldtuple's tupleheader */
#ifdef USE_ICLif(doIcl) {    LockBufHdr(bufHdr);    //buffer header lock and buffer content lock is separate, so I
guessthe buffer header lock is needed    if(bufHdr->icl_length < ICL_LEN_LIMIT-1) {
bufHdr->icl_logs[bufHdr->icl_length].change_start= lp->lp_off;        bufHdr->icl_logs[bufHdr->icl_length].change_end =
lp->lp_off+ sizeof(HeapTupleHeaderData);        bufHdr->icl_logs[bufHdr->icl_length].file = HEAPAM;
bufHdr->icl_logs[bufHdr->icl_length].line= 3003;        IclAssert( IsIclLogValid(bufHdr->icl_logs[bufHdr->icl_length])
);   //making sure of the correctness of the logsize        bufHdr->icl_length++;    }    UnlockBufHdr(bufHdr);} 
#endif
/* hongs added end */

Line number: 2762:    if (newbuf != buffer)
Line number: 2763:    MarkBufferDirty(newbuf);
Line number: 2764:    MarkBufferDirty(buffer);
------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
------------------------ ------------ 

I named the log "icl log".
The above code is recording "the update to the old tuple's tuple header" into the log array field of the buffer
descriptorwhose buffer page is supposed to be marked dirty. 

I'm not interested in the buffers frequently updated. I'm interested in the buffers to be flushed having very small
amountof genuine update areas. 
Since, pgsql's update policy uses MVCC time-shapshot model, so every update causes the update of old tuple's header
(changingthe xmax field of it). 
There might be some buffer pages to be flushed which have only one or two small regions of genuine updates like updated
xmaxfield or updated XLogRecPtr. 
I think, purely in my opinion, those flush operations that have small amount of genuine update regions are inefficient.

However, it's not the only problems of pgsql, though. The in-place update operations of every DBMS have similar
problems.
I think pgsql's update logic is less problematic than others,
since the main updates (not old tuple's header update but the real tuples) could be piled up in a buffer page (not in
scatteredpages),  
and the hot-update mechanism addresses the previous problems of time-snapshot MVCC well in pgsql.

Therefore, I limited the maximum log array size as 20. If I apply some log merge logic (cuz there would be many logs
whichcan be merged together like 8152 ~ 8172 and 8162 ~ 8192 -> 8151 ~ 8192) 
, then the array size would be enough to locate the buffers having small genuine update regions. I don't care about the
bufferswhich has logs more than the maximum log array size. 

It's an example, current codes doesn't look like this.
I am trying to not touch the previous codes but only append my logic, so that later my code can be patched as an
additionalmodule for specific purpose like flash based storage. 

I want to emphasize this once more, this attempt is not for the pgsql patch or pgsql enhancement but for my own
researchpurpose, at least for now. 
Besides, this try is just a preparation for my research idea to be implemented.
Therefore, if you see much of inefficiency and stupidness in this try, please understand that.
Later, when I am confident to show the total picture of my idea and working codes (at least after passing through the
regressiontest and my own tests using dbt2-benchmark), 
I'll present it to you, and hackers.

I really thank your interest in my try.

For the original query, I found my mistake. I confused relation oid with relNode (of relFileNode). Sorry for the hasty
question.

Thank you for reading this.

- Best Regards Hongchan Roh -



-----Original Message-----
From: gsstark@gmail.com [mailto:gsstark@gmail.com] On Behalf Of Greg Stark
Sent: Tuesday, October 27, 2009 2:22 AM
To: 노홍찬
Cc: pgsql-hackers@postgresql.org
Subject: Re: a question about relkind of RelationData handed over to heap_update function

On Sun, Oct 25, 2009 at 9:37 AM, 노홍찬 <fallsmal@cs.yonsei.ac.kr> wrote:
> What I am trying to do now is to examine the real dirty portion of buffer pages to be flushed like the following.
>
>   page 1
> -------------
> |           |   dportion1 (real dirty portion 1) ranges between 20 ~ 80
> | dportion1 |
> |           |   dportion2 (real dirty portion 2) ranges between 8190 ~ 8192
> |           |
> | dportion2 |
> -------------
>
> Since there are many different kinds of page-updates such as updates to local buffer, temp relation, indexes, toasted
attributes,and so forth. 
>
> It would be a big burden to me if I inspect all that codes.
>
> Therefore, I decided to make a start point as inspecting only updates to the ordinary tables.
>
> I added a log array field to BufferDesc struct, and added logs to the designated bufferDesc of the updated buffer
>
> when it comes to ordinary table updates (The logs specifies the real dirty portion ranges of the buffer).
>

I would think you would want to modify MarkBufferDirty to take a start
and end point and store that in your log. Then modify every existing
MarkBufferDirty operation that you can to specify the range that the
subsequent operation is going to modify. You're going to run into
problems where you have code which looks like:
- mark buffer dirty- do some work which modifies a predictable portion- if (some rare condition)   - do some more work
whichmodifies other parts of the buffer 

The "some more work" may be some function call which doesn't usually
do much either.

So you may end up having to restructure a lot of code so that every
function is responsible for marking the buffer range dirty itself
instead of assuming it's already been marked.


--
greg