Re: polyphase merge?
От | Don Marvick |
---|---|
Тема | Re: polyphase merge? |
Дата | |
Msg-id | d18e24870902041852t4f69a91ay54409fcf3297ab26@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: polyphase merge? (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
This is what I understand from existing implementation:<br />- fix the merge fan in F, and number of tapes=F+1<br />- foreach sorted run generated, we have a formula to determine which tape it belongs to<br />- the sorted run is added to theend of the tape<br /> - all sorted runs have been added to tape. There is always an empty tape called output tape<br />-add dummy runs if necessary, to each tape<br />- merge the input tapes, write result to output tape, until one of theinput tape is empty<br /> - repeat this process until 1 sorted run remains<br /><br /><br />I think the main differenceis that at each step, the current impl does not merge the smallest k-runs. It merges from the beginning of eachtape. For 1 pass, this does not make any difference. For 2 passes onwards there are some differences. In some complexquery, where the memory is eaten by some other operators, more passes are needed and the differences become larger.This is the only improvement that can be achieved. Let me know if this does not make any sense.<br /><br />Regardingdisk layout, I am open to suggestion whether we should reuse the tape module or not. <br /><br /><br /><br /><divclass="gmail_quote">On Wed, Feb 4, 2009 at 11:20 PM, Tom Lane <span dir="ltr"><<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>></span>wrote:<br /><blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d">Don Marvick<<a href="mailto:donmarvick@gmail.com">donmarvick@gmail.com</a>> writes:<br /> > Since nobody replied, Iwould give it a try. I am going to implement the<br /> > merge pattern described in Knuth Page 365 (5.4.9), essentiallyit is as<br /> > follow:<br /> > - create initial runs using replacement selection (basically this is asin<br /> > the current implementation)<br /> > - add enough dummy runs of size 0 so that the number of sorted runminus one<br /> > can be divided by k-1 (k is merge fan in)<br /> > - repetitively merge k smallest runs until only1 run left<br /><br /></div>How is this better than, or even different from, what is there now?<br /><br /> regards, tom lane<br /></blockquote></div><br />
В списке pgsql-hackers по дате отправления: