Re: RAID arrays and performance
От | Mark Mielke |
---|---|
Тема | Re: RAID arrays and performance |
Дата | |
Msg-id | 47557B12.5040809@mark.mielke.cc обсуждение исходный текст |
Ответ на | Re: RAID arrays and performance (James Mansion <james@mansionfamily.plus.com>) |
Ответы |
Re: RAID arrays and performance
|
Список | pgsql-performance |
James Mansion wrote: > Mark Mielke wrote: >> This assumes that you can know which pages to fetch ahead of time - >> which you do not except for sequential read of a single table. > Why doesn't it help to issue IO ahead-of-time requests when you are > scanning an index? You can read-ahead > in index pages, and submit requests for data pages as soon as it is > clear you'll want them. Doing so can allow > the disks and OS to relax the order in which you receive them, which > may allow you to process them while IO > continues, and it may also optimise away some seeking and settle > time. Maybe. Sorry to be unclear. To achieve a massive speedup (12X for 12 disks with RAID 0) requires that you know what reads to perform in advance. The moment you do not, you only have a starting point, your operations begin to serialize again. For example, you must scan the first index, to be able to know what table rows to read. At a minimum, this breaks your query into: 1) Preload all the index pages you will need, 2) Scan the index pages you needed, 3) Preload all the table page you will need, 4) Scan the table pages you needed. But do you really need the whole index? What if you only need parts of the index, and this plan now reads the whole index using async I/O "just in case" it is useful? Index is a B-Tree for a reason. In Matthew's case where he has an IN clause with thousands of possibles (I think?), perhaps a complete index scan is always the best case - but that's only one use case, and in my opinion, an obscure one. As soon as additional table joins become involved, the chance that whole index scans are required would probably normally reduce, which turns the index scan into a regular B-Tree scan, which is difficult to perform async I/O for, as you don't necessarily know which pages to read next. It seems like a valuable goal - but throwing imaginary numbers around does not appeal to me. I am more interested in Gregory's simulations. I would like to understand his simulation better, and see his results. Speculation about amazing potential is barely worth the words used to express it. The real work is in design and implementation. :-) Cheers, mark -- Mark Mielke <mark@mielke.cc>
В списке pgsql-performance по дате отправления: