New/Revised TODO? Gathering actual read performance data for use by planner
От | Michael Nolan |
---|---|
Тема | New/Revised TODO? Gathering actual read performance data for use by planner |
Дата | |
Msg-id | BANLkTi=tNr6EBAObv_t-KLTwREZWKAhTYw@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: New/Revised TODO? Gathering actual read performance
data for use by planner
|
Список | pgsql-hackers |
In the TODO list is this item:<br /><br /><b>Modify the planner to better estimate caching effects <br /></b><br />Tom mentionedthis in his presentation at PGCON, and I also chatted with Tom about it briefly afterwards.<br /><br />Based onlast year's discussion of this TODO item, it seems thoughts have been focused on estimating how much data is<br /> beingsatisfied from PG's shared buffers. However, I think that's only part of the problem. <br /><br />Specifically, readperformance is going to be affected by:<br /><br />1. Reads fulfilled from shared buffers.<br /> 2. Reads fulfilledfrom system cache.<br />3. Reads fulfilled from disk controller cache.<br />4. Reads from physical media.<br /><br/>#4 is further complicated by the type of physical media for that specific block. For example, reads that can<br />be fulfilled from a SSD are going to be much faster than ones that access hard drives (or even slower types of media.)<br/><br />System load is going to impact all of these as well.<br /><br />Therefore, I suggest that an alternativeto the above TODO may be to gather performance data without knowing <br /> (or more importantly without needingto know) which of the above sources fulfilled the read. <br /><br />This data would probably need to be kept separatelyfor each table or index, as some tables or indexes <br />may be mostly or fully in cache or on faster physicalmedia than others, although in the absence of other <br /> data about a specific table or index, data about otherrelations in the same tablespace might be of some use. <br /><br />Tom mentioned that the cost of doing multiple systemtime-of-day calls for each block read might be <br /> prohibitive, it may also be that the data may also be too coarseon some systems to be truly useful <br />(eg, the epoch time in seconds.) <br /><br />If this data were available,that could mean that successive plans for the same query could have <br /> significantly different plans (and thusactual performance), based on what has happened recently, <br />so these statistics would have to be relatively shortterm and updated frequently, but without becoming <br />computational bottlenecks. <br /><br />The problem is one I'minterested in working on.<br />--<br />Mike Nolan<br />
В списке pgsql-hackers по дате отправления: