Обсуждение: Gist consistent and compression
Hello list, An idea for possible improvement of query speed on gist indexes with a costly compression function. We have a gist index that uses a compressed internal datatype. The compression is with some cost (it involves a syscache lookup). The functions called by the consistent functions are binary functions on the internal type. Since the query parameter passed to consistent is the 'external' datatype, it must be compressed by the consistent function. When performing a query, the same data value (the query) is compressed repeatedly while traversing the nodes in the index tree. The query speed increased substantially after building in an ad hoc caching (a hash) in the compression function. We believe that every gist index that uses compression could benefit from a cache, even a small one. So it would make sense to add the caching to the gist support in the backend, instead of having to build one in the user defined functions gist functions for every datatype. Any thoughts? regards, Yeb Havinga
Hi, Le 1 mai 09 à 23:04, Yeb Havinga a écrit : > An idea for possible improvement of query speed on gist indexes with > a costly compression function. We have a gist index that uses a > compressed internal datatype. The compression is with some cost (it > involves a syscache lookup). [...] > small one. So it would make sense to add the caching to the gist > support in the backend, instead of having to build one in the user > defined functions gist functions for every datatype. > > Any thoughts? It seems to me that what you're asking for is addressed indirectly in the possibility to make your internal data type a full SQL visible datatype. Then you store this new datatype directly in the table and index that. Instead of converting from external to internal type at consistent() time in a query, provide an implicit CAST for external to internal for queries to "just work" without editing. The CAST will get called once per literal in the query. See prefix and the prefix_range datatype as an example of this: http://blog.tapoueh.org/prefix.html I'm not sure if the caching facility should be added to -core PostgreSQL or not, I'm just proposing another way to solve your problem without requiring it... Regards, -- dim
Hi Dimitri, list > It seems to me that what you're asking for is addressed indirectly in > the possibility to make your internal data type a full SQL visible > datatype. Then you store this new datatype directly in the table and > index that. Instead of converting from external to internal type at > consistent() time in a query, provide an implicit CAST for external to > internal for queries to "just work" without editing. The CAST will get > called once per literal in the query. Thanks for your reply. If there would be a way that the user never sees the 'internal' type in query results (with some rules perhaps), it might work. But somehow it feels 'not right'; the user datatype is the external one, and how the gist index works internally should not be put as burden on the user. The compression is also lossy, so recheck is necessary and I think that will not work if only the internal type is stored in the relation. > See prefix and the prefix_range datatype as an example of this: > http://blog.tapoueh.org/prefix.html I wish I'd seen the slides on that link half a year ago, since I think they are helpful to anyone that wants to start writing a gist index. I remember using google translate on a russion page about gist in the search for information. Besides that also reading Guttmans r-tree paper was good, to have any idea what picksplit, penalty and union are supposed to do. regards, Yeb Havinga