Hi,
I have some questions regarding the indexing and sampling API.
My aim is to implement a variant of progressive indexing as seen in this paper (link). To summarize,
I want to implement a variant of online aggregation, where an aggregate query (Like Sum, Average, etc.) is answered in real time, where the result becomes more and more accurate as Tuples are consumed.
I thought that I could maybe use a custom sampling routine to consume table samples until I have seen the whole table with no duplicate tuples.
Meanwhile, with every consumed sample and returned partial answer, I want to add the tuples consumed to a progressively evolving index.
This would mean that I would have to be able to uniquely identify each row to be able to add them to the growing index, right? Since OID is deprecated / phased out, I am still unsure of how to solve this.
Does this sound reasonable or is there an obvious flaw in my thinking?
I would also be thankful if there is any material beyond the Postgres documentation which helps me to start out modifying the source to realize something like this.
Regards
Michael H.