neon: a functional database, git for structured data
От | Amirouche Boubekki |
---|---|
Тема | neon: a functional database, git for structured data |
Дата | |
Msg-id | 49ebfb078a1b415b9ebcc580a185e7b5@hypermove.net обсуждение исходный текст |
Список | pgsql-hackers |
Hello, I think, I found a way to create a performant versioned triple store. Neon achieves versioning *almost* without copy of a set of tuples that looks like: (graph, subject, predicate, object) I described the implementation in GNU Guile using wiredtiger in the attached (poorly rendered) pdf document. The project is AGPLv3. for the time being, the repository is private, send me your github handle if you want to lurk around. Basically, the triples are attached to unsigned 64 bits integers called 'transaction' that draws a direct-acyclic-graph named 'branch history'. That is, neon stores tuples that looks like the following: (graph, subject, predicate, object, transaction) transaction are mapped in another table. That other table stores the following tuples (transaction) → (parent1, parent2, sha-256) This builds DAG where vertex can have at most two incoming edges but as many outgoings edges. outgoings edges represent branches; and vertices have in general a single incoming edge. Vertices with two incomings edges are merge transactions. Given a 'transaction' it's possible to know the added and deleted triples. Based on the 'branch history', a mapping is computed associating 'transaction' to another unsigned 64 bits integer that denotes its 'history significance'. The order procedure use a topological sort that preserves 'history significance'. To compute the value of given (graph, subject, predicate) I use a code equivalent to the following: SELECT graph, subject, predicate, object, HMAX(trasaction, 'master') FROM store WHERE graph='wikipedia', subject='Database', predicate='See Also' GROUP BY graph, subject, predicate, object Where HMAX poke at the 'branch history' mapping to compute the latest transaction. Special transactions called 'merge' introduce the necessary adds and deletes that reconcile history between two branch. The tuples introduced by a 'merge' transaction will have a bigger 'history significance' shadowing the conflicting tuples. There is a bit of copying in case of updates and deletes. The above mapping is implemented with an immutable datastructure called simply 'history' which can be shared between readers and writers. Because it would be intractable to compute the history otherwise, it's not possible for two transactions to have the same parent while being in the same branch. Otherwise anonymous branches are forbidden (unlike mercurial). There is write lock on a branch history that serialize write making the database 'single writer on a branch'. That said, it's possible to create a branches concurrently. A requested feature is to have a PostgreSQL backend. Q: Is there a way to speed up the computation of HMAX? Maybe via a specific index on transaction(parent1, parent2)? Q: Does an index (or primary key?) speed up range queries over multiple columns? Feedback welcome! -- Amirouche ~ amz3 ~ http://www.hyperdev.fr
Вложения
В списке pgsql-hackers по дате отправления: