Обсуждение: An Idea for OID conflicts
Folks, I would like to have your thoughts on a solution for the duplicate OIDs. I wanted to apply the uuid patch on a newly download source from the CVS. Patching and "make install" went just okay but "make check" and initdb failed to my surprise. A quick look at duplicate_oids showed that almost all of the OIDs I got available from unused_oids were already in use. So naturally I went "!!?! segmentation fault **!!?". I can only imagine how time consuming this can be for the committers to correct these manually. I think we can solve this problem with the combination of the following three steps. 1. When using new OIDs always start from a fixed number. For example 10000. This way the new OIDs are easy to recognize and the developer can continue the work. 2. Always use the new OIDs sequentially. 3. Make a small utility that goes through a patch, finds the new OIDs and changes them back to a value specified by the committer(s). Would this be workable? Regards, Gevik.
Gevik Babakhani wrote: > Folks, > > I would like to have your thoughts on a solution for the duplicate OIDs. > > I wanted to apply the uuid patch on a newly download source from the > CVS. Patching and "make install" went just okay but "make check" and > initdb failed to my surprise. A quick look at duplicate_oids showed that > almost all of the OIDs I got available from unused_oids were already in > use. So naturally I went "!!?! segmentation fault **!!?". > > I can only imagine how time consuming this can be for the committers to > correct these manually. > > I think we can solve this problem with the combination of the following > three steps. > > 1. When using new OIDs always start from a fixed number. For example > 10000. This way the new OIDs are easy to recognize and the developer can > continue the work. > 2. Always use the new OIDs sequentially. > 3. Make a small utility that goes through a patch, finds the new OIDs > and changes them back to a value specified by the committer(s). > > Would this be workable? > > My idea was to have a file called reserved_oids.h which would contain lines like: #error "do not include this file anywhere" CATALOG(reserved_for_foo_module,9876) /* 2006-09-18 */ and which would be examined by the unused_oids script. To get oids reserved, a committer would add lines to that file for you. When your patch was done, it would include stuff to remove those lines from the file. That way you will be working with the oids your patch will eventually use - no later fixup necessary. Every release the file would be cleaned of old entries if necessary. cheers andrew
Gevik Babakhani <pgdev@xs4all.nl> writes: > 3. Make a small utility that goes through a patch, finds the new OIDs > and changes them back to a value specified by the committer(s). > Would this be workable? That utility sounds AI-complete to me ... regards, tom lane
Gevik Babakhani <pgdev@xs4all.nl> writes: > 1. When using new OIDs always start from a fixed number. For example > 10000. This way the new OIDs are easy to recognize and the developer can > continue the work. Reserving a range of OIDs for experimentation seems like a good idea since it means nobody's development tree would ever be broken by a cvs update. At least not because their OIDs were pulled out from under them. But I had another thought. It seems like a lot of the catalog include files don't really need to be defined so laboriously. Just because types and operators are in the core doesn't mean they need to be boostrapped using fixed OIDs and C code. Those types, functions and operators that aren't used by system tables could be created by a simple SQL script instead. It's a hell of a lot easier to write a CREATE OPERATOR CLASS call than to get all the OIDs in in four different include files to line up properly. This would make it a whole lot more practical to define all those smallfoo data types we were discussing too with all the cross-data-type comparison operators as well. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Mon, 2006-09-18 at 14:36 -0400, Tom Lane wrote: > Gevik Babakhani <pgdev@xs4all.nl> writes: > > 3. Make a small utility that goes through a patch, finds the new OIDs > > and changes them back to a value specified by the committer(s). > > > Would this be workable? > > That utility sounds AI-complete to me ... Maybe I am missing something here but if the developer used the OIDs sequentially it won't be that difficult to find and replace the used range within the patch.
Gregory Stark <stark@enterprisedb.com> writes: > Those types, functions and operators that aren't used by system tables could > be created by a simple SQL script instead. Only if you don't need to know their OIDs anywhere in the C code. I'm not certain offhand how many of the non-core objects are so referenced. We have in any case got a problem with changing OIDs for any built-in types, because there are a number of clients out there with hard-wired knowledge of what the type OIDs returned in query results mean. T'would be nice to simplify the operator class setup though ... regards, tom lane
On Mon, 2006-09-18 at 14:10 -0400, Andrew Dunstan wrote: > My idea was to have a file called reserved_oids.h which would contain > lines like: > > #error "do not include this file anywhere" > CATALOG(reserved_for_foo_module,9876) /* 2006-09-18 */ > > and which would be examined by the unused_oids script. Or you needn't even use a header file -- presumably it wouldn't actually be included by any C source files, so you could just keep the reserved OIDs as a newline-separated text file. We'd probably want to design the file format to avoid unnecessary patch conflicts when reserved OID ranges are removed. I wonder if the additional committer burden of maintaining this file is worth it -- personally this problem rarely occurs in practice for me, and when it does, it is easily resolved via the existing scripts for finding duplicate and unused OIDs. -Neil
Whoops, I hadn't read this far when I sent the last message on the other thread. > Gevik Babakhani wrote: >> 3. Make a small utility that goes through a patch, finds the new OIDs >> and changes them back to a value specified by the committer(s). This is effectively what I ended up suggesting in the aforementioned post. There's no reason that the OIDs would have to start at 10000, though, and that would probably necessitate a change in what I understand is the policy of starting a db cluster at OID 10000. OTOH, while waiting for patch acceptance, there's probably benefit in using OIDs well above the current first free OID: you don't have to repeatedly run the script. If you started at 9000, say, you'd only have to run the update script when you were about to submit the patch. If you're right on the line, you'd have to run it every week or whatever. >> Would this be workable? Well, *I* think so, with the suggestions made above. Andrew Dunstan wrote: > My idea was to have a file called reserved_oids.h which would contain > lines like: > > #error "do not include this file anywhere" > CATALOG(reserved_for_foo_module,9876) /* 2006-09-18 */ > > and which would be examined by the unused_oids script. > > To get oids reserved, a committer would add lines to that file for you. > When your patch was done, it would include stuff to remove those lines > from the file. > > That way you will be working with the oids your patch will eventually > use - no later fixup necessary. This sounds like a recipe for gaps in the OID list to me. If the patch gets rejected / dropped, you've got a gap. If the user doesn't use all the allocated OIDs, you've got a gap. What happens if the patch author decides that they need more OIDs? More time from a committer to reserve some. Plus it has the downside that the author has to ask for some to be reserved up front, they need to have a very good idea of how many they'll need, and a committer has to agree with them. If you'd asked me how many I thought I'd need when I started the enum patch, I would have told you a number much smaller than the number that I ended up using. :) Of course, if you don't care about gaps, then half of the objections above go away, and the other half can be solved by reserving more than you'd need. Given the fairly contiguous nature of OIDs in the catalogs currently, it would appear that we do care about gaps, though. I like the script idea much better. It wouldn't be bad to even allow patches to be submitted with OIDs in the high 9000 or whatever range. The committer responsible for committing the patch could just run the update script before comitting the code. Cheers Tom
On Mon, Sep 18, 2006 at 07:46:41PM +0100, Gregory Stark wrote: > > Gevik Babakhani <pgdev@xs4all.nl> writes: > > > 1. When using new OIDs always start from a fixed number. For example > > 10000. This way the new OIDs are easy to recognize and the developer can > > continue the work. > > Reserving a range of OIDs for experimentation seems like a good idea since it > means nobody's development tree would ever be broken by a cvs update. At least > not because their OIDs were pulled out from under them. > > But I had another thought. It seems like a lot of the catalog include files > don't really need to be defined so laboriously. Just because types and > operators are in the core doesn't mean they need to be boostrapped using fixed > OIDs and C code. > > Those types, functions and operators that aren't used by system tables could > be created by a simple SQL script instead. It's a hell of a lot easier to > write a CREATE OPERATOR CLASS call than to get all the OIDs in in four > different include files to line up properly. If there's 4 different files involved ISTM it'd be best to have a script that generates those 4 files from a master list. This is something I should be able to create if there's interest. Though I do agree that moving things to SQL where possible probably makes the most sense... -- Jim Nasby jimn@enterprisedb.com EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Gregory Stark wrote: > Those types, functions and operators that aren't used by system tables could > be created by a simple SQL script instead. It's a hell of a lot easier to > write a CREATE OPERATOR CLASS call than to get all the OIDs in in four > different include files to line up properly. No kidding. Just FYI that wouldn't have worked for the enums patch, though, because of the pseudo anyenum type. That stuff really did need to be in the backend. For more common user defined types like uuid that are being discussed, it might work well. Heck, a bunch of the existing casts etc could probably be changed to SQL, and would become a great deal more readable in the process. Not that I'm advocating fixing a non-broken thing... :) Cheers Tom
Tom Dunstan <pgsql@tomd.cc> writes: > [ some good arguments snipped ] > I like the script idea much better. It wouldn't be bad to even allow > patches to be submitted with OIDs in the high 9000 or whatever range. > The committer responsible for committing the patch could just run the > update script before comitting the code. The scary thing about a script is the assumption that it will make all and only the changes needed. Four-digit magic numbers are not that uncommon in C code. I think it might be safer if we made the arbitrary OID range for an uncommitted patch be large, say eight digits (maybe "12345xxx"). The script would knock these down to 4-digit numbers, ie removing one tab stop, so it wouldn't be too hard on your formatting. Given the current OID generation mechanism, the presence of large OIDs in the initial catalogs shouldn't be a problem for testing patches, even assuming that the OID counter gets that high in your test database. regards, tom lane
Tom Lane wrote: > The scary thing about a script is the assumption that it will make all > and only the changes needed. Four-digit magic numbers are not that > uncommon in C code. I think it might be safer if we made the arbitrary > OID range for an uncommitted patch be large, say eight digits (maybe > "12345xxx"). The script would knock these down to 4-digit numbers, > ie removing one tab stop, so it wouldn't be too hard on your formatting. > Given the current OID generation mechanism, the presence of large OIDs > in the initial catalogs shouldn't be a problem for testing patches, > even assuming that the OID counter gets that high in your test database. Well, the only files that a script would touch would be in src/include/catalog, since any other reference to them should be through a #define anyway IMO. And I figured that the 9000 range was as good a magic number as any, at least in that directory. The nice thing about 9000 numbers is that they're still under the 10000 magic starting point for initdb, so you're guaranteed not to run into that range when inserting data while testing your patch. I'm a little shady on how OID uniqueness works though, so maybe it's not a problem. Are OIDs guaranteed to be unique across a whole cluster, or just in a given relation (including wraparound scenarios)? The other point about the script scariness is that obviously you'd have to a) pass make check including whatever tests test out your patch (and there should be some if you've added a new proc or type or whatever), and b) the patch would have to survive review, where OID weirdness should leap out when the patch is viewed stripped of all the surrounding catalog noise. Maybe. :) Anyway if having OIDs up above 10000 isn't a problem, then it doesn't really matter either way, so having them stand out by being longer seems just as good to me as my suggestion. Thanks Tom
Whoops, I hadn't read this far when I sent the last message on the other thread. > Gevik Babakhani wrote: >> 3. Make a small utility that goes through a patch, finds the new OIDs >> and changes them back to a value specified by the committer(s). This is effectively what I ended up suggesting in the aforementioned post. There's no reason that the OIDs would have to start at 10000, though, and that would probably necessitate a change in what I understand is the policy of starting a db cluster at OID 10000. OTOH, while waiting for patch acceptance, there's probably benefit in using OIDs well above the current first free OID: you don't have to repeatedly run the script. If you started at 9000, say, you'd only have to run the update script when you were about to submit the patch. If you're right on the line, you'd have to run it every week or whatever. >> Would this be workable? Well, *I* think so, with the suggestions made above. Andrew Dunstan wrote: > My idea was to have a file called reserved_oids.h which would contain > lines like: > > #error "do not include this file anywhere" > CATALOG(reserved_for_foo_module,9876) /* 2006-09-18 */ > > and which would be examined by the unused_oids script. > > To get oids reserved, a committer would add lines to that file for you. > When your patch was done, it would include stuff to remove those lines > from the file. > > That way you will be working with the oids your patch will eventually > use - no later fixup necessary. This sounds like a recipe for gaps in the OID list to me. If the patch gets rejected / dropped, you've got a gap. If the user doesn't use all the allocated OIDs, you've got a gap. What happens if the patch author decides that they need more OIDs? More time from a committer to reserve some. Plus it has the downside that the author has to ask for some to be reserved up front, they need to have a very good idea of how many they'll need, and a committer has to agree with them. If you'd asked me how many I thought I'd need when I started the enum patch, I would have told you a number much smaller than the number that I ended up using. :) Of course, if you don't care about gaps, then half of the objections above go away, and the other half can be solved by reserving more than you'd need. Given the fairly contiguous nature of OIDs in the catalogs currently, it would appear that we do care about gaps, though. I like the script idea much better. It wouldn't be bad to even allow patches to be submitted with OIDs in the high 9000 or whatever range. The committer responsible for committing the patch could just run the update script before comitting the code. Cheers Tom