Обсуждение: Domains and subtypes, a brief proposal
As many of you know I've been contemplating the implementation of Domains and subtypes. DISCLAIMER: This is a proposal only. The actual work needs to be picked up by someone in a better place to work on the code than I am. For various reasons, I can only be an active reference and tester on this projects. DISCLAIMER: This is a proposal, not a detailed spec. A more detailed spec can be written with the help of a developer. Domains and subtypes. * Create new child type from values in parent type. * Maintain only checks for constraints * Create implicit casts fromchild to parent With this model all of the type checking that exists should work and domain checking of types in any place except for constraint checking can be removed. * Add proper SQL standard CREATE childtype UNDER parenttype This should include all of the ordinary create type options. Constraints on types: * Change the pg_types to hold a NULLABLE constraint text column OR add a type constraint lookup table (pg_domains?) This is a big deal, I know. Constraints for domains would be in that field. All domain checking should be done from that source. This would theoretically enable type constraint checking for any type if we chose to add this feature to create type. This would be appropriate for base types only, including UDTs. There may be an argument for complex types, though, for example certain types of integer-ish arrays may only contain unique values. Usually types have the constraint definition encoded in their input routines. This gives the type developers an easier option to validate their types since constraints can be written in pl languages. Problems and Issues: There are a few known issues with the current implementation of domains with overriding operators. * Creating the table with an domain PRIMARY KEY did not use the subtype comparison function. It was necessaryto create a unique index which explicitly used the domain operator class. This should be fixed by having thedomain as a proper type. No domain checking should be necessary. * ORDER BY requires USING op clause. This may befixed by having the domain as a proper type. No domain checking should be necessary. * LIKE requires explicit castingof the second argument to text. (I need to double check this. It may work OK on cvs head.) * COPY has a problemwhich still need investigating. Why do we want this. * We need subtypes * Domains are *almost* subtypes because you can override the operators as you can with any type. *Use the types as designed The current code jumps through hoops to check for domain types. Ideally this change should remove a lot of that code in favor of checking just for the existence of a constraint and then only where constraint checking is needed. By using this technique Illustra easily added subtypes, but did not add the domains with their additional complexities. No special type checking for simple subtypes was necessary. I may have missed some stuff here. Obviously. For example how to divide and conquer the various aspects of the issues raised here. But this is a high, high level proposal at this time. Comments, volunteers are welcome. --elein -------------------------------------------------------------- elein@varlena.com Varlena, LLC www.varlena.com (510)655-2584(o) (510)543-6079(c) PostgreSQL Consulting, Support & Training --------------------------------------------------------------
Elein, > I may have missed some stuff here. Obviously. For example how to divide > and conquer the various aspects of the issues raised here. But this is a > high, high level proposal at this time. I'm not quite clear on what in your proposal is different from current Domain behavior. Or are you just looking to remove the limitations on where Domains can be used? -- Josh Berkus PostgreSQL @ Sun San Francisco
On Thu, Sep 07, 2006 at 07:12:17PM -0700, Josh Berkus wrote: > Elein, > > > I may have missed some stuff here. Obviously. For example how to divide > > and conquer the various aspects of the issues raised here. But this is a > > high, high level proposal at this time. > > I'm not quite clear on what in your proposal is different from current Domain > behavior. Or are you just looking to remove the limitations on where Domains > can be used? > > -- > Josh Berkus > PostgreSQL @ Sun > San Francisco > I'm looking to make domains proper types and eliminate the need for special domain checking in the case where it is not a check constraint check. I'm trying to influence a more logical implementation of domains that removes some code and gains us features. Also it should eliminate some (all?) of the current limitations with regards to using domains as subtypes. It will also pave the way for implementation of create type under type. --elein elein@varlena.com
elein <elein@varlena.com> writes: > Domains and subtypes. > * Create new child type from values in parent type. > * Maintain only checks for constraints > * Create implicit casts from child to parent This seems a bit content-free, because it's not clear how it differs from what we do now. We already have implicit child-to-parent casts. > Constraints on types: > * Change the pg_types to hold a NULLABLE constraint text column > OR add a type constraint lookup table (pg_domains?) I understand that you are arguing to allow constraints to be associated with any type not only domains, but (a) I don't see why we should want to add that overhead, and (b) I don't see what that has to do with the problem you actually need to solve, specifically limiting the application of implicit domain-to-base-type casts. regards, tom lane
On Fri, Sep 08, 2006 at 02:33:13PM -0400, Tom Lane wrote: > elein <elein@varlena.com> writes: > > Domains and subtypes. > > > * Create new child type from values in parent type. > > * Maintain only checks for constraints > > * Create implicit casts from child to parent > > This seems a bit content-free, because it's not clear how it differs > from what we do now. We already have implicit child-to-parent casts. I guess the key point here was to treat the domains as proper udt types except where constraint checking is required. And yes, this is already done, but it needed to be included for context. > > > Constraints on types: > > > * Change the pg_types to hold a NULLABLE constraint text column > > OR add a type constraint lookup table (pg_domains?) > > I understand that you are arguing to allow constraints to be associated > with any type not only domains, but > (a) I don't see why we should want to add that overhead, and > (b) I don't see what that has to do with the problem you actually > need to solve, specifically limiting the application of implicit > domain-to-base-type casts. This is a new feature idea, derived from the implementation of domains. Usually people store type checking in the input functions, but this is a nice addition to UDTs that require a constraint checking model. It allows the constraints to be in plperl which is nice for parsing complex object stored at strings. (My example was email and the constraint was a plperl function that validated the format and legitimacy of the value.) Other complex objects (stored as strings) such as key value lists and your ordinary weirdly constructed values can use the (more expensive) constraint at constraint time only instead of the input function which should remain fast and may be a borrowed or inherited input function. This is not a drop dead required feature but it should flow from the cleaner implementation of domains. Changing the check from domain type to constraint exists on any type should be cleaner. Changing the SQL for CREATE TYPE should be the added work to get this feature available. It just seems simpler and cleaner. We want to treat all types the same and maintain a type blind database server. a) if subtypes/domains can have constraints then the model should not be different for domains only but for all types. Constraint checking would only occur at check constraint points--and there for any type. You already check for the existanceof a domain. Change that test to the existence of a constraint only and eliminate domain specific code. b) It is not part of the problem but a logical stretch given the changes required. It will also reduce the domain checking. Other than my existing tests (published previously) I do not have a good idea code wise the extent of the changes. This discussion may help us get to that point. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > elein elein@varlena.com
elein <elein@varlena.com> writes: > a) if subtypes/domains can have constraints then the model should > not be different for domains only but for all types. Constraint > checking would only > occur at check constraint points--and there for any type. You > already check for the existance of a domain. Change that test > to the existence of a constraint only and eliminate domain specific > code. Au contraire, the test whether a constraint actually exists occurs at runtime, not at the time we check for domain-ness. Your proposal would force such checks to be introduced into every single expression evaluation. It's not feasible at all without plan invalidation, and even with that I foresee fairly enormous added overhead. Our experience with domains so far is that looking up those constraints is *expensive*. regards, tom lane
On Fri, Sep 08, 2006 at 03:47:23PM -0400, Tom Lane wrote: > elein <elein@varlena.com> writes: > > a) if subtypes/domains can have constraints then the model should > > not be different for domains only but for all types. Constraint > > checking would only > > occur at check constraint points--and there for any type. You > > already check for the existance of a domain. Change that test > > to the existence of a constraint only and eliminate domain specific > > code. > > Au contraire, the test whether a constraint actually exists occurs at > runtime, not at the time we check for domain-ness. Your proposal would > force such checks to be introduced into every single expression > evaluation. It's not feasible at all without plan invalidation, and > even with that I foresee fairly enormous added overhead. Our experience > with domains so far is that looking up those constraints is *expensive*. For domain checking isn't expression evaluation required anyway? email := email_value || email_value should fail on a constraint check for result value at assignment time. I think what you are saying is that the domain checking (proposed constraint existence checking) would need to be done in more places and I'm not sure I understand this. I believe constraints checking should done less often than input types. And checking for constraint <> NULL should be equivalent to the current check *typtype != 'd'. I could be wrong base on the current implementation. There may be more to it, but I suspect making sure the constraint value is available when you fetch a type would be necessary. Turn the thing around a bit. The contraint is an attribute on anytype. All type code, only where appropriate, should check for existence of the constraint attribute. This is different from saying domains as special types and need special casing in places (other than constraint checking). I'm trying to remove the specialness from domains so that the type code can pretty well work as is in all places execpt checking for the constraint attribute of a type. This should solve some of the existing domain problems. We're pretty close to this as is, but there is still a lot of special casing going on. The ability to add constraints to any type should only be considered as a logical extension made easier by the change in what you test when you test for constraints. elein elein@varlena.com
elein <elein@varlena.com> writes: > I think what you are saying is that the domain checking (proposed constraint > existence checking) would need to be done in more places and I'm not sure I > understand this. What I'm complaining about is the need to search the catalogs to see if a datatype has constraints. At the moment we need to do that only for operations yielding domain types. Your proposal appears to require that it be done for *every* operation on *every* datatype, right down to int4 and bool (which at the very least creates some interesting circularity issues). I'm not willing to accept that much overhead on the strength of what is frankly a pretty weak case. If you want a constraint, what's wrong with putting a domain on your base type to enforce it? > And checking for constraint <> NULL should be equivalent to the > current check *typtype != 'd'. Not without an amazingly complicated substructure to the "constraint" column (multiple entries, names as well as expressions, etc). At the very least that's a violation of relational theory, and I'm not sure how we're going to handle dependencies of the constraint expressions at all if they aren't separate catalog entries. regards, tom lane
On Fri, Sep 08, 2006 at 05:20:18PM -0400, Tom Lane wrote: > elein <elein@varlena.com> writes: > > I think what you are saying is that the domain checking (proposed constraint > > existence checking) would need to be done in more places and I'm not sure I > > understand this. > > What I'm complaining about is the need to search the catalogs to see if > a datatype has constraints. At the moment we need to do that only > for operations yielding domain types. Your proposal appears to require > that it be done for *every* operation on *every* datatype, right down > to int4 and bool (which at the very least creates some interesting > circularity issues). I'm not willing to accept that much overhead on > the strength of what is frankly a pretty weak case. If you want a > constraint, what's wrong with putting a domain on your base type to > enforce it? > > > And checking for constraint <> NULL should be equivalent to the > > current check *typtype != 'd'. > > Not without an amazingly complicated substructure to the "constraint" > column (multiple entries, names as well as expressions, etc). At the > very least that's a violation of relational theory, and I'm not sure how > we're going to handle dependencies of the constraint expressions at all > if they aren't separate catalog entries. > > regards, tom lane I'm seeing the constraint column as an attribute of the type. It would at least flag existence of a constraint on a type. This is necessary for domains as they work now (but we check typtype). This would also be a catalog change, i.e. non-trivial. It would link types to constraints only and replace the 'd' value of typtype. Perhaps my ignorance is showing and I'm missing something. How you have the ability to select the constraint from the domain name now is fuzzy to me. But I'm trying to move "domains" from a type of type to just an attribute of type with the goal of simplifying the behaviour. The single value result of an expression, on assignment is the only candidate for constraint checking. Other expression evaluation would behave as is until assignment to a final result. Theoretically, you would not have to add constraints other than those defined now. And this should already be working though the implementation would change slightly with my proposal. elein elein@varlena.com