Re: Reducing tuple overhead
От | Peter Geoghegan |
---|---|
Тема | Re: Reducing tuple overhead |
Дата | |
Msg-id | CAM3SWZS0GyUaiFx97oYJuirmcW1MsojmEAtoEF7WCgxdppNOXg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Reducing tuple overhead (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On Thu, Apr 30, 2015 at 6:54 AM, Robert Haas <robertmhaas@gmail.com> wrote: > The other, related problem is that the ordering operator might start > to return different results than it did at index creation time. For > example, consider a btree index built on a text column. Now consider > 'yum update'. glibc gets updated, collation ordering of various > strings change, and now you've got tuples that are in the "wrong > place" in the index, because when the index was built, we thought A < > B, but now we think B < A. You would think the glibc maintainers > might avoid such changes in minor releases, or that the Red Hat guys > would avoid packaging and shipping those changes in minor releases, > but you'd be wrong. I would not think that. Unicode Technical Standard #10 states: """ Collation order is not fixed. Over time, collation order will vary: there may be fixes needed as more information becomes available about languages; there may be new government or industry standards for the language that require changes; and finally, new characters added to the Unicode Standard will interleave with the previously-defined ones. This means that collations must be carefully versioned. """ Also, in the paper "Modern B-Tree Techniques", by Goetz Graefe, page 238, it states: """ In many operating systems, appropriate functions are provided to compute a normalized key from a localized string value, date value, or time value. This functionality is used, for example, to list files in a directory as appropriate for the local language. Adding normalization for numeric data types is relatively straightforward, as is concatenation of multiple normalized values. Database code must not rely on such operating system code, however. The problem with relying on operating systems support for database indexes is the update frequency. An operating system might update its normalization code due to an error or extension in the code or in the definition of a local sort order; it is unacceptable, however, if such an update silently renders existing large database indexes incorrect. """ Unfortunately, it is simply not the case that we can rely on OS collations being immutable. We have *no* contract with any C standard library concerning collation stability whatsoever. I'm surprised that we don't see hear more about this kind of corruption. -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: