Re: [v9.2] make_greater_string() does not return a string in some cases
От | Tom Lane |
---|---|
Тема | Re: [v9.2] make_greater_string() does not return a string in some cases |
Дата | |
Msg-id | 21348.1316706403@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [v9.2] make_greater_string() does not return a string in some cases (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: [v9.2] make_greater_string() does not return a string
in some cases
|
Список | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > One thing I was thinking about is that it would be useful to have some > metric for judging how well any given algorithm that we might pick > here actually works. Well, the metric that we were indirectly using earlier was the number of characters in a given locale for which the algorithm fails to find a greater one (excluding whichever character is "last", I guess, or you could just recognize there's always at least one). > For example, if we were to try all possible > three character strings in some encoding and run make_greater_string() > on each one of them, we could then measure the failure percentage. Or > if that's too many cases to crank through then we could limit it some > way - Even in UTF8 there's only a couple million assigned code points, so for test purposes anyway it doesn't seem like we couldn't crank through them all. Also, in many cases you could probably figure it out by analysis instead of brute-force testing every case. A more reasonable objection might be that a whole lot of those code points are things nobody cares about, and so we need to weight the results somehow by the actual popularity of the character. Not sure how to take that into account. Another issue here is that we need to consider not just whether we find a greater character, but "how much greater" it is. This would apply to my suggestion of incrementing the top byte without considering lower-order bytes --- we'd be skipping quite a lot of code space for each increment, and it's conceivable that that would be quite hurtful in some cases. Not sure how to account for that either. An extreme example here is an "incrementer" that just immediately returns the last character in the sort order for any lesser input. regards, tom lane
В списке pgsql-hackers по дате отправления: