Re: working around JSONB's lack of stats?
От | Jim Nasby |
---|---|
Тема | Re: working around JSONB's lack of stats? |
Дата | |
Msg-id | 54CC3130.5030604@BlueTreble.com обсуждение исходный текст |
Ответ на | Re: working around JSONB's lack of stats? (Josh Berkus <josh@agliodbs.com>) |
Список | pgsql-performance |
On 1/30/15 2:26 PM, Josh Berkus wrote: > On 01/28/2015 03:50 PM, Peter Geoghegan wrote: >> On Wed, Jan 28, 2015 at 3:42 PM, Josh Berkus <josh@agliodbs.com> wrote: >>> jsonb_col @> '[ "key1" ]' >>> or jsonb_col ? 'key1' >>> if in MCE, assign % from MCE >>> otherwise assign 1% of non-MCE % >>> >>> jsonb_col @> '{ "key1": "value1" }' >>> if in MCE, assign MCE% * 0.1 >>> otherwise assign 0.01 of non-MCE % >>> >>> Does that make sense? >> >> I suspect it makes a lot less sense. The way people seem to want to >> use jsonb is as a document store with a bit of flexibility. Individual >> JSON documents tend to be fairly homogeneous in structure within a >> table, just like with systems like MongoDB. Strings within arrays are >> keys for our purposes, and these are often used for tags and so on. >> But Strings that are the key of an object/pair are much less useful to >> index, in my estimation. > > Yeah, I see your point; except for arrays, people are usually searching > for a key:value pair, and the existence of the key is not in doubt. > > That would make the "element" the key:value pair, no? But > realistically, we would only want to do that for simple keys and values. > > Although: if you "flatten" a nested JSON structure into just keys with > scalar values (and array items as their own thing), then you could have > a series of expanded key:value pairs to put into MCE. > > For example: > > { house : { city : San Francisco, > sqft: 1200, > color: blue, > occupants: [ mom, dad, child1 ] > } > occupation: programmer > } > > ... would get flattened out into the following pairs: > > city: san francisco > sqft: 1200 > color: blue > occupants: [ mom ] > occupants: [ dad ] > occupants: [ child1 ] > occupation: programmer > > This would probably work because there aren't a lot of data structures > where people would have the same key:value pair in different locations > in the JSON, and care about it stats-wise. Alternatetly, if the same > key-value pair appears multiple times in the same sample row, we could > cut the MC% by that multiple. Even if there were multiple occurrences, this would probably still be an improvement. Another idea... at one time in the past when discussing statistics on multiple columns, one idea was to build statistics on indexes. If we built that, we could also do the same thing for at least JSONB (not sure about JSON). Obviously doesn't help for stuff you haven't indexed, but presumably if you care about performance and have any significant size of data you've also indexed parts of the JSON, yes? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
В списке pgsql-performance по дате отправления: