Re: OT: Data structure design question: How do they count
От | Richard Huxton |
---|---|
Тема | Re: OT: Data structure design question: How do they count |
Дата | |
Msg-id | 443A2424.4020702@archonet.com обсуждение исходный текст |
Ответ на | OT: Data structure design question: How do they count so fast? (Brendan Duddridge <brendan@clickspace.com>) |
Ответы |
Re: OT: Data structure design question: How do they count so fast?
|
Список | pgsql-performance |
Brendan Duddridge wrote: > > Now, initially I thought they would just pre-compute these counts, but > the problem is, when you click on any of the above attribute values, > they reduce the remaining possible set of matching products (and set of > possible remaining attributes and attribute values) by the amount > displayed next to the attribute value selected. You can click on any > combination of attribute values to filter down the remaining set of > matching products, so there's a large combination of paths you can take > to arrive at a set of products you might be interested in. > > Do you think they are pre-computed? Or do you think they might use a > query similar to the following?: Pre-computed almost certainly, but at what level of granularity? And with application-level caching? > select pav.attribute_value_id, count(p.product_id) > from product_attribute_value pav, > attribute a, > product p > where a.attribute_id in (some set of attribute ids) and > pav.product_id = p.product_id and > pav.attribute_id = a.attribute_id and p.product_id in > (select product_id > from category_product > where category_id = some category id) and > p.is_active = 'true' > group by pav.attribute_value_id; > > It would seem to me that although the above query suggests a normalized > database structure, that joining with 3 tables plus a 4th table in the > sub-query with an IN qualifier and grouping to get the product counts > would take a VERY long time, especially on a possible result set of > 1,260,658 products. Hmm - I'm not sure I'd say this was necessarily normalised. In the example you gave there were three definite types of attribute: 1. Price range (< 20, 20-50, ...) 2. Product type (lighting, rugs, ...) 3. Store (art.com, homeannex, ...) Your example discards this type information. I'm also not sure it lets store A sell widgets for 19.99 and B for 25.99 So - let's look at how we might break this down into simple relations: product_types (product_id, prod_type, prod_subtype) product_availability (product_id, store_id, price_range) and so on for each set of parameters. Then, if PG isn't calculating fast enough I'd be tempted to throw in a summary table: product_counts(store_id, price_range, prod_type, prod_subtype, ..., num_products) Then total over this for the top-level queries. I'd also cache common top-level queries at the applicaton level anyway. -- Richard Huxton Archonet Ltd
В списке pgsql-performance по дате отправления: