Re: OT: Data structure design question: How do they count

Поиск

Список

Период

Сортировка

От	Richard Huxton
Тема	Re: OT: Data structure design question: How do they count
Дата	10 апреля 2006 г. 06:24:06
Msg-id	443A2424.4020702@archonet.com обсуждение исходный текст
Ответ на	OT: Data structure design question: How do they count so fast? (Brendan Duddridge <brendan@clickspace.com>)
Ответы	Re: OT: Data structure design question: How do they count so fast?
Список	pgsql-performance

Дерево обсуждения

Brendan Duddridge wrote:
>
> Now, initially I thought they would just pre-compute these counts, but
> the problem is, when you click on any of the above attribute values,
> they reduce the remaining possible set of matching products (and set of
> possible remaining attributes and attribute values) by the amount
> displayed next to the attribute value selected. You can click on any
> combination of attribute values to filter down the remaining set of
> matching products, so there's a large combination of paths you can take
> to arrive at a set of products you might be interested in.
>
> Do you think they are pre-computed? Or do you think they might use a
> query similar to the following?:

Pre-computed almost certainly, but at what level of granularity? And
with application-level caching?

> select pav.attribute_value_id, count(p.product_id)
> from product_attribute_value pav,
>      attribute a,
>      product p
> where a.attribute_id in (some set of attribute ids) and
> pav.product_id = p.product_id and
> pav.attribute_id = a.attribute_id and p.product_id in
>     (select product_id
>      from category_product
>      where category_id = some category id) and
> p.is_active = 'true'
> group by pav.attribute_value_id;
>
> It would seem to me that although the above query suggests a normalized
> database structure, that joining with 3 tables plus a 4th table in the
> sub-query with an IN qualifier and grouping to get the product counts
> would take a VERY long time, especially on a possible result set of
> 1,260,658 products.

Hmm - I'm not sure I'd say this was necessarily normalised. In the
example you gave there were three definite types of attribute:
  1. Price range (< 20, 20-50, ...)
  2. Product type (lighting, rugs, ...)
  3. Store (art.com, homeannex, ...)
Your example discards this type information.

I'm also not sure it lets store A sell widgets for 19.99 and B for 25.99

So - let's look at how we might break this down into simple relations:
  product_types (product_id, prod_type, prod_subtype)
  product_availability (product_id, store_id, price_range)
and so on for each set of parameters.

Then, if PG isn't calculating fast enough I'd be tempted to throw in a
summary table:
  product_counts(store_id, price_range, prod_type, prod_subtype, ...,
num_products)
Then total over this for the top-level queries.

I'd also cache common top-level queries at the applicaton level anyway.

--
   Richard Huxton
   Archonet Ltd

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: OT: Data structure design question: How do they count