24x7x365 high-volume ops ideas
От | Ed L. |
---|---|
Тема | 24x7x365 high-volume ops ideas |
Дата | |
Msg-id | 200411031810.18093.pgsql@bluepolka.net обсуждение исходный текст |
Ответы |
Re: 24x7x365 high-volume ops ideas
|
Список | pgsql-general |
I have a few high-volume, fairly large clusters that I'm struggling to keep up 24x7x365. I want to ask for advice from anyone with similar experience or hard-won wisdom. Generally these are clusters with 100-200 queries/second, maybe 10GB-30GB of data (always increasing), and maybe 10% writes. A little regular routine downtime for maintenance would do wonders for these systems, but unfortunately, the requirement is 100% uptime all the time, and any downtime at all is a liability. Here are some of the issues: 1) Big tables. When the tables grow large enough, it takes too long to vacuum them. In some cases there's just too much data. In other cases, it's dead space, but both reindex and vacuum full block production queries (a lesser version of downtime). In the past, we have taken a PR hit for downtime to dump/reload (we've found it to be faster than vacuum full). Async replication helps with cluster moves from one server to another, but still don't have a low-to-zero downtime solution for regular maint. 2) Big tables, part 2. Of course, customers want all data that ever existed online and quickly available via sub-second queries. I assume at some point this data is going to be too much for one table (how much is too much?). This is a little vague, I know, but what sorts of segmenting strategies to folks employ to deal with data that cannot be retired but gets too expensive to vacuum, etc. 3) Simple restarts for configuration changes (shared_buffers, max_connections, etc). When we have to break client connections, we have to notify the customer and take a PR hit. Maybe pgpool is a possible solution? Are these issues for Oracle, DB2, etc as well? TIA. Ed
В списке pgsql-general по дате отправления: