Re: pg_dump / copy bugs with "big lines" ?
От | Alvaro Herrera |
---|---|
Тема | Re: pg_dump / copy bugs with "big lines" ? |
Дата | |
Msg-id | 20160229183023.GA286012@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: pg_dump / copy bugs with "big lines" ? (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: pg_dump / copy bugs with "big lines" ?
|
Список | pgsql-hackers |
A customer of ours was hit by this recently and I'd like to get it fixed for good. Robert Haas wrote: > On Mon, Apr 6, 2015 at 1:51 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > > In any case, I don't think it would be terribly difficult to allow a bit > > more than 1GB in a StringInfo. Might need to tweak palloc too; ISTR there's > > some 1GB limits there too. > > The point is, those limits are there on purpose. Changing things > arbitrarily wouldn't be hard, but doing it in a principled way is > likely to require some thought. For example, in the COPY OUT case, > presumably what's happening is that we palloc a chunk for each > individual datum, and then palloc a buffer for the whole row. Right. There's a serious problem here already, and users are being bitten by it in existing releases. I think we should provide a non-invasive fix for back-branches which we could apply as a bug fix. Here's a proposed patch for the localized fix; it just adds a flag to StringInfo allowing the string to grow beyond the 1GB limit, but only for strings which are specifically marked. That way we avoid having to audit all the StringInfo-using code. In this patch, only the COPY path is allowed to grow beyond 1GB, which is enough to close the current problem -- or at least my test case for it. If others can try this patch to ensure it enables pg_dump to work on their databases, it would be great. (In this patch there's a known buglet which is that the UINT64_FORMAT patched in the error message doesn't allow for translatability.) Like Robert, I don't like this approach for the long term, however. I think it would be saner to have CopySendData not keep a single gigantic string in memory; it would be better to get the bytes out to the client sooner than end-of-row. To avoid causing a performance hit, we would only flush when the size of the output buffer is about to reach the allocation limit (MaxAllocSize); so for regular-sized tuples, we would only do it at end-of-row, keeping the current behavior. I don't have a patch for this yet; I'm going to try that next. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: