Обсуждение: [PATCH] Log CSV by default

Поиск
Список
Период
Сортировка

[PATCH] Log CSV by default

От
David Fetter
Дата:
This makes it much simpler for computers to use the logs while not
making it excessively difficult for humans to use them.

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 03594e77fe..21a0f4f86c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1544,7 +1544,7 @@ static struct config_bool ConfigureNamesBool[] =
             NULL
         },
         &Logging_collector,
-        false,
+        true,
         NULL, NULL, NULL
     },
     {
@@ -3727,13 +3727,13 @@ static struct config_string ConfigureNamesString[] =
     {
         {"log_destination", PGC_SIGHUP, LOGGING_WHERE,
             gettext_noop("Sets the destination for server log output."),
-            gettext_noop("Valid values are combinations of \"stderr\", "
-                         "\"syslog\", \"csvlog\", and \"eventlog\", "
+            gettext_noop("Valid values are combinations of \"csvlog\", "
+                         "\"stderr\", \"syslog\", and \"eventlog\", "
                          "depending on the platform."),
             GUC_LIST_INPUT
         },
         &Log_destination_string,
-        "stderr",
+        "csvlog",
         check_log_destination, assign_log_destination, NULL
     },
     {
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 1fa02d2c93..c68ac6868c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -411,13 +411,13 @@
 
 # - Where to Log -
 
-#log_destination = 'stderr'        # Valid values are combinations of
-                    # stderr, csvlog, syslog, and eventlog,
-                    # depending on platform.  csvlog
-                    # requires logging_collector to be on.
+#log_destination = 'csvlog'        # Valid values are combinations of
+                    # csvlog, stderr, syslog, and eventlog,
+                    # depending on platform.  Options other than csvlog
+                    # do not require that logging_collector be on.
 
 # This is used when logging to stderr:
-#logging_collector = off        # Enable capturing of stderr and csvlog
+#logging_collector = on        # Enable capturing of stderr and csvlog
                     # into log files. Required to be on for
                     # csvlogs.
                     # (change requires restart)

-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: [PATCH] Log CSV by default

От
Andres Freund
Дата:
Hi,

On 2018-11-30 19:53:18 +0100, David Fetter wrote:
> This makes it much simpler for computers to use the logs while not
> making it excessively difficult for humans to use them.

While perhaps not excessively so, I think it increases the difficulty
sufficiently that I'm against such a proposal.  Yes, a conversion into
something more readable can be done programatically, but that's another
step before analyzing a problem. And besides, a lot of log fragments are
embedded in email, screenshotted etc.

Secondarily, csvlog doesn't contain all the interesting information.

Thirdly, it often increases the log volume noticably.


I think having a bin/pg_logparse tool that can parse postgres' config
file and attempt to parse the log contents in whatever format they are
would be much much more useful. Obviously not every log_line_prefix can
be parsed unambiguously, but a lot of formats can, and a lot more
formats can be made unambiguous (e.g. adding escape logic to application
name logging would be very useful).

- Andres


Re: [PATCH] Log CSV by default

От
Isaac Morland
Дата:

I think having a bin/pg_logparse tool that can parse postgres' config
file and attempt to parse the log contents in whatever format they are
would be much much more useful. Obviously not every log_line_prefix can
be parsed unambiguously, but a lot of formats can, and a lot more
formats can be made unambiguous (e.g. adding escape logic to application
name logging would be very useful).

Aren't application names set by the client? If they are currently logged without escaping, couldn't a client insert arbitrary binary content such as terminal control sequences into log files, with potentially unpleasant results? Or is there some sort of filter already in place somewhere? Should I investigate?

Re: [PATCH] Log CSV by default

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-11-30 19:53:18 +0100, David Fetter wrote:
>> This makes it much simpler for computers to use the logs while not
>> making it excessively difficult for humans to use them.

> While perhaps not excessively so, I think it increases the difficulty
> sufficiently that I'm against such a proposal.

I don't like this either.  People who prefer CSV format can select it
trivially.  Those who don't are going to be annoyed at us for changing
behavior that's stood for many years.

Also, in addition to the objections you noted, there's the problem that
this change requires changing logging_collector to default to "on".
That's an *enormous* compatibility break, because of the effects on
where the log output goes.  Not to mention having performance impacts
that can be significant.

I think we should reject this out of hand.

> I think having a bin/pg_logparse tool that can parse postgres' config
> file and attempt to parse the log contents in whatever format they are
> would be much much more useful. Obviously not every log_line_prefix can
> be parsed unambiguously, but a lot of formats can, and a lot more
> formats can be made unambiguous (e.g. adding escape logic to application
> name logging would be very useful).

Yeah, it might be possible to make some progress in those directions.

            regards, tom lane


Re: [PATCH] Log CSV by default

От
Stephen Frost
Дата:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-11-30 19:53:18 +0100, David Fetter wrote:
> >> This makes it much simpler for computers to use the logs while not
> >> making it excessively difficult for humans to use them.
>
> > While perhaps not excessively so, I think it increases the difficulty
> > sufficiently that I'm against such a proposal.
>
> I don't like this either.  People who prefer CSV format can select it
> trivially.  Those who don't are going to be annoyed at us for changing
> behavior that's stood for many years.

Agreed.  Frankly, even if we made this the default, I doubt packagers
would decide to go with it because of the issues raised here.

> I think we should reject this out of hand.

Agreed.

Thanks!

Stephen

Вложения

Re: [PATCH] Log CSV by default

От
Michael Paquier
Дата:
On Fri, Nov 30, 2018 at 06:11:03PM -0500, Stephen Frost wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I think we should reject this out of hand.
>
> Agreed.

+1.
--
Michael

Вложения

Re: [PATCH] Log CSV by default

От
David Fetter
Дата:
On Fri, Nov 30, 2018 at 04:55:30PM -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-11-30 19:53:18 +0100, David Fetter wrote:
> >> This makes it much simpler for computers to use the logs while not
> >> making it excessively difficult for humans to use them.
> 
> > While perhaps not excessively so, I think it increases the difficulty
> > sufficiently that I'm against such a proposal.
> 
> I don't like this either.  People who prefer CSV format can select it
> trivially.  Those who don't are going to be annoyed at us for changing
> behavior that's stood for many years.
> 
> Also, in addition to the objections you noted, there's the problem that
> this change requires changing logging_collector to default to "on".
> That's an *enormous* compatibility break, because of the effects on
> where the log output goes.  Not to mention having performance impacts
> that can be significant.
> 
> I think we should reject this out of hand.

It's been far too long since I got one of these!

> > I think having a bin/pg_logparse tool that can parse postgres' config
> > file and attempt to parse the log contents in whatever format they are
> > would be much much more useful. Obviously not every log_line_prefix can
> > be parsed unambiguously, but a lot of formats can, and a lot more
> > formats can be made unambiguous (e.g. adding escape logic to application
> > name logging would be very useful).
> 
> Yeah, it might be possible to make some progress in those directions.

So application names need better handling, and possibly reviews for
security considerations, and pg_logparse ?

OK.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: [PATCH] Log CSV by default

От
Vik Fearing
Дата:
On 30/11/2018 21:15, Andres Freund wrote:
> Secondarily, csvlog doesn't contain all the interesting information.

It doesn't?  What is it missing?
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: [PATCH] Log CSV by default

От
Arkhena
Дата:
Hi all,
 
Also, in addition to the objections you noted, there's the problem that
this change requires changing logging_collector to default to "on".
That's an *enormous* compatibility break, because of the effects on
where the log output goes. 

As a consultant with so many customers that simply install Postgres "out of the box", I think that changing log_destination to 'csvlog' and logging_collector to 'on' needs changing at least log_truncate_on_rotation (but certainly also log_filename and log_rotation_age) if we don't want my customers to run out of space sooner or later.

So, that's a "-1" for me to change the log_destination default to 'csvlog'.

Personally, I use one of my colleague's python script that "change" the standard text output to csv (and even create the table and insert rows in it by connecting to my database), so I can use my SQL queries to perform log analyzes.
So I think the idea of a 'pg_logparse' is wonderfull.

Cheers,

Lætitia

Le sam. 1 déc. 2018 à 01:03, David Fetter <david@fetter.org> a écrit :
On Fri, Nov 30, 2018 at 04:55:30PM -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-11-30 19:53:18 +0100, David Fetter wrote:
> >> This makes it much simpler for computers to use the logs while not
> >> making it excessively difficult for humans to use them.
>
> > While perhaps not excessively so, I think it increases the difficulty
> > sufficiently that I'm against such a proposal.
>
> I don't like this either.  People who prefer CSV format can select it
> trivially.  Those who don't are going to be annoyed at us for changing
> behavior that's stood for many years.
>
> Also, in addition to the objections you noted, there's the problem that
> this change requires changing logging_collector to default to "on".
> That's an *enormous* compatibility break, because of the effects on
> where the log output goes.  Not to mention having performance impacts
> that can be significant.
>
> I think we should reject this out of hand.

It's been far too long since I got one of these!

> > I think having a bin/pg_logparse tool that can parse postgres' config
> > file and attempt to parse the log contents in whatever format they are
> > would be much much more useful. Obviously not every log_line_prefix can
> > be parsed unambiguously, but a lot of formats can, and a lot more
> > formats can be made unambiguous (e.g. adding escape logic to application
> > name logging would be very useful).
>
> Yeah, it might be possible to make some progress in those directions.

So application names need better handling, and possibly reviews for
security considerations, and pg_logparse ?

OK.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



--
Adoptez l'éco-attitude
N'imprimez ce mail que si c'est vraiment nécessaire

Re: [PATCH] Log CSV by default

От
Tom Lane
Дата:
Arkhena <Arkhena@gmail.com> writes:
> As a consultant with so many customers that simply install Postgres "out of
> the box", I think that changing log_destination to 'csvlog' and
> logging_collector to 'on' needs changing at least log_truncate_on_rotation
> (but certainly also log_filename and log_rotation_age) if we don't want my
> customers to run out of space sooner or later.

Yeah, that's another issue that we'd have to confront if we turn
logging_collector on by default: the default settings for log
rotation aren't exactly foolproof, and we'd need to make them so.

(Now having said that, the default setup *without* logging_collector
isn't exactly foolproof either, since wherever you send stderr to
will grow without bound.  But maybe people are used to dealing with
that.)

Independently of changing logging_collector's default, perhaps we could
agree on changes to the rotation-related defaults so that it's less
risky to just turn logging_collector on without other changes?

            regards, tom lane


Re: [PATCH] Log CSV by default

От
David Fetter
Дата:
On Sun, Dec 02, 2018 at 12:09:57PM -0500, Tom Lane wrote:
> Arkhena <Arkhena@gmail.com> writes:
> > As a consultant with so many customers that simply install Postgres "out of
> > the box", I think that changing log_destination to 'csvlog' and
> > logging_collector to 'on' needs changing at least log_truncate_on_rotation
> > (but certainly also log_filename and log_rotation_age) if we don't want my
> > customers to run out of space sooner or later.
> 
> Yeah, that's another issue that we'd have to confront if we turn
> logging_collector on by default: the default settings for log
> rotation aren't exactly foolproof, and we'd need to make them so.
> 
> (Now having said that, the default setup *without* logging_collector
> isn't exactly foolproof either, since wherever you send stderr to
> will grow without bound.  But maybe people are used to dealing with
> that.)
> 
> Independently of changing logging_collector's default, perhaps we could
> agree on changes to the rotation-related defaults so that it's less
> risky to just turn logging_collector on without other changes?

Would rotating daily and keeping a week's worth be reasonable as
defaults? If not, what do you have in mind?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: [PATCH] Log CSV by default

От
Gilles Darold
Дата:
Le 30/11/2018 à 19:53, David Fetter a écrit :
> This makes it much simpler for computers to use the logs while not
> making it excessively difficult for humans to use them.


I'm also not very comfortable with this change. For a pgBadger point of
view I don't know an existing library to parse csv log in multi-process
mode. The only way I know is to split the file or load chunks in memory
which is not really recommended with huge log files. I am not saying
that this is not possible to have a Perl CSV library allowing
multi-process but this require some additional days of work. Also at
this time I have not received any feature request for that, maybe the
use of csvlog format is not so common.


Regards,

-- 
Gilles Darold
http://www.darold.net/



Re: [PATCH] Log CSV by default

От
David Fetter
Дата:
On Mon, Dec 03, 2018 at 07:18:31PM +0100, Gilles Darold wrote:
> Le 30/11/2018 à 19:53, David Fetter a écrit :
> > This makes it much simpler for computers to use the logs while not
> > making it excessively difficult for humans to use them.
> 
> I'm also not very comfortable with this change. For a pgBadger point
> of view I don't know an existing library to parse csv log in
> multi-process mode. The only way I know is to split the file or load
> chunks in memory which is not really recommended with huge log
> files. I am not saying that this is not possible to have a Perl CSV
> library allowing multi-process but this require some additional days
> of work.

How about other structured log formats? Would one for JSON work
better?

> Also at this time I have not received any feature request for that,
> maybe the use of csvlog format is not so common.

Part of the reason it's uncommon is that it's not the default we ship
with, and defaults influence this very much.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: [PATCH] Log CSV by default

От
Gilles Darold
Дата:
Le 03/12/2018 à 19:25, David Fetter a écrit :
> On Mon, Dec 03, 2018 at 07:18:31PM +0100, Gilles Darold wrote:
>> Le 30/11/2018 à 19:53, David Fetter a écrit :
>>> This makes it much simpler for computers to use the logs while not
>>> making it excessively difficult for humans to use them.
>> I'm also not very comfortable with this change. For a pgBadger point
>> of view I don't know an existing library to parse csv log in
>> multi-process mode. The only way I know is to split the file or load
>> chunks in memory which is not really recommended with huge log
>> files. I am not saying that this is not possible to have a Perl CSV
>> library allowing multi-process but this require some additional days
>> of work.
> How about other structured log formats? Would one for JSON work
> better?


pgBadger is able to parse jsonlog
(https://github.com/michaelpq/pg_plugins/tree/master/jsonlog) output in
multi-process mode because it do not use an external library to parse
the json. In this minimalist implementation I assume that each log line
is a full json object. Honestly I do not have enough feedback on this
format to be sure that my basic jsonlog parser is enough, maybe it is
not. What is really annoying for a log parser is the preservation of
newline character in queries, if the format outputs each atomic log
messages on a single line each, this is ideal.


>
>> Also at this time I have not received any feature request for that,
>> maybe the use of csvlog format is not so common.
> Part of the reason it's uncommon is that it's not the default we ship
> with, and defaults influence this very much.

I agree. I also think that syslog or stderr are more human readable than
csvlog or jsonlog, which have probably contributed to their large
adoption too.



--
Gilles Darold
http://www.darold.net/




Re: [PATCH] Log CSV by default

От
Michael Paquier
Дата:
On Mon, Dec 03, 2018 at 09:20:01PM +0100, Gilles Darold wrote:
> pgBadger is able to parse jsonlog
> (https://github.com/michaelpq/pg_plugins/tree/master/jsonlog) output in
> multi-process mode because it do not use an external library to parse
> the json. In this minimalist implementation I assume that each log line
> is a full json object. Honestly I do not have enough feedback on this
> format to be sure that my basic jsonlog parser is enough, maybe it is
> not. What is really annoying for a log parser is the preservation of
> newline character in queries, if the format outputs each atomic log
> messages on a single line each, this is ideal.

jsonlog generates one JSON object for each elog call, and per line.
Newlines are treated as they would for a JSON object by being replaced
with \n within the query strings, the same way te backend handles JSON
strings as text.
--
Michael

Вложения

Re: [PATCH] Log CSV by default

От
Gilles Darold
Дата:
Le 04/12/2018 à 03:37, Michael Paquier a écrit :
> On Mon, Dec 03, 2018 at 09:20:01PM +0100, Gilles Darold wrote:
>> pgBadger is able to parse jsonlog
>> (https://github.com/michaelpq/pg_plugins/tree/master/jsonlog) output in
>> multi-process mode because it do not use an external library to parse
>> the json. In this minimalist implementation I assume that each log line
>> is a full json object. Honestly I do not have enough feedback on this
>> format to be sure that my basic jsonlog parser is enough, maybe it is
>> not. What is really annoying for a log parser is the preservation of
>> newline character in queries, if the format outputs each atomic log
>> messages on a single line each, this is ideal.
> jsonlog generates one JSON object for each elog call, and per line.
> Newlines are treated as they would for a JSON object by being replaced
> with \n within the query strings, the same way te backend handles JSON
> strings as text.

This was what I remember and this is why it is easy for pgbadger to
process these log files in multiprocess mode.


-- 
Gilles Darold