Tuesday, April 17, 2007

Log files in XML, YAML, or JSON?

Currently, log files are almost always in a form that is hard for machines to parse. It's either in a comma separated form, or an arbitrary proprietary format. Why is that? The primary assumption of log files is that a human will read it. But usually, no humans read it unless something goes wrong, and it's always in a reactive sense.

Of course, no human wants to look at log files all day long. This is the kind of thing that machines would be great at...if only they could read it. What we can do to help log file processing is to put it into formats that are easily transferable and readable by both humans and machines. Isn't that the primary goal of data formats such as XML, YAML, and JSON? A machine that can read log files can monitor it and do analysis on it to present information to users that wouldn't be apparently when just reading the log file straight through.

And yet, most of our log files are in proprietary formats, especially for web servers and web applications. This might not be as much of a problem for long-standing programs like Apache. They've been around long enough that their log file has stabilized and there are specialized programs to parse and analyze those log files.

In addition, I think (correct me if I'm wrong), JSON format allows you to carry code as if it were data. Having that code to perform specific transformations on the log data when processing it might be something useful. Therefore, it would be like having transformed data transparently available to the parsing/analysis program. It would also cut down on the amount of extra programming that is needed for the analysis program, since the log would know how to generate specific pieces of information not explicitly written in itself.

No comments:

Post a Comment