Posts Calendar

May 2019
« Dec    

Organizing the Captain’s Log

Monday, April 18, 2011 @ 08:04 AM
Author: krusta80

Alright, I admit it!  I am a pretty big Star Trek fan.  While most of my friends growing up daydreamed about scoring a winning touchdown or hitting a game-ending shot at the buzzer, I would spend hours exploring the galaxy with Captain Kirk and Mr. Spock.

Once in a while…when the weather was particularly bad and I had run out of video games to keep myself occupied…I would ”coax” my younger brother into watching Star Trek marathons with me (sorry again for that, bro).

And while the dates would change, almost every episode started with the same narrative spoken by William Shatner or Patrick Stewart:  ”Captain’s Log, star date 43125.8″.  And just like that, I was hooked again…transported back to the final frontier.

While I’m sure it is no stretch of the imagination to think that an IT professional would also be a Star Trek fan, there actually is a point to my mentioning the ultimate science fiction show today:  log files.   Just as it is for the captain of the Enterprise, keeping a recorded history of important events and problems is critical to maintaining a healthy and robust office network.

All too often, log files get lost in the shuffle.  After all, why would anyone need to investigate them if everything is working properly??  Any why wouldn’t we expect everything to work properly?  We ARE pros after all, aren’t we?

And then, of course, there is the real world to consider:  Murphy’s Law in action.  Something goes wrong one day…often months after the initial setup is done!  By then we have fixed dozens of other issues and have written hundreds — if not thousands– of lines of code for other projects.  Our mindset has since moved on, and we are forced to relearn much of what we had created.  But of course, none of that matters in an emergency situation, and it often is just that:  en emergency situation.

This is when a properly-generated (AND properly organized) log file comes into play.  It can mean the difference between spending a few minutes pinpointing the issue and wasting hours through trial and error.  And as we all know, during a critical outage for a client, there is nothing worse than wasting time on something that “should have been working in the first place anyway.”

Simply put, the client normally won’t know about… and CERTAINLY won’t and shouldn’t care about an unexpected change caused by the latest Windows patch or a problem with the local DNS server.  It is up to you to dig up that log file and find out what is causing the failure.

So, without further ado, I give you Pandora’s top tips for creating and managing your log files:

1.          Whether developing a program from scratch or simply implementing a set of out-of-the-box programs and/or scripts, print everything to the screen first.  You will likely need to do this anyway for the initial setup, but it is also a good way of visualizing exactly what messages are critical to the program’s success and/or failure.

2.          Once the new process is configured properly, it is time to redirect all of those important print statements to files.  This is when I like to run through the basic questions:  what, when, who, where, and why/how.

a.  What:            Remember, this wasn’t your first program and it won’t be your last.  So identify what the name of this process is, both as part of the file name and as the opening line(s) of the log file!

b.  When:           The more precise the better!!  Every line in a log file (and the file name itself) should have a timestamp.  The human mind is amazing at pattern recognition, so give yourself a chance at picking up potential timeouts or other things when investigating a problem.  Particularly with network issues, timing is everything.

c.   Who:             Every operating system…especially these days…makes security a priority.  It is possible to have dozens of levels of access, both per user and per group.  Therefore, it is very important to keep track of what permissions  were used when running the process.  Was it run as “root” or “administrator”, or was it something written for “Bob”?  If the latter, were Bob’s credentials recently changed for any reason?

d.   Where:         The price of data storage is plummeting, and the average hard drive size is rapidly expanding.  It is therefore more important than ever to keep track of where your files are read from and written to.  It is equally important to keep track of each active sub-process (with absolute path info included) at all times.  During troubleshooting, this info comes into play mostly when system updates or some subtle change made by a seemingly independent process is to blame.

e.   Why/How:   These are basically the same question when it comes to computers, and unfortunately answering why or how something is not working is often left up to us.  But that said, there are still ways to help ourselves here, and this is where multi-level debugging comes into play.  For example, CUPS, which is an open-source printing subsystem implemented by most Unix/Linux servers, allows for two levels of debugging:  1 and 2.  While level one is normally sufficient to handle the most basic of issues, sometimes you need to dig a little deeper!

3.          Determine how critical this new program is to operations, as well as whether it is intended to be completely automated or part of a manual process run by a user.  The more critical and automated the program, the more you need to consider setting up a notification system.

This is where things get a bit dicey for us IT people.  The more processes that we deem “critical” and send ourselves notifications for, the more clogged up our inboxes will get with automated messages and log-file attachments.  No matter how hard we try, it is human nature to automatically apply some sort of visual filter when overwhelmed with things.  Our eyes and  minds will naturally gloss over messages over time, especially if there is little difference between a harmless warning email and one indicating a critical system failure!

So while it may seem counter intuitive at first, DO NOT send yourself an email for any little thing out of the ordinary when you create or configure a new system process.  Separate the critical errors from the warnings, and remember that there is no better notification system than your end users!  If the script you setup is part of a daily routine run by a person, let THEM be your notification system…trust me, they will let you know when something is wrong!

Note that limiting notifications does NOT mean limiting log files altogether.  After all, even for the less severe issues, you will need to dig into the logs eventually anyway.  But by keeping your email notifications limited, you are in
effect streamlining your efforts for your clients.

4.          Bad news first:  log files can often take up lots of disk space.  Since they are constantly written to, it is not uncommon for log files to quickly gobble up hundreds of megabytes of space.  The good news?  Well, since log files normally
consist of repeated text messages, they are highly compressible…so what was 100 megs before compression can easily be converted to only a handful of megabytes as a zip file.

Be sure to archive your log files carefully!  While it is best to have the source program assign a separate log file to each hour, day, or other predetermined time period, you can always accomplish the same task independently if need be.  In fact, I will actually be exploring how to do just this in a future post dealing with “grep” and other powerful parsing tools.

As alluded to before, be sure to compress your archived log files accordingly.  Whatever the cutoff date is for when to archive a log file, make sure you have one.  Your server’s hard drive will thank you for it.

5.          As with everything you do, it is critical to keep your client’s logs organized!  Whether dedicating a directory on the hard drive for storing all log files on the system or simply including each process’s log file location as part of your documentation, make sure you know where to begin when troubleshooting a process.

6.          Finally, we need to know how best to navigate a log file when the time comes.  Log files can often seem like mountain faces of information, and they are best scaled with powerful parsing tools (again, more about this in a later post).

There is no room for guesswork in our business; we are the ones others look to for answers.  With the right tools in place, log files are the blueprints of problem solving.  Treat them as such, and they will never let you down.

Post to Twitter Tweet This Post

2 Responses to “Organizing the Captain’s Log”

  1. Steve P says:

    Illuminating article! As a support person for an investment bank, I work with log files often. I wish that the companies for which I have worked followed your strategies.

    I do have a question not entirely off topic. Do you have any tips for finding the relevant content in a log file quickly?

  2. krusta80 says:

    Thanks for the compliment, Steve! We are glad to know that our tips have gone to good use.

    As for your question regarding finding content from within a log file, we will be providing a follow-up article later this week. In it, we will be exploring some of the powerful tools available both in Unix/Linux environments and in Windows. For now, however, I would start by researching “grep” and “tail” if you are using Unix or Linux.

    –John Gruska

Leave a Reply