Blog

Three Rules for Effective Naming - ECG

Written by Mark Lindsey | Jun 13, 2009 4:00:00 AM

1. ONE OF THE RULES that has served me well is this: things should have unique names so I can distinguish them. Filenames are easy examples; I've noticed it's a lot easier if, for output I don't intend to edit, I don't re-use filenames.

Output I don't intend to edit includes things like the output of mysqldump (which dumps the contents of a database) for backup purposes; or the output of wget (as it downloads a web page or site). Or the output of "show tech-support" on a network box.

At the moment, I put a timestamp in the filename; such as "show_tech_support_200906131557".

This raises some questions:

(a) Can't we depend on the filesystem for this? After all, every filesystem has time and date stamps. BUT: Files don't just live on filesystems: they're often sent through email, or attached to ticket systems on web sites.

(b) What timezone should be used? I just use the timezone I'm actually in, but I'm not sure that's "ideal". It *CAN* be confusing when a system (such as a server) is in one timezone and I'm working in another timezone. I probably need a better rule for this case.

2. FILESYSTEM HIERARCHIES ARE GREAT, but unfortunately they're not useful once the file leaves the filesystem (such as when it's sent by email). So if you're file is named "tech-support.txt", you only know the least bit about its content. If you add the timestamp as I suggest above, you get "tech-support_200906131607.txt". It's also smart to add your organization to the name if it's going to be crossing organizational boundaries, such as when you email the output to a vendor. It's also smart to include the name of the system that generated the output.

So you'd end up with "acmecorp_router_1_tech-support_200906131607.txt".

Occasionally, I'll get a complaint that the files named this way are too long. I attribute this to sloth on the part of the complainer, but perhaps other theories may explain the complaints.

3. ANOTHER RULE is keeping a unique tag on datasets introduced to inter-mingled sets. For example, I happened to harvest some MP4 audio files as they were flying by one month, and I added those to iTunes. Later on I decided the quality was too low on those files, so I wanted to delete them all.

Fortunately, I had included an underscore "_" in the song names when I added those files to the iTunes library, so it's trivial to select and delete them all.

Or take a less-trivial case: suppose you're managing SIP VoIP Phones, and you're adding entries to the database for 10,000 of them. The database may already have 5,000 entries. After you've added these entries to the database, you may later need to go back and work with these entries. (Maybe you didn't get it perfect the first time.) In this case, it's convenient to have some marker in the database entries for the SIP phones you added in this way. If the database has a way of marking the source of the entries explicitly, then that's good to use. If not, then you can sometimes add a tag to the name. At the very least, you can keep keep a list of the individual entries added, which you can use later to access the new data set.