B.A.T.T.L.E. Against The Three-Letter Extension

Copyright by Karim D. Ghantous 2003

File extensions might be small pieces of information but the quality of information they tell you about a file at a glance is most valuable. By knowing what kind of file you have (document, movie, image, etc.) you'll also know if you have the right software to view the file.

Whether we've liked it or not (and I do believe 100% of people do not like it) we've had to put up with MS-DOS's limiting file naming system of 8+3 characters. Whether we're Amiga or Unix or Mac users, and especially if we're Atari users we've had to put up with poorly described files on our computer that others have created. Atari users of course have had to put up with this limitation locally as the file system is almost identical to MS-DOS, at least superficially.

The most aggravating thing is that even after the Windows 95 take up, people still hung on to naming their files with limited characters. And stupidly enough, even though just the file name was longer than eight characters, the extension would still be limited to three. For example, once DOS and Windows users got the message that they could name their files with more freedom (even the Commodore 64, an 8-bit computer, allowed 11+4 file names) they happily named their files like 'fromrussiawithlove.doc' or some such stuff.

The problem was and still remains that the file extension, the last group of characters, appearing after the dot was still limited to three characters. Now, there is some logic in this because Windows 95's file system still recognised files by their DOS names, not their Windows names. A Windows 95 file like fromrussiawithlove.doc would be something like fromru~1.doc underneath the window-dressing. In other words, the Windows 95 filenames were not 'real' file names. But hey, good fruit never came from bad trees, and Microsoft is in the eyes of many a very bad tree.

But now, where NT-based versions of Windows seem to be more prolific, Windows users have finally caught up to Mac and Unix users in being able to have genuine, long filenames with no restrictions lurking underneath the surface. So now, in Mac OS or in Windows you can name your file 'Indiana_Jones_and_the_Last_Crusade.html' and if you transferred it from Mac to Windows, the file name would retain its name and not be truncated, not even underneath the surface.

And yet, even on the Web, when all Web servers are either Unix, Mac OS X or Windows NT based, you still find files with three letter extensions. Stop for a minute and think: what's the point? We don't have to handicap ourselves for Windows users anymore (not that we ever should have but that time has passed). There is no need to play it 'safe'. In fact, why should it bother us if we, on our NT or Unix systems, create a file that a Windows 95/98 user may have to truncate on his system?

If there were a lot of file formats in the past - which needed their own file extension to help identify them - there are plenty more now. The limts placed upon useful file description by the three letter extension is not acceptable. For years now I have been conscious to wherever necessary use four or even more letters after the dot in a file name. I invite and encourage you to do the same.

But there are some things to keep in mind. Firstly, some file extensions are fine as they are. 'doc', 'ogg', 'mp3', 'rtf' etc are all fine. Some like 'htm' and 'jpg' are not so fine. With these latter names you can see how horrible this ridiculous convention of file naming really is. It's time to act.

But wait a minute, you might say. Okay, I'm patient... You might feel that changing the way you name files might cause some sort of problem - 'incompatibility' or something like that. Or that other people might not be able to read your files. Or maybe you may feel that things are like they are for a reason and it's not for ordinary people like you to change them. My answer to all that is simple: it's all bluff, people, pure bluff and nothing more.

Back in the days when we had many computers to choose from some of us bought Macs, some C64s, some IBMs. One reason why people said they bought IBMs was the fact that 'everybody else has one'. Or that it's the 'industry standard' (which industry was that again?). Or that 'I want to be compatible with everyone else'. All those reasons were caused by IBM and later Microsoft succeeding in their bluff. And millions believed them. When you think about it, there was no good reason to choose an IBM over an Amiga or Atari ST. Even the C64 trumped the IBM compatible in features and usability for years after the latter was introduced.

Now we're dealing with a more subtle kind of bluff. The bluff that says that file extensions must be three characters. Which is why HTML files are sadly cut to 'htm' by some Web authors. But we don't have to put up with this nonsense any longer. Especially now that no operating system sold today has any major restrictions in regards to file naming (though you should try to follow the conventions detailed below to make life easier). Here are the file extensions that I use:


Replace this... with this
.htm .html
.jpg .jpeg
.txt .text
.tif .tiff
.mpg .mpeg
.jp2 .jpeg2
.mov .moov

There are others of course, but some shouldn't always be fixed. For example, if you save a movie in the DivX format it will often have a default extension of '.avi' which indicates the Windows multimedia container file. An AVI file can have movies encoded with different codecs, and DivX is one of them. Changing the extension to '.divx' in this case is a good and safe thing to do. But if the file is encoded with the more usual codecs like Indeo or Microsoft RLE, then stick with AVI.

It's similar with QuickTime but it pays to just keep the 'moov' extension. In fact 'moov' is the default extension for QuickTime files but was sometimes contracted to 'mov' for the handicapped MD-DOS file system. No need to do so now, that's for sure.

The least we can say about the less restrictive (but just as concise) longer file extensions is that it enables everyone room to move. There are only so many useful contractions that you can make with three characters. Having four or even five allows more flexibility and distinction in identifying files and what's in them.

Oh, and if you have multitudes of files already on your system and you don't want to bother renaming their extensions manually (and who does) you can use Rename if you're on Windows. Linux and OSX users should have similar utilties (I'll post links here when I've found them).

File naming conventions

Now we've been through some of the file extensions we can improve, let's quickly deal with naming files in general. The first, golen rule is to never use spaces in files. This helps cross-platform compatibility, especially in networked environments. Mac and Windows users often use spaces in their filenames, but Mac users are slightly more liberal in this respect. I remember seeing a system administrator trying to download Mac filenames (i.e. with spaces) onto his NT machine and having an awful time of it. Already filename conscious by this stage, I lamented that nobody had established some kind of 'house rules' for this kind of thing. Below are the other characters in addition to the space that you should avoid in the file name:

< > ' " *
{ } ^ ! \
[ ] # | &
( ) $ ? ~

This table from Unix for Dummies p.29, by Levine & Young, IDG Books

This is important for transferring files from one platform to another. Avoid using the space and the above characters and you shouldn't have any problem putting in an IBM or Macintosh formatted disk into a Unix computer and copying the files to a hard drive. In a pervasively networked world, it's a good idea to keep things 'clean'. By the way, these characters are assigned special meaning especially in the Unix operating system (which, if we include Windows NT and especially Mac OS X, is virtually every operating system in use).

And important note, though, on single and double quotation marks. They can be used in pairs within a filename but not individually. In addition, they can be used to surround an otherwise reserved character, like an asterisk or tilde (prounouced 'til-deh', like the 'e' in 'Porsche'). For example, a filename like bullsh*t.text is not a good idea; but bullsh"*"t.text is okay.