Small LS logo

On bad coding


Bad coding is a term for any source code that isn't easy to read or can't be easily maintained by another programmer without reference to the original author. It also describes the habit of making changes that destroy the readability of the original code and thus make its maintenance more difficult.

Writing well-laid out code with carefully chosen names and intelligent comments gives it the best chance of being easy to understand and maintain. I prefer blocks of well formatted and carefully written comments placed ahead of the code they describe to any other comment style.

I've certainly seen some memorable examples of badly documented or too-clever code in the past.

If you want to improve your coding style, I thoroughly recommend reading "The Practise of Programming" by Brian Kernighan and Rob Pike, pub. Addison Wesley. The advice it contains is applicable to most modern programming languages and reading it can be of benefit to programmers of almost all skill levels.

Poor use of comments

One of my favourite examples of bad documentation was actually quite good code: it was just the commenting style that was naff. This was TSC's MC 6809 assembler source for utilities that supported their Flex-09 operating system. The house style was to add a comment alongside every instruction whether it needed it or not. The machine had INC and DEC instructions to add or subtract 1 from a register. So, did writing "BUMP THE B REGISTER" alongside every INC B instruction make the program easier to read? Really?

Lack of comments combined with deliberately obscure code

As it happens, my worst ever example was also assembler: this time in PLAN 3 assembler for an ICL 1900 mainframe. In this case the offensive code was a subroutine that nearly filled a lineprinter page, i.e. it was no more than 60 instructions long. It calculated the last day of the current month so that date could be printed on a set of financial statements. It worked from the current date, which was input as a string in "dd/mm/yy" format. The source of this subroutine had only two comments: the first said "Calculate the last day of the month" and the second said "Works until Feb 2100". The rest of the page contained a wodge of register-to-register operations and relative branches. There wasn't a single label or variable name in the whole mess. I never met anybody who could follow it past its first dozen steps, and that includes myself. Bear in mind that I used to be fairly good at programming in PLAN, which was one of the better macro-assemblers back then.

A better approach

The silly thing was that the date calculation was trivial on a 1900, which held dates internally as days since 31 Dec 1899 and provided a good set of date manipulation subroutines. In summary, the idea was that, say the current date is 21/06/13. You first transform it into first day of the next month by setting the day to 1 and adding one to the month, i.e. the string becomes 01/07/13 which, when you convert it to days since a base date, subtract one and convert it back will give you 30/06/13, the desired last day of the month.

The implementation, given the 1900's date manipulation subroutines and writing the algorithm in a Java-like pseudocode rather than PLAN, looks like this:


public class DateManipulator
{ 
   public String lastDayOfMonth(String today)
   {
      int day = 1;
      int month = parseInt(today.substring(3,4)) + 1;
      int year = parseInt(today.substring(6));
      if (month > 12)
      {
         year++;
         month = 1;
      }

      String s = format("%02d/%02d/%02d",day, month, year)
      days = gdatebin(s);
      return gbindate(days - 1);
   }
}

... and wasn't much longer in assembler.

gdatebin() converts a date string into an integer holding the number of days since 31/12/1899 and gbindate() does the reverse. Bearing in mind that both were written in the late 1960s, both expected a date to occupy eight characters formatted as dd/mm/yy. I have no idea why the bloke thought his mess of assembler would fall over after Feb 2100: the 1900 held dates in a signed 24 bit number, so its Y2K moment never arrived in its lifetime: it isn't due until some time in 24866 AD.

And, yes, I do know my algorithm worked correctly despite leap years and all: it ran for years as part of an accounting package and anybody could understand it even though it was written in COBOL.

I've shown the algorithm as working on dates without a century in the interest of historical accuracy: when I wrote the COBOL version back in the very early 70s, nobody ever used dates with a century[1]. But the code would work correctly: if you convert 31/12/1999 to days since a base date (31/12/1899 is as good as any), add one to it and convert the result back to a character string you get 01/01/2000 which, if you omit the century, becomes 01/01/00 which is the correct result.

Admittedly, without using a four digit year it would eventually have got the end of February wrong but, since 2000 was a leap year, the first of these mistakes would only have shown up in Feb 2100 since the leap years during a century, e.g. 1904 to 1998 and 2004 to 2098 are always the same. However, if a four digit year is used so the method knows whether the first year of the century is a leap year or not, then my algorithm is correct until the 1900's Y2K moment.

Finally, if this was being written in Java for current use, the date manipulation would be done in a GregorianCalendar instance and conversion to and from a Date would use SimpleDateFormat, though the sequence of date adjustmants is identical that shown above. I've not shown this because some of the foibles of using the SimpleDateFormat and GregorianCalendar classes would obscure the algorithm rather than clarifying it. I have, in fact, written a Java version of the algorithm and it works as predicted.

A Lightbulb moment

As I was writing this I suddenly understood the nameless spaghetti coder's "Works until Feb 2100" comment, though I won't begin to guess whether, like my method, his would have been permanently fixed by using a four digit year.


[1]Four digit dates in the late '60s and early '70s.

There was a curious blind spot about including the century in dates back then, partly due to hardware limitations and partly aided and abetted by the COBOL language specification.

The CODASYL specification for COBOL required the receiving field in an "ACCEPT field FROM DATE." statement, which is how a program gets the current date from the operating system, to be a six digit non-computational field, i.e. exactly 6 characters long and that the received date would be in the form "yymmdd". There was no provision for accepting a century, at least as late as the mid '80s, and apparently in IBM COBOL/400 as late as 2006, though I see that by 2010 their mainframe COBOL used four digit years. The same applied if you required the current date in Julian format ("ACCEPT field FROM DAY" which returned "yyddd").

Early hardware was expensive and physically large, but very limited in capability. In the early days saving storage and clock cycles was all-important. In the 60s and early 70s a small mainframe might have as little as 16 KB of ferrite core RAM, a clock speed of 12uS (30K instructions/sec) and would have stored its data on 1200 or 2400 foot magnetic tapes at 556 bits/inch, i.e. 8 MB or 16 MB per reel at the outside, and would have read and written data on tape at no more than 20 KB/sec. These numbers apply to the ICL 1901, which is what I learnt to write assembler on.

Consequently, you didn't bother with storing the century unless you really had to, e.g. you ran a payroll and had people on the staff who had been born in the 1890s.

Slovenly maintenance

The most important rule for a maintenance programmer is to make changes using the same style that the program was originally written in, even if you hate it. This means choosing new names that will be similar to those used when the program was written and laying out the code you write in the same style as the original: going against this rule instantly reduces the code's readability.

A particularly glaring example of how not to do it was provided by a COBOL programmer called Ken Eagle: I never met him and don't particularly want to call attention to him, but his choice of data names makes his identity rather obvious. His approach was to avoid any attempt at choosing names that describe the use of the data item by calling any new numeric variable KEN1, KEN2,... while calling new non-numeric variables EAGLE1, EAGLE2,... The numeric suffixes had no meaning, being assigned as he needed new variables. I think the only justification for this way of assigning data names, and a particularly lazy one, is that any responsible programmer was unlikely to have used either KEN or EAGLE as part of a data name in the past.