Looking for bugs in all the wrong places

When I took Earlham’s Networks and Networking class, we implemented Dijkstra’s algorithm.

Dijkstra’s algorithm is an algorithm for finding the shortest paths between nodes in a graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

The algorithm exists in many variants; Dijkstra’s original variant found the shortest path between two nodes, but a more common variant fixes a single node as the “source” node and finds shortest paths from the source to all other nodes in the graph, producing a shortest-path tree.

Wikipedia

I got my implementation (in Python) close, but not quite right, by the time the deadline hit for submission.

And more deeply than I have for any coding project up till now, I always felt bad about falling short on this one. I trained much of my perfectionism out of me to become a CS major and decent programmer, but this particular hangup hit hard. In hours of work, I couldn’t find where my implementation was going wrong, or why it was going wrong so consistently.

I submitted my code for grading unhappily, then put it down to focus on other things. I felt like I’d reached my upper limit as a programmer (though I knew in my mind that this was probably not the case). The source code lay quietly in a directory for a couple of years.

Today I’m happy to report that – judged exclusively my own irrational metric, success in implementing Dijkstra’s algorithm – I underestimated myself.

This semester I’m helping teach the same networks class. Since we may assign Dijkstra’s algorithm at some point, I decided to review my old code and maybe try to make it work.

I spent about two hours today, Sunday, reading that rusty old code, tweaking it, running the new version, and parsing its output. I added debug statement after debug statement. I ran it on different input files.

Then I noticed a mistake in the output. Somehow, an edge of weight 1 was being read as an edge of weight 100000000 (the value I used to approximate infinite cost, i.e. the cost of moving directly between two nodes that do not share an edge). In effect, that edge would never be part of a shortest-path between any combination of source and destination. This was bad, because in fact that edge was part of many such shortest-paths in this network.

I went back to some of the most basic pieces of the code and found a possible problem. It was small, easy to fix but hard to detect. I edited a single line of code and ran the program.

It worked.

As it turns out, I’d gotten the implementation right. The core of the assignment, Dijkstra’s algorithm itself, had worked on the input it received.

Visually, here’s the network I had:

And here’s the network the program thought I had:

So what did I get wrong?

Believe it or not: counting.

You see, I had set a variable for the number of nodes N in the network graph. I also had a two-dimensional list describing the network, where each item in the list was an edge in the graph, itself represented by a list containing two nodes and the weight to go between them. Crucially, there are at most N^2 edges in such a graph.

My fatal flaw: rather than saying “for each possible edge in the network, read a line from the file”, I said, for each node in the network read a line in the file. In other words, for my graph with up to N^2 edges, I would only be loading the data about N of them. In this case, the program read only 4 lines, and the edge of weight 1 was described on the 5th line.

(This might have been obvious had I tested the code more thoroughly on one of the larger network files we had. Alternatively, the combination of edges being missed might have obscured the result a lot. A copy of the same input file, but with the lines reversed, would have been the most useful second test case.)

After switching the variable that the index would be checked against, everything worked as I expected.

The code still has problems. I intend to clean it up and streamline it. But the implementation now consistently returns correct output.

The concrete lessons of this experience for me are:

  • Don’t just write debug statements. Write clear and meaningful debug statements. Be specific.
  • Check your I/O, indices, and other such basic features of the code. You can have the greatest algorithm of all time (though I did not!), but if the program isn’t handling exactly what you expect it to, you won’t get the results you want.
  • Vary the input. Vary the input. Vary the input.
  • Don’t let one project, however important or complex or valuable, determine your feelings about your personal skillset.

Finally, while I emphasized the specific and silly programming error here, failure to count correctly wasn’t a root cause of my mistake. The root causes were factors removed from coding altogether: rushing to completion and getting too tangled in the weeds to think holistically about the problem. I don’t think it’s a coincidence that I solved this problem after spending a lot of time in my life disciplining those tendencies.

Leave a Comment