Misadventures in source control

Or, what Present!Me ever do to Past!Me?

I observed a while ago on Twitter that learning git (for all its headaches) was valuable for me:

This was on my mind because I recently set about curating (and selecting for either skill review or public presentation) all the personal software projects I worked on as a student. It was a vivid reminder of how much I learned then and in the few years since.

Every day since then I have observed more version control errors of mine, and at some point I thought it worth gathering my observations into one post. Here is a non-comprehensive list of the mistakes I observed in my workflows from years past:

  • a bunch of directories called archive, sometimes nested two or three deep
  • inconsistent naming scheme so that archive and old in multiple capitalization flavors were together
  • combinations of the first two: I kid you not, cs350/old-string-compare/archive/archive/old is a path to some files in my (actual, high-level, left-as-it-was-on-final-exam-day) archive
  • multiple versions OF THE SAME REPO with differing levels of completion, features, etc. (sure, branching is tricky but… really?)
  • no apparent rhyme or reason in the sorting at all – a program to find the area under a curve by dividing it up into trapezoids and summing the trapezoid area was next to a program to return a list of all primes less than X, and next to both of those was a project entirely about running software through CUDA, which is a platform not a problem
  • timestamps long since lost because I copied files through various servers without preserving metadata when I was initially archiving
  • inconsistent use of README‘s that would inform of me of, say, how to compile a program with mpicc rather than gcc or how to submit a job to qsub
  • files stored on different servers with no real reason for any of them to be in any particular place
  • binaries in some directories but not others
  • Makefiles in some directories but not others

(You may have noticed that parallelism is a recurring theme here, and that’s because it was a parallel and distributed computing course where I realized that my workflows weren’t right. I didn’t learn how to fix that problem in time to go from a B to an A in the course, but after that class I did start improving my efficiency and consistency.)

To be fair to myself and to anyone who might find this eerily familiar: I never learned programming before college, so much of my college years were spent catching up on the basics that a lot of people already knew when they got there. Earlham is a place that values experiment, learn-by-doing, jumping into the pool rather than (or above and beyond) reading a book about swimming, etc. Which is good! I learned vastly more that way than I might have otherwise.

What’s more, I understand that git isn’t easy to pick up quickly and poses problems for accessibility to newcomers. Still I can’t help but look at my own work and consider it vastly superior to trying to make this up as you go. It’s well worth the time to learn.

Git and related software carpentry were not something I learned until quite a while into my education. And that’s a bit of a shame, to me: if you’re trying to figure out (as I clearly was) how to manage a workflow, do appropriate file naming, etc. concurrently with learning to code, you end up in a thicket of barely-sorted, unhelpfully-named, badly-organized code.

And then neither becomes especially fun, frankly.

I’ve enjoyed the coding I’ve done since about my junior year in college much more than before that, because I finally learned to get out of my own way.