Misadventures in source control

Or, what Present!Me ever do to Past!Me?

I observed a while ago on Twitter that learning git (for all its headaches) was valuable for me:

This was on my mind because I recently set about curating (and selecting for either skill review or public presentation) all the personal software projects I worked on as a student. It was a vivid reminder of how much I learned then and in the few years since.

Every day since then I have observed more version control errors of mine, and at some point I thought it worth gathering my observations into one post. Here is a non-comprehensive list of the mistakes I observed in my workflows from years past:

  • a bunch of directories called archive, sometimes nested two or three deep
  • inconsistent naming scheme so that archive and old in multiple capitalization flavors were together
  • combinations of the first two: I kid you not, cs350/old-string-compare/archive/archive/old is a path to some files in my (actual, high-level, left-as-it-was-on-final-exam-day) archive
  • multiple versions OF THE SAME REPO with differing levels of completion, features, etc. (sure, branching is tricky but… really?)
  • no apparent rhyme or reason in the sorting at all – a program to find the area under a curve by dividing it up into trapezoids and summing the trapezoid area was next to a program to return a list of all primes less than X, and next to both of those was a project entirely about running software through CUDA, which is a platform not a problem
  • timestamps long since lost because I copied files through various servers without preserving metadata when I was initially archiving
  • inconsistent use of README‘s that would inform of me of, say, how to compile a program with mpicc rather than gcc or how to submit a job to qsub
  • files stored on different servers with no real reason for any of them to be in any particular place
  • binaries in some directories but not others
  • Makefiles in some directories but not others

(You may have noticed that parallelism is a recurring theme here, and that’s because it was a parallel and distributed computing course where I realized that my workflows weren’t right. I didn’t learn how to fix that problem in time to go from a B to an A in the course, but after that class I did start improving my efficiency and consistency.)

To be fair to myself and to anyone who might find this eerily familiar: I never learned programming before college, so much of my college years were spent catching up on the basics that a lot of people already knew when they got there. Earlham is a place that values experiment, learn-by-doing, jumping into the pool rather than (or above and beyond) reading a book about swimming, etc. Which is good! I learned vastly more that way than I might have otherwise.

What’s more, I understand that git isn’t easy to pick up quickly and poses problems for accessibility to newcomers. Still I can’t help but look at my own work and consider it vastly superior to trying to make this up as you go. It’s well worth the time to learn.

Git and related software carpentry were not something I learned until quite a while into my education. And that’s a bit of a shame, to me: if you’re trying to figure out (as I clearly was) how to manage a workflow, do appropriate file naming, etc. concurrently with learning to code, you end up in a thicket of barely-sorted, unhelpfully-named, badly-organized code.

And then neither becomes especially fun, frankly.

I’ve enjoyed the coding I’ve done since about my junior year in college much more than before that, because I finally learned to get out of my own way.

What counts as making?

As I pursue becoming a maker of things, the two things I want to make are software and writing. If I can spend time making those two things, I’ll be happy.

Obviously you can make more than those things, and for anyone reading this I want to be generous with my definition of “making”. Here’s a non-exhaustive list of what I think we can include:

  • writing, which I mean in the almost limitless fashion that includes “the Great American Novel”, screenplays, and 10,000-word blog posts but also fan fiction, appliance manuals, etc.
  • coding
  • running a home media server
  • starting a business
  • film or video
  • music, including performances of pieces you didn’t yourself write
  • a podcast
  • handcrafts
  • woodworking
  • metalworking
  • social media posts are borderline, but if you’re doing it in a systematic way and not just for marketing, chit-chat, or blowing off steam, it counts
  • photography (it’s not all phone snaps)
  • illustration
  • learning to cook new meals or improve your skills in what you already know how to cook
  • science experiments
  • home decoration and design

I would also consider the following as making, or at least making-adjacent. Their inclusion may be controversial as they arguably don’t produce some external artifact for others’ enjoyment, but for many people they offer the same types of satisfaction that making does: purpose, fulfillment, legacy, impact, sense of progress, agency, a valuable way to spend finite time on this earth.

  • athletics, or at least fitness with emphasis on improvement, technique, and maybe participation in some group activity
  • teaching

This should go without saying in the age of open-source software and ubiquitous blogging/microblogging, but whether or not you make money from something does not factor into whether it’s “making.”

A plan for making

In my last post I discussed why I’ve concluded that I need to make more. This post looks forward, and it will explain how I plan to start and continue creating.

In that pile of articles I’ve saved up, I can find a lot of advice about habits, systems, workflows, etc. But I’m mostly drawing from the work of Cal Newport, whose Deep Work and So Good They Can’t Ignore You both changed my outlook on work within the last year or two.

I’d note that all, some, or none of this may be broadly applicable:

  • If you’re already hold a job where making is what you do from 9-5 (the sort of job I’d like to have!), this probably won’t be especially useful.
  • Since this is a post about getting started in a creative habit, it’s also focused on projects conducted by an individual, not a team. The patterns in those two kinds of work are different.
  • I’m posting this mostly as a public expression of a private commitment to myself.
  • This is by no means binding. I’ll be revisiting this list in the future and adjusting my methods based on how well it actually works for me.

The short version, as I currently imagine it, is this:

  • Clear distractions.
  • Make the time and space.
  • Do the work.
  • Share it, with 0 or more people.

That’s it!

Details follow on how to go from that design to an implementation that can last.

0. Clear distractions.

This is item 0 because it’s not actually part of the creative process. That’s very important to say: clearing distractions is about getting out of your own way, not a method of creating. By itself it does not add value (economic, social, personal, spiritual, etc.) to the world. It’s about making space for what you actually want.

Action steps:

  • Freedom: Run the Freedom app. I’ve experimented with it for a year now, and the way that works best for me is to run a fairly strict blocker on a repeating schedule – but not to disable quit during sessions. This lets me look things up if, say, I’ve blocked Reddit but there might be good information on /r/learnprogramming about something I’m learning. I recommend blocking social media, news, and all sites you mindlessly click to when you want to waste time while you’re working.
  • Pick simple tools or tools you know well. Choosing the “perfect” tools can easily chew up a lot of your time. (Of course, if there is a tool that is THE tool for what you’re doing, learn that.)
  • Have your environment ready. The time to pick your instrumental Spotify playlist for audio while you’re working is before the work session, not during it. Ditto lighting, seating, temperature, colors in your terminal, choice of word processor, etc.

1. Before making anything else, make time and space.

Put creating on your calendar. If necessary, consider it an appointment. A creative work session is a time during which you are unavailable except in emergencies. No email, no phone, etc.

Actions to take:

  • Add it to the calendar. I’m going to start with 30 to 60 minutes each day, longer when I need more depth. I’d like to go for 90 on, say, weekends. If you can do this in such a way that it accomplishes tasks for your job, that’s excellent and you could probably do it during work hours. If not, make sure to reserve the time outside work, at least a few days a week. Either way, add it to your calendar and stick to that appointment.
  • Be prepared to fill that time. Don’t arrive unprepared to make. “Write”, for example, might be on the calendar, but if you don’t know what to write, the path of least resistance is to hem and haw and not do anything. Prevent that by knowing what you’re going to work on in advance.

2. Do the work.

This is the only stage that counts. It’s the only non-subjective item on this list.

None of the rest of this matters, for purposes of this project, if it doesn’t ultimately help make this happen.

At the chosen time, in the chosen environment, and for the duration of the chosen time, create. Words and drawings onto paper and screen. Code into editor. Sound waves into air.

Action steps:

  • Make!
  • If you have 60 to 90 minutes, use the time to do deep work: focusing on a single cognitively-demanding task for an extended period of time to improve your own skillset. This may require a little more advance planning in the previous step.

3. Share your work, with 0 or more people.

For me, this is the hardest stage. Part of my motivation for publishing this list is to let me isolate it, in my mind, from the other stages.

All those questions I listed in my previous post about managing feedback, responses, comments, traffic, etc.? They all crop up now, when you “put it out there” for others to see.

This is also the most difficult stage to turn into action steps, because it varies so much by your medium. But here’s my attempt:

  • Create a site or blog for each part of your work. I’ve got my personal digital home right here, a developer blog hosted at GitHub, both private and public code repos on different platforms, and a blog for my writing.
  • Automate sharing tools, if you’d like. I’d rather pick and choose as I go, but the option to share across social networking accounts exists on most web tools you’d want to use.

Some caveats

There are a ton of lifestyle factors that matter to making as well: manage time well, stay healthy, exercise, get outside away from screens, etc. But for the specific work of making, this is the path that makes most sense to me: clear, schedule, work, share.

I also believe readers ought to willfully disregard any of this that won’t work for them because of personality, lifestyle, schedule, etc. But if you’re wondering how to start putting in the work, this is one possible framework that might at least provide some useful tactics.

On making

For a long time I’ve been operating in one mode of learning and personal growth, and I’m in the process of pivoting to another.

Since sometime in college, I have followed what we might call a consume-curate-archive model of learning. I read online frequently, so I have hundreds of articles saved to Instapaper about a range of topics. I read books. I guzzle podcasts, and I use Tweetdeck to follow a lot of carefully-refined Twitter lists about politics, tech, culture, and more.

This is a process of finding inputs, highlighting or annotating them, and stashing them away. It’s a kind of orderly digital hoarding, and I’ve been doing it for a long time.

At the same time, my creative output slowed to a crawl. I’m not sure if there was a direct 1-to-1 tradeoff of those two (any writer will tell you that reading is essential to better writing), but certainly I was gathering a lot of inputs and not using them to produce rare and valuable outputs.

There are a lot of reasons for that. Basic lifestyle is one factor, especially after college. Unsure what I wanted to do, I felt directionless and frustrated. After being across the country for college, I moved back home and worked for myself for about a year. (A contemporary indicator of the simmering desire to make was the joy I felt designing and creating my solopreneur business.) After a few months on my current job, at Earlham, I feel better consistently. Maybe I should have created in spite of my mood, but I didn’t, and now I feel like I can again.

I also experience typical human anxieties about reactions. Some of my work can be private, but some should be public, and how do I sort which is which? We know deliberate practice, “learn by doing” guided by consistent feedback, is the only path to mastering something, but how do I implement that? Say I decide writing is what I want to do. Do I have to spend all my time keeping track of website comments and worrying about traffic? Can I make money from it? Should I? I feel like I have a lot to say, but am I then violating my own privacy just to get attention or make a buck? If I succeed, do I have to be a “public figure” (I don’t want to be)? There’s a halting problem to be modeled here, and my conjecture is that this kind of thinking never terminates on its own.

All of that was enough to smother my creativity for a long time.

That’s no way to live, to work, or to grow professionally. That’s why, in the last few months, I’ve been gradually switching into a creation-first model of learning – a model in which producing some artifact (at a high level it doesn’t much matter what kind) is the guiding principle, rather than accumulating a lot of neatly-arranged inputs.

This is the first in a series of posts I’ll be writing about making. In my next post, I’ll share how I’m planning to orient myself, on a regular basis, into a pattern of creativity.

Generate an idea list

It’s been a while since I had a side project going, so I’ve been itching do to something creative. But I’ve been in the developer’s equivalent of writer’s block for a while, hung up on a lack of ideas or at least a lack of good ideas.

I think I’ve now replenished the idea supply, thanks to this exercise.

I spent a few hours tonight generating 101 ideas for software projects. Half or more of them will never become anything. A few are absolute garbage. A bunch of them might be good afternoon projects, possibly worth sharing, but aren’t especially inspired.

A few seem actually good, and they’re what the exercise is all about.

If you’re in a creative rut, especially if (like me) you’re fairly young or early in your career, I highly recommend this.

Here’s how I did it:

  • Pick a number, let’s say N. I picked N=101 because it’s (in this context) a big number that would be a stretch. I also like that it’s JUST over 100, so your 100th isn’t a complete fluke.
  • Open a notes app, spreadsheet, word processor, text file with line counters, dead-tree notepad, etc.
  • Start writing your list of N items.
  • Fill the list in one sitting exactly, if at all possible. (You can always revisit this!)
  • Put it down afterward. I just finished mine and don’t intend to look it over until at least tomorrow. I want to see it with fresh eyes to choose the best prospects.

I suggest these criteria for whether to add an idea to the list:

  • Be specific…: An item should be a short but complete description of a project. One sentence or phrase is enough if it tells you everything you need to know.
  • … but vague enough: Stay away from describing programming languages, environments, web servers, etc. – those questions can come if you decide a project is worth actually implementing.
  • Make the project scope appropriate for team size: I was thinking up individual side projects, so I wanted ideas I could implement on my own using mostly the Internet for external information. A group/team/company project would be different.
  • Otherwise reserve judgment: If you must, add a parenthetical statement afterward on the problem (e.g., a few of my ideas raised concerns about copyright, but I added them anyway because as a technical matter they would be doable). You can also bold an idea you feel especially good about in the moment. If possible don’t get any more judgy than that.
  • It can be based on something you’ve seen, but modify it and be mindful of law and ethics: Inspiration is inevitable, especially at the idea stage. Be sure to make it your own, though, or it’s not much of an “idea”. Worry about copyright, terms of use, etc., if you ever consider publishing or distributing what you make. For now, add the idea if you think it would work for you given your context and purpose.

Fundamentally, this is about quantity not quality.

This is my first time doing such an exercise in the context of software, so by all means change the rules to whatever works for you. These worked for me, though, so I suspect they may work for others.

A final cautionary note: I worked alone, and I suspect I would have created a totally different list working in a group (i.e. brainstorming). Your mileage may vary there.

Looking for bugs in all the wrong places

When I took Earlham’s Networks and Networking class, we implemented Dijkstra’s algorithm.

Dijkstra’s algorithm is an algorithm for finding the shortest paths between nodes in a graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

The algorithm exists in many variants; Dijkstra’s original variant found the shortest path between two nodes, but a more common variant fixes a single node as the “source” node and finds shortest paths from the source to all other nodes in the graph, producing a shortest-path tree.

Wikipedia

I got my implementation (in Python) close, but not quite right, by the time the deadline hit for submission.

And more deeply than I have for any coding project up till now, I always felt bad about falling short on this one. I trained much of my perfectionism out of me to become a CS major and decent programmer, but this particular hangup hit hard. In hours of work, I couldn’t find where my implementation was going wrong, or why it was going wrong so consistently.

I submitted my code for grading unhappily, then put it down to focus on other things. I felt like I’d reached my upper limit as a programmer (though I knew in my mind that this was probably not the case). The source code lay quietly in a directory for a couple of years.

Today I’m happy to report that – judged exclusively my own irrational metric, success in implementing Dijkstra’s algorithm – I underestimated myself.

This semester I’m helping teach the same networks class. Since we may assign Dijkstra’s algorithm at some point, I decided to review my old code and maybe try to make it work.

I spent about two hours today, Sunday, reading that rusty old code, tweaking it, running the new version, and parsing its output. I added debug statement after debug statement. I ran it on different input files.

Then I noticed a mistake in the output. Somehow, an edge of weight 1 was being read as an edge of weight 100000000 (the value I used to approximate infinite cost, i.e. the cost of moving directly between two nodes that do not share an edge). In effect, that edge would never be part of a shortest-path between any combination of source and destination. This was bad, because in fact that edge was part of many such shortest-paths in this network.

I went back to some of the most basic pieces of the code and found a possible problem. It was small, easy to fix but hard to detect. I edited a single line of code and ran the program.

It worked.

As it turns out, I’d gotten the implementation right. The core of the assignment, Dijkstra’s algorithm itself, had worked on the input it received.

Visually, here’s the network I had:

And here’s the network the program thought I had:

So what did I get wrong?

Believe it or not: counting.

You see, I had set a variable for the number of nodes N in the network graph. I also had a two-dimensional list describing the network, where each item in the list was an edge in the graph, itself represented by a list containing two nodes and the weight to go between them. Crucially, there are at most N^2 edges in such a graph.

My fatal flaw: rather than saying “for each possible edge in the network, read a line from the file”, I said, for each node in the network read a line in the file. In other words, for my graph with up to N^2 edges, I would only be loading the data about N of them. In this case, the program read only 4 lines, and the edge of weight 1 was described on the 5th line.

(This might have been obvious had I tested the code more thoroughly on one of the larger network files we had. Alternatively, the combination of edges being missed might have obscured the result a lot. A copy of the same input file, but with the lines reversed, would have been the most useful second test case.)

After switching the variable that the index would be checked against, everything worked as I expected.

The code still has problems. I intend to clean it up and streamline it. But the implementation now consistently returns correct output.

The concrete lessons of this experience for me are:

  • Don’t just write debug statements. Write clear and meaningful debug statements. Be specific.
  • Check your I/O, indices, and other such basic features of the code. You can have the greatest algorithm of all time (though I did not!), but if the program isn’t handling exactly what you expect it to, you won’t get the results you want.
  • Vary the input. Vary the input. Vary the input.
  • Don’t let one project, however important or complex or valuable, determine your feelings about your personal skillset.

Finally, while I emphasized the specific and silly programming error here, failure to count correctly wasn’t a root cause of my mistake. The root causes were factors removed from coding altogether: rushing to completion and getting too tangled in the weeds to think holistically about the problem. I don’t think it’s a coincidence that I solved this problem after spending a lot of time in my life disciplining those tendencies.

The perks of being a VM

Several of the CS department’s servers are virtual machines. While running VM’s adds complexity, it also lets you do things like octuple* system RAM in five minutes from your laptop.

For context, Earlham CS runs a Jupyterhub server for the first- and second-semester CS students. We want to provide a programming environment (in this case Python, a terminal, and a few other languages) so students can focus on programming instead of administration, environment, etc. Jupyter is handy for that purpose.

The issue: Each notebook takes a relatively large amount of RAM. There are 60 or so intro CS students here. The Xen virtual machine hosting Jupyter was simply not equipped for that load. So at the request of my colleagues teaching the course, I visited a lab today. After observing the problem, we took five minutes to shut the server down, destroy the volume, change a single number in a single config file, and bring it all back to life with a boosted configuration. We’ve had no additional problems – so far. 🙂

Running a VM is frequently more complex than running on bare hardware. But the alternative is this:

I wish I had some of the “upcoming maintenance” email notifications we sent out in my ECCS sysadmin days for comparison. They were basically “no email, no websites for several days while we rebuild this server from parts, mmmkay?”

@chrishardie

Because we do so much of our administration in software, we’ve mostly avoided that problem in recent years. The closest we’ve gotten to scrambling hardware lately was recovering from disk failures after a power outage over the summer. We had to send a lot of “sorry, X is down” emails. I wouldn’t want that to be our approach to managing all servers all the time.

(Of course there are many other alternatives, but running Xen VM’s serves our purposes nicely. It’s also, for many reasons, good practice for our student system administrators.)

*I tweeted about this originally and said we quadrupled the RAM. In fact, a previously-arranged RAM doubling had been specified in the VM’s config file but not implemented. Before we restarted the machine, we decided to boost it even more. Ultimately we quadrupled the double of the previous RAM amount.

Categories of helpfulness

A math professor once mentioned to a group of students and faculty, among whom was me, that you can be any of the following:

  • Helpful!
  • Unhelpful.
  • Anti-helpful.

You may not fall into the first category all the time.

But try above all else to stay out of the third.

Broken links

I was looking up some old articles and blog posts (circa-2009-2012, so not that old). I was saddened to be reminded how littered the modern Internet is with broken links.

The excellent archive.org, at its Wayback Machine, saves as much of the public Internet as it can, but it doesn’t quite capture everything.

If the link is broken, and the Wayback Machine doesn’t have it? Try your luck with a search engine and hope someone copy-pasted it, or the author cross-posted it to a website that still links it.

This highlights some significant modern problems:

  • Slipping through the cracks this way seems like about the only way to actually disappear from the Internet.
  • The web today, which has gotten pretty bad (but not irredeemably so), dominates search engine results in part because of recency bias and in part because those pages do, in fact, still exist.
  • The central challenge of the age of ubiquitous “information” is that it’s hard to find real, good information.

The web is a god of chaos. Broken links are the debris of its manic, arbitrary antics.

Review: In the Beginning

I read Neal Stephenon’s 1999 essay bundle In the Beginning… Was the Command Line this weekend. It’s flawed but good. It’s about operating systems and their relationship with users, developers, business, and society – ambitious for such a short book.

First its flaw: anytime it veers into talk of “culture” it gets weaker – the chapter “The Interface Culture”, for example, is a trainwreck. Its analysis of society and history is – and, to be fair, half-admits to being – condescending and painted with too broad a brush. (I haven’t read enough of Stephenson’s current work to know how his perspective has evolved since.)

The second half of the book, mostly about the world of Linux OS’s, is much better than the first. In fact, despite its age, there are chapters I would consider required reading for new tech geeks. I know when I was learning Linux a few years ago it would have solved a lot of mysteries at once.

The book does suffer from its own success: much of its message can be picked up from snippets of blog posts, tweets, and podcasts in the geek Internet today. With several years of basic coding and systems administration experience, I didn’t learn much that was new to me from it. Still, it’s handy to find it all in one place – and being such a major influencer as to become a part of tech’s very bloodstream is hardly a criticism.

And, definitely in spite of myself, I was really delighted by the last chapter.

ITBWTCL is uneven and its cultural commentary doesn’t hold up well, but on net it’s good. I appreciate this book for what it is. New geeks should read the second half for sure.