Tech improves pandemic life

I can’t imagine going through the COVID-19 pandemic without computers. Tech improves pandemic life, and it makes it easier for us to make good decisions.

For reasons of both personal caution and what I see as a moral duty, I am probably in the 80th percentile for cautious behavior during the pandemic. I live alone, and my job lends itself to remote work for almost everything. What’s more, my workplace is a socially-conscious liberal arts college. As a result, I interact with very few people (those I do see are always masked-up).

That lifestyle is only sustainable because of computer technology. I buy and pick up groceries through an app. Meetings take place over video chats. Songs or podcasts play in the background while I cook. I can stream almost anything I want to see. I’ve continued to learn and to work using some excellent rectangles.

Ron Swanson "This is an excellent rectangle." Tech life.

There are tradeoffs, of course, but I have basically lived this way since March. Doing so I have weathered the pandemic as well as I could hope (so far).

The national dialogue now includes a lot of chatter about how to stay safe for the holidays. I’m cautious and want to model good behavior. That means I’ll be on FaceTime for Thanksgiving, Christmas, and New Year’s Eve. That’s not great, and it’ll be sad not to be physically visiting family.

But for people like me, the alternative to a FaceTime holiday isn’t an in-person holiday, but a canceled holiday, spent in isolation. Thanks to the people in my industry, I don’t have to do that. Technology brings people together. It’s one reason I remain idealistic about the work I do.

Amidst the tragedies and terrors of 2020, pause to appreciate the age we live in and the cool things we’ve invented. Tech improves pandemic life – and improves life in general. There’s lots to worry about if you want (conspiracy theories, AI risk, etc.), but I’m happy to live in a technologically advanced society.

Jupyterhub user issues: a 90% improvement

photo of Jupiter the planet, as a play on words in the context of Jupyterhub user issues
Jupyter errors are not to be confused with Jupiter errors.

At Earlham Computer Science we have to support a couple dozen intro CS students per semester (or, in COVID times, per 7-week term). We teach Python, and we want to make sure everyone has the right tools to succeed. To do that, we use the Jupyterhub notebook environment, and we periodically respond to user issues related to running notebooks there.

A couple of dozen people running Python code on a server can gobble up resources and induce problems. Jupyter has historically been our toughest service to support, but we’ve vastly improved. In fact, as I’ll show, we have reduced the frequency of incidents by about 90 percent over time.

Note: we only recently began automatic tracking of uptime, so that data is almost useless for comparisons over time. This is the best approximation we have. If new information surfaces to discredit any of my methods, I’ll change it, but my colleagues have confirmed to me that this analysis is at least plausible.

Retrieving the raw data

I started my job at Earlham in June 2018. In November 2018, we resolved an archiving issue with our help desk/admin mailing list that gives us our first dataset.

I ran a grep for the “Messages:” string in the thread archives:

grep 'Messages:' */thread.html # super complicated

I did a little text processing to generate the dataset: regular expression find-and-replace in an editor. That reduced the data to a column of YYYY-Month values and a column of message counts.

Then I went and searched for all lines with subject matching “{J,j}upyter” in the subject.html files:

grep -i jupyter {2018,2019,2020}*/subject.html 

I saved it to jupyter-messages-18-20.dat. I did some text processing – again regexes, find and replace – and then decided that followup messages are not what we care about and ran uniq against that file. A few quick wc -l commands later and we find:

  • 21 Jupyter requests in 2018
  • 17 Jupyter requests in 2019
  • 19 Jupyter requests in 2020

One caveat is that in 2020 we moved a lot of communication to Slack. This adds some uncertainty to the data. However, I know from context that Jupyter requests have continued to flow through the mailing list disproportionately. As such, Slack messages are likely to be the sort of redundant information already obscured using uniq in the text processing.

Another qualifier is that a year or so ago we began using GitLab’s Issues as a ticket tracking system. I searched that. It found 11 more Jupyter issues, all from 2020. Fortunately, only 1 of those was a problem that did not overlap with a mailing list entry.

Still, I think those raw numbers are a good baseline. At one level, it looks bad. The 2020 number has barely budged from 2018 and in fact it’s worse than 2019. That’s misleading, though.

Digging deeper into the data

Buried in that tiny dataset is some good news about the trends.

For one thing, those 21 Jupyter requests were in only 4 months out of the year – in other words, we were wildly misconfigured and putting out a lot of unnecessary technical fires. (That’s nobody’s fault – it’s primarily due to the fact that my position did not exist for about a year before I arrived at it, so we atrophied.)

What’s more, the 19 this year are, by inspection, half password or feature requests rather than the 17 problems we saw in 2019, which I think were real.

So in terms of Jupyter problems in the admin list, I find:

  • around 20 in the latter third of 2018
  • 17 in ALL OF 2019
  • only two (granted one was a BIG problem but still only 2) in 2020

That’s a 90% reduction in Jupyterhub user issues over three years, by my account.

“That’s amazing, how’d you do it?”

Number one: thank you, imaginary reader, you’re too kind.

Number two: a lot of ways.

In no particular order:

  1. We migrated off of a VM, which given our hardware constraints was not conducive to a resource-intensive service like Jupyterhub.
  2. Gradually over time, we’ve upgraded our storage hardware, as some of it was old and (turns out) failing.
  3. We added RAM. When it comes to RAM, some is good, more is better, and too much is just enough.
  4. We manage user directories better. We export these over NFS but have done all we can to reduce network dependencies. That significantly reduces the amount of time the CPU spends twiddling its thumbs.

What’s more, we’re not stopping here. We’re currently exploring load-balancing options – for example, running Jupyter notebooks through a batch scheduler like Slurm, or potentially a containerized environment like Kubernetes. There are several solutions, but we haven’t yet determined which is best for our use case.

This is the work of a team of people, not just me, but I wanted to share it as an example of growth and progress over time. It’s incremental but it really does make a difference. Jupyterhub user issues, like so many issues, are usually solvable.

An inspirational place

I graduated from Earlham in December 2016. I returned to work for the Computer Science Department here in June 2018. Like so many in the community, I relate to it as more than an alma mater or an employer: it’s an institution and a community I (and so many others) hold in high esteem.

For all that, I don’t think I’ve ever been more inspired by this place than I was by this:

It’s an incredible display. I wasn’t able to attend this event myself but I watched some of its organization unfold on social media in the hours beforehand. It was a breathtaking, awe-inspiring achievement.

This is a time of a lot of fear, heartbreak, and frustration. To briefly lapse into politics, it is horrifying to check the news and to see the President of the United States so thoroughly, spectacularly, and dangerously fail in guiding this nation through the crisis. The effects of COVID-19 may hover over our heads for a long time to come.

But this is also a moment of profound social solidarity. I need look no further than this small liberal arts college to see it. It’s wonderful to be part of a community where this could materialize.

We’re dispersing geographically for the rest of the semester, but we carry this spirit with us wherever we go. I can only hope the people of America and the world rise to the occasion as this community did.

Small resolutions for 2020

I have a lot coming up in 2020, so I don’t want to make any major resolutions. But I do see a few obvious, relatively simple places for improvement in the new year:

  • Use social media better. I’ve cut back quite a bit, but Twitter, Facebook, and LinkedIn each have benefits. I want to put them to good use.
  • Listen to more variety in music. I’ve expanded my taste in movies significantly in the last couple of years and want to nurture my musical taste as well.
  • Read fewer articles, more books.
  • More intense workouts. I’ve been coasting on light-to-moderate walking and jogging, and I’d like to push myself more. HIIT and strength training are in my mind currently.

This is all in addition to continuing to the next steps in my career and skills growth.

Happy New Year, friends!

Christmas trees and trip cost vs item cost

When building software for large datasets or HPC workflows, we talk a lot about the trip cost versus the item cost.

The item cost is the expense (almost always measured in time) to run an operation on a single unit of data – one member of a set, for example. The trip cost is the total expense of running a series of operations on some subset (possibly the whole set) of the data. The trip cost incorporates overhead, so it’s not just N times the item cost.

This is a key reason that computers, algorithms, and data structures that support high-performance computing are so important: by analyzing as many items in one trip as is feasible, you can often minimize time wasted on unnecessary setup and teardown.

Thus trip cost versus item cost is an invaluable simplifying distinction. It can clarify how to can make many systems perform better.

Yes, Virginia, there is a trip cost

Christmas tree

Christmas trees provide a good and familiar example.

Let’s stipulate that you celebrate Christmas and that you have a tree. You’ve put up lights. Now you want to hang the ornaments.

The item cost for each of the ornaments is very small: unbox and hang the ornament. It takes a couple of seconds, max – not a lot, for humans. It also parallelizes extremely well, so everyone in the family gets to hang one or more ornaments.

The trip cost is at least an order of magnitude (minutes rather than seconds) more expensive, so you only want to do it once:

  • Find the ornament box
  • Bring the box into the same room as the tree
  • Open the box
  • Unbox and hang N ornaments
  • Close the box
  • Put the box back

Those overhead steps don’t parallelize well, either: we see no performance improvement and possibly a performance decline if two or more people try to move the box in and out of the room instead of just one.

It’s plain to see that you want to hang as many ornaments as possible before putting away the ornament box. This matches our intuition (“let’s decorate the tree” is treated as a discrete task typically completed all in one go), which is nice.

Whether Christmas is your holiday or not, I wish you the best as the year draws to a close.

Going to Iceland

The title is the tl;dr. (!)

Barring a catastrophe (a non-trivial assumption!), I’m traveling to Iceland with the field science program at Earlham in June 2020. I’ll be one of the faculty members on the student-faculty research team for the annual expedition. It’s a thrilling opportunity.

I’ve been working toward this for a while. I’ve acted as “ground control” for several summers, both as a student and in my current role as a member of the CS faculty. Between trips, I’ve been part of the team that’s engineering and coding software to do fascinating things:

  • DNA analysis
  • in-field mobile data collection
  • drone flight planning
  • image analysis

But for a variety of reasons, it’s never been feasible for me to take the trip. Finally, the path seems clear.

Of course, anything could happen. This could fall through, or it could turn out amazingly well. Wherever it ultimately falls on that spectrum, I’ll spend the next few months wrangling various software projects, mentoring students, and assisting the official leaders of the trip as we prepare to go do science. My focus will be on building software, but that will be just one task among many.

Text is not a perfect medium for communicating emotion, but I’m quite excited. I wanted to flag this as a personal and professional milestone. I’m certain to be posting more about it over time.

Some twilight Americana

My alma mater-turned-employer Earlham College has a back-campus area with some trails and buildings, and it’s where I usually go to exercise. I took a long walk yesterday evening to savor the first day of cooler weather here, and I took a few photos there (iPhone 7 camera). These are some of my favorites.

Moon code

I’m one institution removed from the person who – with her team – developed the software that took us to the moon fifty years ago.

NASA photo

Margaret Hamilton is an Earlham alum (class of 1958) who wrote much of the code for the Apollo missions, coined the term “software engineering”, and a lot more. If you’re uninitiated, check out her Wikipedia page.

I’d note that she’s had an incredible career since the photo was taken, so I mean to use it only for its connection to the events we’re celebrating today, not as an encapsulation of an entire career or life.

If you’re interested, the Apollo 11 guidance code is here. You’ll see Hamilton in the “Attribution” section.

Happy Moon Landing Day, friends.

T minus 2 hours 24 minutes

The Iceland crew departs at 1500 today.

They’ve been sent away with a lot of field gear (which I don’t know much about), along with a laptop stuffed with a virtual machine (which I know quite a lot about) running a loose collection of services they depend on when gathering scientific data in the field but don’t have Internet – like when they’re out on a big damn glacier.

In theory, it’s a great use of virtualization. In practice, I’m confident about it, but we’ll soon have a lot more clarity. At the least, we’ll gather a lot of new data about the plus/minuses of the approach.