Small resolutions for 2020

I have a lot coming up in 2020, so I don’t want to make any major resolutions. But I do see a few obvious, relatively simple places for improvement in the new year:

  • Use social media better. I’ve cut back quite a bit, but Twitter, Facebook, and LinkedIn each have benefits. I want to put them to good use.
  • Listen to more variety in music. I’ve expanded my taste in movies significantly in the last couple of years and want to nurture my musical taste as well.
  • Read fewer articles, more books.
  • More intense workouts. I’ve been coasting on light-to-moderate walking and jogging, and I’d like to push myself more. HIIT and strength training are in my mind currently.

This is all in addition to continuing to the next steps in my career and skills growth.

Happy New Year, friends!

Christmas trees and trip cost vs item cost

When building software for large datasets or HPC workflows, we talk a lot about the trip cost versus the item cost.

The item cost is the expense (almost always measured in time) to run an operation on a single unit of data – one member of a set, for example. The trip cost is the total expense of running a series of operations on some subset (possibly the whole set) of the data. The trip cost incorporates overhead, so it’s not just N times the item cost.

This is a key reason that computers, algorithms, and data structures that support high-performance computing are so important: by analyzing as many items in one trip as is feasible, you can often minimize time wasted on unnecessary setup and teardown.

Thus trip cost versus item cost is an invaluable simplifying distinction. It can clarify how to can make many systems perform better.

Yes, Virginia, there is a trip cost

Christmas tree

Christmas trees provide a good and familiar example.

Let’s stipulate that you celebrate Christmas and that you have a tree. You’ve put up lights. Now you want to hang the ornaments.

The item cost for each of the ornaments is very small: unbox and hang the ornament. It takes a couple of seconds, max – not a lot, for humans. It also parallelizes extremely well, so everyone in the family gets to hang one or more ornaments.

The trip cost is at least an order of magnitude (minutes rather than seconds) more expensive, so you only want to do it once:

  • Find the ornament box
  • Bring the box into the same room as the tree
  • Open the box
  • Unbox and hang N ornaments
  • Close the box
  • Put the box back

Those overhead steps don’t parallelize well, either: we see no performance improvement and possibly a performance decline if two or more people try to move the box in and out of the room instead of just one.

It’s plain to see that you want to hang as many ornaments as possible before putting away the ornament box. This matches our intuition (“let’s decorate the tree” is treated as a discrete task typically completed all in one go), which is nice.

Whether Christmas is your holiday or not, I wish you the best as the year draws to a close.

My new laptop

Now that Apple fixed the keyboard, I finally upgraded my Mac.

I am a non-combatant in the OS wars. I have my Mac laptop and a Windows desktop. I have an iPhone and build for Android at work. I run Linux servers professionally. I love everybody.

My little 2015 MacBook Air is a delightful machine. But it wasn’t keeping up with my developer workloads, in particular Android app builds. I bought the $2799 base model MacBook Pro 16″.

I started from scratch with a fresh install. Generally I try to avoid customizing my environments too much, on the principle of simplicity, so I didn’t bother migrating most of my old configs (one exception: .ssh/config).

Instead I’ve left the default apps and added by own one by one. I migrated my data manually – not as daunting as it sounds, given that the old laptop was only 128GB and much of it was consumed by the OS. I closed with an initial Time Machine backup to my (aging) external hard drive.

Now I’ve had a couple of weeks to actually use the MacBook Pro. Scattered observations:

  • WOW this screen.
  • WOW these speakers.
  • WOW the time I’m going to save building apps (more on that later).
  • I’m learning zsh now that it’s the Mac’s default shell.
  • Switching from MagSafe to USB-C for charging was ultimately worth the tradeoff.
  • I was worried about the footprint of this laptop (my old laptop is only 11-inch!), but I quite like it. Once I return to working at my office, I think it will be even better.
  • I am running Catalina. It’s fine. I haven’t seen some of the bad bugs people have discussed – at least not yet.
  • I’m holding on to my old Mac as a more passive machine or as a fallback if something happens to this one.

Only one of those really matters, though.

Much better for building software

The thing that makes this laptop more than a $2799 toy is the boon to my development work. I wanted to benchmark it, not in a strictly scientific way (there are websites that will do that) but in a comparative way in the actually existing use case for me: building Android apps.

The first thing I noticed: a big cut in the time taken to actually launch Studio. It’s an immediate lifestyle improvement.

I invalidated caches and restarted Studio on both machines. The two apps opened at the same time (not optimal performance-wise, but not uncommon when I’m working on these apps intensively).

I then ran and recorded times for three events, on each machine, for both of the apps I regularly build:

  • Initial Gradle sync and build
  • Build but don’t install (common for testing)
  • Build and install

Shock of all shocks, the 2019 pro computer is much better than the 2015 budget-by-Apple’s-standards computer (graphs generated with a Jupyter notebook on the new laptop, smaller bars are better; code here):

Yeah. I needed a new computer. 🙂

I expect 2020 to be a big year for me for a number of reasons I’ll share over time, and my old laptop just couldn’t keep up. This one will, and I’m happy with it.

I sure hope there’s enough to do…

I begin the month-long winter break this weekend. Students and teaching faculty finish about a week later. When classes reconvene in January, we’ll start spending a lot of time on Iceland and its spinoff scientific computing projects.

To lay the groundwork, we’ve spent the last few weeks clearing brush:

  • Updating operating systems and apps on a dozen laptops, handsets, and tablets
  • Syncing accounts with Apple and Google
  • Sitting in on planning/logistics meetings
  • Coordinating with the students who will do most of the actual research and development
  • Producing the list of software improvements we need to make

The last item is the most substantial from the perspective of my colleagues and students in CS. We build a lot of software in-house to collect and visualize data, capture many gigabytes of drone photos, and run datasets through complex workflows in the field.

It takes a lot of work locally to make this succeed on-site. Students have to learn how to both use and develop the software in a short time. Since the entire Iceland team (not just CS students) depends on everything we build, these projects provide real and meaningful stakes.

All of this has come together in the last few weeks in a satisfying way. We’re up to 62 GitLab issues based on our experiences using the software. That’s a good enough list to fill a lot of time in the spring, for both students and faculty.

We’ll hit the ground running in January, when the clock officially begins ticking.

Going to Iceland

The title is the tl;dr. (!)

Barring a catastrophe (a non-trivial assumption!), I’m traveling to Iceland with the field science program at Earlham in June 2020. I’ll be one of the faculty members on the student-faculty research team for the annual expedition. It’s a thrilling opportunity.

I’ve been working toward this for a while. I’ve acted as “ground control” for several summers, both as a student and in my current role as a member of the CS faculty. Between trips, I’ve been part of the team that’s engineering and coding software to do fascinating things:

  • DNA analysis
  • in-field mobile data collection
  • drone flight planning
  • image analysis

But for a variety of reasons, it’s never been feasible for me to take the trip. Finally, the path seems clear.

Of course, anything could happen. This could fall through, or it could turn out amazingly well. Wherever it ultimately falls on that spectrum, I’ll spend the next few months wrangling various software projects, mentoring students, and assisting the official leaders of the trip as we prepare to go do science. My focus will be on building software, but that will be just one task among many.

Text is not a perfect medium for communicating emotion, but I’m quite excited. I wanted to flag this as a personal and professional milestone. I’m certain to be posting more about it over time.

Computing lessons from DNA analysis experiments

I’ve been working with my colleagues in Earlham’s Icelandic Field Science program on a workflow for DNA analysis, about which I hope to have other content to share later. (I’ve previously shared my work with them on the Field Day Android app.)

My focus has been heavily experimental and computational: run one workflow using one dataset, check the result, adjust a few “dials”, and run it again. When we’re successful, we can often automate the work through a series of scripts.

At the same time, we’ve been trying to get our new “phat node” working to handle jobs like this faster in the future.

Definitions vary by location, context, etc. but we define a “phat node” or “fat node” as a server with a very high ratio of (storage + RAM)/(CPU). In other words, we want to load a lot of data into RAM and plow through it on however many cores we have. A lot of the bioinformatics work we do lends itself to such a workflow.

All this work should ultimately redound to the research and educational benefit of the college.

It’s also been invaluable for me as a learning experience in software engineering and systems architecture. Here are a few of the deep patterns that experience illustrated most clearly to me:

  • Hardware is good: If you have more RAM and processing power, you can run a job in less time! Who knew?
  • Work locally: Locality is an important principle of computer science – basically, keep your data as close to your processing power as you can given system constraints. In this case, we got a 36% performance improvement just by moving data from NFS mounts to local storage.
  • Abstractions can get you far: To wit, define a variable once and reuse it. We have several related scripts that refer to the same files, for example, and for a while we had to update each script with every iteration to keep them consistent. We took a few hours to build and test a config file, which resolved a lot of silly errors like that. This doesn’t help time for any one job, but it vastly simplifies scaling and replicability.
  • Work just takes a while: The actual time Torque (our choice of scheduler) spends running our job is a small percentage of the overall time we spend shaping the problem:
    • buying and provisioning machines
    • learning the science
    • figuring out what questions to ask
    • consulting with colleagues
    • designing the workflow
    • developing the data dictionary
    • fiddling with configs
    • testing – over, and over, and over again
    • if running a job at a bigger supercomputing facility, you may also have to consider things like waiting for CPU cycles to become available; we are generally our systems’ only users, so this wasn’t a constraint for us

A lot of this is (for computer scientists, software engineers, etc.) common sense, but taking care to apply that common sense can be critical for doing big interesting work.

The punchline of it all? We managed to reduce the time – walltime, for fellow HPC geeks – required to run this example workflow from a little over 8 hours to 3.5 hours. Just as importantly we developed a bunch of new knowledge in the process. (I’ve said almost nothing here about microbiology, for example, and learning a snippet of that has been critical to this work.) That lays a strong foundation for the next several steps in this project.

If you read all this, here’s a nice picture of some trees as a token of my thanks (click for higher-resolution version):

Image of trees starting to show fall color
Relevance: a tree is a confirmed DNA-based organism.

Some twilight Americana

My alma mater-turned-employer Earlham College has a back-campus area with some trails and buildings, and it’s where I usually go to exercise. I took a long walk yesterday evening to savor the first day of cooler weather here, and I took a few photos there (iPhone 7 camera). These are some of my favorites.

A tale of two large-ish app updates

This week I spent some time working on Earlham CS’s Field Day Android application. It’s the app used by our student-faculty field science researchers to collect data on trips to, say, a glacier in Iceland. I made two substantial changes.

The first was updating our system dependencies. At the start of the summer, Field Day wasn’t a fully modern application. That’s mostly because its development is contingent on the interest levels of students and faculty who (correctly!) have other priorities during the academic year. We experience our only consistent spikes in development during preparation for a trip to Iceland. Even then, we tend to focus on adding or fixing features, rather than major design choices or boring updates. Whatever their benefits, such changes always risk eating up precious time in the short run.

As a result, we had long neglected to update SDK versions, themes, and other app fundamentals. I wanted to fix that before classes resumed this month.

Not being an Android expert (yet?), I relied on a mix of automated tools in Android Studio, manual code tweaks, and careful testing to push the update process forward. Here’s how I described it in my merge request:

I wanted to make us a “grownup” application, by which I mean that I wanted to move us away from as many deprecated tools and dependencies as possible, as far in advance of a field trip as possible. (EDIT: With one exception: these changes do not attempt to resolve the [looming] Google Drive [API] deprecation.)

To that end, this merge request involves substantial changes to build fundamentals like the Gradle version, as well as some Lint cleanup and general tidying. Much of it was done following a simple pattern:

– run a built-in Android Studio update tool (e.g. “Update to AppCompat”)

– change a bunch of details in the code so it builds

– test on the device

– lather, rinse, repeat

Field Day merge request 9

After some tests by myself and a colleague, I approved the merge.

To reward myself for accomplishing that admittedly tedious process (which followed a long, slow battery testing process), I did something more fun.

For a long time I’d wanted to improve Field Day’s UI to streamline the navigation. I made a batch of changes, then submitted the following merge request:

[Field Day’s original creative developers] created a great design palette for Field Day: fun fonts, bright colors, intuitive icons.

I wanted to keep that but update the navigation to reflect the current understanding of our usage model. To that end, this merge centralizes everything onto one screen, miniaturizes our less-used buttons, and puts database and sensors at the forefront.

No specific activities or fragments other than the main screen (and the deletion of the obsolesced sensor screen) have been changed.

I can foresee a future where we do more data analysis and aggregation through the lab notebook, so I’ve preserved the notebook icon for future use.

Field Day merge request 10

The changes in that request took us from this set of two main screens:

Previous main screen (“Sampling” takes user to the second screen)
Previous second screen, containing our sensor and database features

… to this one screen:

Our most commonly-used buttons are on the main screen and fill the entire screen width.

I again checked with my colleague and then approved the request. I’m now working on other issues and have already found the changes to be substantial boosts to the user experience.

This is a sample of my own personal work, but of course building software is a team sport. And it relies on iteration. The original designers of Field Day – current and former colleagues of mine – did a lot of the heavy lifting over a few years building the core logic and aesthetic of the app. As I made my changes in the last few months, I’ve worked to maintain their original design palette while improving usability, performance, and the underlying data model. It’s a useful, specialized, and dare I say fun application, and I want it to keep getting better.

As a closing note about process, I find it sharpens my skills development when I have to summarize my work into prose, as in these merge requests. Writing them requires more precision than a quick chat in a hallway. That’s to say nothing of possible benefits to future developers trying to retrace changes and intentions.

IDE vs. text editing app vs. editor in a shell

I’ve been doing software development of various kinds for a few years now, using a variety of tools. In my experience, these are the tradeoffs between an integrated development environment (IDE), a standalone text editing app, and a text editor in the terminal. (I haven’t listed any examples here that you need to pay for.)

IDE

Examples: Android Studio, Xcode

An integrated development environment is exactly what it says on the tin: a complete development environment for a project featuring a mix of tools that work together to help you build your application. An IDE’s features go far beyond reading and writing files, and might include…

  • managing your project in its version control system
  • scanning for possible mistakes in code
  • displaying a preview of your application next to the text file that describes it
  • serial port monitoring
  • debugging
  • automated building

IDE’s are powerful and can provide real boosts to productivity and insight. The tradeoff is lock-in: learning one IDE doesn’t mean you automatically know how to use another, and each editor has a learning curve.

In practice that’s usually fine. If you are exclusively, say, an Android app developer, you’ll be well-served to learn Android Studio. It just won’t teach you much about using Eclipse, let alone Xcode.

Text editor, standalone app

Examples: Atom, Notepad++

Standalone text-editing apps can be extremely handy. They’re not as heavy (in terms of system resources and user experience) as an IDE, but they’re great for all the common tasks we’d expect of an editor: find-and-replace, viewing the outline of a project in the filesystem, remote editing by SFTP, and more. Some editing apps, like Atom (which I use), support installing extensions that let you supercharge them to an almost arbitrary degree.

Good text editing apps can do all you need it to do and more. They may be the best choice for many users in many cases. That hasn’t been the case for me, though: I tend to experience them as an unhappy medium between horsepower and simplicity, and I don’t use them as often as the other two categories in this post.

Text editor in the shell

Examples: Vi, Nano

Vim is my fondest friend in the world of dev environments. I learned it within my first semester studying computing. While it has a learning curve, its keyboard shortcuts are simple and powerful. It is a widely-used example of an in-shell text editor – one that you run by typing its name in your shell followed by a space and the name of the file you intend to edit.

vim helloworld.txt     # 2 tokens, no waiting

This category tends to attract strong partisans. Stern computer science people on the Internet often emphasize vi for its near-guaranteed presence on Linux systems, for example. I like vim. Others like nano. Try them, and see what works for you.

There are some great advantages to using an editor in the shell:

  • Simplicity: An in-shell editor is sparse. You don’t get error checking, colorful little GUI buttons, simulators, design previews, or any other IDE feature, but that’s often unnecessary. (In most cases you can add font coloring, which is handy.) What you get instead is the ability to open and examine any file you can imagine – even if it’s a binary file and all you see is gibberish – in a predictable way.
  • Fast: From the POV of the user, an in-shell editor typically loads as fast as you can type. By comparison, IDE’s and editing apps have long load times – and sometimes long read and write times. This speed has the side effect of making shell editors operate similarly no matter your hardware, something that’s not true of either of the other two options.
  • Real big files: It can handle gigantic files better than GUI applications including IDE’s can.
  • Skills that scale: An editor in a shell makes you start learning how to use a shell. Knowing at least the basics of operating at that level is a solid upgrade to your development skillset. (The biggest confounding variable here is the type of shell you work with. I, like most people, just use the Bash shell. Experience could be quite different using ZSH or TCSH.)

Shell editors aren’t for every person working on every project, but I use mine several times daily because “it just works”.

(I’ll throw in a kind word for GNU Emacs here. It can be run as either a shell client or an editing app. I haven’t used it for years, but some people really love it.)

If the environment isn’t an IDE, building and/or running the edited software can be done by running your compiler or interpreter in the terminal. This isn’t as good an approach for something like an Android app, which needs to run on a device or at least a simulator, but it works for many applications.

So which do I pick?

I’m a fan of simple rules for choices like this. When it comes to development environments, try (in this order):

  1. The one everyone else working on the project with you uses. This is how I started working with both vim and Android Studio.
  2. The one you’re most comfortable with.
  3. If you’re early in your dev career and don’t yet have a comfort level with any of these, don’t be afraid to try several or all of them – it’s served me well to be able to shift between them from time to time.

Your mileage may vary on all of this, but these are the patterns in my experience.

Searching, sorting, and “Inbox Zero”

Two fundamental computer science problems that anyone who’s studied the topic will recognize are searching and sorting. These are intuitive enough at a high level:

  • Search algorithms scan a bunch of data to find what you’re looking for
  • Sorting algorithms take a bunch of data and re-arrange it into some order (that may be alphabetizing, grouping, labeling, whatever)

We use these principles outside the classroom as well. Consider email. The modern “knowledge worker” is inundated with emails, and there’s a certain subculture that constantly strives for “Inbox Zero” as a solution to the problem. I would argue that’s not the optimal strategy for most people, most of the time.

Inbox Zero is a strategy that’s heavy on sort and light on search. You want to classify emails, put some in folders, capture the information from as many as necessary, and delete as much as possible. That’s often useful, but a little bit of upfront work can handle a lot of that for you, and when you take those steps the Inbox Zero approach becomes much less attractive than it superficially appears.

As an example, here’s my strategy, which consists of a couple of sorting strategies and then (mostly) letting search handle the rest.

Sorting

The first sorting mechanism I have is the simplest one: unsubscribe to as many lists as you can. That’s great and for most commercial services I’ve done it, but I also receive emails from…

  • the Beowulf cluster mailing list
  • auto-pay services
  • the Earlham CS admin mailing list
  • the Earlham faculty mailing list
  • the ECCS GitLab instance if someone pushes an update
  • our wiki when someone makes a change

I don’t need every single message in all of those categories, but I know I’ll need or want some of them, so I can’t just delete them. That’s where filters come in.

I rely on (entirely too many) email filters on my Mac Mail client to handle preprocessing for me. Each filter applies a set of rules to incoming emails based on certain information in the email, often sender or subject contents. It takes many minutes/few hours to set up initially, but adding to it incrementally is quick and simple after. For that reason, it’s most of the sorting I ever do.

Those two tactics – cutting subscriptions and running filters – account for all of the sorting I do more than once every couple of months. That’s because I want the power of…

Searching

The major downside of deleting emails (especially if you then clear the trash, as I imagine most Inbox Zero people want to do) is that you can no longer search the content. Even if you think you’ve written everything down and downloaded every document, there’s a chance that something you consider unimportant now will become important later. Your ability to remember the context of an exchange depends on how well you’ve captured the details in your notes, and you may not know all the details you need for a long time.

The ability to search every message a human being has sent me in the last several months is handy. It’s not something I do all the time, but it has high value on the occasions when I do it. I’ve recovered many documents, resolved ambiguities during meetings, and revived forgotten but important conversations by doing this.

Why Inbox Zero?

Space is cheap, (human) time is expensive, search is extremely useful, sorting is only modestly useful… and yet we spend a lot of human time rearranging messages. Why?

I think the answer is obvious: It looks and feels nice to have a squeaky-clean inbox. An inbox full of messages feels like clutter, and clutter is a stressor.

That’s a perfectly valid reason! It’s why, a couple of times a year, I reach for Inbox Zero myself. I’ll apply a bunch of sorting to my inbox: deleting, archiving, moving things to new folders, etc. It’s just not necessary most of the time. My attention, like most people’s, is much better devoted to getting work done than to maintaining a clean inbox.

Most of the time, I’m happy to search 773 emails rather than sort 773 emails. Go to Inbox Zero because you want to, not because you think you must.

UPDATE: I promise I wrote and scheduled this post before I saw the most recent (Friday July 26) XKCD cartoon but I’m happy with the serendipity: