The long xkcd climate change post, split

I wanted to share the xkcd climate change cartoon “Earth Temperature Timeline” on social media, but I had two problems: (1) it’s a visually very tall comic that does not lend itself to viewing in one screen; and (2) most of the big sites punish external links so just pointing to the source would narrow the reach.

So, taking advantage of xkcd’s generous license, I downloaded and cut up the cartoon into pieces. Sharing my result here.

They are here and the imagemagick commands I ran actually divided them relatively cleanly (e.g., no blocks of text sprawling two images, that I see). Those commands are below.

Commands I used:

magick input.png -resize 1080x +repage -crop 1080x1350 -quality 95 +repage slide_%02d.png
magick slide_15.png slide_16.png -append slide15_combined_last.png

I did use an LLM to help tune the first command a bit.

My memory was that this cartoon generated a fair amount of discourse, but I do think it makes a striking point in isolation about the extent of the climate problem. Anyway – let’s move to renewables and more distributed electrical generation.

Living through powers of 2

I’m unreasonably excited that my age at my birthday this year (32) is a power of 2.

I’ve been aware and idly amused by this fact for a bit, but my excitement comes from noticing recently just how rare that is.

Think about it.

The first power of 2 you experience is 1 (2^0). I know an infant who is almost one year old. He doesn’t know anything about powers of 2. He is too young to appreciate it.

The second power of 2 you experience is 2 (2^1). The aforementioned infant has a two-year-old toddler for a brother. Based on my observations, this older sibling is also too early in brain development to appreciate this pattern.

The third power of 2 in your life is 4 (2^2). This is one isn’t impossible to appreciate at the time I suppose, but it’s out of reach for all but the most comically precocious four-year-olds.

Your fourth power of 2 is 8 (2^3). It’s been so long since I thought about the elementary school math curriculum that I had to look at whether they teach this concept in those years. They do not. Evidently that’s too early to be teaching powers of 2. [1] Your very ahead-of-the-curve eight-year-olds might see it, but it has to be a rarity.

So the first power of 2 most people might meaningfully experience is age 16 (2^4). At this point it is entirely plausible to know and appreciate the milestone! You have to bit a bit of a math geek, sure, but that rules! Sadly, I confess I have no memory of caring about this at the time.

Above a certain point, there are bigger gaps between the numbers with every increase to the next power. You have to wait more than ten years for your next power of 2 birthday, the easily-overlooked 32 (2^5). That’s a big jump from 16 mathematically – probably an even bigger jump in life.

Finally, you have reasonable hope for one more, 64 (2^6). I say one more because the oldest person ever, that we know of, passed away at 122 years old. That’s six shy of 128 (2^7). Miracle anti-aging cures, potions, and technologies are in a permanent state of hype, but I humbly venture my guess that we’re not about to crack 2^7 anytime soon.

So realistically, a person who lives a common human lifespan gets three power-of-2 birthdays they might appreciate: 16, 32, and 64. I’m closing in on my second, age 32 (2^5). I intend to live that year – and, if I’m so blessed or fortunate, all the years after it – to the fullest.

Aging to a Power of Two: Because if we’re not making up arbitrary reasons to celebrate, what are we even doing here?

A quick afterword: It occurred to me while writing this that someone must have done a post like this before. Sure enough, a quick search turned up a few links. I promise I didn’t read before I drafted mine – but if you enjoyed this little musing, you might also like these:

  • In 2005 there was a fun little blog post here on the same concept, which the author framed as “base-2 birthdays”. The post also highlights the marketing value to Big Greeting Card of writing the years in binary format (e.g. calling them your 1st, 10th, 100th, and 1000th birthdays instead of your 1st, 2nd, 4th, and 8th.).
  • A 2020 issue of Communications of the ACM included the playful article “Birthday Bit Boundaries”, from a more computer science-oriented angle.

[1] You will note the use of “too”, “to”, and “two” in this sentence. 🙂

The WSL file that was supersizing my UrBackup incremental backups

I was doing a backup of my data from my Windows 11 desktop to an UrBackup server on my new NAS.

I was getting ~200GB incremental backups on a desktop with about 500GB in total use. I was 100% sure I was not making 200GB of changes on the system – not over the course of a few hours barely touching the computer.

I also knew from past experience it was a WSL problem, but my $searchEngine incantations were failing me. I am a Linux sysadmin who occasionally uses Windows with power user (but not system administrator) proficiency – or, more to the point, I was out of my depth on this. I did some poking and thought I could maybe save someone else some trouble by sharing my conclusion.

I figured out the filesystem is seemingly saved as a vhdx file in my AppData. I’ve adjusted this a bit for anonymity of the underlying path but the following is the pattern:

C:\path\to\Users\my_username\AppData\Local\Packages\MyOSName\LocalState\ext4.vhdx

This would seem to be because the local state of the WSL filesystem is (from the point of view of Windows) saved as this single giant file, which is then backed up via UrBackup every single time any change of any kind is made within WSL. This is not ideal for incremental backups.

The fix at that point was simple – in the UrBackup client settings, I excluded the relevant path (actually decided for some more reasons to exclude all the AppData, but you could just do the LocalState directory at the local-most level). That reduced the next incremental backup to a few tens of megabytes, much more tractable and plausible for the actual changes. That’s a decrease of four orders of magnitude, nothing to sneeze at.

I’ve got some rsync backups of the WSL content I care about now. All set, hope this is helpful.

CASC meeting for spring 2024

Last week I had the pleasure of attending the spring 2024 conference of the Coalition for Academic Scientific Computation (CASC) in Washington, DC. I was especially treated to attend (in-person) the Cyberinfrastructure Leadership Academy (CILA) 2024 the day before CASC. [1]

It’s an opportunity to learn about the state of research computing at academic institutions at the U.S. today. Along with SC, it’s also a chance to see in-person a lot of people I mostly encounter over email or as boxes in Zoom meetings.

One memorable event was a talk by “HPC Dan” Reed, incorporating information from this blog post and a lot more (in 2024, it wouldn’t do to ignore LLM’s). This preceded the release of the Indicators report by the National Science Board, which he presented at the White House on the 13th.

[1] Enough acronyms yet?

Pride reflection

June is Pride Month. I have a few thoughts.

Pride street painting, Boulder, CO, 2021

To me, celebrating pride is about celebrating different modes of pursuing happiness. More to the point, it’s about the breaking of arbitrary expectations for gender presentation, identity, and expression. That includes the right to fall in love with someone of the same sex, but it goes well beyond that.

I’m gay, so I’m very much a participant in this month’s celebrations. I’m also a cis male and to outward appearances basically gender-conforming. That’s neither a good thing nor a bad thing – just where I’ve landed. But I like the idea that others enjoy the freedom to be otherwise, that if I felt compelled to change or redefine some aspect of my identity or presentation tomorrow I could, and that the realm of personal freedom keeps expanding.

The opposition is loud and destructive, and it’s reached a fever pitch in the last few years. Transgender people in particular are the targets du jour. I see conservatives trying to drive a wedge between gay/bi and trans people. I see Republicans attacking Pride Month merchandise in stores, shuttering programs promoting diversity, and banning LGBTQ books. Worse, they’re isolating queer kids and queer families in school. They’re making it harder for people to just live as they see fit without doing a bit of harm to anyone else.

In the face of this, my fellow queer people make me proud. These are people living happy, interesting, loving, fulfilling lives despite intimidation and scapegoating. This community gives me hope for the future when it sometimes feels in short supply.

It’s inspiring, and not just in theory and not just for each person individually. We truly have accomplished a lot for the improvement of our society. On a scale of decades, and with plenty of setbacks, America has become more accepting of the wide variety of people who live here. If we (and now I’m including straight folks) can empathize with each other and make a bit of room for other people’s differences, we can continue on that path. To me, that’s what all those rainbow flags and parades are about: celebrating where we’ve been, looking forward to how much better we still have to do.

Happy Pride. 🏳‍🌈🏳️‍⚧️

The author, windswept, smiling at Seyðisfjörður, Iceland, chapel by rainbow cobblestones, 2021

Fun with the Slurm reservation MAINT and REPLACE flags

Note: as of 23.02.1 at least, Slurm no longer exhibits this behavior:

scontrol: error: REPLACE and REPLACE_DOWN flags cannot be used with STATIC_ALLOC or MAINT flags

Preserving this post for posterity to know why that’s a very good idea.

***

Late this fine Saturday morning I noticed the work Slack blowing up. Uh-oh.

Turns out that earlier in the week I had introduced an issue with our compile functionality, which rests on the logic of Slurm reservations. It’s now fixed, and I wanted to share what we learned in the event that it can help admins at other HPC centers who encounter similar issues.

See, on CU Boulder Research Computing (CURC)’s HPC system Alpine, we have a floating reservation for two nodes to allow users to access a shell on a compute node to compile code, with minimal waiting. Any two standard compute nodes are eligible for the reservation, and we use the Slurm replace flag to exchange the nodes over time as new nodes become idle.

But on Saturday morning we observed several bad behaviors:

  • The reservation, acompile, had the maint flag.
  • Nodes that went through acompile ended up in a MAINTENANCE state that, upon their release, rendered them unusable for users for standard batch jobs.
  • Because nodes rotate in and out, Slurm was considering more and more nodes to be unavailable.
  • A member of our team attempted to solve the issue by setting flags=replace on the reservation. This seemed to solve the issue briefly but it quickly resurfaced.

I think I have a sense of the proximate cause and an explainer, and I also think I know the underlying cause and possible fixes.

Proximate cause: Slurm reservations (at least as of version 22.05.2) are conservative with how they update the maint flag. To use this example, to remove from a reservation with flags=maint,replace, it’s not sufficient to say flags=replace  – the flag must be explicitly removed, with something like flags-=maint.

Allow me to demonstrate.

This command creates a reservation with flags=maint,replace :

$ scontrol create reservation reservationName=craig_example users=crea5307 nodeCnt=1 starttime=now duration=infinite flags=maint,replace
Reservation created: craig_example

Slurm creates it as expected:

$ scontrol show res craig_example
ReservationName=craig_example StartTime=2023-04-08T11:58:44 EndTime=2024-04-07T11:58:44 Duration=365-00:00:00
   Nodes=c3cpu-a9-u1-2 NodeCnt=1 CoreCnt=64 Features=(null) PartitionName=amilan Flags=MAINT,REPLACE
   TRES=cpu=64
   Users=crea5307 Groups=(null) Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
   MaxStartDelay=(null)

We (attempt to) update the reservation using flags=replace. The intention is to have replace be the only flag. This would seem to be the logical behavior.

$ scontrol update reservation reservationName=craig_example flags=replace
Reservation updated.

However, despite an apparently satisfying output message, this fails to achieve our goal. The maint flag remains:

$ scontrol show res craig_example
ReservationName=craig_example StartTime=2023-04-08T11:58:44 EndTime=2024-04-07T11:58:44 Duration=365-00:00:00
   Nodes=c3cpu-a9-u1-2 NodeCnt=1 CoreCnt=64 Features=(null) PartitionName=amilan Flags=MAINT,REPLACE
   TRES=cpu=64
   Users=crea5307 Groups=(null) Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
   MaxStartDelay=(null)

Then, using minus-equals, we actually remove the maint flag:

$ scontrol update reservation reservationName=craig_example flags-=maint
Reservation updated.

Lo, the flag is gone:

$ scontrol show res craig_example
ReservationName=craig_example StartTime=2023-04-08T11:58:44 EndTime=2024-04-07T11:58:44 Duration=365-00:00:00
   Nodes=c3cpu-a9-u1-2 NodeCnt=1 CoreCnt=64 Features=(null) PartitionName=amilan Flags=REPLACE
   TRES=cpu=64
   Users=crea5307 Groups=(null) Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
   MaxStartDelay=(null)

Based on this example, which replicates the behavior we observed, it very much appears to me that proper removal of the maint flag was the proximate cause of our problem. I’ve made this exact mistake in other contexts before, so I at least had a sense of this already.

That’s all well and good, but the proximate cause is not really what we care about. It’s more important how we got to this point. As it happens, the underlying cause is that the maint flag was set on acompile in the first place. I’ll describe why I did so initially and what we will do differently in the future.

An important fact is that Slurm (sensibly) does not want two reservations scheduled for the same nodes at the same time unless you, the admin, are REAL SURE you want that. The maint flag is one of the only two documented ways to create overlapping reservations. We use this flag all the time for its prime intended purpose, reserving the system for scheduled maintenance. So far so good.

However, at our last planned maintenance (PM), on April 5, we had several fixes to make to our ongoing reservations including acompile. For simplicity’s sake, we chose to delete and rebuild them according to our improved designs, rather than updating them in-place. When I first attempted the rebuild step with acompile, I was blocked creating it because of the existing (maint) reservations, so I added that flag to my scontrol create reservation command. From my bash history:

# failed
  325  alpine scontrol create reservation ReservationName=acompile StartTime=now Duration=infinite NodeCnt=2 PartitionName=acompile Users=-root Flags=REPLACE

# succeeded
  329  alpine scontrol create reservation ReservationName=acompile StartTime=now Duration=infinite NodeCnt=2 PartitionName=acompile Users=-root Flags=REPLACE,MAINT

What I had not realized was that, by setting the maint flag in acompile and never removing it, I was leaving every node that cycled through acompile in the MAINTENANCE state – hence the issues above.

I can imagine other possible solutions to this issue – scontrol reconfigure or systemctl restart slurmctld may have helped, though I don’t like to start there. In any case, I think what I’ve described did reveal an issue in how I rebuilt this particular reservation.

For the future I see a few followup steps:

  1. Document this information (see: this blog post, internal docs).
  2. Revisit the overlap flag for Slurm reservations, which in my experience is a little trickier than maint but may prevent this issue if we implement it right.
  3. Add reservation config checks as a late step in post-PM testing, perhaps the last thing to do before sending the all-clear message.

This was definitely a mistake on my part, but I (and my team) learned from it. I wrote this post to share the lesson, and I hope it helps if you’re an HPC admin who encounters similar behavior.

Alpine updates and RMACC 2022

This week I had the opportunity to speak at the 2022 RMACC Symposium, hosted by my own institution, about the Alpine supercomputer. My presentation and the others from my CU colleagues are available here.

In summary, Alpine has been in production since our launch event in May. After some supply chain issues (the same that have affected the entire computing sector), we are preparing to bring another round of nodes online within weeks. That will put Alpine’s total available resources (about 16,000 cores) on par with those of the retiring Summit system. It’s an exciting step for us at CURC.

As for RMACC: I’ve never attended the symposium before. After three days, I came away with a lot of new information, new contacts, and ideas for how to support our researchers better. A few topics in particular I paid attention to:

  • Better and more scalable methods of deploying HPC systems and software
  • How the community will navigate the transition from XSEDE to ACCESS
  • The companies, organizations, and universities (like mine!) building the future of this space
  • Changes in business models for the vendors and commercial developers we work with

Academic HPC is a small niche in the computing world, and gatherings like this can be valuable as spaces to connect and share our best ideas.

New supercomputer just dropped

server cabinets in the aisle of a data center

Today marks the launch of CU Boulder’s shiny new research supercomputer, Alpine. Text of the university press release:

The celebratory event signals the official launch of CU Boulder’s third-generation high performance computing infrastructure, which is provisioned and available to campus researchers immediately.

On May 18, numerous leaders from on- and off-campus will gather to celebrate, introduce and officially launch the campus’s new high-performance computing infrastructure, dubbed “Alpine.”

Alpine replaces “RMACC Summit,” the previous infrastructure, which has been in use since 2017. Comparable to systems now in use at top peer institutions across the country, Alpine will improve upon RMACC Summit by providing cutting-edge hardware that enhances traditional High Performance Computing workloads, enables Artificial Intelligence/Machine Learning workloads, and provides user-friendly access through tools such as Open OnDemand.

“Alpine is a modular system designed to meet the growing and rapidly evolving needs of our researchers,” said Assistant Vice Chancellor and Director of Research Computing Shelley Knuth. “Alpine addresses our users’ requests for faster compute and more robust options for machine learning.”

Notable among the technical specifications that will make Alpine an invaluable tool in research computing for researchers, industry partners and others, Alpine boasts: 3rd generation AMD EPYC CPUs, which provide enhanced energy efficiency per cycle compared to the Intel Xeon E5-2680 CPUs on RMACC Summit; Nvidia A100 GPUs; AMD MI100 GPUs; HDR InfiniBand; and 25 Gb Ethernet.

The kick-off event on May 18 will celebrate the Alpine infrastructure being fully operational and allow the community to enjoy a 20-minute tour, including snacks, an introduction to Research Computing, and a tour of the supercomputer container. The opportunity is open to the public and free of charge, and CU Boulder Research Computing staff will be on site to answer questions. CU Boulder Chief Information Officer Marin Stanek, Chief Operating Officer Patrick O’Rourke, and Acting Vice Chancellor for Research and Innovation Massimo Ruzzene will offer remarks at 1:30 p.m.

In addition to the main launch event, Research Computing is offering a full slate of training and informational events the week of May 16—20.

Researchers seeking to use Research Computing resources, which includes not only the Alpine supercomputer, but also large scale data storage, cloud computing and secure research computing, are invited to visit the Research Computing website to learn about more training offerings, the community discussion forum, office hours and general contact information.

Alpine is funded by the Financial Futures strategic initiative.

This is the biggest project I have ever worked on. It was in the works months before I arrived but has consumed most of my professional time since September. It’s exciting that we can finally welcome our researchers to use it.

What’s next

Some personal news… 🙂

I have been at Earlham College for almost seven years, including my time as a student and as CS faculty. Today is my last day there.

It’s been an incredible place to grow as a person, deepen my skills, collaborate with talented people from all walks of life, and try to make the world a little bit better. I’ve seen a few generations of the community cycle through and watched us withstand everything up to and including a literal pandemic. I capped it with the trip of a lifetime, spending a month doing research in Iceland – on a project I hope to continue working on in the future.

To the Earlham Computer Science community in particular I owe a big thanks. I have had a supportive environment in which to learn and grow for virtually the entirety of those years. The value they’ve added to my life can’t be quantified. I am deeply grateful.

What’s next?

I am elated to announce that in mid-September I will go to work as a Research Computing HPC Cluster Administrator at the University of Colorado Boulder! I’m excited to take the skills I’ve built at Earlham and apply them at the scale of CU Boulder. Thanks to the many people who’ve helped make this opportunity possible.

Highlights of an amazing trip

Today is the last day most of us are in Iceland for this trip. As I started this post, we were completing a tour of the Golden Circle after a few days in beautiful Reyjkavik. Now we are preparing for departure.

Our view of the volcano

I wanted to post some of the highlights of our trip. There’s a rough order to them, but don’t take the numbering too seriously – it’s been a great experience all-around. Without further ado:

  1. The volcano is truly incredible. It was not uncommon for people to spontaneously shout “Wow!” and “Oh my god!” as the lava burst up from the ground.
  2. We woke up every day for a few weeks with a view of a fjord.
  3. We did a glacier hike on Sólheimajökull, with two awesome guides.
  4. This was a historically successful round of data collection, both on the drone side and on the biology side. We’ll write and share a lot more about this in the next few months.
  5. We shared space with the group of phenomenal students from the University of Glasgow. We also collaborated with them on multiple occasions, learning a lot about different ways to study wildlife and local sites.
  6. THE FOOD – you probably don’t associate Iceland with food culture (I certainly didn’t), but our meals were delicious.
  7. The architecture and decorations are so distinctly Icelandic.
  8. Amazing photography and video – in high quality and high quantity.
  9. Walking along the boundary between the North American and European plates.
  10. Guided tour from our Skalanes hosts – who incidentally are awesome people – of a stretch of eastern Iceland.
Getting the rundown about glaciers at Solo

Some of my personal honorable mentions include:

  • Trail running at Skalanes is breathtaking.
  • Blue glacier ice is real neat.
  • The National Museum of Iceland is fascinating and well-done.
  • Rainbow roads in both Seyðisfjörður and Reykjavik highlight what a welcoming place this country is – also perfect reminders of Pride Month in the U.S.!
  • My first-in-my-lifetime tour of a beautiful country happened alongside people I admire who teach me things every single day. What more could I ask for?
A drone photo of the coast by the fjord

If you haven’t already, check out this interview with Charlie and Emmett, conducted by Cincinnati Public Radio.

Davit and Tamara flying

In addition to our success this year, we’ve also set up some great new opportunities for future years. With our long-time friend and collaborator Rannveig Þórhallsdóttir, we’ve added the cemetery in Seyðisfjörður to our list of sites to survey. We believe there may be historically-significant artifacts to be found there, and our drone work lends itself well to finding out.

The fjord at Skalanes

Finally, here’s the trip by the numbers:

  • 7 Earlhamites
  • 26 days
  • 183 GB of initial drone images and initial assemblies
  • 2 great hosts at Skalanes
  • 6 outstanding co-dwellers
  • 4 guides at 2 sites
  • 1 perfect dog
  • N angry terns
  • 1 amazing experience
Admiring the view

And that’s a wrap. Hope to see you again soon, Iceland!

Cross-posted at the Earlham Field Science blog.