Give yourself the gift of quality control

If you spend any time at all in the tech chatter space, you have probably heard a lot of discontent about the quality of software these days. Just two examples:

I can’t do anything about the cultural, economic, and social environment that cultivates these issues. (So maybe I shouldn’t say anything at all? 🙂 )

I can say that, if you’re in a position to do something about it, you should treat yourself to quality control.

The case I’d like to briefly highlight is about our infrastructure rather than a software package, but I think this principle can be generalized.

Case study: bringing order to a data center

After a series of (related) service outages in the spring of 2020, shortly before the onset of the COVID-19 crisis, we cut back on some expansionary ambitions to get our house in order.

Here’s a sample, not even a comprehensive list, of the things we’ve fixed in the last couple of months:

  • updated every OS we run such that most of our systems will need only incremental upgrades for the next few years
  • transitioned to the Slurm scheduler for all of our clusters and compute nodes, which has already made it easier to track and troubleshoot batch jobs
  • modernized hardware across the board, including upgraded storage and network cards
  • retired unreliable nodes
  • implemented comprehensive monitoring and alerts
  • replaced our old LDAP server and map with a new one that will better suit our authentication needs across many current and future services
  • fixed the configuration of our Jupyterhub instances for efficiency

Notice: None of those are “let’s add a new server” or “let’s support 17 new software packages”. It’s all about improving the things we already supported.

There are a lot of institutional reasons our systems needed this work, primarily the shortage of staffing that affects a lot of small colleges. But from a pragmatic perspective, to me and to the student admins, these reasons don’t matter. What matters is that we were in a position to fix them.

By consciously choosing to do so, we think we’ve reduced future overhead and downtime risk substantially. Quantitatively, we’ve gone from a few dozen open issue tickets to 19 as of this writing. Six others are advancing rapidly.

How we did it and what’s next

I don’t have a dramatic reveal here. We just made the simple (if not always easy) decision to confront our issues and make quality a priority.

Time is an exhaustible, non-renewable resource. We decided to spend our time on making existing systems work much much better, rather than adding new features. This kind of focus can be boring, because of how strictly it blocks distractions, but the results speak for themselves.

After all that work, now we can pivot to the shiny new thing: installing, supporting, and using new software. We’ve been revving up support for virtual machines and containers for a long time. HPC continues to advance and discover new applications. The freedom to explore these domains will open up a lot of room for student and faculty research over time. It may also help as we prepare to move into our first full semester under COVID-19, which is likely to have (at minimum) a substantial remote component.

Some thoughts on moving from Torque to Slurm

This is more about the process than the feature set.

Torque moved out of open-source space a couple of years ago. This summer we are finally make the full shift to Slurm. I’m not going to trash the old thing here. Instead I want to celebrate the new thing and reflect on the process of installing it.

  1. I haven’t researched the progeny of Slurm as a project, but the UI seems engineered to make this shift easier. There are tables all over the Internet (including on our wiki!) of the Torque<->Slurm translations.
  2. Slurm’s accounting features were the trickiest part of this all to configure, but taking the time was worth it. Even at the testing stage, the sacct command’s output is super-informative.
  3. SchedMD’s documentation is among the best of any large piece of software I’ve worked with. If you’re doing this and you feel like you’re missing something, double-check their documents before flogging Stack Overflow etc.
  4. You can in fact do a single-server install as well as a cluster install. We did both, the latter in conjunction with Ansible. Neither is actually much more difficult than the other. That’s because the same three pieces of software (the controller, the database, and the worker daemon) have to run no matter the topology. It’s just that the worker runs on every compute node while the controller and database run only on the head node.
  5. We’ve been successful in using an A –> AB –> B approach to this transition. Right now we have both schedulers next to each other on each of these systems. That will remain the case for a few weeks, until we confirm we’ve done Slurm right.
  6. Schedulers have the most complicated build process of any piece of software I’ve worked with – except gcc, the building of which sometimes makes one want to walk into the ocean.
  7. Dependencies and related programs (e.g. your choice of email tool) are as much a complexity as the scheduler itself.
  8. From a branding perspective, Slurm managed to pull off an impressive feat. Its name is clear and distinctive in the software space, but a fun Easter egg if you have a certain geek pop culture interest/awareness.

This is has been successful up to now. We’ve soft-launched Slurm installs on our scientific computing servers. We should be all-Slurm when classes and researchers return.

How to enable a custom systemctl service unit without disabling SELinux

We do a lot of upgrades in the summer. This year we’re migrating from the Torque scheduler (which is no longer open-source) to the Slurm scheduler (which is). It’s a good learning experience in addition to being a systems improvement.

First I installed it on a cluster, successful. That took a while: It turns out schedulers have complicated installation processes with specific dependency chains. To save time in the future, I decided that I would attempt to automate the installation.

This has gone better than you might initially guess.

I threw my command history into a script, spun up a VM, and began iterating. After a bit of work, I’ve made the installation script work consistently with almost no direct user input.

Then I tried running it on another machine and ran headfirst into SELinux.

The problem

The installation itself went fine, but the OS displayed this message every time I tried to enable the Slurm control daemon:

[root@host system]# systemctl enable slurmctld.service
Failed to enable unit: Unit file slurmctld.service does not exist.

I double- and triple-checked that my file was in a directory that systemctl expected. After that I checked /var/log/messages and saw a bunch of errors like this …

type=AVC msg=audit(1589561958.124:5788): avc:  denied  { read } for  pid=1 comm="systemd" name="slurmctld.service" dev="dm-0" ino=34756852 scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:object_r:admin_home_t:s0 tclass=file permissive=0

… and this:

type=USER_END msg=audit(1589562370.317:5841): pid=3893 uid=0 auid=1000 ses=29 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:session_close grantors=pam_keyinit,pam_limits,pam_systemd,pam_unix acct="slurm" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'UID="root" AUID="[omitted]"

Then I ran ls -Z on the service file’s directory to check its SELinux context:

-rw-r--r--. 1 root root unconfined_u:object_r:admin_home_t:s0                  367 May 15 13:01 slurmctld.service
[...]
-rw-r--r--. 1 root root system_u:object_r:systemd_unit_file_t:s0               337 May 11  2019 smartd.service

Notice that the smartd file has a different context (system_u...) than does the slurmctld file (unconfined_u...). My inference was that the slurmctld file’s context was a (not-trusted) default, and that the solution was to make its context consistent with the context of the working systemctl unit files.

The solution

Here’s how to give the service file a new context in SELinux:

chcon system_u:object_r:systemd_unit_file_t:s0 slurmctld.service 

To see the appropriate security context, check ls -Z. Trust that more than my command, because your context may not match mine.

Concluding remarks

I am early-career and have done very little work with SELinux, so this is not a specialty of mine right now. As such, this may or may not be the best solution. But, mindful of some security advice, I think it is preferable to disabling SELinux altogether.

Between the files and the disks of your server

I recently took a painful and convoluted path to understanding management of disks in Linux. I wanted to post that here for my own reference, and maybe you will find it useful as well. Note that these commands should generally not be directly copy-pasted, and should be used advisedly after careful planning.

Let’s take a plunge into the ocean, shall we?

Filesystem

You’ve definitely seen this. This is the surface level, the very highest layer of abstraction. If you’re not a sysadmin, there’s a good chance this is the only layer you care about (on your phone, it’s likely you don’t even care about this one!).

The filesystem is where files are kept and managed. There are tools to mount either the underlying device (/dev/mapper or /dev/vgname) or the filesystem itself to mount points – for example, over NFS. You can also use the filesystem on a logical volume (see below) as the disk for a virtual machine.

This is where ext2, ext3, ext4, xfs, and more come in. This is not a post about filesystems (I don’t know enough about filesystems to credibly write that post) but they each have features and associated utilities. Most of our systems are ext4 but we have some older systems with ext2 and some systems with xfs.

Commands (vary by filesystem)

  • mount and umount; see /etc/fstab and /etc/exports
  • df -h can show you if your filesystem mount is crowded for storage
  • fsck (e2fscke4fsck, and similar are wrappers around fsck)
  • resize2fs /dev/lv/home 512G # resize a filesystem to be 512G, might accompany lvresize below
  • xfsdump/xfsrestore for XFS filesystems
  • mkfs /dev/lvmdata/device # make a filesystem on a device
  • fdisk -l isn’t technically a filesystem tool, but it operates at a high level of abstraction and you should be aware of it

LVM

Filesystems are made on top of underlying volumes in LVM, or “logical volume manager” – Linux’s partitioning system. (Actually manipulating LVM’s rather that passively using simple defaults is technically optional, but it’s widely used.)

LVM has three layers of abstraction within itself that each have utilities associated with them. This closely follows the abstraction patterns we’ve already seen in the layers below this one.

LVM logical volumes

A volume group can then be organized into logical volumes. The commands here are incredibly powerful and give you the ability to manage disk space with ease (we’re grading “easy” on a curve).

If you want to resize a filesystem, there’s a good chance you’ll want to follow up by resizing the volume underneath it.

Commands:

  • lvdisplay
  • lvscan
  • lvcreate -L 20G -n mylv myvg # create a 20GB LVM called mylv in group myvg
  • lvresize -L 520G /dev/lv/home # make the LVM on /dev/lv/home 520GB in size

LVM volume groups

A logical volume is created from devices/space within a volume group. It’s a collection of one or more LVM “physical” volumes (see below).

Commands:

  • vgscan
  • vgdisplay
  • pvmove /dev/mydevice # to get stuff off of a PV and move it to available free space elsewhere in the VG

LVM physical volumes

At the lowest LVM layer there are “physical” volumes. These might actually correspond to physical volumes (if you have no hardware RAID), or they might be other /dev objects in the OS (/dev/md127 would be a physical volume in this model).

These are the LVM analog to disk partitions.

Commands:

  • pvscan
  • pvdisplay

Software RAID (optional)

RAID is a system for data management on disk. There are both “hardware” and “software” implementations of RAID, and software is at a higher level of abstraction. It’s convenient for a (super-)user to manage. Our machines (like many) use mdadm, but there are other tools.

Commands:

  • mdadm --detail --scan
  • mdadm -D /dev/mdXYZ # details
  • mdadm -Q /dev/mdXYZ # short, human-readable
  • cat /proc/mdstat

Devices in the OS

“In UNIX, everything is a file.” In Linux that’s mostly true as well.

The /dev directory contains the files that correspond to each particular device detected by the OS. I found these useful mostly for reference, because everything refers to them in some way.

If you look closely, things like /dev/mapper/devicename are often symlinks (pointers) to other devices.

All the other layers provide you better abstractions and more powerful tools for working with devices. For that reason, you probably won’t do much with these directly.

(The astute will observe that /dev is a directory so we’ve leapt up the layers of abstraction here. True! However, it’s the best lens you as a user have on the things the OS detects in the lower layers.)

Also: dmesg. Use dmesg. It will help you.

Hardware RAID (optional)

If you use software RAID for convenience, you use hardware RAID for performance and information-hiding.

Hardware RAID presents the underlying drives to the OS at boot time by way of a RAID controller on the motherboard. At boot, you can access a tiny bit of software (with a GUI that’s probably older than me) to create and modify hardware RAID volumes. In other words, the RAID volume(s), not the physical drives, appear to you as a user.

At least some, and I presume most, RAID controllers have software that you can install on the operating system that will let you get a look at the physical disks that compose the logical volumes.

Relevant software at this level:

  • MegaCLI # we have a MegaRAID controller on the server in question
  • smartctl --scan
  • smartctl -a -d megaraid,15 /dev/bus/6 # substitute the identifying numbers from the scan command above
  • not much else – managing hardware RAID carefully requires a reboot; for this reason we tend to keep ours simple

Physical storage

We have reached the seafloor, where you have some drives – SSD’s, spinning disks, etc. Those drives are the very lowest level of abstraction: they are literal, physical machines. Because of this, we don’t tend to work with them directly except at installation and removal.

Summary and context

From highest to lowest layers of abstraction:

  1. filesystem
  2. LVM [lv > vg > pv]
  3. software RAID
  4. devices in the OS
  5. hardware RAID
  6. disks

The origin story of this blog post (also a wiki page, if you’re an Earlham CS sysadmin student!): necessity’s the mother of invention.

I supervise a sysadmin team. It’s a team of students who work part-time, so in practice I’m a player-coach.

In February, we experienced a disk failure that triggered protracted downtime on an important server. It was a topic I was unfamiliar with, so I did a lot of on-the-job training and research. I read probably dozens of blog posts about filesystems, but none used language that made sense to me in a coherent, unified, and specific way. I hope I’ve done so here, so that others can learn from my mistakes!

I didn’t know there was a full First Letters Capitalized list

While reading through the early days of a pandemic, I discovered that someone made a list of Eight Fallacies of Distributed Computing:

  • The network is reliable.
  • Latency is zero.
  • Bandwidth is infinite.
  • The network is secure.
  • Topology doesn’t change.
  • There is one administrator.
  • Transport cost is zero.
  • The network is homogeneous.

I’ve seen all those assumptions fail personally. That’s with only a few years of experience running small distributed systems.

The originating article/talk is here.

Christmas trees and trip cost vs item cost

When building software for large datasets or HPC workflows, we talk a lot about the trip cost versus the item cost.

The item cost is the expense (almost always measured in time) to run an operation on a single unit of data – one member of a set, for example. The trip cost is the total expense of running a series of operations on some subset (possibly the whole set) of the data. The trip cost incorporates overhead, so it’s not just N times the item cost.

This is a key reason that computers, algorithms, and data structures that support high-performance computing are so important: by analyzing as many items in one trip as is feasible, you can often minimize time wasted on unnecessary setup and teardown.

Thus trip cost versus item cost is an invaluable simplifying distinction. It can clarify how to can make many systems perform better.

Yes, Virginia, there is a trip cost

Christmas tree

Christmas trees provide a good and familiar example.

Let’s stipulate that you celebrate Christmas and that you have a tree. You’ve put up lights. Now you want to hang the ornaments.

The item cost for each of the ornaments is very small: unbox and hang the ornament. It takes a couple of seconds, max – not a lot, for humans. It also parallelizes extremely well, so everyone in the family gets to hang one or more ornaments.

The trip cost is at least an order of magnitude (minutes rather than seconds) more expensive, so you only want to do it once:

  • Find the ornament box
  • Bring the box into the same room as the tree
  • Open the box
  • Unbox and hang N ornaments
  • Close the box
  • Put the box back

Those overhead steps don’t parallelize well, either: we see no performance improvement and possibly a performance decline if two or more people try to move the box in and out of the room instead of just one.

It’s plain to see that you want to hang as many ornaments as possible before putting away the ornament box. This matches our intuition (“let’s decorate the tree” is treated as a discrete task typically completed all in one go), which is nice.

Whether Christmas is your holiday or not, I wish you the best as the year draws to a close.

Computing lessons from DNA analysis experiments

I’ve been working with my colleagues in Earlham’s Icelandic Field Science program on a workflow for DNA analysis, about which I hope to have other content to share later. (I’ve previously shared my work with them on the Field Day Android app.)

My focus has been heavily experimental and computational: run one workflow using one dataset, check the result, adjust a few “dials”, and run it again. When we’re successful, we can often automate the work through a series of scripts.

At the same time, we’ve been trying to get our new “phat node” working to handle jobs like this faster in the future.

Definitions vary by location, context, etc. but we define a “phat node” or “fat node” as a server with a very high ratio of (storage + RAM)/(CPU). In other words, we want to load a lot of data into RAM and plow through it on however many cores we have. A lot of the bioinformatics work we do lends itself to such a workflow.

All this work should ultimately redound to the research and educational benefit of the college.

It’s also been invaluable for me as a learning experience in software engineering and systems architecture. Here are a few of the deep patterns that experience illustrated most clearly to me:

  • Hardware is good: If you have more RAM and processing power, you can run a job in less time! Who knew?
  • Work locally: Locality is an important principle of computer science – basically, keep your data as close to your processing power as you can given system constraints. In this case, we got a 36% performance improvement just by moving data from NFS mounts to local storage.
  • Abstractions can get you far: To wit, define a variable once and reuse it. We have several related scripts that refer to the same files, for example, and for a while we had to update each script with every iteration to keep them consistent. We took a few hours to build and test a config file, which resolved a lot of silly errors like that. This doesn’t help time for any one job, but it vastly simplifies scaling and replicability.
  • Work just takes a while: The actual time Torque (our choice of scheduler) spends running our job is a small percentage of the overall time we spend shaping the problem:
    • buying and provisioning machines
    • learning the science
    • figuring out what questions to ask
    • consulting with colleagues
    • designing the workflow
    • developing the data dictionary
    • fiddling with configs
    • testing – over, and over, and over again
    • if running a job at a bigger supercomputing facility, you may also have to consider things like waiting for CPU cycles to become available; we are generally our systems’ only users, so this wasn’t a constraint for us

A lot of this is (for computer scientists, software engineers, etc.) common sense, but taking care to apply that common sense can be critical for doing big interesting work.

The punchline of it all? We managed to reduce the time – walltime, for fellow HPC geeks – required to run this example workflow from a little over 8 hours to 3.5 hours. Just as importantly we developed a bunch of new knowledge in the process. (I’ve said almost nothing here about microbiology, for example, and learning a snippet of that has been critical to this work.) That lays a strong foundation for the next several steps in this project.

If you read all this, here’s a nice picture of some trees as a token of my thanks (click for higher-resolution version):

Image of trees starting to show fall color
Relevance: a tree is a confirmed DNA-based organism.

A -> AB -> B

I was reading a recent Rachel By The Bay post in my RSS reader and this struck me:

Some items from my “reliability list”

It should not be surprising that patterns start to emerge after you’ve dealt with enough failures in a given domain. I’ve had an informal list bouncing around inside my head for years. Now and then, something new to me will pop up, and that’ll mesh up with some other recollections, and sometimes that yields another entry.

Item: Rollbacks need to be possible

This one sounds simple until you realize someone’s violated it. It means, in short: if you’re on version 20, and then start pushing version 21, and for some reason can’t go back to version 20, you’ve failed. You took some shortcut, or forgot about going from A to AB to B, or did break-before-make, or any other number of things.

That paragraph struck me because I’m about one week removed from making that very mistake.

Until last week, we’d been running a ten-year-old version of the pfSense firewall software on a ten-year-old server (32-bit architecture CPU! in a server!). I made a firewall upgrade one of our top summer priorities.

The problem was that I got in a hurry. We tried to upgrade without taking careful enough notes about how to reset to our previous configuration. We combined that with years’ worth of lost knowledge about the interoperability of the Computer Science Department’s subnets with the Earlham ITS network. That produced a couple of days of downtime and added stress.

We talked with ITS. We did research. I sat in a server room till late at night. Ultimately we reverted back to the old firewall, allowing our mail and other queues to be processed while we figured out what went wrong in the new system.

The day after that we started our second attempt. We set up and configured the new one alongside the old, checking and double-checking every network setting. Then we simply swapped network cables. It was almost laughably anticlimactic.

In short, attempting to move directly from A to B generated hours of downtime, but when we went from A to AB, and then from AB to B, it was mere seconds.

We learned a lot from the experience:

  1. The A->AB->B pattern
  2. ECCS and ITS now understand our network connections much more deeply than we did three weeks ago.
  3. Said network knowledge is distributed across students, staff, and faculty.
  4. We were vindicated in our wisest decision: trying this in July, when only a handful of people had a day-to-day dependence on our network and we had time to recover.

A more big-picture lesson is this: We in tech often want to get something done real fast, and it’s all too easy to conflate that with getting it done in a hurry. If you’re working on something like this, take some time to plan a little bit in advance. Make sure to allow yourself an A->AB->B path. A little work upfront can save you a lot later.

Or, as one mentor of mine has put it in the context of software development:

Days of debugging can save you from hours of design!

Fixing mail as a troubleshooting case study

We recently upgraded our firewall, and after much ado we’re in good shape again with regard to network traffic and basic security. The most recent bit of cleanup was that our mail stack wasn’t working off-campus. This post is the text of the message I sent to the students in the sysadmin group after fixing it today. I’ve anonymized it as best I can but otherwise left it unaltered.

tl;dr the firewall rule allowing DNS lookups on the CS subnet allowed only TCP requests, not TCP/UDP. Now it allows both.

Admins, here’s how I deduced this problem:

  • Using a VPN, I connected to an off-campus network. (VPN’s as a privacy instrument are overrated, but they’re a handy tool as a sysadmin for other reasons.)
  • I verified what $concernedParty observed, that mail was down when I was on that network and thus apparently not on-campus.
  • I checked whether other services were also unavailable. While pinging cs dot earlham dot edu worked, nothing else seemed to (Jupyter was down, website down, etc.)
  • I tried pinging and ssh-ing tools via IP address instead of FQDN. That worked. That made me think of DNS.
  • I checked the firewall rules, carefully. I observed that our other subnet, the cluster subnet, had a DNS pass rule that was set to allow both TCP and UDP traffic, so I tried ssh’ing to cluster (by FQDN, not IP address) and found that it worked.
  • I noticed that, strangely, the firewall rule allowing DNS lookups on the CS subnet via our DNS server allowed only TCP connections, not TCP/UDP. (I say “strange” not because it didn’t use both protocols but because, of the two, it accepted TCP instead of DNS’s more common protocol of choice, UDP.)
  • I updated the appropriate firewall rule to allow both TCP and UDP.
  • It seemed to work so I sent a followup message to $concernedParty. And now here we are.

This approach – searching for patterns to understand the scope of the problem, followed by narrowing down to a few specific options, and making small changes to minimize external consequences – has often served me well in both my sysadmin work and my work developing software.

Some reflections on guiding a student sysadmin team

How does a team of students administer a powerful data center for education and research at a small undergraduate liberal arts college?

Success at my job is largely dependent on how well I can answer that question and implement the answer.

Earlham CS, under the banner of the Applied Groups, has a single team of students running our servers:

The Systems Admin Group’s key functions include the maintenance of the physical machines used by the Earlham Computer Science Department. They handle both the hardware and software side of Earlham’s Computer Science systems. The students in the sysadmin group configure and manage the machines, computational clusters, and networks that are used in classes, for our research, and for use by other science departments at Earlham. 

The students in that group are supervised by me, with the invaluable cooperation of a tenured professor.

The students are talented, and they have a range of experience levels spanning from beginner to proficient. Every semester there’s some turnover because of time, interest, graduations, and more.

And students are wonderful and unpredictable. Some join with a specific passion in mind: “I want to learn about cybersecurity.” “I want to administer the bioinformatics software for the interdisciplinary CS-bio class and Icelandic field research.” Others have a vague sense that they’re interested in computing – maybe system administration but maybe not – but no specific focus yet. (In my experience the latter group is consistently larger.)

In addition to varieties of experience and interest, consider our relatively small labor force. To grossly oversimplify:

  • Say I put 20 hours of my week into sysadmin work, including meetings, projects, questions, and troubleshooting.
  • Assume a student works 8 hours per week, the minimum for a regular work-study position. We have a budget for 7 students. (I would certainly characterize us as two-pizza-compliant.)
  • There are other faculty who do some sysadmin work with us, but it’s not their only focus. Assume they put in 10 hours.
  • Ignore differences in scheduling during winter and summer breaks. Also ignore emergencies, which are rare but can consume more time.

That’s a total of 86 weekly person-hours to manage all our data, computation, networking, and sometimes power. That number itself limits the amount we can accomplish in a given week.

Because of all those factors, we have to make tradeoffs all the time:

  • interests versus needs
  • big valuable projects versus system stability/sustainability
  • work getting done versus documenting the work so future admins can learn it
  • innovation versus fundamentals
  • continuous service versus momentary unplanned disruption because someone actually had time this week to look at the problem and they made an error the first time

I’ve found ways to turn some of those tradeoffs into “both/and”, but that’s not always possible. When I have to make a decision, I tend to err on the side of education and letting the students learn, rather than getting it done immediately. The minor headache of today is a fair price to pay for student experience and a deepened knowledge base in the institution.

In some respects, this is less pressure than a traditional company or startup. The point is education, so failure is expected and almost always manageable. We’re not worried about reporting back to our shareholders with good quarterly earnings numbers. When something goes wrong, we have several layers of backups to prevent real disaster.

On the other hand, I am constantly turning the dials on my management and technical work to maximize for something – it’s just that instead of profit, that something is the educational mission of the college. Some of that is by teaching the admins directly, some is continuing our support for interesting applications like genome analysis, data visualization, web hosting, and image aggregation. If students aren’t learning, I’m doing something wrong.

In the big picture, what impresses me about the group I work with is that we have managed to install, configure, troubleshoot, upgrade, retire, protect, and maintain a somewhat complex computational environment with relatively few unplanned interruptions – and we’ve done it for quite a few years now. This is a system with certain obvious limitations, and I’m constantly learning to do my job better, but in aggregate I would consider it an ongoing success. And at a personal level, it’s deeply rewarding.