Simple battery testing for common use cases of an Android application

In developing the Android app we use to collect field data, I recently completed a series of tests. We wanted to determine how worried we should be about the Android “Location” tab’s judgment that Field Day is a “High battery usage” app. We wanted to avoid installing and running any additional apps to do so.

Protocol

The simple (if slow) protocol we developed to meet those requirements goes something like this. It’s best when run on multiple, comparable devices at the same time.

  1. Fully charge the device.
  2. Restart the device.
  3. Close all applications other than the app you’re testing.
  4. Disable sleep, screen turn-off – any battery-saving features that apply to your phone but do not fit the use case of Field Day when we actually run it.
  5. Do any setup tasks – in our case, linking to a remote database and an Arduino sensor platform to collect data for us.
  6. Run it for hours not minutes. In our case this was convenient because our app and platform can run without much user interference and still produce valid data.
  7. Close the app.
  8. Immediately go to the phone settings and find battery info using the built-in Android battery analyzing tools.
    1. Check “Battery” (possibly a submenu) as well as “Location”.
    2. Write down the numbers someplace as data.

I did my day of tests using my cheap second-hand Samsung phone and a Nexus tablet. I connected a Bluetooth sensor platform and collected data from it once every five seconds for five hours (this is of course automated in the app). I kept the screens turned on most of the time.

Findings

You’ll observe below that different Android versions (including the extra stuff a manufacturer installs) cause apps and services to report battery usage somewhat differently. My phone, for example, doesn’t list Field Day itself, but does list screen and battery usage, which are (nearly) 100% attributable to Field Day because of the constraints we imposed on our device uses in the early steps in the protocol. (The Nexus did list Field Day, conveniently.)

Craig’s phone

  • 64% remaining after 5 hours
  • Bluetooth (97%)
    • Time on 5h7m35s
    • CPU total 19 sec
    • Stay awake 1 sec
    • Computed power usage 38232 mAh
  • Screen (2%)
    • Time on 4h 4m 20s
    • Computed power usage 1059 may
    • (adaptive display, mid-to-low brightness)

Nexus

  • 40% remaining after 5 hours
  • Screen 22%
    • Time on 5h 5m 0s
    • Computed power usage 753 mAh
    • (adaptive display, max brightness)
  • Field Day 9%
    • CPU Total 6m 19s
    • CPU Foreground 6m 9s
    • Keep awake 3m 25s
    • GPS 5h 2m 23s
    • WiFi packets received 19
    • WiFi packets sent 30
    • Computed power use 317 mAh

In both cases, the “Location” menu continues to report Field Day as “high battery usage” as this issue reports. In practice, battery usage appears to be what we might expect: bright screens and Bluetooth make your battery work more (“I’m shocked, shocked…”).

This test wasn’t the only one we’ve run, but it was the most systematic. It also tested a relatively power-greedy case – normally, we do not keep the screen on at all when we collect a data stream.

Next steps

I’m going to do another round of tests tomorrow, with the screens off to check the more common usage case. However, based on current observations, the app’s battery usage is in line with what I would expect.

There are pieces of this process we should refine – for example, we should have simply controlled for screen brightness rather than let it be such a dominant player in the first place. But after some discussion, we are confident enough to conclude that our battery usage conforms with what should be expected from an app that heavily uses Bluetooth, GPS, cellular, and/or WiFi. We expect and hope that tomorrow’s screen-off test will confirm this.

A -> AB -> B

I was reading a recent Rachel By The Bay post in my RSS reader and this struck me:

Some items from my “reliability list”

It should not be surprising that patterns start to emerge after you’ve dealt with enough failures in a given domain. I’ve had an informal list bouncing around inside my head for years. Now and then, something new to me will pop up, and that’ll mesh up with some other recollections, and sometimes that yields another entry.

Item: Rollbacks need to be possible

This one sounds simple until you realize someone’s violated it. It means, in short: if you’re on version 20, and then start pushing version 21, and for some reason can’t go back to version 20, you’ve failed. You took some shortcut, or forgot about going from A to AB to B, or did break-before-make, or any other number of things.

That paragraph struck me because I’m about one week removed from making that very mistake.

Until last week, we’d been running a ten-year-old version of the pfSense firewall software on a ten-year-old server (32-bit architecture CPU! in a server!). I made a firewall upgrade one of our top summer priorities.

The problem was that I got in a hurry. We tried to upgrade without taking careful enough notes about how to reset to our previous configuration. We combined that with years’ worth of lost knowledge about the interoperability of the Computer Science Department’s subnets with the Earlham ITS network. That produced a couple of days of downtime and added stress.

We talked with ITS. We did research. I sat in a server room till late at night. Ultimately we reverted back to the old firewall, allowing our mail and other queues to be processed while we figured out what went wrong in the new system.

The day after that we started our second attempt. We set up and configured the new one alongside the old, checking and double-checking every network setting. Then we simply swapped network cables. It was almost laughably anticlimactic.

In short, attempting to move directly from A to B generated hours of downtime, but when we went from A to AB, and then from AB to B, it was mere seconds.

We learned a lot from the experience:

  1. The A->AB->B pattern
  2. ECCS and ITS now understand our network connections much more deeply than we did three weeks ago.
  3. Said network knowledge is distributed across students, staff, and faculty.
  4. We were vindicated in our wisest decision: trying this in July, when only a handful of people had a day-to-day dependence on our network and we had time to recover.

A more big-picture lesson is this: We in tech often want to get something done real fast, and it’s all too easy to conflate that with getting it done in a hurry. If you’re working on something like this, take some time to plan a little bit in advance. Make sure to allow yourself an A->AB->B path. A little work upfront can save you a lot later.

Or, as one mentor of mine has put it in the context of software development:

Days of debugging can save you from hours of design!

GitLab

I have a GitHub account, but most of my coding activity is on Earlham CS’s GitLab instance. That includes most of the code I write for use in production, as well as much of the commentary and communication about it.

Today, for example, I submitted a merge request for some work on the Field Day application, one of my favorite projects and one of ECCS’s most successful multi-year collaborative creations.

Software development like this is only one part of my job, and as you’ll see by the squares on my profile I don’t get to do it every single day. But it’s quite rewarding when I do get to dedicate some time to it.

Moon code

I’m one institution removed from the person who – with her team – developed the software that took us to the moon fifty years ago.

NASA photo

Margaret Hamilton is an Earlham alum (class of 1958) who wrote much of the code for the Apollo missions, coined the term “software engineering”, and a lot more. If you’re uninitiated, check out her Wikipedia page.

I’d note that she’s had an incredible career since the photo was taken, so I mean to use it only for its connection to the events we’re celebrating today, not as an encapsulation of an entire career or life.

If you’re interested, the Apollo 11 guidance code is here. You’ll see Hamilton in the “Attribution” section.

Happy Moon Landing Day, friends.

Fixing mail as a troubleshooting case study

We recently upgraded our firewall, and after much ado we’re in good shape again with regard to network traffic and basic security. The most recent bit of cleanup was that our mail stack wasn’t working off-campus. This post is the text of the message I sent to the students in the sysadmin group after fixing it today. I’ve anonymized it as best I can but otherwise left it unaltered.

tl;dr the firewall rule allowing DNS lookups on the CS subnet allowed only TCP requests, not TCP/UDP. Now it allows both.

Admins, here’s how I deduced this problem:

  • Using a VPN, I connected to an off-campus network. (VPN’s as a privacy instrument are overrated, but they’re a handy tool as a sysadmin for other reasons.)
  • I verified what $concernedParty observed, that mail was down when I was on that network and thus apparently not on-campus.
  • I checked whether other services were also unavailable. While pinging cs dot earlham dot edu worked, nothing else seemed to (Jupyter was down, website down, etc.)
  • I tried pinging and ssh-ing tools via IP address instead of FQDN. That worked. That made me think of DNS.
  • I checked the firewall rules, carefully. I observed that our other subnet, the cluster subnet, had a DNS pass rule that was set to allow both TCP and UDP traffic, so I tried ssh’ing to cluster (by FQDN, not IP address) and found that it worked.
  • I noticed that, strangely, the firewall rule allowing DNS lookups on the CS subnet via our DNS server allowed only TCP connections, not TCP/UDP. (I say “strange” not because it didn’t use both protocols but because, of the two, it accepted TCP instead of DNS’s more common protocol of choice, UDP.)
  • I updated the appropriate firewall rule to allow both TCP and UDP.
  • It seemed to work so I sent a followup message to $concernedParty. And now here we are.

This approach – searching for patterns to understand the scope of the problem, followed by narrowing down to a few specific options, and making small changes to minimize external consequences – has often served me well in both my sysadmin work and my work developing software.

Learn the rules. And their exceptions.

As I learn more about how to develop software and manage systems, I’m struck that there are many things we learn at some point – only to discover a pile of exceptions later.

A good example: one of the first things you learn as a coder, either self-taught or in an intro course in college, is to make every function small, readable, easily-testable, do as little as possible, and return a result predictably.

So now we know how to code. We just write a lot of small functions and duct tape them together call them in a sensible order.

Of course, life’s not that simple.

Henry Neeman in Supercomputing in Plain English notes that in high-performance computing (HPC) applications, you want a function to do as much as possible so you can stuff it into a loop, which many compilers can optimize nicely:

How many of you love loops? Hate loops? Do not have an emotional reaction to loops? Okay I want you to love loops.

[In many programming contexts,] we write little routines on little pieces of data because it’s easier to debug and – after all – bugs are bad, right? We don’t want a lot of bugs.

The problem with that approach is then you don’t have enough for the compiler to chew on and the hardware to chew on, and so you get terrible performance.

So the way you want to write code to get good performance is literally the exact opposite of what we were taught when we were wee lads and lasses in our first programming course. You want to write big routines on big chunks of data.

It’s still probably best, in many cases, to develop small bits of code that are easily tested and iterated. (Most applications are not high-performance computing applications.) But it’s not always the right idea.

The takeaway for me is that domain knowledge is as important for building software as good intuition about coding best practices, precisely because “best practices” are downstream of the type of problem you’re trying to solve.

A summer in upgrade mode

Much like facilities and maintenance, CS/IT work is often best done when services are most lightly used. At a college, that’s in the summer. For that reason we spent May and June performing maintenance and upgrades.

Our biggest achievement is the near-complete rebuild of two of our computing clusters. They’re modest, and both are a few years old, but we gave one a complete OS upgrade and reconfigured the other with a new head node that will let us better use the systems we already have.

We’re running CentOS 7 on all nodes of the newly-upgraded cluster (up from CentOS 5!), and to configure it we’re implementing an instance of Ansible. It’s similar to the c3 tools we’ve previously run on our three clusters, but it’s vastly more powerful. We’re all learning its vocabulary and some new syntactical sugar, but it’s already paying returns in efficiency of time and labor.

In addition to those upgrades:

  • We’ve racked and de-racked some servers, a task largely implemented by the students who should get the experience.
  • Several bits of software that were long overdue for updates finally got attention.
  • We’re ready to upgrade to a new server to host our firewall, also long overdue.

Other than sysadmin work, I lent support to the Icelandic field studies group while they were on-site for a few weeks. Development on Field Day tends to slow once the crew returns, but I found it quite enjoyable and fulfilling to build the app, so I hope to have the chance to continue developing it (alongside some other thickets of code I’ve wandered into).

Finally, to my great relief, an annual scheduled power outage didn’t induce downtime, let alone the hard crash of last year. That’s thanks to some fixes we made to hardware and software in the wake of the last incident, one I hope no one in this department ever repeats.

It’s a successful first half of the summer. I’ve been supervising all of it but we wouldn’t be half as far without the hard work of the summer admin students. I continue to be optimistic that we’re setting ourselves up for some interesting things in the next academic year.

Phase 2 for the Admins

I entered my current position last June. Before me, it hadn’t existed for a year or so. That gap gave me a lot to do immediately as faculty supervisor of the sysadmins: upgrades, problem-solving, server migrations, deprecation of old hardware, etc., all on top of maintaining existing services. The admins spent a lot of time on those issues, to good effect.

Starting my second year in the role, I’m basically satisfied that we’ve finished that work. Maintenance and tracking will keep us up to speed for a while.

We’re now exploring options for how to make better use of the great resources we have. I see opportunities for growth in three areas:

  • Virtualization, containers, and other cloud-style features
  • High-performance computing (HPC) and specifically scientific computing, a longstanding strength of ours
  • Security

We’re working out the specifics as a group over the next few months, but I’m pretty excited about what we can accomplish at this point.

T minus 2 hours 24 minutes

The Iceland crew departs at 1500 today.

They’ve been sent away with a lot of field gear (which I don’t know much about), along with a laptop stuffed with a virtual machine (which I know quite a lot about) running a loose collection of services they depend on when gathering scientific data in the field but don’t have Internet – like when they’re out on a big damn glacier.

In theory, it’s a great use of virtualization. In practice, I’m confident about it, but we’ll soon have a lot more clarity. At the least, we’ll gather a lot of new data about the plus/minuses of the approach.

Ground control for the Iceland expedition

I’m not going to Iceland with the Earlham field science people this week –

For context, every year, the Earlham CS Department plays a major role in the Icelandic field studies program, in which a group of students go out and gather data on/near a glacier in Iceland for about three weeks. They do multidisciplinary research that equips students to work on big problems in areas spanning climate change, genome sequencing, and more. Over the years they’ve developed the data model, protocols, plans, schedules, and educational process to make it as effective – and inclusive of newcomers – as possible.

– but I will be around Earlham for the three weeks they’re gone, and a lot of what I do right now revolves around supporting their efforts.

My contributions fall into two major categories. One is helping set up and configure a computational environment that can support their needs when they’re on a glacier with minimal access to the global Internet. That includes a de-facto centralized source code lab, support for OpenDroneMap, and fully-functional databases, all managed in a virtual server.

My other area of focus is – to me – the more intellectually and creatively fulfilling of the two: I’ve become a contributor to the software stack they use to gather their datasets. There are several components, and I (along with dozens of other people) have worked on all of them:

  • Arduino platforms with sensors to collect the samples – samples of elevation, or air quality, or soil…
  • a PostgreSQL database on the server side to store the data according to the data model we’ve developed
  • an Android app to provide the interface between the platforms, database, and user
  • a web interface to display data points on a map for QA purposes

Working on these earned me the informal title, bestowed by a colleague, of “Full-Stack Field Science Developer”.

To me, this is one of the gems of Earlham College. I’m going to share more about it here as we continue working on it.