iBeacon Bug: Hunting for a needle in a haystack

Envoy has already signed in more than 17 million visitors around the world. To help frequent visitors, we built Envoy Passport with BLE technology. When you approach the iPad app, it detects BLE signal and notifies the iPad you’re here, like this: This feature may look trivial, but it’s not! There was a bug here that […]

Envoy has already signed in more than 17 million visitors around the world. To help frequent visitors, we built Envoy Passport with BLE technology. When you approach the iPad app, it detects BLE signal and notifies the iPad you’re here, like this:

This feature may look trivial, but it’s not! There was a bug here that took us several months to track down. Today we are going to share the story of hunting the bug like finding a needle in a haystack.

It just works! Until it doesn’t 😅

After we launched the Passport app, some users reported that it occasionally didn’t work. The most difficult bugs to find aren’t those ones that happen all the time, but the ones that happen sporadically.

Besides that, there were even more factors that made this bug harder to find:

  • It involved real devices interacting with each other, and it couldn’t be reproduced on a simulator.
  • It was hard to tell if the iPad or iPhone app was causing the problem.
  • Because iBeacon is facilitated by iOS, we didn’t know whether it was a bug from iOS or our app.
  • The Passport app itself is very complex.

Given the massive scope and factors, this wasn’t an easy bug to fix. Here’s how we hunted the bug down step by step.

Round 1. Simplify

At the very beginning, we had no clue whether it was Apple’s bug or ours. Either way we realized that finding out was our first priority. If it’s Apple’s bug, we can only report and wait for them to fix.

Passport is not a simple app, it talks to an API server, gets data from it, caches it, it updates the UI, and responds to user interaction. There are too many factors involved. And just like what Sherlock’s author said

Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth

— Arthur Conan Doyle

To make it easier for us to find out the truth, we need to eliminate as much of the impossible as possible up front. Say if it’s really a bug in iOS, then we should be able to reproduce the glitch with only simple BLE functionality, all other functionalities are just noises. So the strategy we adopted here is to simplify.

We built very simple apps, one for iPad and one for iPhone. They do nothing but BLE signal broadcasting and monitoring. The iPad app has an “I’m here!” button in it like this

We set it up at the front desk and asked our co-workers to help. They install another simple app on their iPhone, and every day they come to work, they tap the button and select their name to sign-in. Now we have simple apps which do only BLE broadcasting and listening. Next, to know the truth, we need to collect data.

Round 2. Logs, logs, and more logs

As an engineer, I loved to watch a TV show when I was a kid — Air Crash Investigation

It’s a TV show about the investigation of airplane crashes. There are many invaluable engineering lessons to learn from a tragedy. One very important lesson I’ve learned from this TV show was: write logs.

When an airplane crashes and the black box gets lost, it’s very hard to find out what happened. Software also runs and crashes — that’s why we should keep logging what’s happening.

To know what’s going on with the BLE stack, We started to log relevant data for our investigation — like

  • Start broadcasting signals
  • Entering an iBeacon region
  • Exiting an iBeacon region
  • … so and so on

We could then read and investigate that data to see why the BLE was not working as expected.

Round 3. Centralize the log

To make it easier to investigate on a larger scale, we developed a simple API server that writes BLE activities event log into a database.

Now, we are in the big data business! With all the events available in a database, we can query and find exactly the pattern we are looking for.

Round 4. Suspect found, but not the murderer

With the BLE data, we soon quickly identify and confirm a bug in iOS BLE stack, we published the report here. We found out that iBeacon region stops monitoring after the device reboots sometimes. Unfortunately, this doesn’t look like the major root cause of the unstable BLE issue, since people don’t usually reboot their iPhone. We reported this bug to Apple, they fixed it and we kept looking.

To know why sometimes it works, or doesn’t work, we need more details, like

  • When did the user enter the building?
  • When do they take out their iPhon?
  • When do they tap the sign-in button?
  • How do they carry their iPhone
  • Do they put it in a metal box that blocks radiation signal or what?

To find out, we look into office security camera video footage.

After a long time of data collection and analytics, we came to a conclusion:

There are few iPhones seem to be malfunctioning, tend to not respond to iBeacon signals, but for almost all iPhone devices always work as expected. It may sound like we can easily jump into it’s the flaw of few devices as the final conclusion, but the problem is, some of our co-workers reported their passport is not working, and their device actually always works with the simple experimental app.

Hmmm, okay…. 🤔

Sorry, there is no conclusion yet. But at least for now, we can stop blaming Apple. We know there must be a bug in our code, or at least something different from the simple BLE app causes the malfunctioning.

Round 5. Sysdiagnose is here to help!

We contacted Apple for help. Apple asked us to install Bluetooth log profile and run sysdiagnose. We followed the instructions and found out that the sysdiagnose tool is super helpful for investigating the root cause of the bug, as it logs detailed events about Bluetooth activity in OS level.

To run sysdiagnose, visit this page, find the functionality you would like to debug, it’s Bluetooth for iOS for our case, we download the profile and install it on our iOS devices. Normally iOS won’t bother to log too many details, but this profile enables it. Next, use your iOS device as normal, and when encountering the problem, you long press

Volume Up + Volume Down + Power

and release. You will feel a short vibration, which means iOS started collecting sysdiagnose logs for you. Wait few minutes, then plug in your iOS device to your Mac, use iTunes to sync it, and you should be able to find the sysdiagnose log file at

~/Library/Logs/CrashReporter/MobileDevice/[Your_Device_Name]/DiagnosticLogs/sysdiagnose

Then you can open the “system_logs.logarchive” file with the Console app and you should be able to see the logging entries popping up

Once all logging entries are loaded, you can scroll to the time that specific issue occurred, and see if there is anything strange.

Caught in the act!

With the help of sysdiagnose, we finally found something promising. We realized that iOS was doing a great job in terms of seeing the iBeacon region.

After entering the iBeacon region, iOS should launch the Passport app and notify it about this event. However, instead of our app being notified, we found a dead body:

Yep, that’s our app and it crashed! After looking into the crashing report, we found that its error code is 0x8badf00d (bad food), we googled and found out it means the app takes too long time to launch and not responsive, thus end up getting killed by an iOS demand process call watchdog.

With that in mind, we look into how the app launches and realized that it actually takes a long time to load your previous sign-ins from RealmDB when you have many of them. That’s a pretty ironic finding, the more you use the app, the more likely it’s not going to work. Fortunately now we know and caught this serial killer in the act!

Solution

As the app got killed for taking a long time to launch for iBeacon event, we decided to make launch as lightweight as possible, we change the code a little bit by looking into “UIApplicationLaunchOptionsLocationKey” in “launchOptions”

- (BOOL)application:(UIApplication *)application 
didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
if (launchOptions[UIApplicationLaunchOptionsLocationKey]) {
// Do not load UI and any other heavy duty loading stuff
// Kick start iBeacon here
return YES;
}
// ...
}

So if the app is launched for a location event, then it only runs necessary code for iBeacon setup. Since we’ve fixed the problem, Passport now always works like a charm!

Takeaways

That’s a long story to tell, but actually, there are still some details not covered. To keep it short, here are some lessons we’ve learned:

  • If your app takes too long time to launch and not responding, it may get killed by watchdog, and of course, your app won’t see the iBeacon event
  • If your app launches in short time but didn’t setup iBeacon in time, it’s also possible your app also won’t see the iBeacon event
  • If your app happens to enter a region and another with the same UUID, it won’t be notified even the major or minor number is different, as iOS only cares about the region UUID when monitoring. And it only notifies your app when it enters or exits a region.
  • App running with TestFlight may take longer time to launch, as it seems iOS needs to talk to server to ensure the testing is still valid
  • Avoid heavy-duty loading when app is waking up for a location event
  • Setup iBeacon as soon as possible when the app is launched for a location event
  • Writing logs about software activity is a good practice, it helps a lot while debugging
  • Simplify the code as much as possible to reproduce a very hard to reproduce bug
  • Luke, use sysdiagnose!

Finally, hope you enjoy our bug hunting story. If you’re like us, feel excited about finding the root cause of bizarre bugs, strive to build great products and pursue it-just-works quality, Envoy is hiring!


iBeacon Bug: Hunting for a needle in a haystack was originally published in Envoy Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: Envoy