Why do you even need the IMEI?
The International Mobile Equipment Identity, or IMEI, is a special number that is tied to every unique mobile phone. It is used whenever the phone is registered on a mobile network. It can also be used to blacklist a phone off of the network, which happens, for example, when it is reported as stolen. Blacklisting keeps phones off the network even if the SIM card or telephone number changes. For it to be effective for these purposes, the IMEI needs to be immutable, which it generally is: the IMEI is implemented in hardware. In fact, in the UK, the Mobile Telephone Reprogramming Act makes it illegal to change the IMEI or even to possess equipment to do it! That means that there’s a legal process to change your name, but not your phone’s IMEI. As a side effect of this, the IMEI can be used as a persistent identifier for tracking purposes.
Android’s IMEI Protections
Android recognizes the importance of protecting the IMEI. In order for apps to access it, they must be granted the READ_PHONE_STATE permission (see figure above); only apps with that permission are allowed to access the IMEI. The problem is that this permission is a bit vague, and has a wide variety of other uses: the READ_PHONE_STATE permission not only gives full details about the phone numbers, cellular network connections, and of course the IMEI, but also things like what phone number is calling you or you are calling. Surprisingly, a large proportion of apps request access to it (though don’t necessarily transmit it): as of this year, 22,708 of 92,326 unique apps (i.e., nearly a quarter of all apps that we’ve tested…which is staggering).
Just because an app has the READ_PHONE_STATE permission, however, doesn’t mean it actually needs to read the IMEI or send it to third parties. (Indeed, it’s a challenge to think of any reason to ever send the IMEI to advertisers that isn’t a little bit creepy.) In fact, Google’s policies explicitly prohibit the IMEI from being used for advertising purposes. So are apps doing this in practice? Looking at 92,326 different apps that we’ve tested at the time that we measured this, we found that about 5.7% (5,306 apps) were transmitting the IMEI. However, when accounting for app popularity, there is roughly a 1 in 6 chance of installing an app that exfiltrates the IMEI. The three most popular destinations for a phone’s IMEI are Unity (1360 apps), AppsFlyer (651 apps) and Tapjoy (713 apps), the latter two explicitly being advertising companies (and the former having an advertising division), which makes their behavior hard to reconcile with Google’s policies.
We found a total of 142,323 packets sending the IMEI across all of our testing. Of those, 91,438 sent it unencrypted over port 80! More than half of the IMEI-sending apps (2,996) send the IMEI to at least one domain unencrypted, and this includes Unity, AppsFlyer, and Tapjoy. The developers are not taking simple, reasonable precautions to safeguard these transmissions of extremely persistent identifiers that they shouldn’t even be collecting in the first place. The popular destinations that do not require encryption are api.greedygame.com, alog.umeng.com, androidha.vascogames.com and sdk.stats-locations.com.
(Our automated tool didn’t catch that last one at first, because they did the clever trick of reversing the base64-encoded data before sending it. This domain appears to be associated with MobKnow, whose SDK has this domain hardcoded, and which bills itself as a way of monetizing apps “without ads”—presumably by harvesting and reselling sensitive user data. Curiously, 82 out of 118 of the apps that contact it are created by Tiny Lab Productions, a developer of kids’ games who we have previously written about, although this behavior has stopped in their apps’ most recent versions, since we wrote that article and they found themselves subject to litigation.)
These Vasco Games apps, however, weren’t sending the IMEI directly. Instead they sent the MD5 hash of the IMEI. A cryptographic hash, or digest, is a function that takes any message (like an IMEI) and gives you some random-looking output that is unique to that message. So a particular IMEI will always generate the same hash. The cryptographic part is that you can only compute the hash function in one direction, so you can’t take the random-looking output and figure out the message that hashes to it (i.e., you can’t determine the IMEI from the MD5 hash of it). The hash is something that is effectively unique for each value, but hides what it is. That is, a hashed persistent identifier is still a persistent identifier.
MD5 is a hash function, and 2,512 apps sent the MD5 hash of the IMEI compared to 3,278 apps that sent the IMEI outright (or with trivial alterations, such as Baidu reversing the IMEI string). Our previous numbers included apps sending hashes of the IMEI, for reasons that will become clear. Two other widely used hash functions are SHA1 and SHA256, but it turns out they aren’t as popular: 217 apps sent the SHA1 hash of the IMEI, whereas 36 sent the SHA256 hash. The following is an incredible example of a transmission made by an app (a Mattel children’s game that has since been removed from the Google Play Store) which sent all of them to api.geo.kontagent.net (Kontagent is now part of Upsight, a marketing company). We’ve added some line breaks to make it a bit clearer, but basically, it sent the following string:
So what’s happening here? Well, the first number is one of our phone’s IMEIs. The next are its MD5 twice, SHA1 twice, and SHA256 twice. This is one risk-averse exfiltrator! (Or they’re just too lazy to compute the hashes on the server.) But, still, why do they appear twice?
Well the next line is another hardware identifier, followed by its MD5, another number, SHA1, another number, SHA256, and another number. It turns out that these other numbers are the hashes of the upper case! So one is the hash of b07adb5ec8d1818c and the other is the hash of B07ADB5EC8D1818C! Take the hexadecimal and put it to uppercase, then hash that to be absolutely sure you’ve exfiltrated the identifiers and all possible hashed variations! So why do the IMEI’s hashes appear twice? Because the uppercase of a number isn’t any different! Numbers don’t have capitals, yet Kontagent’s SDK cluelessly hashes the upper and lower case version of a number and sends both. If you know how to read .smali (Dalvik assembly) files you can actually see it:
Now at this point you may think, “well, who cares about hashed values.” We already said that hashes can’t be reversed, so it’s not a big deal. The problem with this thinking is that first, if you think of the IMEI only in terms of being a dangerous persistent identifier (because it cannot be reset) then it’s already a big deal. For the purposes of advertisers tracking people across apps so they can profile them and do behavioural advertising: all that is needed is any identifier that is reasonably unique and that cannot be changed. For this purpose, it doesn’t matter if the phone’s IMEI or some function on the IMEI are used, both serve the same purpose, as they both embody these two properties.
But actually, the hashing in this case does nothing to protect the IMEI. It’s like hashing the password “password123” and saying no one can figure out the password from the hash. Hash functions can be easily reversed when there aren’t that many possibilities to begin with. The trick here is that if you have the hash, you just try all possible IMEIs (i.e., all 15-digit numbers), and see which one generates the hash you are looking at—there’s your IMEI! And there aren’t many possible IMEIs. Keep in mind that we are referring to what a computer can do, so trillions of trillions is still “not that many.” For example, commodity computer hardware can perform on the order of hundreds of millions of MD5 hashes per second, so computing every MD5-hashed IMEI would take a few days (but only needs to be computed once). However, as we show next, even less time is needed in practice, because there are actually many fewer than 1015 possible IMEIs.
So why are there not many IMEIs? Well, the IMEI is a 15-digit number. The first eight are the called the Type Allocation Code (TAC) and identify a phone manufacturing run. Since you may know that it’s an Android phone to begin with, there are only a few thousand possibilities for the first 8 numbers, which is basically nothing. You can find partial lists of TACs online.
The next 6 digits are the serial number. Assuming that any number is possible here—it’s truly random—then we only have a million possibilities. The last digit is a checksum, so given the rest of the IMEI, there’s only one possibility there.
Most of our phones have the same TAC. It took 13 seconds on a commodity Thinkpad laptop to compute the MD5 hash of all possible IMEIs with that TAC. The resulting file is 50 MiB, which you can keep handy to unhash new IMEIs instantly. Hashing it doesn’t make it any safer. More importantly, none of these apps need to be doing anything with the IMEI—it’s really just the phone company that should know it.