In October, as part of its Digital Platform Services Inquiry and ongoing efforts to protect consumers and promote competition, the Australian Competition and Consumer Commission (ACCC) published our report detailing the behaviors of 1,000 of the most popular apps available in the Australian Android app market, including 103 Health-related apps, 100 Kids’ apps, and 797 other popular apps (see the report for more information on how these apps were chosen).
Our objective was to observe data collection en masse to see a snapshot of what data is collected, how it is collected, and where it is sent. To do so, our team extended our testing infrastructure to Australia, adding another continent to the list of regions supported by our system, and provisioned modern devices with improved instrumentation. We also built an in-browser, remote-testing environment to support human app testers physically located elsewhere. This allows us to automate setup and data collection, while also allowing testers to interact with apps in order to explore as much app behavior as possible.
While you can find much more detail in our report on the ACCC website, here are a few highlights from the report, based on the behaviors we observed during the testing period and the analysis we performed. (Note: where relevant, we have provided extra references and material that may not have been mentioned in the report.)
- Of the recipients of user information that we were able to identify by observing data transmissions during the testing period, the top ten third-party parent companies were:
- Facebook (40% of apps tested)
- AppsFlyer (15%)
- Unity (12%)
- Google (11%)
- Adjust (11%)
- Twitter (8%)
- Verizon (8%)
- Branch (7%)
- Amazon (4%)
- Liftoff (4%)
- The top ten companies with which we observed apps communicating—with or without user information—were:
- Google (68%)
- Facebook (45%)
- AppsFlyer (16%)
- Unity (12%)
- Liftoff (11%)
- Adjust (11%)
- Twitter (11%)
- Verizon (9%)
- Oracle (9%)
- AppLovin (7%)
- The two most common types of user information we observed in our testing were transmitted far more frequently than others, and are commonly used to identify and track users for advertising and analytics purposes:
- Android Advertising ID (AAID): designed to enable advertising and analytics while providing some privacy control, in that it can be reset. It is easy to reset—if you know where to look—through a system-wide setting.
- Android ID: originally designed to uniquely identify a device for its lifetime, but rescoped to be more privacy-protective on Android 8 and above (in use by more than 75% of Android installs worldwide, as of September 2020, according to statcounter).
- Far fewer apps (12% Health, 7% Other, 2% Kids) collected and transmitted GPS location data, but several apps transmitted user information relating to the WiFi hotspot (SSID and BSSID) to which the testing devices were connected—a known surrogate for location data, since hotspots are usually stationary and public databases exist that map them to GPS coordinates. While the majority of the apps that collected location-related information had the relevant Android permission to do so, the CNN Breaking US & World News app was observed to have transmitted the WiFi router’s SSID, using a side-channel which does not invoke the operating system’s location permission check. (This issue has been fixed in Android 10, which is in use by roughly 38% of Android phones worldwide, according to statcounter as of Nov 2020.)
- Even though programmatic access to it has been prohibited since Android 6, a very small percentage of apps (1.9% Health, 4.5% Other, 1% Kids) was observed to have transmitted the MAC address of the test device’s WiFi radio—a unique and extremely persistent identifier. In most cases, this data was sent to Umeng, and because it was accessed via a Java vulnerability exploited by Umeng’s SDK, the app’s developers were likely unaware it was being accessed and transmitted at all.
- While Google’s policies restrict associating the AAID with other persistent identifiers in an attempt to preserve the privacy protections it provides, 32% of apps were observed to have transmitted other persistent identifiers alongside it, without explicitly requesting consent from the user. In the majority of cases, the additional identifier was the Android ID—an extremely common trend—but a handful of apps transmitted WiFi MAC addresses, IMEI numbers, and device serial numbers in conjunction with the AAID.
- Often, privacy considerations are enforced by invoking the notion of “consent,” in which users must effectively acknowledge and agree to the collection and use of their data. However, in practice, it is difficult to say that consent has been effectively acquired or adequately sought. We observed 18 apps transmitting the AAID within 1 second of the app being launched, and 19 apps transmitting other identifiers—such as GSF ID, email address, WiFi MAC, and Bluetooth name—within 5 seconds of opening the app. In other words, data is transmitted before a user can practically accept privacy terms. Some apps, such as Yoho Sports, show privacy-related dialogs before users can use the app, but transmit identifiers within seconds of launching, regardless of user input.
- Regarding the security of transmissions, 107 apps (5 Health, 5 Kids, 97 Other) transmitted user information without using TLS—the widely accepted (and trivially applied) standard for encrypting data in transit on the Internet. The vast majority of these insecure transmissions included the AAID, but 79 apps transmitted other identifiers to endpoints owned by companies such as Alibaba (11 apps), Amber Weather (6 apps), UnderArmour (3 apps), Verizon (3 apps), and comScore (2 apps). For example, Messenger for SMS, an app with 10,000,000 installs (according to Google Play), was observed to have transmitted the Android ID and WiFi MAC to Umeng (Alibaba) insecurely, effectively allowing any computers, routers, people, companies, and service providers involved in relaying traffic between the phone and the data recipients to read and record this data in large quantities.
- Of the 407 SDKs we identified embedded in the apps tested, the 10 most popular (in order of frequency) were:
- Google Mobile Services
- Google Gson
- Google Ads
- Google DoubleClick
- Google Play
- Facebook Login
- Google Mobile Services (91% of apps)
- Firebase (83%)
- Facebook (62%)
- Google Ads (55%)
- Google DoubleClick (48%)
- Crashlytics (46%)
- Facebook Login (40%)
- Facebook Share (38%)
- Facebook Analytics (36%)
- Facebook Ads (32%)
- Across all apps, READ_EXTERNAL_STORAGE and WRITE_EXTERNAL_STORAGE were the most commonly requested and used permissions, followed by ACCESS_FINE_LOCATION, and CAMERA. The first two govern access to the phone’s shared internal storage, which any app or user can access. This ability has many legitimate uses (e.g., accessing a photo library on the shared filesystem), however, this permission can be—and, in other situations, has previously been observed to have been—abused to exfiltrate data for which the relevant permissions are not granted. We observed that 54% of Health apps and 41% of Other apps requested the ACCESS_FINE_LOCATION permission, but only 19% and 18%, respectively, made use of it at runtime. The difference between the occurrence of requests and uses we observed suggests that either apps only use this data internally, they obscure it before transmitting it, they use a technique to transmit it that we haven’t yet uncovered, or that our testing failed to uncover the specific app functionality that makes use of it during the testing period.
- The specific permissions requested for each app varied between a handful and a hundred. For example, Parallel Space requested 102 permissions (21 of which are labeled “dangerous” by Google), which may be appropriate due to its complex functionality.
We hope this report sheds new light on the behaviors of apps in the Android ecosystem, and how they relate to user privacy and corporate data practices. It is, however, only a snapshot at one point in time, and more work needs to be done to track behaviors as they change in response to policy and feature changes. We continue to believe that putting information like this in the hands of consumers, developers, and regulators will have a positive impact on the privacy and security landscape.
For more details about the apps we examined, their specific behaviors, and more in-depth analysis, read our report published on the ACCC website.