Understanding Worldwide Private Information Collection on Android
Submitted by NortonLifeLock
Data has become the commodity that sustains much of the digital ecosystem. As smart devices, especially smartphones, become more central in our daily life, mobile phones are turned into reliable sources of rich information about us (e.g., where you go, what activities you do, etc.).
Most mobile apps request access to some sort of information about you and obtain certain permissions on the device you are using. In most cases, information is shared and device permissions are enabled with your explicit consent. Once the consent is given, however, it is impractical for users to recall which app collects what information, not to mention tracing the location the information is transmitted to and the actors who may further process, use, and control the data collected.
Therefore, it remains very challenging to obtain a comprehensive view of the information collected by those mobile apps. For instance, it is natural to ask questions like how many apps on my smartphone collect private information, what kind of private information these apps collect, which company processes and stores my private information, etc. In a study we conducted together with researchers from Boston University, we investigate 22 categories of information that may affect the user’s privacy. We list them in Table 1. Our goal is to address the above questions and understand worldwide private information collection on Android phones by analyzing the flows of information (i.e., which app collects what information to which domain) generated by 2.1M unique apps installed by 17.3M users over 21 months between 2018 and 2019.
Is Private Information Collection Pervasive in Mobile Apps?
It is now a common practice that the apps installed on your smartphone request information about you and the device (e.g., your name, your email address, your location information) before you can use them. We try to understand if private information collection is pervasive in mobile apps.
By analyzing our dataset, we discover that, on average, a mobile app sends private information to 2 unique domains. We also observe that over 57.6K apps (installed on 12.8M devices collectively) collect at least 5 unique categories of private information and send them to at least 5 unique domains. Our findings confirm that private information collection in mobile apps is universal and diversified at the same time, highlighting the need for additional security and privacy layer on the device.
Who collects and processes private information?
We further analyze who ultimately obtains and processes the information collected by the mobile apps. We leverage our patented technology to uncover the ownership of the domains to which the private information was transmitted. These domains were then ranked by the fraction of devices they collect private information from. Figure 1 depicts the top 25 data processors and controllers. These data processors and controllers accumulate private information from 13.9M devices. Notably, 2 out of 3 devices would have their information collected by either Facebook or Alphabet. Figure 2 depicts the top 12 types of private information collected by the global top 20 domains. We observe that the companies behind these domains consistently collect four types of private information from the users - device, sim card, location, and settings information. Such information enables them to track the users more systematically.
GDPR and its impact on private information flow
The European Union’s (EU) General Data Protection Regulation (GDPR) entered into effect on May 25th, 2018. The implementation of GDPR did not substantially change the flow of personal data originating from EU countries to countries outside the EU, see Figure 3. Our observations of these data flows show that confinement within the EU is low. Germany and Ireland are the only two European countries that host a reasonable portion of private information originated from Europe while the United States dominates the private information collection in the EU.
Why do you see intrusive ads?
Potentially harmful applications (PHAs) are apps that could put users, user data, or devices at risk (e.g., trojan, spyware, etc.). We identify 1.2M PHAs were installed on 3.8M devices. We uncover that 116K PHAs (installed on 393K devices) collect operator information and 63K PHAs (installed on 280K devices) also collect running app information on a global scale. As we can see in Figure 4, such aggressive private information collection behavior enables adversaries to better profile the users and may lead to some intrusive monetization actions. For example, we also uncover that 590K devices with PHAs presence are affected by notification bar ads (i.e., ads are displayed as app notifications) and 317k devices suffer from short-cut ads (i.e., targeted ads are placed on the home screen).
Implications to the research community and the policymakers
Our findings highlight a number of challenges faced by the research community when studying private information collection on Android. We show that looking at device penetration is critical to observe the distribution of information collection actors in the wild. we also hope that our study will encourage policymakers to think critically about how private information is used by and shared among the companies and how accountability and customer choice can be truly guaranteed.
Implications to the consumers
Protecting your private information can help reduce your risk of identity theft. We have the following recommendations for users who want to take more control over their privacy on their mobile devices.
NortonLifeLock Inc. (NASDAQ: NLOK) is a global leader in consumer Cyber Safety. We are dedicated to helping secure the devices, identities, online privacy, and home and family needs of nearly 50 million consumers, providing them with a trusted ally in a complex digital world.
More from NortonLifeLock