1. Background
In 2022, Internet Safety Labs (ISL) conducted an extensive benchmark of EdTech apps used in schools across the United States. We sampled 13 schools in each state and the District of Colombia and identified 1,722 unique apps which were in use In K12 schools. During the benchmark, the apps were evaluated scored on their behaviors related to safety. As part of the safety evaluation, SDKs in each app were identified and researchers collected network traffic for 1,357 apps. In total, there were 275 unique SDKs in the apps, and 8,168 unique subdomains, 3,211 unique domains from the network traffic.
A key research question in conducting the 2022 EdTech benchmark was to determine how accurate SDKs were as a proxy for actual third-party data sharing, since network traffic data collection is somewhat labor-intensive. This report shares the results of the analysis.
2. Analysis
The basis of the analysis was to compare the “expected” third parties as based on the company owners of the SDKs with the observed companies in the network traffic. This required identifying the owner companies for both the SDKs and all the subdomains observed in the aggregate network traffic.1
Researchers first identified which SDKs were in use in apps by using AppFigures as a resource. In total, 275 SDKs unique SDKs were found in use across all apps. Next, researchers identified the companies who published these SDKs. For each app, the number of unique company owners of SDKs found in each app is referred to as the “expected” number of companies to receive data.
Next, researchers performed a similar analysis on the subdomains observed in the network traffic (1,175 total apps). Each subdomain was resolved to an “owner” company. Subdomains were identified from HTTP POST/GET requests captured in the network traffic.
We then performed two quantitative analyses: (1) we examined the network traffic of apps with at least one SDK (n=1,083 apps), and (2) we examined the network traffic of apps with no SDKs (n=92 apps).
2.1 Apps With at Least One SDK
Apps with at least one SDK communicated with an average of 10.1 companies based on observed network traffic (Table 1).
2.1.1 “Expected” Companies in Network Traffic
In apps with at least one SDK, there were an average of 4.7 unique companies represented by the SDKs–thus, 4.7 “expected” companies to receive data. However, on average, only 1.7 (or 36.2%) of the “expected” companies were seen in the network traffic of apps with at least one SDK (Table 1).
Note that there are several contributing factors that could account for this, including:
- The manual testing performed by the researchers was unstructured and therefore had inconsistencies across researchers.
- The manual testing didn’t perform all functions in the app. For instance, the tested did not make any optional purchases or upgrading to a premium version.
Average Expected Companies | Average Expected Companies Seen | Average # Unexpected Companies Seen | Average Total # of Companies Seen | |
Webview – With (n=609) | 5.0 | 1.9 | 12.6 | 14.5 |
Webview – Without (n=474) | 4.3 | 1.4 | 2.6 | 4.0 |
Advertisements – With (n=189) | 5.6 | 2.1 | 24.0 | 26.1 |
Advertisements – Without (n=894) | 4.5 | 1.6 | 5.0 | 6.6 |
Behavioral Advertisements – With (n=105) | 5.4 | 2.1 | 33.7 | 35.8 |
Behavioral Advertisements – Without (n=978) | 4.6 | 1.6 | 5.5 | 7.1 |
ALL Tested Apps With 1+ SDK (n=1083) | 4.7 | 1.7 | 8.4 | 10.1 |
2.1.2 “Unexpected” Companies in Network Traffic
Additionally, as seen in Table 1, these apps communicated with an average of 8.4 unexpected companies.
As expected, apps that used Webview2, had advertisements or behavioral ads all had even higher average numbers of unexpected companies, with apps with behavioral ads having the highest at 33.7 unexpected companies on average3. The ISL app score rubric regards the use of Webview and the inclusion of advertising as very high risks for K-12 students and the data in Table 1 reinforces the rubric.
-
- Apps with at least one SDK that use Webview had 2.6 times as many third parties as apps with at least one SDK that don’t use Webview.
- Apps with at least one SDK that include ads had 3.0 times as many third parties as apps with at least one SDK that don’t include ads.
- Apps with at least one SDK that include behavioral ads had 4.0 times as many third parties as apps with at least one SDK that don’t include behavioral ads.
2.2 Apps with No SDKs
There were 92 apps in the data set that had no SDKs and for which we had network traffic. Since these apps had no SDKs, there were no “expected” companies to receive data from the app [other than the app developer, of course].
Apps with no SDKs averaged 4.6 companies observed in network traffic—negligibly less than the average for apps with at least one SDK. However, for apps that use Webview, or include advertising or behavioral advertising, the average observed companies is markedly lower (Table 2).
-
- Apps with no SDKs that use Webview had 44.1% fewer observed companies.
- Apps with no SDKs that include advertising had 40.6% fewer observed companies.
- Apps with no SDKs that include behavioral advertising had 21.0% fewer observed companies.
Average # of Companies Seen | |
Webview – With (n=43) | 8.1 |
Webview – Without (n=49) | 1.6 |
Advertising – With (n=11) | 15.5 |
Advertising – Without (n=81) | 3.2 |
Behavioral Ads – With (n=4) | 28.3 |
Behavioral Ads – Without (n=88) | 3.6 |
All Tested Apps Without SDKs (n=92) | 4.6 |
3. Conclusion
3.1 SDKs as a Proxy for Third Party Sharing
As the data shows, SDKs aren’t a useful proxy for the actual number of third parties receiving data from the app. Moreover, apps that include ads or that use Webview will likely have significantly more third parties than apps without.
This means that viable measurement of third parties receiving data from apps requires testing and observation of network traffic. ISL used mostly manual methods for the collection of this data but automated methods would be extremely beneficial for ongoing and pervasive measuring of app third party sharing.
SDKs do provide value in identifying potential omissions in the manual testing process. Can we account for the specific SDKs that don’t appear in the network traffic? Did we miss a particular functional branch of the app that we should go back and test? Or might it be an indication of an error in the SDK database? So while SDKs don’t serve as a perfect indication of the third parties communicating with the app, they still provide valuable information, and as such, they will remain in our app safety labels (see https://appmicroscope.org/).
3.2 Validation of ISL App Scoring Rubric
As shown in section 2, use of Webview and the inclusion of advertising substantially increase user exposure to data sharing with more third parties. This finding reinforces the ISL app scoring rubric wherein the use of Webview and presence of advertising are indicators for very high risk.
4. Helpful Links
Footnotes:
- See the SDK Risk Dictionary and the Subdomain Risk Dictionary for details.
- Note: researchers determined the use of Webview manually, by observing third-party pages opening within the app. Thus, the presence of Webview as tagged in ISL’s AppMicroscope.org may not accurately assess Webview use for first-party web pages.
- It would be interesting to study how many apps have behavioral ads and don’t use Webview.