WebXray domain ownership list
The database is hosted at Github.
What the database contains
The list is a JSON file. Each entry in the list has the following fields:
id: a numeric identifier (integer) for the entry, this will change whenever the list is expanded and reindexed, do not count on it remaining stable
parent_id: if the entity has a parent owner, the id of the parent
owner_name: a string which is the name of the service (eg. 'Google Analytics') or the company ('Google') which owns the domain
aliases: an array of strings representing possible alternate spellings of the owner_name (eg. 'YouTube' and 'You Tube')
homepage_url: a string which is the url of the homepage of the service or company
privacy_policy_url: a string which is the url of the privacy policy of the service or company
notes: a string which has pertinent information as to why a domain was assigned to a given owner
country: the ccTLD for the country in which the service or company is based
uses: what a first-party uses the service for, note that first-party use may be different than the ultimate third-party use. For example, a site may use audience measurement tools from a third-party to gain insights into traffic, but the third-party may use this data for marketing.
platforms: where the domain has been observed, so far 'web', 'mobile', and 'email'
domains: an array of domian names (strings) which are owned by the given service or company
Why it is important
- adtech sucks!
- third party tracking is the worst
Why it is important to build it collaboratively
- can do SARs, and build the whole pipeline of support around that
- can do visualisations, maybe not for the general public but at least for super users of PDIO to better understand progress
What needs to be done for PDIO
- define format for the adtech entries
- complexify the SAR tool in order to leverage third party situations
Wishlist WebXRay side
- ???