Difference between revisions of "WebXray domain ownership list"
Line 1: | Line 1: | ||
==What the database contains== | ==What the database contains== | ||
+ | |||
+ | The list is a JSON file. Each entry in the list has the following fields: | ||
+ | |||
+ | id: a numeric identifier (integer) for the entry, this will change whenever the list is expanded and reindexed, do not count on it remaining stable | ||
+ | |||
+ | parent_id: if the entity has a parent owner, the id of the parent | ||
+ | |||
+ | owner_name: a string which is the name of the service (eg. 'Google Analytics') or the company ('Google') which owns the domain | ||
+ | |||
+ | aliases: an array of strings representing possible alternate spellings of the owner_name (eg. 'YouTube' and 'You Tube') | ||
+ | |||
+ | homepage_url: a string which is the url of the homepage of the service or company | ||
+ | |||
+ | privacy_policy_url: a string which is the url of the privacy policy of the service or company | ||
+ | |||
+ | notes: a string which has pertinent information as to why a domain was assigned to a given owner | ||
+ | |||
+ | country: the ccTLD for the country in which the service or company is based | ||
+ | |||
+ | uses: what a first-party uses the service for, note that first-party use may be different than the ultimate third-party use. For example, a site may use audience measurement tools from a third-party to gain insights into traffic, but the third-party may use this data for marketing. | ||
+ | |||
+ | platforms: where the domain has been observed, so far 'web', 'mobile', and 'email' | ||
+ | |||
+ | domains: an array of domian names (strings) which are owned by the given service or company | ||
==Why it is important== | ==Why it is important== |
Revision as of 10:52, 31 July 2019
What the database contains
The list is a JSON file. Each entry in the list has the following fields:
id: a numeric identifier (integer) for the entry, this will change whenever the list is expanded and reindexed, do not count on it remaining stable
parent_id: if the entity has a parent owner, the id of the parent
owner_name: a string which is the name of the service (eg. 'Google Analytics') or the company ('Google') which owns the domain
aliases: an array of strings representing possible alternate spellings of the owner_name (eg. 'YouTube' and 'You Tube')
homepage_url: a string which is the url of the homepage of the service or company
privacy_policy_url: a string which is the url of the privacy policy of the service or company
notes: a string which has pertinent information as to why a domain was assigned to a given owner
country: the ccTLD for the country in which the service or company is based
uses: what a first-party uses the service for, note that first-party use may be different than the ultimate third-party use. For example, a site may use audience measurement tools from a third-party to gain insights into traffic, but the third-party may use this data for marketing.
platforms: where the domain has been observed, so far 'web', 'mobile', and 'email'
domains: an array of domian names (strings) which are owned by the given service or company
Why it is important
- adtech sucks!
- third party tracking is the worst
Why it is important to build it collaboratively
- can do SARs, and build the whole pipeline of support around that
- can do visualisations, maybe not for the general public but at least for super users of PDIO to better understand progress
What needs to be done for PDIO
- define format for the adtech entries
- complexify the SAR tool in order to leverage third party situations
Wishlist WebXRay side
- ???