Difference between revisions of "WebXray domain ownership list"

From Wikibase Personal data
Jump to navigation Jump to search
Line 8: Line 8:
 
id: a numeric identifier (integer) for the entry, this will change whenever the list is expanded and reindexed, do not count on it remaining stable
 
id: a numeric identifier (integer) for the entry, this will change whenever the list is expanded and reindexed, do not count on it remaining stable
  
parent_id: if the entity has a parent owner, the id of the parent
+
parent_id ({{P|123}}): if the entity has a parent owner, the id of the parent
  
 
owner_name: a string which is the name of the service (eg. 'Google Analytics') or the company ('Google') which owns the domain
 
owner_name: a string which is the name of the service (eg. 'Google Analytics') or the company ('Google') which owns the domain

Revision as of 07:51, 2 August 2019

Drawn from the webXray project, this list provides a hierarchical accounting of what entities own commonly found third-party domains on the web.

The database is hosted at Github.

What the database contains

The list is a JSON file. Each entry in the list has the following fields:

id: a numeric identifier (integer) for the entry, this will change whenever the list is expanded and reindexed, do not count on it remaining stable

parent_id (parent (P123)): if the entity has a parent owner, the id of the parent

owner_name: a string which is the name of the service (eg. 'Google Analytics') or the company ('Google') which owns the domain

aliases: an array of strings representing possible alternate spellings of the owner_name (eg. 'YouTube' and 'You Tube')

homepage_url: a string which is the url of the homepage of the service or company

privacy_policy_url: a string which is the url of the privacy policy of the service or company

notes: a string which has pertinent information as to why a domain was assigned to a given owner

country: the ccTLD for the country in which the service or company is based

uses: what a first-party uses the service for, note that first-party use may be different than the ultimate third-party use. For example, a site may use audience measurement tools from a third-party to gain insights into traffic, but the third-party may use this data for marketing.

platforms: where the domain has been observed, so far 'web', 'mobile', and 'email'

domains: an array of domian names (strings) which are owned by the given service or company

Why it is important

  • The domain ownership list needs to be populated to offer an index of simply where personal data is being sent, and where the legal ownership of the domains rests.
  • Without these fields, the filing of SARs is far more time consuming, and mass inspections of websites are barely intelligible.

The third party tracking ecosystem is currently free to operate and monetise individuals personal data without transparency. The domain ownership list is a step to addressing this.

Why it is important to build it collaboratively

  • can do SARs, and build the whole pipeline of support around that
  • can do visualisations, maybe not for the general public but at least for super users of PDIO to better understand progress

What needs to be done for PDIO

  • define format for the adtech entries
  • complexify the SAR tool in order to leverage third party situations

Wishlist WebXRay side

  • Framework for contributions and easy format for submission.
  • contributions!