Difference between revisions of "WebXray domain ownership list"

From Wikibase Personal data
Jump to navigation Jump to search
Line 1: Line 1:
Drawn from the webXray project, this list provides a hierarchical accounting of what entities own commonly found third-party domains on the web.
+
Drawn from the [https://www.webxray.org webXray project], this list provides a hierarchical accounting of what entities own commonly found third-party domains on the web.
  
 
The database is hosted at [https://github.com/timlib/webXray_Domain_Owner_List Github].
 
The database is hosted at [https://github.com/timlib/webXray_Domain_Owner_List Github].

Revision as of 10:59, 31 July 2019

Drawn from the webXray project, this list provides a hierarchical accounting of what entities own commonly found third-party domains on the web.

The database is hosted at Github.

What the database contains

The list is a JSON file. Each entry in the list has the following fields:

id: a numeric identifier (integer) for the entry, this will change whenever the list is expanded and reindexed, do not count on it remaining stable

parent_id: if the entity has a parent owner, the id of the parent

owner_name: a string which is the name of the service (eg. 'Google Analytics') or the company ('Google') which owns the domain

aliases: an array of strings representing possible alternate spellings of the owner_name (eg. 'YouTube' and 'You Tube')

homepage_url: a string which is the url of the homepage of the service or company

privacy_policy_url: a string which is the url of the privacy policy of the service or company

notes: a string which has pertinent information as to why a domain was assigned to a given owner

country: the ccTLD for the country in which the service or company is based

uses: what a first-party uses the service for, note that first-party use may be different than the ultimate third-party use. For example, a site may use audience measurement tools from a third-party to gain insights into traffic, but the third-party may use this data for marketing.

platforms: where the domain has been observed, so far 'web', 'mobile', and 'email'

domains: an array of domian names (strings) which are owned by the given service or company

Why it is important

  • adtech sucks!
  • third party tracking is the worst

Why it is important to build it collaboratively

  • can do SARs, and build the whole pipeline of support around that
  • can do visualisations, maybe not for the general public but at least for super users of PDIO to better understand progress

What needs to be done for PDIO

  • define format for the adtech entries
  • complexify the SAR tool in order to leverage third party situations

Wishlist WebXRay side

  • Framework for contributions and easy format for submission.
  • contributions!