Difference between revisions of "WebXray domain ownership list"

From Wikibase Personal data
Jump to navigation Jump to search
Line 1: Line 1:
 
==What the database contains==
 
==What the database contains==
 +
 +
The list is a JSON file. Each entry in the list has the following fields:
 +
 +
id: a numeric identifier (integer) for the entry, this will change whenever the list is expanded and reindexed, do not count on it remaining stable
 +
 +
parent_id: if the entity has a parent owner, the id of the parent
 +
 +
owner_name: a string which is the name of the service (eg. 'Google Analytics') or the company ('Google') which owns the domain
 +
 +
aliases: an array of strings representing possible alternate spellings of the owner_name (eg. 'YouTube' and 'You Tube')
 +
 +
homepage_url: a string which is the url of the homepage of the service or company
 +
 +
privacy_policy_url: a string which is the url of the privacy policy of the service or company
 +
 +
notes: a string which has pertinent information as to why a domain was assigned to a given owner
 +
 +
country: the ccTLD for the country in which the service or company is based
 +
 +
uses: what a first-party uses the service for, note that first-party use may be different than the ultimate third-party use. For example, a site may use audience measurement tools from a third-party to gain insights into traffic, but the third-party may use this data for marketing.
 +
 +
platforms: where the domain has been observed, so far 'web', 'mobile', and 'email'
 +
 +
domains: an array of domian names (strings) which are owned by the given service or company
  
 
==Why it is important==
 
==Why it is important==

Revision as of 10:52, 31 July 2019

What the database contains

The list is a JSON file. Each entry in the list has the following fields:

id: a numeric identifier (integer) for the entry, this will change whenever the list is expanded and reindexed, do not count on it remaining stable

parent_id: if the entity has a parent owner, the id of the parent

owner_name: a string which is the name of the service (eg. 'Google Analytics') or the company ('Google') which owns the domain

aliases: an array of strings representing possible alternate spellings of the owner_name (eg. 'YouTube' and 'You Tube')

homepage_url: a string which is the url of the homepage of the service or company

privacy_policy_url: a string which is the url of the privacy policy of the service or company

notes: a string which has pertinent information as to why a domain was assigned to a given owner

country: the ccTLD for the country in which the service or company is based

uses: what a first-party uses the service for, note that first-party use may be different than the ultimate third-party use. For example, a site may use audience measurement tools from a third-party to gain insights into traffic, but the third-party may use this data for marketing.

platforms: where the domain has been observed, so far 'web', 'mobile', and 'email'

domains: an array of domian names (strings) which are owned by the given service or company

Why it is important

  • adtech sucks!
  • third party tracking is the worst

Why it is important to build it collaboratively

  • can do SARs, and build the whole pipeline of support around that
  • can do visualisations, maybe not for the general public but at least for super users of PDIO to better understand progress

What needs to be done for PDIO

  • define format for the adtech entries
  • complexify the SAR tool in order to leverage third party situations

Wishlist WebXRay side

  • ???