Splitting the work

From Wikibase Personal data
Revision as of 13:45, 26 April 2019 by 192.168.0.1 (talk)
Jump to navigation Jump to search

There are many options for storing data and code in the Wikibase system:

  • as data entering the item/property paradigm;
  • as templates in the Template: namespace;
  • as module in the Md: namespace, i.e. using a Lua script;
  • as widget in the Widget: namespace;
  • as Javascript in the user script/common.js paradigm (User:/MediaWiki:).

The widget namespace probably only makes sense for the operation of the website itself.

A lot of components can be and should be reused from Wikidata. At least for templates, modules, widgets, and scripts, those that make sense to reuse probably have to do with the operation of the website itself as well. For items and properties, the reuse might be broader: a database of companies is useful, for instance, and certainly Wikidata contains a start.

Otherwise, there are factors to take into account to distribute the work:

  • access control -- or lack thereof -- can be decided with namespace granularity (considering that specific items can be protected). This is very important in also enabling outside contributions as data that become quickly very useful, or possibly for users to design their own new interfaces;
  • some data will remain user-side, and needs to be semantically aligned. Therefore it makes sense to maintain some of the information within the database rather than the code;
  • some data will be written by technical contributors, who will not write code but data.

A concrete example can illustrate the last two points: the system might want to store in the user's personal data store that the "_ga" cookie on the website "example.com" had a given value. Alongside, the system should also store, that the email address of the user is "joe@schmo.com". Both are attributes of the individual user, so they should be stored in the same way. The best way to achieve this is to have one item for "email address", and one for "_ga cookie on example.com". We call this principle "internet architecture as semantic data": what is at the level of ecosystem architecture is what is not personal data, and should follow semantic principles. In this way individual level data can be clearly identified and manipulated, in a privacy preserving way.

In general, the code base, distributed over many different languages and modes of operation, should be seen as a toolkit enabling others to craft their own experiments.