Difference between revisions of "Splitting the work"

From Wikibase Personal data
Jump to navigation Jump to search
Line 20: Line 20:
  
 
== Lessons from "telephone number" ==
 
== Lessons from "telephone number" ==
 +
We implemented a quick system for storing {{Q|488}} locally and interfacing with the server.
 +
 +
The perspective/goal has been refined: smooth distribution of data and computation across server and client.
 +
 +
=== Computing ===
 +
The server computing environment could be:
 +
* gadgets;
 +
* user scripts;
 +
* other?
 +
 +
The client computing environment could be:
 +
* the browser javascript environment,
 +
* a browser extension (in [https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Anatomy_of_a_WebExtension background page or content scripts])
 +
 +
The client itself can distribute its data however it pleases, as well as the computations it needs to perform. The server serves code that has to be run.
 +
 +
=== Static data ===
 +
For static data, we win the closest we are to storing RDF graphs. RDF graphs (i.e. collections of triples) have the advantage that they are directly interoperable at a technical level. Just append the two files. So when we store data, including in the browser, etc, we should be as close to that as possible.
 +
 +
=== Semantics ===
 +
Beyond mere storage of data, the semantics matter as well. For interoperability reasons (multiple servers eventually), it makes sense to get as far away from possible from the intricacies of Wikibase, and refer to concepts using external tags, possibly coming from well-known ontologies. This is true on server side, as well as on client side.
 +
 +
=== Conclusions ===
 +
*

Revision as of 08:24, 20 May 2019

There are many options for storing data and code in the Wikibase system:

  • as data entering the item/property paradigm;
  • as templates in the Template: namespace;
  • as module in the Md: namespace, i.e. using a Lua script;
  • as widget in the Widget: namespace;
  • as Javascript in the user script/common.js paradigm (User:/MediaWiki:).

The widget namespace probably only makes sense for the operation of the website itself.

A lot of components can be and should be reused from Wikidata. At least for templates, modules, widgets, and scripts, those that make sense to reuse probably have to do with the operation of the website itself as well. For items and properties, the reuse might be broader: a database of companies is useful, for instance, and certainly Wikidata contains a start.

Otherwise, there are factors to take into account to distribute the work:

  • access control -- or lack thereof -- can be decided with namespace granularity (considering that specific items can be protected). This is very important in also enabling outside contributions as data that become quickly very useful, or possibly for users to design their own new interfaces;
  • some data will remain user-side, and needs to be semantically aligned. Therefore it makes sense to maintain some of the information within the database rather than the code;
  • some data will be written by technical contributors, who will not write code but data.

A concrete example can illustrate the last two points: the system might want to store in the user's personal data store that the "_ga" cookie on the website "example.com" had a given value. Alongside, the system should also store, that the email address of the user is "joe@schmo.com". Both are attributes of the individual user, so they should be stored in the same way. The best way to achieve this is to have one item for "email address", and one for "_ga cookie on example.com". We call this principle "internet architecture as semantic data": what is at the level of ecosystem architecture is what is not personal data, and should follow semantic principles. In this way individual level data can be clearly identified and manipulated, in a privacy preserving way.

In general, the code base, distributed over many different languages and modes of operation, should be seen as a toolkit enabling others to craft their own experiments.

Lessons from "telephone number"

We implemented a quick system for storing telephone number interface button (Q488) locally and interfacing with the server.

The perspective/goal has been refined: smooth distribution of data and computation across server and client.

Computing

The server computing environment could be:

  • gadgets;
  • user scripts;
  • other?

The client computing environment could be:

The client itself can distribute its data however it pleases, as well as the computations it needs to perform. The server serves code that has to be run.

Static data

For static data, we win the closest we are to storing RDF graphs. RDF graphs (i.e. collections of triples) have the advantage that they are directly interoperable at a technical level. Just append the two files. So when we store data, including in the browser, etc, we should be as close to that as possible.

Semantics

Beyond mere storage of data, the semantics matter as well. For interoperability reasons (multiple servers eventually), it makes sense to get as far away from possible from the intricacies of Wikibase, and refer to concepts using external tags, possibly coming from well-known ontologies. This is true on server side, as well as on client side.

Conclusions