Linked Data on the Move

February 2021

My current main research project is Solid + WebMonetization: How can creators get paid for the content they publish on Solid? A simple first step was linking each Solid pod to a Payment Pointer, but once we did that, we bumped into some more fundamental unanswered questions, like: other than looking at tags in the current HTML page, which is quite imprecise, what are ways to discover the creator of content on Solid? or to pay one of your Solid friends?

I created some draft building blocks in the Solid OS MoneyPane, but I soon realized a lot of plumbing is missing if we want creators to publish content on Solid in the first place. Our existing Linked Data framework is lacking some functionality if we want to use it for data that moves around between creators, data owners, data hosters, and consumers.

Social Linked Data

The name 'Solid' comes from 'SOcial LInked Data'. Linked data is data with hyperlinks in it, that point to related pieces of data, that can then be linked again, to form a web. We know the web of documents works best if documents stay at the URL where they were first published. This works well for documentation, but for social and personal data it's not always a reasonable requirement.

Users of any social application will want data portability: the ability to move data around. With IndieHosters we initially set out to offer managed personal server hosting, including a domain name registration, but in a federated network of hosting providers. That way, a user could seamlessly take their domain name and personal server from one provider to the next, at least within the network of IndieHosters-compatible personal server hosting providers. That idea didn't really take off though, and nowadays most Solid users don't own their URLs, mainly because a Solid pod on a subdomain is free, and a domain name costs money.

So when you switch to a different pod provider, all your URLs change. And not only that: if you were given access to data through your WebID, as is the basis of Web Access Control (WAC), then your new WebID will not give you the same access. This means that switching from one WebID or pod provider to another is painful. It's OK if documentation is linked to hard-coded URLs, but for personal data it's less desirable.

Another point where hyperlinks break is when data is sent in a message. The sender and the receiver of a message will both have a record of the message that was sent. And the message itself, even in flight, can also contain relative URLs inside itself.

And last but not least, when data is used socially, the person hosting the data in their pod is not necessarily the authoritative author of that data. This means that whenever we try to build pilot projects with Solid, a lot of use cases end up using verifiable credentials where an additional signature determines authority, instead of straight-up linked data where the hostname determines authority.

So to make Solid a success, we should stop just thinking of linked data, and start thinking of linked data on the move. Not in the least for determining who should be allowed to monetize a given piece of content. In Solid, the owner of the pod where the content is hosted will not automatically be the author of that content. The author's identity should be annotated inside the content as it moves around, gets sent and received, and gets included in news feeds and on web pages. We should decouple the author identity from the hoster identity.

Solid Messaging

Another thing we haven't fully worked out yet, especially when thinking about people who may sometimes change their WebID, is messaging. In each WebID profile on Solid, there is a link to an LDP Inbox, to where messages can be delivered with unauthenticated http POST. When this happens, the recipient has no way of knowing who sent that message. So for contacts with whom you converse repeatedly, you will want to set up a sender-specific inbox. That inbox would then only allow the WebID's of that sender to POST to it, so you can know for sure who sent the message.

Also, when you read messages in either your global inbox or one of your sender-specific inboxes, you will want to have a way to mark them as read. The way we currently implemented this in solid-logic is by moving read messages to a date-indexed archive.

To discover a person's global inbox, you can just follow the link from one of their WebID's. But when they assign you a sender-specific inbox, you first need to read and understand the notification they send you about that, and then you have to somehow store the information, as an attribute of that person, in your addressbook.

When you move from one pod provider to another, if your WebID doesn't change, then you can update the global inbox triple in your profile, and just need to send notifications about the sender-specific inboxes that move. These senders should then parse and apply these messages in their addressbooks, so that next time they message you, the message will go to the new sender-specific inbox which you created at the new pod provider. If your WebID also changes, then the thing to do would be to first send out notifications that you have a new WebID, then people can update their ACL documents and their addressbook entries about you, to reference both your old and your new WebID, then you start using only your new one, and then you can send a message to deprecate the old WebID.

All of this is basic plumbing for Social Linked Data, and we're currently implementing it as part of Solid OS.

Federated Bookkeeping

Apart from allowing linked data to be moved, I had to finally admit we should also allow it to be copied. I was always a big advocate of "data at the source", but now that I'm working more and more with Federated Bookkeeping, I realised that you cannot always just link to the system-of-record, you sometimes want to keep a copy for your own administration. This is a huge step away from the basic design of linked data, but I think it's the right one for Solid to take at this point, at least when we use Solid for bookkeeping.

One thing I kept running into is that I wanted to import my bank statements, converting them to a richer data format, but without losing the original. My bank doesn't allow me to retrieve my MT940 bank statement exports with https+WAC yet, so I need to download and upload them from my bank to my pod. There, I keep a copy of the original, some metadata about the provenance of the original document, and then my converted version of it.

The same happens with peer-to-peer messaging. Last year, I presented SNAP on Solid at the global Solid World meetup. In that implementation, each peer would keep a copy of each message they send in a 'sent box', and a copy of each message they receive in a sender-specific inbox like the ones described above. So this is not linked data in the classical sense. This is data inside a message that was sent at some point in time, and of which different parties (the sender and the receiver) keep a copy. Neither the sender's copy nor the recipient's copy is the "real" main copy of that message.

Linked but not Stationary

In all of this, I didn't even mention how data moves between apps and pods. A user may use a mobile device to run a local-first app, which asynchronously synchronizes its data to the user's pod. There may be multiple devices competing to edit the same resource. There may even be multiple users combining their incremental edits collaboratively. This is, however, a less fundamental deviation from URL-based data, since the copy of the pod is always the authoritative one. The devices sit around it in a hub-and-spokes formation.

We could use proper device-to-device messaging, for instance via WebRTC, and this may of course be in scope for Web Monetization, for instance when you pay for a videocall with a coach or therapist. But that is mainly out of scope for Solid. We have no way to link to data that moves neither from nor to a (pod) server.

In short, I think Solid is an evolution of the semantic web. It sits at the sweet spot of classic web servers and more dynamic ways to use (personal) data. Solid extends the semantic web to allow data not only to link, but also to move. Hence the title of this blogpost: Linked data on the move!