inqlab icon

GeoPub: ActivityPub for content curation

Mon 06 April 2020

As previously mentioned openEngiadina is developing a platform for creating, managing and using crowd-sourced data in meaningful ways.

It is my pleasure to announce an initial demonstrator of GeoPub, an application for curating crowd-sourced content.

For the impatient: Link to the demo.

GeoPub is still very rough edged and limited in functionality but I hope good enough to demonstrate two ideas:

  1. How the ActivityPub protocol can be used to create, share and discuss any kind of structured content.
  2. How a content-management system for crowd-sourced data might work.

This is an extended demo of the initial GeoPub version, discussion on some known issues, things that we have planned, a peek under the hood as well as an outlook on next steps.

For more background on the Semantic Web and Linked Data see a previous post.

GeoPub

Demo time! GeoPub is a web application and we have an instance hosted here. You can also get the sources here and compile it yourself.

On startup GeoPub will attempt to load some initial data. This might take a few seconds.

Once loaded, you will see a screen showing recent activity:

Activity

Every line corresponds to an activity performed by an actor (a person, organization or similar):

Create a note

We see four pieces of information:

  1. The actor who performed the activity (openengiadina).
  2. The type of activity performed (Create).
  3. The object the activity was performed on (Note).
  4. When the activity was performed.

The types of activities are defined by the ActivityStreams Vocabulary.

Note that the actor, the activity type and the object are links that you can follow. For example you can follow the Note link.

This will take you to the "browse" view:

Browse note

In the center pane you see the content of the Note ("Hello World!") and some additional information such as the identifier and type of the note.

In the right-side view related activities are displayed (for the Note we are viewing there is only the initial Create activity).

On the left there is a menu for browsing different types of content.

Activities as content

The activity itself is just "content" and has it's own identifier and can be viewed like the note by following the Create link:

Browse activity

We see that the created note was made public by addressing it to the special public collection (https://www.w3.org/ns/activitystreams#Public).

Events

So far there is not much that sets apart GeoPub from any ActivityPub client such as the Mastodon or Pleroma frontends (except that GeoPub does not look as nice and seems overly complicated). What makes GeoPub different is that it displays content regardless of its type.

For example the demo user alice created an event:

Alice created an event

When following the link My super cool event we see that the type of the object alice created is an "Event":

An event

In the "Related Activities" pane you see that Alice created the event, announced it and then Bob created a note that is somehow related to the event. The user interface for displaying the related activities is of course very rudimentary. For example you would want to see the content of Bob's note immediately when viewing the event.

Note that we are still in the realms of what is currently done with ActivityPub. Gancio and Mobilizon are applications that are specialized for creating, sharing and interacting with events over ActivityPub.

GeoPub on the other hand does not have any special knowledge about events. It does not treat the event type as a special type. It is not intended for creating content of a specific type. GeoPub is intended for browsing and curating a wide range of different content types. An event, a description of a hiking trail or a toot are equally interesting.

Let's see some content types for which there are no ActivityPub applications yet.

Ice cream shop

The user bob has created something with the name "Mauri Gelateria":

Create Mauri Gelateria

We follow the link and see the created content:

Mauri Gelateria

The type is IceCreamShop. We can even follow the IceCreamShop link and get more information on the type:

IceCreamShop

The type is defined by the schema.org project, a collection of types for defining structured data. There are many more concepts and properties that describe the concepts that are defined and can be used to describe things such as our Gelateria.

Given the right interface anybody can create structured content on their business, their organization or almost anything, publish it and allow social interactions via ActivityPub on the created content.

For example I can like the created ice cream shop:

Like Mauri

This is the key idea we are trying to develop with openEngiadina: social interaction on arbitrary structured content.

The Fediverse and beyond

GeoPub is capable of getting data from many different sources.

For example it can get data from the Fediverse and display it:

Framasoft Activity

The content is fetched directly from the Framasoft ActivityPub outbox. We also fetch the actor profile:

Framasoft Activity

This data is loaded on initialization as sample data.

Note that there is an issue with CORS that might result in you not being able to get this data. See below under issues on how to get around it.

Embedded Content

Many websites publish embedded structured content in their HTML (using RDFa). GeoPub is capable of getting this data.

For example, this site (inqlab.net) has embedded meta data that can be fetched by entering the URL in the "Enter URL" form:

Enter URL

The data will be loaded and displayed:

inqlab.net

Another nice example is radar.squat.net, an online portal of events and groups. Structured content is embedded in the HTML and can be read by GeoPub:

Events from radar.squat.net

Many websites publish such structured content and GeoPub can fetch it and use it.

Curating crowd-sourced data

As seen above the Fediverse and the whole Internet serves as a source for content that can be browsed with GeoPub. The amount of information is almost unlimited, whereas the quality of data is not always up to standards.

It seems necessary for individuals and organizations to be able to create curated selection of content and share these selections.

On the Fediverse "Announcing" or "Boosting" is one way of curating content ("retweet" on the birdsite) - content that has been previously created is made visible to your followers.

A more elaborate way of doing curation is by creating collections. For example, adding existing content to a collection called "Events to publish on website". The interaction pattern is almost identical to announcing or boosting content, except that you need to choose to which collection the content should be added.

The content of this collection "Events to publish on website" can be made public and can be used by a static-site generator or a Wordpress plugin to render on a website.

We are working on implementing the ability to create collections of content in openEngiadina by implementing a Linked Data Platform - a bit like WebDAV for structured content. This is also the base specification of Solid.

For now we have implemented a simpler version of this: Liking - or in other words - adding things to a collection of liked content.

If authenticated you can like your favorite ice cream shop from GeoPub:

Like an ice cream shop

Under the hood

A quick peek under the hood of GeoPub.

ClojureScript

GeoPub is implemented in ClojureScript. We extensively use language specific functionalities and libraries in GeoPub (e.g. core.logic). it seems to be very well suited for "data-driven" applications.

Most importantly, ClojureScript is fun. Have a look at the code.

Parsers

A key functionality of GeoPub is the ability to get structured data from various sources in different formats.

This is done by using the RDF Parse JavaScript library. Which can parse data in Turtle, JSON-LD and RDFa (among others).

core.logic

In order to be able to handle all the data from various sources GeoPub implements a query engine using miniKanren, a very simple and lightweight embedded language for logic programming. We use the core.logic implementation of miniKanren in Clojure.

For example this allows us to very nicely query for all related activities to something:

(run* [id]
  (rdf-logic/graph-typeo graph id (as "Activity"))
  (fresh [p]
    (rdf-logic/graph-tripleo graph id p something)))

We query for the ids of all activities that have something as object for any property p.

The graph-typeo relation also handles type inference. We do not have to specify all the types of activities but can use the fact that specific activities (such as as:Create) are defined as sub-types of the as:Activity type.

We intend to implement more of such nice things (following the lines of Simple and Efficient Minimal RDFS) and also make the query engine directly usable from the interface.

Issues and what's planned

GeoPub is still far from being usable. Some of the issues we are aware of and some things we are planning to implement:

CORS

For security reasons browsers restrict HTTP requests initiated by scripts to sites other than the origin of the script (cross-origin requests).

This makes a lot of sense for the usual untrusted Javascript that gets loaded from random sources. However it limits applications that get data from many different sources, such as GeoPub.

Sites can opt-in to allow requests from cross-origin via a mechanism called Cross-Origin Resource Sharing (CORS). This needs to enabled manually on server-side. Practically it is impossible to get all server administrators to enable CORS. Practically it is impossible to implement a client side Linked Data web application.

We are thinking about ways to get around this limitation. If you have ideas, we would love to hear about them. Please get in contact.

If you would like to try the complete "Linked Data" experience now, I recommend using the CORS Everywhere plugin for Firefox.

There are other related issues that make consuming Linked Data from web clients challenging, such as mixed content.

Map

GeoPub has Geo in the name because it was initially conceived to handle geospatial data (such as points of interest, hiking trails, organizaitons with physical locations).

We intend to implement a map showing content that has location data attached.

Along the same lines we intend to implement a calendar view for data with dates (such as events).

Multilingual

RDF explicitly allows tagging the language of content. An note can have content in four different languages:

@prefix as: <https://www.w3.org/ns/activitystreams#> .

<https://openengiadina.net/objects/sample-note>
    a as:Note ;
    as:content "Good day!"@en ;
    as:content "Guten Tag!"@de ;
    as:content "Grüezi"@gsw ;
    as:content "Bun di!"@roh .

We intend to use this feature in the data model to pick the most appropriate translation for content.

The example above was encoded in Turtle. Multilingual content can also be encoded in JSON-LD:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Note",
  "contentMap": {
    "en": "Good day!",
    "de": "Guten Tag!",
    "gsw": "Grüezi!",
    "roh": "Bun di!"
  }
}

Persistent storage

Currently GeoPub does not have any persistent storage. We intend to implement persistent storage using the IndexedDB API.

This is an important step towards making GeoPub local-first software.

What's next?

A quick research stint into how Linked Data and RDF can be used in even more decentralized ways.

Unfortunately some aspects of the existing Semantic Web stack require a lot of centralized state manipulation (e.g. Linked Data Platform or Web Access Control) which seem unsuitable for decentralized platforms and also massively increase the complexity of servers.

Our motivation for doing this research is to keep any server logic as simple as possible and allow a transition to a completely decentralized architecture, while being compatible and data-interoperable with existing technology and data.

We want to explore and research:

Acknowledgments

This work was supported by the NLnet Foundation trough the NGI0 Discovery Fund. Hosting of the demo instance is provided to us by ungleich.

I am very grateful for feedback, comments and questions.

GeoPub and openEngiadina is a collaborative effort. Join us in our Matrix channel.

pukkamustard