Data Sync: Concept

Real-time data synchronization between database and user interfaces is the main goal of Snorky.

We want our user interfaces to reflect the state of the database in every moment, not only the time the page was initially loaded. We don’t want polling because it has many downsides, as seen in Polling with Meta Refresh. Instead we need a Publish-Subscribe approach.

Parties involved

We distinguish several parties:

  • The clients, which have a browser with Javascript support.
  • The data, often stored in a database of any kind.
  • The main web server, which serves HTML and JS code and usually queries and modifies data in the process.
  • The WebSocket or any (compatible-fallback) server. It will act as the PubSub server.

Any of the parties can be in separate machines.

Process

In order to fetch data and keep with the updates, the following process would be used:

  1. The user requests the page to the main web server.

  2. Upon receiving the request, the main web server:

    1. Initiates (authorizes) a subscription with the WebSocket server, specifying what kind of data the user needs to get updates from.

      This communication is not necessarily done with the WebSocket protocol, but with other backend channel. Only trusted parties like the main web server can use the backend channel.

      The WebSocket server resolves the request with a subscription token.

    2. Queries the current data to the database.

    3. Sends the user both the retrieved data and the subscription token to the client with the appropriate HTML code and scripts.

  3. The user browser renders the data received and establishes a connection with the WebSocket server. It sends it a message to acquires the subscription with the received token.

  4. Each time a party modifies the data, it must notify (publish) the change to the WebSocket server through the backend channel. The notification specifies the type of change (addition, update or deletion) and the affected data before and after the change.

  5. The WebSocket server, upon receipt of the notification looks up for subscriptions to the affected data and sends it back to the subscripted clients, which will update the data in their user interface.

Simplistic UML diagram showing the subscription and notification process.

Caveats

Of course, the devil is in the details. There are many caveats that require modifying the simplistic approach described above:

  • Subscription tokens may or may not be used, e.g. if the user closes the page loading or browses the web without Javascript enabled. They consume resources in the WebSocket server, so proper timing out and cleaning is required.

  • Changes may occur after the subscription was acquired but before the client acquired it: The WebSocket server must maintain a buffer in order to deliver those notifications to the user upon connection.

  • Subscription authorization and database querying must occur in that order, they must not be parallelized. Otherwise, if querying happened before authorization, changes could occur without being assigned to the newly-created subscription.

  • Changes may occur after the subscription has been authorized but before the database is queried. In that case, the client would receive updates that act on data older than theirs. This is not necessarily a problem: the client can omit those updates if it can not apply them... As long as they are ordered.

  • The WebSocket server may not receive the change notifications in the same order they occurred in the database, ruining the previous point. Coordination between the process that alter the data or with the database may be necessary in order to serialize the notifications correctly (e.g. label them with increasing numbers) in cases where it matters.

  • For security reasons, subscription tokens must be not predictable.

  • In order to send updated data to the client the same query must be coded twice: first in the form of a subscription authorization and later as a database query (e.g. in SQL or with ORM calls).

    Repetition makes code less maintainable and is undesirable. In the cases where it could be avoidable, that would require language or framework specific code (e.g. for Django ORM or simple SQL), which may outweigh the costs.

  • Matching subscriptions with change notifications in a way that scales up to many changes per second, many subscriptions or both can be tricky at times.

Table Of Contents

Previous topic

UML notation

This Page