A Web of Sync

09 Dec 2010

I’m excited about how the rise of HTML5 (and related technologies) will change the way we think about the web, and how they will further encourage developers to build richer applications instead of mere document-oriented sites.

(At Sencha, for example, we spend a lot of time thinking about what this really means, and how it changes the skills, tools and frameworks we’ll all need to be effective web practitioners)

But here, I’d like to consider three particular changes that are happening, and one particularly overlooked impact they might have.

The first is the rise of powerful application architectures on the client side. Sencha has added Model-View-Controller support in both Ext JS 4 and Sencha Touch, for example, and this makes it possible to implement entire data schemas, rich business logic, and well-structured user interfaces – entirely on the client side. (Other frameworks are following a similar path: SproutCore, JavaScriptMVC and so on)

Secondly, the way in which HTML5 provides support for local and offline storage makes the autonomy of these apps a real possibility. Viewing, filtering and manipulating data can still be performed in the browser, even when completely disconnected from the server (or the entire network, in the case of a mobile device that loses coverage).

Finally, there are the possibilities of being able to use Javascript on the server-side of the web. The rise of interest in node.js, for example, is largely prompted by the opportunity to be able to use the same programming language throughout the entire web stack.

But such changes mean that this stack is changing shape.

Historically, the entire logic of delivering web-based services to browsers was built on the server-side. Databases, security layers, business logic, user interface creation – all of this was within the realm of the web server, and the browser merely took the resulting HTML and dumbly rendered it for the user.

Of course, this is incredibly inefficient: the entire stack is executed (and a whole page of HTML generated) for almost every single user interaction. To mitigate this, web developers turned to AJAX & AHAH to start placing more of the user interface logic in the client’s runtime environment, and to reduce the size of the payloads involved. The stack started to span both the server and client sides of the picture.

But with client-side MVC and support for local storage techniques, we can take this stack evolution a whole step further. With Sencha Touch, for example, an application constructs its entire user interface from scratch, and can synthesize the entire business logic, validation ruleset, and storage schema that would have traditionally been encapsulated on the server side.

Now, islands of independent data on each and every user’s device might make for a useful service, but it is far more likely that most applications will continue to operate in conjunction with a server – which, with the appropriate layer of security to protect access to it, will provide a central storage repository.

Our stack has now been neatly split in two.

(Note that, in the same way that we’ve long learnt to run both client-side and server-side validation of data submitted from HTML forms, it will probably remain wise to mirror business logic on both sides of the network to prevent inconsistency. While the rise of Javascript on the server theoretically makes it possible to do this with one set of common code, in reality, most server stacks will remain non-JS based for some time.)

All well and good. But in one very particular respect, an architecture like this is deeply ambitious. We have added a significant new programming challenge into the mix: and that is one of data synchronization.

When the browser is merely displaying read-only HTML or AHAH, the definitive copy of the business data is one place only: the server. But in this new architecture, we have the same logical data in two places – and, assuming we’re allowing users to interact with their copy of it, quite possibly off-line, we have to deal with the matter of keeping the two (or N) data sets consistent.

At a protocol level, this is simple enough. We can easily use JSON to transmit data bidirectionally over an HTTP wire, and Ext JS and Sencha Touch’s model store classes, for example, take care of all the plumbing for us. But when we should do so, what the payload should be – and how we reconcile any conflicts between client and server – can quickly become interesting design questions.

Consider a web-based corporate price list application, build using a client-side MVC pattern, and providing up-to-date information about a company’s products to its mobile salesforce. On its first instantiation on a mobile device, say, such an application might easily pull the entire list of product records from the server over JSON and store them locally for fast, possibly off-line, access.

So far, so easy. But of course the records might frequently change on the server: the company creates new products, discontinues old ones, and often changes their prices. How should the ‘create’, ‘delete’ and ‘update’ operations be applied to the mobile device’s version of the data? Should it regularly refresh the whole data set again? Should it request all changes since the last sync and then incrementally apply those changes to its local copy? If certain data is not updated for a certain length of time, should the client application mark it as stale?

These are already thoughtful decisions for the application developer to have to make.

But now imagine that the company’s product managers are given more powerful rights within the same application, and they can actually edit the product descriptions on their mobile devices. When connected, these changes can be immediately sent directly back to the server. How can we ensure that other users’ apps can have that change broadcast to them?

What about when the product manager is off-line, perhaps editing the portfolio on a plane? Should the application prohibit updates to the local data? Preferably not, but if changes can be made, how should the application poll to see when it is reconnected, and send those changes back in the most efficient way?

Taking it to extreme, what happens when two product managers edit the same pieces of data? What happens when each have added a new product to which their client applications have assigned the same supposedly unique ID? The system needs to reconcile those potentially conflicting changes – and very quickly this becomes a non-trivial task.

In the broader computing world, these challenges are common and have been frequently, and successfully, addressed. Many proven solutions exist for synchronization in the realms of distributed file systems, versioned source control, disk, database and site mirroring and so on – and even to manage our MP3 collections. But what is new is that this will soon becoming a challenge that web developers will need to tackle, and it is a challenge that might be unfamiliar and daunting to many.

In the general case, of course, there is no single approach that will answer all these questions. Effective synchronization across a ‘split stack’ architecture will depend very much upon the sort of data that is being manipulated and what the application is setting out to achieve.

Frameworks will certainly provide us with all the tools we need to implement the choices we make. But considering these synchronization-related scenarios and use-cases will be an increasingly important part of bringing compelling applications to users in increasingly diverse contexts. Reconciliation will lie at the heart of how we bring complex applications to life – not just an after-thought.

The web is no longer one of documents or of simple uni-directional data: it is fast becoming a web of synchronization. And I fear it might not be as easy as we think. As developers, we all need to go into this new world with our eyes wide open and be prepared for new challenges around the corner.