Levvel Blog - Offline Data in the Browser

Offline Data in the Browser

In the past, most web applications read and wrote all data directly to the server. The client’s job was simply to fetch and display or capture and send data. That setup was easy for everyone to understand and JavaScript applications on the browser could remain blissfully ignorant of the data complexities happening on the server.

The rise of HTML5 and client side frameworks like Backbone, Angular, Ember, and React have changed all that. Now client side applications have their own rich model layers and MVC structures. These tools are incredibly powerful and allow us front end developers to build logic on the client that has long been the domain of back end developers. But it all comes with a catch. We now have the power to work with data on the client, but now we also have to deal with problems that used to only exist on the sever. There are lots of questions that come up once we add a richer model layer to our client apps:

  • Persistence – Once we have data in a JavaScript app, how do we save it across pages, sessions, or even browsers? Do we just load it all from the server on application start? What do we do if the app is offline?
  • Syncing – How/when do we sync data changes on the client up to a server? What about data changes on the server down to the client? How about schema changes?
  • Querying – Keeping and searching a simple document collection like a list of “todo’s” or “comments” is nice and easy in JavaScript. There are no shortage of examples. But querying real data – the kind managed by ERP systems for example – isn’t as simple as “Give me the last 5 items for this user.” In the real world, we need to answer questions like “What are the IDs of the cheapest contracts per plan category that are available within 30 miles of the customer’s zipcode?” How do we get answers to these kinds of real world questions using client side code?
  • Performance – Server side databases use techniques like query optimizations, indices, and sharding to optimize performance, but what are the performance implications of working with large datasets in the browser? How much data can we store? How fast can we query it?

All of a sudden we’ve got real problems.

In the past year, I’ve come face to face with some of these issues on real client projects. Knowing that these types of issues are only going to come up more frequently in the future, I’ve started researching some of the solutions available. Turns out, there are a quite a few options, each with their own strengths and weaknesses.

LocalStorage

The easiest and most widely used data storage option is LocalStorage. LocalStorage and its less widely used counterpart SessionStorage are part of the HTML5 webstorage spec. It provides a key-value store with a simple interface to persist your data to the browser.

localStorage.setItem('myShow', 'House of Cards'); //Save value
localStorage.getItem('myShow'); //Get value back

Current implementations only allow saving of strings, but you can easily get around that limitation and save complete JSON objects by serializing and deserializing them before calling localStorage’s set/get methods.

var myShows = [{show: 'House of Cards', seasons: 3, actors: [...]}]
localStorage.setItem('shows', JSON.stringify(myShows))
...
var bestShow = JSON.parse(localStorage.getItem('shows'))[0] //House of Cards!

LocalStorage checks the persistence box by letting you store arbitrary data and retrieve it on subsequent user sessions. It enjoys excellent browser support, even on mobile, and unlike cookies, data doesn’t automatically expire.

LocalStorage also has the added benefit of being synchronous (so you don’t have to deal with callbacks, promises, generators etc) and is quite performant as far as data access is concerned (35k ops/sec).

Unfortunately, localStorage also has some drawbacks. The first and most obvious is data size, which is limited to about 5mb. That’s usually plenty of space to save more than a few objects, but it’s usually not enough for offline applications that are heavily data based.

Second, localStorage gives us no mechanisms to query other than the ‘key’ of the objects we saved. We can get a little control by breaking apart our data into collections and saving them under multiple keys in localStorage. Getting at the exact data we need then usually means reading a few collections into memory then using a library like underscore, lodash or ramda in our application logic to filter and map down to the data we really need.

LocalStorage is a native storage mechanism, so if you want to sync data to a server, you’re going to have to roll your own solution.

IndexedDB

Stepping up from localStorage we have IndexedDB, a more robust W3C recommendation meant for offline application use.

“User agents need to store large numbers of objects locally in order to satisfy off-line data requirements of Web applications. [WEBSTORAGE] is useful for storing pairs of keys and their corresponding values. However, it does not provide in-order retrieval of keys, efficient searching over values, or storage of duplicate values for a key.”

Like localStorage, IDB is also essentially a key value store, but it addresses some of the webstorage’s drawbacks. Data size is technically unlimited, though some vendors will ask users for permission if you cross the 50mb point. IDB introduces transactions, auto-incrementing keys, and database schema versioning.

Unfortunately, with all these powerful features, IDB also brings with it a lot of complexity. The API is asynchronous and all based on transactions. Lets say we want to save a show object like we did in the localStorage example above. We’d have to define a store and save an object to it asynchronously using event listeners. The code looks something like this:

var openRequest = indexedDB.open("netflix", 1);

openRequest.onupgradeneeded = function(e) {
  var IDB = e.target.result;
  if (!IDB.objectStoreNames.contains("shows")) {
    IDB.createObjectStore("shows", { autoIncrement: true });
  }
}

openRequest.onsuccess = function(e) { 
  db = e.target.result;
  addPerson(db);
}

openRequest.onerror = function(e) {
  // Do something for the error
}

function addPerson(db) {
  var transaction = db.transaction(["shows"], "readwrite");
  var store = transaction.objectStore("shows");

  // Define a show
  var myShow = {
    name: 'House of Cards',
    seasons: 3,
    actors: ['Kevin Spacey', 'Robin Wright']
  }

  // Perform the add
  var request = store.add(myShow);

  request.onerror = function(e) {
    //Error handler
    console.log("Error", e.target.error.name);
  }

  request.onsuccess = function(e) {
    // Success Handler
    console.log('success!');
  }
}

That’s a lot more complex than our localStorage 1-liner. The code for getting the data is similarly verbose.

function getShow(id) {
  var transaction = db.transaction(["shows"], "readonly");
  var store = transaction.objectStore("shows");
  var request = store.get(id);
  request.onsuccess = function(e) {
    var result = e.target.result;
    if (result) {
      console.log(result.name);
    }
  } 
}

And this is only the beginning. We haven’t talked about cursors, indexes, ranges… All this, along with its somewhat spotty browser support, can be enough to send most front end developers running for the hills.

Performance also isn’t really one of its strong suits. As of the writing of this post, IndexedDB queries run at ~800 ops/s, and possibly an order of magnitude slower than that when running real world queries on large data. LocalStorage by comparison is over 20x faster. You can run a live performance test in your browser to see how it stacks up today.

Like LocalStorage, IndexedDB is a native storage medium, so it doesn’t offer any sync options to a server.

At this point, I should also mention the existence of WebSQL. WebSQL is a deprecated/retired working draft of a technology meant to solve the same problems as IndexedDB, but in a more relational way. It is still supported in Chrome, Safari, and Opera and some mobile browsers, but Firefox and IE have no love for it. Besides being deprecated, the idea of dynamically constructing SQL queries in a JavaScript environment should be enough to keep you away. If you make one mistake today, don’t let it be using WebSQL.

Personally, I’ve never used IndexedDB directly in a project, though there are some libraries that utilize it as a backing store that are worth looking at.

PouchDB

As we’ve seen, working directly with IndexedDB isn’t for everyone, myself included. Luckily, there are some projects that abstract away the ugly details and make data management much easier. Enter PouchDB.

PouchDB is a JavaScript implementation of the CouchDB NoSQL database. It lets you easily create new databases, save documents, and query documents. Depending on the storage adapter you specify, databases are automatically saved to your persistence store of choice – IndexDB, LocalStorage, Memory… and yes, even WebSQL.

The syntax is super friendly. Going back to our House of Cards example:

var db = new PouchDB('shows');
db.put({ _id: 'HOC', name: 'House of Cards', seasons: 3, actors: ['Kevin Spacey', 'Robin Wright'] });
db.get('HOC', function(err, doc) { 
  console.log(doc); // House of Cards!
})

Pretty slick, right? By default, PouchDB will save data to IndexedDB if available. You can specify a different adapter when instantiating the database connection.

new PouchDB('myDB', { adapter: 'websql' }); // Boo!

Check out the list of adapters for more options, including localstorage and memory adapters! That’s all great, but the real killer feature of PouchDB for offline applications is that it gives us data syncing between the client and server for free when using CouchDB. The challenge of calculating deltas and replicating those data changes between client and server for offline apps is huge. It’s the kind of challenge that really should keep you up at night.

Fortunately, PouchDB lets you sync your database up, down, or both ways with a single line of code.

PouchDB.sync('mydb', 'http://myCouchDbInstance/shows');

That has to be one of the most unbelievable lines of code I’ve ever seen. Literally… I didn’t believe it the first time I saw it! You can also use one way syncing, or sync on update events, like when a new document is added. Very powerful stuff.

With a simple API, multiple persistence options, and the entire syncing problem taken care of, there’s a lot to love about PouchDB. The only drawback I’ve found so far has been performance. In my initial tests, reading and writing relatively simple data took on the order of ~100ms+, even when using the memory adapter. I admit, it could very well be that I had something wrong in my setup, so if you have different performance numbers on PouchDB, I’d love to see them and talk about how you got them.

Bottom line, if you’re looking for a manageable offline database solution, and performance isn’t absolutely critical, you should definitely check out PouchDB. The pros far outweight the cons.

LokiJS

PouchDB is awesome, but what about performance? What if you’re a total performance junkie and NEED lightning fast speeds? LokiJS might be for you.

LokiJS is an in-memory database which prioritises performance over everything *

Loki is an in-memory object database with an API inspired by MongoDB. All operations are synchronous, which makes for very readable code. Creating collections and inserting data is very straight forward.

var db = new loki('loki.json'); // create a database
var shows = db.addCollection('shows') // create a collection
shows.insert({ name:'House of Cards', seasons: 3, actors: ['Kevin Spacey', 'Robin Wright'] }) // insert a document
var myShow = shows.findOne({ name: 'House of Cards' }); // Get the show

If you’re familiar with Mongo, the above code should feel pretty natural. LokiJS supports more powerful queries through an expressive chaining syntax. For example:

var shows = db.getCollection('shows');
// Get the 2 most popular shows starring Kevin Spacey
shows
  .chain()
  .find(function(show) {
    return show.actors.indexOf('Kevin Spacey')
   })
  .simpleSort('popularity')
  .limit(2)
  .data();

LokiJS lets you persist data to disk through an adapter interface and ships with an IndexedDB adapter. You can also auto-persist data on an interval and auto restore the database at application startup. Both of these are really handy for an offline app.

Loki also offers some tools to help us sync data to a server through their “changes api.” The changes API option keeps track of all change operations in the local database since the last sync to the server. This helps us calculate deltas and sync our changes up. While the changes API is definitely helpful, it isn’t as fully featured as what we have in PouchDB.

While the syncing capability is a bit young, where LokiJS really shines is performance. Loki advertises the ability to execute 500,000 ops/sec on a typical dev machine.

I recently used LokiJS for a client project that included a huge offline database (50mb of data and 200,000 document collections). Query times on my dev machine were often sub-millisecond, with the worst performing queries not breaking 8ms. Really impressive.

Another thing to keep in mind about Loki is that development is still very active and the documentation could be better. If however, speed is your thing, and you’re not afraid to write some custom syncing code, LokiJS could be a great lightweight option.

Conclusion

With the rise of Single Page Apps and client side MV* frameworks, we’re going to find ourselves writing more offline-capable web applications. Just one of the new challenges of writing these sorts of apps is managing and persisting large complex data sets on the client side. Luckily, there are already a host of new tools out there to help us step up to the challenge.

The client side technologies and libraries we outlined here are just a few of the ones available out there, and new libraries are being created all the time. When evaluating new solutions, I typically ask how they stack up in 4 key areas:

  1. Persistence
  2. Syncing
  3. Querying
  4. Performance

Those aren’t the only criteria to measure, just the ones I’ve come up with. I’m interested in hearing what offline data solutions you’ve settled on for some of your projects and why they were the best solution for you.

Assaf Weinberg

VP of Product

Related Posts