Using local storage

WebExtensions' local storage is used to store settings such as sites' whitelist information and script substitutes.

Avoiding races

Local storage seems to be very race-prone. If 2 contexts, for example popup and settings page, try to simultaneously modify the value of a certain storage key, it might leave the storage in an inconsistent state. Consider a storage entry which holds a list of values (e.g. script names). To add a value to this list, code has to (asynchronously) fetch the list from the storage, make modification to it in memory and (asynchronously) store it back. There are no transactions as in SQL databases. If 2 contexts do this at the same time, they need to synchronize some other way.

Note that while currently (besides initialization when extension is installed) only the settings page mutates storage, other pages (such as a popup) might start doing so in the future and we want to design with that in mind.

Solution idea: use a storage entry as a lock

It is tempting to use a local storage entry as a lock variable. A script that wants to modify something in storage would first need to successfully set the lock variable to some value (say, a generated pseudo-random nonce). This would be more or less an implementation of transactions on top of a transaction-less key-value store.

With this approach there are some difficult corner cases. What if the settings page gets closed by the user while holding the lock? Perhaps we should allow script to "steal" a lock after some specified timeout?

Solution idea: avoid storing big, updateable values in the storage

It could be argued that WebExtensions' local storage was not meant for things like lists. Currently, Haketilo stores lists of all script names, all bag names and all page URL patterns. This makes it easy to quickly get the information of what is in one of these lists but also requires updates whenever a single item gets added or deleted. Instead, it would be possible to not store such lists at all. We could still re-create a list of all items of each kind programmatically, after fetching the entire contents of local storage. Also, it would be possible to maintain an in-memory cache of script, bag and page lists (just as we're currently doing anyway).

Currently employed solution: serialize storage accesses in the background script

Right now neither settings page nor popup or content scripts make direct accesses to local storage. Instead, they exchange messages with background script which fetches and modifies stored data. This way accesses are effectively serialized, because a single context of background scripts runs single-threadedly and accesses within that context are guarded by an asynchronous, promise-aware lock (implemented in common/lock.js).

Caching storage data

The background script maintains a cache of lists of items. This way whitelist information necessary for CSP header injection is available to webRequest callback in a synchronous fashion. This is important because Chromium's implementation of webRequest doesn't allow a promise to be returned from the callback.

Additionally, it is possible to create yet another such cache, in another context, and synchronize it with the background one through messages. background/storage_server.js and common/storage_client.js do this. The settings page is currently the only part of extension that uses it.

Such "remote storage" is perhaps not necessary but makes it more straightforward to access stored data from places other than background script.

Data semantics

WebExtensions' local storage only allows us to store key-value pairs and while values can be arbitrary JSON-ifiable data, keys have to be strings. We have to store the data of all substitute scripts (with either their text or URLs), all script bags (sets of scripts, for easy handling of multi-level dependencies), all page settings and some additional, single-instance variables.

Below we describe the semantics of various stored objects. Another good idea might be to look at how data is stored directly in the browser. In the javascript console of one of extension's pages (for example the settings page) you can enter: // chromium

or: // mozilla

Indicating types of stored objects

The solution to storing various kinds of objects was to construct object key by prepending object name with a single letter indicating its type. Scripts have letter "s", bags have letter "b" and pages have letter "p". Additionally, other variables that are not to be aggregated are stored with prefix "_". For example, lists are stored under "_bags", "_pages" and "_scripts". Prefixes are defined in common/stored_types.js.

Page entry semantics

Key of for a page entry is a URL pattern prepended with "p". Value for a page entry should be an object with 2 keys. The first key is "allow", its value is a boolean and it indicates whether native scripts are to be whitelisted on a given site. The second key is "components" and its value is a 2-element array specifying what custom script(s) to run on a given site. First element of that array is a type prefix of the component to run (either "b" for bag or "s" for script). Second element is the name of that component.

Additionally, if either "allow" or "components" is not set or its value is undefined, it is treated as if scripts were blocked (in case of "allow") or no custom scripts were specified to be loaded (in case of "components").

This is how an example page entry might look like:

"p" : {
"allow" : false,
"components" : ["s", "opencores"]

Bag entry semantics

The key for a bag entry is the bag's name prepended with "b". The value for a bag entry should be an array, each element of which indicates one component of that bag (either another bag or a script). The element should itself be a length 2 array containing the component's type prefix and name.

This is how an example bag entry might look like:

"bkerneldevswererudetome" : [
["s", "gratisracecondition"],
["b", "decompilinginhissleep"],
["s", "librejsranting"],
["b", "bsdbackrub"]

Script entry semantics

The key for a script entry is the script's name prepended with "s". The value for a script entry should be an object. It should have either a "text" key which holds the script's text content or "url" and "hash" keys that hold the script's remote location and the SHA256 hash of the script's text contents. "text" shall take precedence over "url" and "hash" if all are present.

This is how an example script entry might look like:

"scrm.ajax" : {
"hash" : "6401a4e257b7499ae4a00be2c200e4504a2c9b3d6b278a830c31a7b63374f0fe",
"url" : ""

Here's another one:

"shello" : {
"text" : "console.log(\"hello, every1!\");\n"

Updated by jahoti 3 months ago · 6 revisions