Global Storage Version 6

Attention

This document is a proposal. It will likely change.

This document describes version 6 of Sync’s global storage format. This describes not only the technical details of the storage format, but also some semantics for how clients supporting version 6 should interact with the Sync server.

Cryptographic Model

All data on the server (with the exception of a single record containing non-private metadata used to help clients perform sanity checking) is encrypted on the client using AES symmetric encryption with 256 bit keys. Encrypted data is secured against tampering by employing HMAC-SHA256 hashing.

The cryptographic model frequently relies on pairs of 256 bit keys. One key is used for AES symmetric encryption; the other for HMAC verification. We refer to such a pair of keys as a Key Bundle.

Data on the server is organized into collections (e.g. history, bookmarks). Every collection has a single Key Bundle associated with it. We refer to a Key Bundle that is affiliated with a collection as a Collection Key Bundle. A single Collection Key Bundle is used to perform cryptographic operations on every record in the collection to which it is associated.

It is recommended, but not technically required, that each Collection Key Bundle be associated with at most a single collection.

Special records on the server hold mappings of collection names to their respective Collection Key Bundles. The Collection Key Bundles are encrypted using another higher-level Key Bundle before they are stored on the server. We refer to these higher-level Key Bundles as Key-Encrypting Key Bundles. And, the entity that holds the mapping to Collection Key Bundles is referred to as a crypto record (because it is a record stored in the crypto collection).

In the simple case, we have a single Key-Encrypting Key Bundle used to encrypt the collection of all Collection Key Bundles. Each Collection Key Bundle is used to encrypt every record in the collection to which it is associated. In other words, we have a master key used to unlock other keys which in turn unlock data on the server.

In graph form:

digraph {
  ROOT [label="Key-Encrypting Key Bundle"];
  BOOKMARKS [label="Bookmarks Collection Key Bundle"];
  HISTORY [label="History Collection Key Bundle"];

  ROOT -> BOOKMARKS;
  ROOT -> HISTORY;

  BOOKMARKS -> "Bookmark A";
  BOOKMARKS -> "Bookmark B";

  HISTORY -> "History 1";
  HISTORY -> "History 2";
}

This specification establishes no rules for which crypto records exist or for how Key-Encrypting Key Bundles are managed. This is entirely up to the client. In other words, key management is a convention between clients. If you are interested in interoperating with Firefox Sync, see Mozilla’s Sync Service.

This is the essence of the cryptographic model. More details are explained below.

Representation of Key Pairs

While Key Bundles consist of two separate keys, they should be thought of as a single immutable entity. To enforce this, a Sync Key Bundle is represented as a single blob of data.

The blob consists of the 256 bits of the encryption key followed by the 256 bits of the HMAC key followed by optional metadata. The format of the metadata is not currently defined. If no metadata is present, no extra information is recorded and the Key Bundle is represented as a single 512 bit (64 byte) blob.

In pseudo-code:

key_bundle = encryption_key + hmac_key + metadata

When encoded in JSON, key bundles are Base64 encoded. Note that keys are not stored in plaintext, so the Base64 encoding will apply to the ciphertext. See below.

Encrypted Records

Most encrypted records share a common payload format and method for encryption and decryption.

The payload of an encrypted records effectively consists of the following fields:

  • ciphertext: The encrypted version of the underlying data.
  • IV: Initialization vector used by AES encryption.
  • HMAC: HMAC for the encrypted message.

Since these 3 items are all related and all are needed to decrypt and verify individual messages, they are represented by a single entity - a buffer containing all 3 fields concatenated together.

Each binary buffer holds the raw bytes constituting the HMAC signature, followed by the raw bytes of the IV, followed by the raw bytes of the ciphertext.

In pseudo-code:

data = hmac + iv + ciphertext

The HMAC signature is always the length of the HMAC key. Since Sync uses 256-bit HMAC keys, the HMAC signature is 256 bits, or 32 bytes.

The IV is fixed-width at 16 bytes.

The ciphertext is variable length.

We refer to this 3-tuple of encryption matter as Encrypted Data.

When represented in JSON, the raw bytes constituting the Encrypted Data are Base64 encoded.

Encryption

Encryption is the process of taking some piece of data, referred to as cleartext, and converting it to Encrypted Data.

We start with a Key Bundle and cleartext.

In pseudo-code:

# collection_name is the name of the collection this record will be inserted
# into. The called function obtains the appropriate key bundle depending on
# the destination collection of the record.
bundle = getBundleForCollection(collection_name)

# Just some aliasing for readability.
encryption_key = bundle.encryption_key
hmac_key = bundle.hmac_key

iv = randomBytes(16)

ciphertext = AES256Encrypt(encryption_key, iv, cleartext)

# Now compute the HMAC. Be sure to include the IV in the computation.
message = iv + ciphertext
hmac = HMACSHA256(hmac_key, message)

encrypted_data = hmac + message

# When going to JSON, the binary payload buffer is Base64-encoded first.
record.payload = Base64Encode(encrypted_data)

Decryption

Decryption is the process of taking Encryted Data and turning it into cleartext.

Decryption requires Encrypted Data and a Key Bundle.

In pseudo-code:

bundle = getBundleForCollection(collection_name)
encryption_key = bundle.encryption_key
hmac_key = bundle.hmac_key

# If grabbing the record from JSON, it will Base64 encoded.
payload_b64 = record.payload
encrypted_data = Base64Decode(payload_b64)

# HMAC is first 32 bytes of payload.
hmac_remote = encrypted_data[0:31]

# IV is the 16 bytes after the HMAC
iv = encrypted_data[32:47]

# ciphertext is everything that's left.
ciphertext = encrypted_data[48:]

# The first step of decryption is verifying the HMAC. The HMAC is computed
# over both the IV and the ciphertext.
hmac_local = HMACSHA256(hmac_key, iv + ciphertext)

if hmac_local != hmac_remote:
    throw new Error("HMAC verification failed!");

cleartext = AESDecrypt(encryption_key, iv, ciphertext)

Global Metadata Record

The meta/global record exists with the same semantics as version 5, the only difference being that the storageVersion is 6 and the engines key has been renamed to repositories.

TODO carry version 5’s documentation forward.

Example:

{
  "syncID": "7vO3Zcdu6V4I",
  "storageVersion": 6,
  "repositories":{
    "clients":   {"version":1,"syncID":"Re1DKzUQE2jt"},
    "bookmarks": {"version":2,"syncID":"ApPN6v8VY42s"},
    "forms":     {"version":1,"syncID":"lLnCTaQM3SPR"},
    "tabs":      {"version":1,"syncID":"G1nU87H-7jdl"},
    "history":   {"version":1,"syncID":"9Tvy_Vlb44b2"},
    "passwords": {"version":1,"syncID":"yfBi2v7PpFO2"},
    "prefs":     {"version":2,"syncID":"8eONx16GXAlp"}
  }
}

crypto Collection

There exists a special collection on the server named crypto. This collection holds records that contain mappings of collections to Collection Key Bundles.

Each record in the crypto collection has associated with it specific semantics. This specification is intentionally vague as to what records and semantics are defined, as it is up to clients to define those. In other words, the set of records on the server and the specifics of which Collection Key Bundles they contain and/or which Key Encrypting Key Bundle is used to secure them is left to the purview of the client.

The rationale for this is that users may wish to manage their Collection Key Bundles with different levels of access or security. For example, the record containing all the keys may only be decrypted with a highly secure parent key, while another record may contain keys for less-sensitive collections, which can be unlocked using a key derived from a less secure method, such as PBKDF2.

Clients should support the ability to intelligently use different sets of Collection Key Bundles, depending on what the user has provided them access to. This means clients should not be eager to delete collections for which it doesn’t have the Collection Key Bundle, as the user may have purposefully withheld access to that specific collection.

Record Format

The exact format of records in this collection has yet to be decided. We have a few options.

Option 1

The payload of every record is an object containing the following fields:

  • collections - (required) A cleartext wrapping of collection names to Encrypted Data. The decrypted values are Key Bundles used to encrypt the collection to which it is tied.
  • encryptingKey - (optional) An encrypted* Key Bundle used to encrypt other encrypted data in this record.

For example:

{
  "collections": {
     "bookmarks": "ENCRYPTED KEY 0",
     "history": "ENCRYPTED KEY 1"
  },
  "encryptingKey": "ENCRYPTED KEY-ENCRYPTING KEY"
}

The client would – if not delivered out-of-band – decrypt the encrypting key. This would require its parent key and the contents of this record.

The client would then take the decrypted key-encrypting key and decrypt the individual Collection Key Bundles.

Pros:

  • Simple

Cons:

  • Server data reveals which encrypting keys can be used to unlock which collections.

Option 2

This is similar to Option 1 except that the mapping info is itself encrypted.

For example:

{
  "data": "ENCRYPTED DATA",
  "encryptingKey": "ENCRYPTED KEY-ENCRYPTING KEY"
}

The decrypted key encrypting key would first decrypt the data field. This would expose the mapping of collection names to encrypted Key Bundles, just as in Option 1. From there, the same key-encrypting key would decrypt each individual Key Bundle.

Yes, the Key Bundles are encrypted with the same key twice. We do not want the Key Bundles unencrypted after the first unwrapping because we want clients to be designed such that they never have to touch unencrypted key matter. In the case of Firefox, this means Sync can operate in FIPS mode since NSS will be the only entity handling the unencrypted keys.

Pros:

  • Server data does not reveal which keys can unlock which collections

Cons:

  • More complicated than version 1
  • Double encryption involves extra work.

No encryptingKey Variation

There is a variation of the above options where the encrypted key encrypting key is not stored in the record. Instead, it is stored in another record on the server or not stored on the server at all. These variations differ only in the specifics of the record payload.

Changes Since Version 5

Sync Keys Consolidated

The Sync Key has traditionally been 128 bits (often encoded as 26 “friendly” Base32 characters). The historical reason for it being 128 bits is that in early versions of Sync (before J-PAKE), people would need to manually enter the Sync Key to pair other devices. Even with J-PAKE, people may need to manually enter the Sync Key (known as the Recovery Key in UI parlance) into their client. From the 128-bit Sync Key, two 256-bit keys were derived via HKDF.

With the intent to use BrowserID’s key wrapping facility, we feel Sync no longer has the requirement that the Sync Key be manageable to enter from UI. This is because your Sync Key will be accessible merely by logging into BrowserID (your BrowserID credentials will unlock a BrowserID user key and that user key can unwrap an encrypted Sync Key stored on the server).

(We expect that users not using BrowserID will use some other mechanism for key exchange other than keyboard entry.)

Therefore, in version 6, the Sync Key will consist of a pair of 256-bit keys. Each key will be generated from a cryptographically secure random number generator and will not be derived from any other source. This effectively replaces the single 128-bit random key and two 256-bit HKDF-derived keys with two completely random 256-bit keys.

Sync Key Stored on Server

Version 6 supports storing the encrypted Sync Key on the Storage Server.

Key Pair Encoding

In version 5, key pairs (the two 256-bit keys used for symmetric encryption and HMAC verification) were represented in payloads as arrays consisting of two strings, each representing the Base64 encoded version of the key.

In version 6, key pairs are transmitted as a a single string or byte array. The two keys are merely concatenated together to form one 512-bit data chunk. Version 6 also supports additional metadata after the keys. However, the format of this metadata is not yet defined.

IV Included in HMAC Hash

In version 6, the IV is included in the HMAC hash. In previous versions, the IV was not included. This change adds more theoretical security to the verification process.

HMAC Performed Over Raw Ciphertext

In version 6, the HMAC is performed over the raw ciphertext bytes. In version 5, HMAC was performed over the Base64 encoding of the ciphertext.

Representation of Crypto Fields in Records

In version 6, the representation of cryptographic fields has been hidden from the record payload.

In version 5, the payload of encrypted records was the Base64 encoding of the JSON encoding of an object with the fields ciphertext, hmac, and IV.

In version 6, we embed all 3 elements in one opaque field. While the client will need to know how to extract the individual cryptographic components, the transport layer happily deals with a single string of bytes. In the case of JSON encoding, the payload is now the Base64 representation of the single string, not a JSON string.

Requiring Storing Separate Key Pairs for Collections

Version 6 requires that separate Key Bundles be used for each collection.

The previous version had a default Key Bundle that could be used to decrypt multiple collections. Clients would look for a collection-specific key in the crypto/keys record then fall back to the default. In practice, clients (notably Firefox), did not generate multiple keys by default.

Version 6 is dropping support for the default key and requiring that each collection use a separate key.

This change is being made in an effort to be forward compatible with future data recovery and sharing scenarios. The requirement of separate keys per collections effectively requires an extra link in the crypto chain where extra functionality can be inserted for one collection without impacting other collections.

Metaglobal Record Format Change

The engines key in the metaglobal record has been renamed to repositories. Semantics are preserved.