Finish porting existing documentation from within codebase

pull/682/head
Kieran Prasch 2019-01-22 12:44:39 -08:00 committed by Kieran Prasch
parent cd924baedc
commit 0aacd4e703
No known key found for this signature in database
GPG Key ID: 199AB839D4125A62
9 changed files with 78 additions and 260 deletions

View File

@ -1,202 +0,0 @@
NuCypher
========
Depencencies / technologies
=============================
* Python 3.5+
* rpcudp - python3.5 branch
* kademlia - python3.5 branch
* lmdb for persistence
* Rekeys and metadata represented as Python dicts, msgpacked and encrypted,
stored in lmdb
* C bindings to OpenSSL for encryption (?)
* PyCryptodome / PyCrypto for symmetric block ciphers
* buildout for building (more convenient when using custom git dependencies?)
Decentralized network
========================
`Kademlia <https://github.com/bmuller/kademlia>`_ by default (see kademlia.network.server) saves data in multiple nodes,
and also clients are servers there.
We need to split up client and server (that is, get and set methods of the
client don't save data in the current node).
In the first version of the protocol, we will use m-of-n threshold re-encryption
for ECIES. It means, that instead of one re-encryption key, we will generate
n re-encryption keys and store each with one node in the network.
By default, Kademlia stores data *copied* to *several* closest nodes. Instead,
we want find n closest and responding nodes and store rekeys with them, w/o
duplicating. The methods get() and set() in ``kademlia.network.Server`` are to
be used only as documentation. We will have to write our own ClientServer class.
The protocol (``kademlia.protocol.KademliaProtocol``) is also to be re-written for
reencryption rather than returning data.
When connections are established with nodes, they should tell their pubkeys
(or rather the pubkeys should be used as public nodeids).
New methods should include: ``store_rekey`` (with policy), ``reencrypt``,
``remove_rekey``.
Nodes should be able to have information on how long they can store
re-encryption keys for (this information will come from metadata written
on blockchain). Clients will be able to knows in advance.
Each node is identified by its pubkey, and clients will be able to know
in advance which node is available to store the policy for long enough.
Another feature to be implemented here is replicating all the rekeys to a
different node is the node is going to be offline for a long time
(complete shutdown). If this happens, the node passes all its rekeys
to node(s) which are capable to handle them for long enough, and write
this information on blockchain.
When a node start, a key which will be used to decrypt the persisted
data can be generated, read from a file (not very safe!), made from
passphrase (safe if the passphrase is long enough and generated),
or stored + delegated access using NuCypher itself.
This kademlia-based protocol is *not* intended to be anonymous, we hope for
split-key reencryption properties (e.g. that < m random nodes will be corrupt).
Persistence layer
====================
The persistence layer to be used is lmdb. Rekeys and metadata can be represented
as Python dictionaries. And when persisted - serialized via msgpack and stored in
lmdb (in an encrypted form).
API
=====
First, we create a Python API. This API should allow to:
* generate a new random symmetric key (this is usually implicit)
* encrypt (off-chain, but store meta-information with files)
* grant and revoke access (on chain)
* decrypt_key (query the network)
* decrypt (data using a key from decrypt_key)
also we can have similar functions for signing rather than just
encryption/decryption in the next versions.
The API should be implemented for: Python (native client),
JSON server (localhost, similar to bitcoind), Javascript (native).
Encryption
=============
We should be able to have algorithms pluggable, so we will note which algo
did we use for pubkey encryption / reencryption in a rekey meta-information.
The choices are:
* Normal BBS98 (1-of-n) (debug only);
* Normal ECIES (1-of-n);
* AFGH (n-of-n) (debug only);
* Split-key ECIES (m-of-n, production ready).
As soon as split-key ECIES is available, we immediately switch to it.
The curve should also be specified. Makes sense to use secp256k1 as it was
well tested with Bitcoin.
We also store which block cipher we used. The choices are:
* AES256-GCM (lisodium-based library for zerodb is the fastest?)
* Other AES modes (maybe not vulnerable to reusing the IV)
* Salsa20 from libsodium
Consumers of the data identify it by owner's public key and the path. It is
important that someone else doesn't submit reencryption keys for the same
path. So, at first, we should add digital signatures for hash(path + policy)
(using pycrypto library?). Then this signature and associated data will be
recorded on the blockchain so that it is publicly verifyable. The miners
have to accept only paths with valid signatures.
Public key should be used as a part of rekey address.
The scheme wouldn't work with anonimity on, so it will have to be redesigned
to be anonymous in later versions of the protocol.
Mapping in the rekey store:
* hash(path) -> (rekey, policy, algorithm, signature, pubkey)
The pubkey here is *not* the encryption key, it's a separate signing key.
Algorithms/libraries to use:
* ECDSA (pycryptodome / pycrypto), secp256k1 curve
* sha3 module for hash functions (let's be future-proof!)
(included in standard hashlib with python3.6+)
Non-anonymous protocol
============================
Owner of the data has signing keypair sk_o/pk_o and encrypting keypair ske_o/pke_o.
ske_o = hash(sk_o)
The path can be a string or a tuple (where a string is equivalent to a tuple with length one).
An example of a tuple-path::
path = ('', 'home', 'ubuntu', 'secret.txt')
When a path contains many elements in the tuple, one can share not only one file, but also whole directories.
If the PRE algorithm is not multihop+unidirectional (there is only one like that), the encryption keys for
files/directories are::
key[i] = hmac(ske_o, '/'.join(path[:i + 1]))
so, key[0] is the (private) key for whole ``/``, key[1] for ``/home`` etc.
When a file (or object) with ``path`` is encrypted, the owner generates a symmetric key for it,
encrypts it with every of key[i] and attaches to the file (or returns just keys if asked for).
When attached to the file, the encrypted symmetric keys are stored together with hashes of
paths and subpaths so that we can verify that this file is encrypted for the users of this path.
When a file or a directory is shared with someone with a key pair (sk_b/pk_b), the re-encryption
key is created for a path shared::
rk = rekey(key[i], pk_b)
where key[i] is calculated in-place from the path, and rk might mean also all re-encryption shares
rather than just one rekey.
After the calculation, the rk is stored with the NuCypher network. It will be stored in the following
persistent mapping::
hmac(pk_o + pk_b, '/'.join(path[:i])) -> (rk, policy, algorithm, sign(hash + rk + policy + algorithm, pk_o))
The policy is signed by the owner's public key in order to protect from submitting by someone else.
In order to protect from submitting after being revoked, the signature can be saved on blockchain
when the policy is submitted and when revoked so that no one can use a replay attack to submit it
again (needs to be rethoght for anonymous protocol).
All the interactions are encrypted with each node's public key + symmetric key, so that nobody
except that node can see the rekey. It's usually one-time interaction over rpcudp, so public key
encryption would work faster than TLS would work.
When a client requests to re-encrypt data, the request is initiated by a command like::
data = client.decrypt(encrypted_data, pk_o, '/path/to/file/or/directory/where/it/is')
What happens under the hood is the following is sent to the miner node in a request encrypted
with miner's public key (on the client side)::
# Path is transformed into a series of hashes
path_split = path.split('/')
path_pieces = ['/'.join(path_split[:i + 1]) for i in len(path_split)]
path_hashes = [hmac(pk_o + pk_b, piece) for piece in path_pieces]
# Multiple pieces are when m-of-n split-key reencryption is used
# if not, there is only one piece
edata_pieces = low_level_client.reencrypt(encrypted_data, pk_o, path_hashes)
data = decrypt_m_of_n(edata_pieces, sk_b)
When the server gets a request with all the path_hashes, it looks for a reencryption key
corresponding to at least one of them, and uses the last one of what it found to reencrypt
the data::
def request_handler(encrypted_data, path_hashes):
for p in path_hashes[::-1]:
if p in storage:
rk = storage[p]
return reencrypt(encrypted_data, rk)
raise KeyNotFound

View File

Before

Width:  |  Height:  |  Size: 44 KiB

After

Width:  |  Height:  |  Size: 44 KiB

View File

Before

Width:  |  Height:  |  Size: 16 KiB

After

Width:  |  Height:  |  Size: 16 KiB

View File

Before

Width:  |  Height:  |  Size: 14 KiB

After

Width:  |  Height:  |  Size: 14 KiB

View File

Before

Width:  |  Height:  |  Size: 13 KiB

After

Width:  |  Height:  |  Size: 13 KiB

View File

Before

Width:  |  Height:  |  Size: 8.0 KiB

After

Width:  |  Height:  |  Size: 8.0 KiB

View File

Before

Width:  |  Height:  |  Size: 12 KiB

After

Width:  |  Height:  |  Size: 12 KiB

View File

@ -1,38 +1,105 @@
# Upgradeable contracts
Smart contracts in Ethereum are not really changeable. Even it can not be deleted - contract still exists in blockchain after `selfdestruct` (only storage cleared).
So fixing bugs and upgrading logic is to change contract (address) and save previous storage values.
# Upgradeable Contracts
Smart contracts in Ethereum are not really changeable. Even it can not be deleted - contract still exists in blockchain after `selfdestruct` (only storage cleared).
So fixing bugs and upgrading logic is to change contract (address) and save previous storage values.
Simple way for this is to create new contract, then copy storage there and destruct (mark as deleted) old contract.
But in this case client should change address for requested contract and also while migration will be active two versions of contract.
But in this case client should change address for requested contract and also while migration will be active two versions of contract.
More convenient way is to use proxy contract with interface where each method redirect to the target contract.
It's good option because client uses one address most of the time but also have some minus - when we should add some methods then need to change proxy address too.
It's good option because client uses one address most of the time but also have some minus - when we should add some methods then need to change proxy address too.
Another way is using fallback function in proxy contract - this function will execute on any request, redirect request to target and return result value (using some opcodes).
Almost like previous option, but this proxy doesn't have interface methods, only fallback function, so no need to change proxy address if we should change methods.
Almost like previous option, but this proxy doesn't have interface methods, only fallback function, so no need to change proxy address if we should change methods.
This way is not ideal and has some restrictions (here only major):
* Sending Ether from client's account to contract uses fallback function. Such transaction could consume only 2300 gas (http://solidity.readthedocs.io/en/develop/contracts.html#fallback-function)
* Proxy contract (Dispatcher) holds storage (not in the contract itself). While upgrading storage values should be the same or equivalent (see below)
# Sources
## Sources
More examples:
* https://github.com/maraoz/solidity-proxy - good realization of using libraries (not contracts) but too complex and some ideas is obsolete after Byzantium hard fork
* https://github.com/willjgriff/solidity-playground - most of the upgradeable code taken from this repository
* https://github.com/0v1se/contracts-upgradeable - almost the same but also have code for verifying upgrade
# Interaction scheme
![Interaction scheme](pics/Dispatcher.png)
## Interaction scheme
![Interaction scheme](../.static/img/Dispatcher.png)
* Dispatcher - proxy contract that redirects requests to the target address.
Also it clearly holds own values (owner and target address) and stores the values of the target contract but not explicitly.
Client should use result contract or interface ABI while sending request to the Dispatcher address.
Owner can change target address by using Dispatcher ABI.
Dispatcher contract uses `delegatecall` for redirecting requests, so msg.sender remains client address and uses storage from dispatcher when executing method in target contract.
If target address is not set or target contract is not exists result may be unpredictable, because `delegatecall` will return true.
* Contract - upgradeable contract, each version should have same order of storage values.
New versions of contract can expand values, but must contain all old values (first of all should contain values from dispatcher).
This contract is like library because it's storage is not used.
If client send request to the contract without using dispatcher then request could be executed without exception
but using wrong target address (should be dispatcher address) and wrong storage (should be dispatcher storage).
# Development
## Development
* Use Upgradeable as base contract for all contracts that will be used with Dispatcher
* Implement `verifyState(address)` method which checks that new version has correct storage
* Implement `finishUpgrade(address)` method which should copy initialization data from library storage to the dispatcher storage
* Implement `finishUpgrade(address)` method which should copy initialization data from library storage to the dispatcher storage
* Each upgrade should include tests which check storage equivalence
## Desired Properties
* Nodes decide which update should occur;
* Nodes can rollback contract if new version has bugs.
### Approaches
* "Hard-fork"
![Hard-fork](../.static/img/Hard-fork.png)
Each version is a new contract with separate address and storage.
Nodes should change contract address that they use.
- Advantages:
- Code is simpler, no special requirements;
- Each node can choose which contract to use.
- Disadvantages:
- There are two versions of contract while updating, so contracts should work together.
Also we can add another contract (Government) for voting and migration between versions.
* [Dispatcher](README.MD) (proxy)
![Dispatcher](../.static/img/Dispatcher2.png)
Using proxy contract that holds storage and library address.
Updating is changing only one library address in proxy contract.
- Advantages:
- Instant update without changing address for nodes.
- Disadvantages:
- Certain rules for updating the contract storage,
better to write additional methods for testing contract storage;
- A voting contract (Government) is required for a legitimate upgrade.
### Implementation
* "Hard-fork"
* Soft updating with two contracts
![Hard-fork-impl1](../.static/img/Hard-fork2.png)
Updating contracts should contain methods for transfer data (amount of locked tokens, balance etc.).
For example, change manager address from old to new in Wallet contract.
Also both version should interact for correct mining
(all locked blocks will be sum from old and new versions in the current period).
For rollback will be enough to move data from the new version back to the previous.
In some moment, new version have to disable previous contract and move remaining data to the new version.
* Full update from one contract to another
![Hard-fork-impl2](../.static/img/Hard-fork3.png)
All nodes vote for updating using additional contract.
After the end of voting old contract should be blocked and new version is activated (or created).
And then data will be copied from old version to new, for example, by new contract.
Rollback is almost the same: new version is paused,
data is moved back to the old version and old version is activated.
So main task is the addition of methods for obtaining data for old and new versions.
* Dispatcher
![Dispatcher-impl](../.static/img/Dispatcher3.png)
After voting Government contract changes library address in proxy.
Rollback is changing address back from the new library to the old.
Main goal is create right voting and check storage while setting new address.

View File

@ -1,47 +0,0 @@
# Desired properties:
* Nodes decide which update should occur;
* Nodes can rollback contract if new version has bugs.
# Approaches
* "Hard-fork"
![Hard-fork](pics/Hard-fork.png)
Each version is a new contract with separate address and storage.
Nodes should change contract address that they use.
- Advantages:
- Code is simpler, no special requirements;
- Each node can choose which contract to use.
- Disadvantages:
- There are two versions of contract while updating, so contracts should work together.
Also we can add another contract (Government) for voting and migration between versions.
* [Dispatcher](README.MD) (proxy)
![Dispatcher](pics/Dispatcher2.png)
Using proxy contract that holds storage and library address.
Updating is changing only one library address in proxy contract.
- Advantages:
- Instant update without changing address for nodes.
- Disadvantages:
- Certain rules for updating the contract storage,
better to write additional methods for testing contract storage;
- A voting contract (Government) is required for a legitimate upgrade.
# Implementation
* "Hard-fork"
* Soft updating with two contracts
![Hard-fork-impl1](pics/Hard-fork2.png)
Updating contracts should contain methods for transfer data (amount of locked tokens, balance etc.).
For example, change manager address from old to new in Wallet contract.
Also both version should interact for correct mining
(all locked blocks will be sum from old and new versions in the current period).
For rollback will be enough to move data from the new version back to the previous.
In some moment, new version have to disable previous contract and move remaining data to the new version.
* Full update from one contract to another
![Hard-fork-impl2](pics/Hard-fork3.png)
All nodes vote for updating using additional contract.
After the end of voting old contract should be blocked and new version is activated (or created).
And then data will be copied from old version to new, for example, by new contract.
Rollback is almost the same: new version is paused,
data is moved back to the old version and old version is activated.
So main task is the addition of methods for obtaining data for old and new versions.
* Dispatcher
![Dispatcher-impl](pics/Dispatcher3.png)
After voting Government contract changes library address in proxy.
Rollback is changing address back from the new library to the old.
Main goal is create right voting and check storage while setting new address.