nucypher/docs/NuCypher_Beachhead.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The NuCypher Beachhead\n",
    "### A starter kit for developers leveraging the NuCypher access management system \n",
    "\n",
    "*by Arjun Hassard, Product Lead & Struggling Writer @ NuCypher*\n",
    "\n",
    "\n",
    "This resource is aimed at developers who have decided to leverage (or are considering leveraging) NuCypher's access management system – located on Github as *nucypher/nucypher*. By focusing on the most straightforward and fundamental functionality in the NuCypher codebase, this notebook serves as a 'beachhead' for one's technical strategy – an initial foothold for developers to: \n",
    "\n",
    "1) understand the principal capabilities of the NuCypher system\n",
    "\n",
    "2) form sensible plans for integrating NuCypher into their application or protocol \n",
    "\n",
    "3) scope out and develop features involving data sharing and access control\n",
    "\n",
    "We recommend using this resource to get comfortable with relatively high-level operations, for instance *ALICE.grant()* and *BOB.join_policy()*, before moving on to more advanced NuCypher libraries, such as *nucypher/pyUmbral* – that afford greater customization, provide opportunities to economize NuCypher network usage, and enable the architecting of more complex delegation workflows.\n",
    "\n",
    "Note: we will hereafter refer to the NuCypher access management system simply as 'NuCypher'."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A note regarding the data sharing 'narrative'\n",
    "We're going to walk through a typical data sharing journey, or 'narrative', centered around production-ready code snippets. In general terms, the narrative stars a *data controller*, who intends for a *designated recipient* to gain access to her data. The data controller proceeds to connect to the NuCypher network, then creates a data sharing policy specifically for the recipient. Later, when the recipient decides to request access to the data, the NuCypher network performs the necessary permission update. Without spoiling the ending, the recipient's access is delivered swiftly and securely, and without any subsequent action required from the data controller.\n",
    "\n",
    "In order to better illustrate the relationship between the functionality present here and the real-world outcomes one might observe in a digital application (that has integrated NuCypher), we will occasionally refer to two human end-users of a hypothetical *health record management platform*: \n",
    "\n",
    "- A medical patient (this is the aforementioned *data controller*) \n",
    "- A doctor (this is the aforementioned *designated recipient*) \n",
    "\n",
    "These references will mostly appear under the subtitle: *\"Relevance to example real-world application\"*. It's worth noting that there is nothing special about patients or doctors from an access management perspective, other than that the underlying data shared via a medical application is likely to be sensitive, and the data's correct delivery critical. Hence, these hypothetical application users can be swapped out for any other end-user. \n",
    "\n",
    "Over the course of the narrative, we will also dive into important lower-level work occurring in the background, plus abstracted opportunities to customize. Please note that these are primarily conceptual explanations, and do not cover every single background process. Rather, the explanations are what we deem to be relevant to a developer's over-arching technical strategy. These reference will appear under the subtitle: \"Relevant work/optionality abstracted by *function/method/etc.*\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A high-level refresher on NuCypher's underlying cryptography (this section can be skipped)\n",
    "\n",
    "*'Threshold proxy re-encryption' is the primary cryptographic scheme powering NuCypher technology. If you've read our white paper (\"NuCypher KMS: Decentralized key management system\"), or are already familiar with proxy re-encryption, you can safely skip this section. \n",
    "\n",
    "#### *Re-encryption* (traditional approach) \n",
    "\n",
    "Let's begin by establishing what we mean by *re-encryption*. In short, re-encryption is a method to update the permission(s) associated with encrypted data. For instance, say you wish to share a document with a colleague. The document file is encrypted with your public key (i.e. only you can decrypt it) and living in remote storage. In order for your colleague to gain access securely, the document file must be encrypted again – to their public key – that is, it must be *re-encrypted*. Traditionally, files can only be re-encrypted when they are in a plaintext state – i.e. it has been decrypted. Since this prerequisite decryption process tends to occur on a remote server, this is commonly known as *server-side decryption*. Typically, the file itself isn’t subject to server-side decryption. Instead, a cryptographic conduit to the file, such as a symmetric key, undergoes the necessary decryption (and subsequent re-encryption to the colleague's public key). Either way, this approach requires you to trust at least one central authority with direct or indirect access to the contents of your document. \n",
    "\n",
    "#### *Proxy* Re-encryption \n",
    "\n",
    "Proxy re-encryption brings us closer to *true privacy* by replacing the fully trusted central authority (who *can* see your data during the re-encryption process, if they choose to), with a 'semi-trusted' third-party proxy (who *cannot* see your data during the re-encryption process, unless they successfully collude with the recipient). \n",
    "\n",
    "Let's walk through a simplified proxy re-encryption flow. As with traditional re-encryption, we'll begin with your encrypted bulk data (a document), living in remote storage. The 'conduit' to the document – a standard symmetric key – has been encrypted with your public key – and we'll call the output of this encryption *'Ciphertext A'*. \n",
    "\n",
    "As before, you wish to grant a colleague access to the document. To achieve this with proxy re-encryption, your application generates a special kind of key known as a *re-encryption key*. This is ‘constructed’ using your private key, and your colleague’s public key. At no point is your private key transferred anywhere, nor exposed to anyone. Nor is it possible to reverse engineer your private key from the re-encryption key – again, barring collusion between the proxy and the recipient, your colleague.\n",
    "\n",
    "Next, the *re-encryption key* and *Ciphertext A* are sent to an available proxy. The proxy’s job is to transform Ciphertext A using the re-encryption key. What emerges from the transformation is a totally new ciphertext – *'Ciphertext B'* – made especially for your colleague – decryptable using their private key. When they decrypt *Ciphertext B*, they get the original symmetric key in plaintext – which they use to access the original document you shared with them.\n",
    "\n",
    "The presence of a third-party (the proxy) confers a range of benefits to developers, which we'll explore in detail over the course of this notebook. However, there is a deeply significant attribute to note at this stage - despite the fact a third-party is *managing access* to your data, they do not ever *gain access* to the data. Both the document and corresponding symmetric key remained safely encrypted throughout entire the sharing journey. In other words, you didn't have to trust the proxy, or indeed anyone, to get your colleague access to your document. \n",
    "\n",
    "#### *Threshold* Proxy Re-encryption \n",
    "\n",
    "*Threshold proxy re-encryption*, also known as *'Umbral'*, is the official cryptographic foundation on which the NuCypher access management system is built. *Umbral* takes standard *Proxy Re-encryption* and increases the security and performance.\n",
    "\n",
    "At a very high-level, the main improvement Umbral brings to *proxy re-encryption* involves the all-important re-encryption key. Rather than sending it to a single proxy, the re-encryption key for a sharing policy is split into multiple fragments (called *'KFrags'*). These are sent to participating proxies (for example, 20 proxies), each of whom uses their KFrag to transform a ciphertext. This generates a fragment of transformed ciphertext (called a *'CFrag'*). When a requisite number of CFrag outputs are combined, a complete ciphertext emerges (similar to *'Ciphertext B'* from earlier), that the recipient can use to access the underlying data. \n",
    "\n",
    "Umbral is an *'M-of-N'* scheme that employs a form a of Shamir Secret Sharing at the CFrag collation stage. This enables the generation of a complete *Ciphertext B* from a smaller number of CFrags than the number of KFrags originally sent out to proxies. The minimum 'threshold' of CFrags required to complete the sharing process can be set in advance by the developer, hence the scheme's name. This approach brings valuable security and network redunancy benefits. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With all this in mind, let's look at some code!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Importing stuff\n",
    "We'll start by importing two built-in functions. We'll also import *maya*, which makes managing timezones easier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "ename": "ModuleNotFoundError",
     "evalue": "No module named 'maya'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-4-61648c1aad25>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mdatetime\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0msys\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mmaya\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'maya'"
     ]
    }
   ],
   "source": [
    "import datetime\n",
    "import sys\n",
    "import maya"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll import some Classes – these bestow specific powers to each 'character' involved in the data sharing narrative. \n",
    "\n",
    "A common misconception made about the characters present in the NuCypher codebase is that they precisely embody the end-users of a digital application – a patient, a doctor, etc. Instead, the NuCypher characters are in fact the *tools*, or *devices*, utilized by end-users to achieve their goals – not the end-users themselves. This distinction will become clearer as we introduce each character (*Alice*, *Bob*, *Ursula* & *Enrico*) in detail."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nucypher.characters.lawful import Alice, Bob, Ursula\n",
    "from nucypher.characters.lawful import Enrico"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we'll import modules which enable the NuCypher network to function correctly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from examples.sandbox_resources import SandboxNetworkyStuff\n",
    "from nucypher.network.node import NetworkyStuff"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### *Ursula*: the proxy & network node\n",
    "The first character to introduce is *Ursula* – fulfilling the role of *proxy* in *proxy re-encryption*. Generally, we can think of a proxy as a third-party, anonymous, remote machine whose sole purpose is to ingest encrypted text and transform it, thereby performing a permission update. In other words, provide the *re-encryption* in *proxy re-encryption*. \n",
    "\n",
    "The NuCypher network, once launched, will comprise thousands of proxies scattered around the world, collectively performing the re-encryptions necessary to power the data sharing narratives within applications/protocols leveraging NuCypher. Every time a permission needs updating, or a data recipient has their access to granted or revoked, multiple Ursulas are marshalled to make this happen. Ursulas provide this service in exchange for fees, paid by developers like yourself. Ursulas are referred to as nodes, network participants, network nodes, miners, re-encryptors, proxies and service-providers in other contexts. Similar to the other characters, 'Ursula' represents the  device utilized by a human node operator to provide a re-encryption service.\n",
    "\n",
    "We won't be touching Ursula directly over the course of this narrative. Instead, we'll be focusing on the characters most pertinent to an end-user facing application: Alice, Bob & Enrico. Nonetheless, it's critical to have a grasp of Ursula's role, at least on a conceptual level – as there is a great deal of behind-the-scenes interaction with Ursulas occurring in virtually all NuCypher data sharing narratives. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### *Alice*: the data controller\n",
    "Let's now introduce Alice, the second character in our data sharing narrative. Like all NuCypher characters, Alice has a Class which prescribes her various unique abilities/powers. She represents all of the following: \n",
    "- Data producer\n",
    "- Data owner\n",
    "- Data delegator\n",
    "- Data controller (we will use hereafter 'data controller' to signify all of the above) \n",
    "\n",
    "As mentioned previously, it's most precise to think of Alice as the device, or devices, that a real-life data controller would use to manage their data. In our hypothetical application, that would be the  patient, managing their medical data 'through' the Alice character.\n",
    "\n",
    "Let's instantiate Alice and ensure that she is properly connected to the NuCypher network. *Network_middleware* works as a 'bridge' between our application and the network – the details of which are not relevant to this narrative. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ALICE = Alice(network_middleware=network_middleware)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next step is to set some basic parameters for a sharing policy, which we'll invoke later when we add a designated recipient to it. For this narrative, we'll choose the following four settings:\n",
    "\n",
    "1) We set the policy to expire 5 days from the moment it is created. \n",
    "\n",
    "*Note: This is an example of policy revocation. This feature is powerful because, with NuCypher, policies can be revoked (and granted) based on arbitrary conditions. Revocation is also, uniquely, enforced by the network itself. In this case, the policy is time-bounded. However, a policy could also be programmed to expire based on express input from a user, or the fulfilment (or lack of fulfilment) of other conditions – such as a confirmed payment, receipt of data, receipt of signature, or more complicated rules written into a smart contract. Policy statuses could even be determined by the output of an Oracle in relation to the outcome of a real-world event.*\n",
    "\n",
    "*You will notice that there are no further references to revoking policies after this line. That's because the relevant functions for more advanced policy revocation are not complete, and this notebook only contains working code. Look out for updates on this front.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "policy_end_datetime = maya.now() + datetime.timedelta(days=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2) We'll set 'n', the number of Ursulas (proxies) to which the forthcoming re-encryption job will be assigned, to 1. \n",
    "\n",
    "*Note: Normally, outside of a demo walkthrough like this, a policy's re-encryption job is always assigned to many more Ursulas than just 1. Setting n > 1 splits the re-encryption key into a corresponding number of fragments, each of which is sent to a different Ursula. Generally, the greater the number of key fragments (& Ursulas), the greater the security of the policy. We'll explain this process in greater detail later, in the section entitled 'Retrieiving the data with Bob'.* "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3) We'll set 'm', the minimum number of Ursulas required to complete the re-encryption job for the recipient to access the data, also to 1. \n",
    "\n",
    "*Note: Similarly, the minimum number of re-encrypting Ursulas, or the 'threshold', would always be set to greater than 1. As this is an 'm-of-n' scheme, n should be greater than m. A possible choice is m = 10, n = 20.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "m = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4) Finally, we'll add a path – this is the 'label' of the data we intend to share. \n",
    "\n",
    "*Note: the label variable does not refer to the exact file that we intend to share. Rather, it functions more like a tag, where any amount of data can utilize the same label – somewhat like multiple files in a single directory. As we shall see, this means that the data does not necessarily need to exist at this stage, and can be produced later.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "label = b\"secret/files/and/stuff\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we can begin granting permissions to anyone, we need Alice to connect to the NuCypher network and the Ursulas/proxies that will perform the re-encryption service on her behalf. We bootstrap this process by connecting Alice to one known Ursula, who then connects Alice to other Ursulas in their network. Note: there will be an option to start the connection process with Ursulas run by NuCypher, if this is preferable. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ALICE.network_bootstrap([(\"localhost\", 3601)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### *Bob*: the designated recipient\n",
    "Bob is the second character in the data sharing narrative. Conceptually, his role is fairly passive – he is simply the chosen recipient. However, as we shall see, there are a number of more complex actions/requirements this character must perform/fulfill in order to securely gain access to the data. Remember, like Alice, Bob represents the device(s) a real-world data recipient – or in our hypothetical application, the doctor – would use to achieve their goals. \n",
    "\n",
    "Right now, we'll just instantiate Bob, so we can add him to Alice's sharing policy in the next section.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "BOB = Bob()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating a sharing policy with *Alice*\n",
    "It's time to create our first sharing policy, using the .*grant()* function. Sharing policies have three distinguishing features, which are defined when a policy is granted: \n",
    "\n",
    " 1) The data controller, *Alice*.\n",
    " \n",
    " 2) The designated recipient, *Bob*.\n",
    " \n",
    " 3) The file(s) location, *label*.\n",
    "\n",
    "In theory we could leave the *label* argument empty at this stage. However, this would mean ceding important security benefits. For example, in the highly unlikely scenario where the designated recipient (Bob) successfully colludes with all the proxies (Ursulas) partaking in this sharing narrative, the colluders would only gain access to the data tagged with this policy's single label. This security feature is defined as *weak perfect forward secrecy* (wPFS), wherein all other data belonging to the data controller, accessible via older sharing policies, is protected from these adversaries – provided that said adversaries did not also actively interfere (i.e. collude in the same way) with those older policies. In this way, labels are a compromise for creating a practical, reliable form of forward secrecy in a proxy re-encryption environment.\n",
    "\n",
    "\n",
    "##### Relevance to example real-world application\n",
    "\n",
    "In our hypothetical real-world application, the granting of a policy is likely to be the first instance where the medical patient proactively does something: namely, decides to add a specific doctor to an approved list of those that can see her health records. Note that Alice does not need to encrypt or send the actual data at this moment. In fact, the data doesn't even need to exist yet – for example, not-yet-available results from a medical exam could be shared automatically under this policy (to all the doctors added to it), once they are ready. More on this scenario below, when we introduce *Enrico*.\n",
    "\n",
    "A few more words on the *label* architecture. Besides the security benefits mentioned above, they are a flexible way to choose per-category sharing parameters for specific data. For example, a patient might have various categories of health data, each with its own unique set of permissions – their oncology results might be shared with a small group of specialists for a limited period, while the cardio data from a wearable device is shared with all their doctors, indefinitely. Labels enable the application developer to stratify the data in this way, without having to store all the data with the same parameters in the same location or file path. \n",
    "\n",
    "\n",
    "##### Relevant work/optionality abstracted by the .grant() method\n",
    "\n",
    "When *.grant()* is run, a great deal of background work is triggered – work which lays a metaphorical path through the NuCypher network for the forthcoming sharing of data. \n",
    "\n",
    "Firstly, the number of Ursulas we requested earlier (n), need to be located. Next, an \"arrangement\" is proposed to those Ursulas, which contains relevant parameters to help each Ursula decide if they want to participate. For example, the arrangement object specifies the funds available to pay for the re-encryption service (known as the \"deposit\" – this can be altered to suit the application's economic requirements) as well as the duration of the policy, which we set earlier. \n",
    "\n",
    "Once the required number (n) of Ursulas have accepted the arrangement, an equivalent number of re-encryption key fragments, known as \"KFrags\", are generated. A quick reminder: a *re-encryption key* is a special key unique to NuCypher. It is constructed, safely and locally, using Alice's private key and Bob's public key. Ursulas use re-encryption keys to transform/re-encrypt ciphertexts such that they are decryptable by Bobs – specifically, the Bobs who supplied their public key. So, to get KFrags, the re-encryption key is split into multiple fragments. Each of those stands ready to be sent to the participating Ursulas – one KFrag for each Ursula. \n",
    "\n",
    "Finally, the *.grant()* method also triggers the generation of a \"TreasureMap\" – this returns the locations of all the participating Ursulas, once they have confirmed their involvement. Bob will use the TreasureMap later, when he wants to retrieve the data Alice has shared with him. \n",
    "\n",
    "Now we have a sense of what's happening under the hood, let's go ahead and create a sharing policy: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "policy = ALICE.grant(BOB, \n",
    "                     label,\n",
    "                     m=m, \n",
    "                     n=n,\n",
    "                     expiration=policy_end_datetime)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our final step involving Alice is to save her public key, such that Bob can easily locate and use it, once the designated recipient is ready to request access. \n",
    "\n",
    "*Note: Alice has multiple public keys, that perform different roles in the data sharing process. The public key we are saving in this step is her signing public key. Later, when we want to encrypt the underlying data, we will employ her encrypting public key.*\n",
    "\n",
    "We can quickly get her *signing* public key via the 'stamp' function, then cast it into a convenient, immutable byte sequence for retrieval later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "alices_pubkey_saved_for_posterity = bytes(ALICE.stamp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This juncture marks the end of Alice's involvement in the data sharing narrative, and by extension, the required involvement of a real-life data controller – or medical patient. \n",
    "\n",
    "Of course, if the data controller wished to create another policy, they would need to return online in order to choose the recipient and label, potentially tweak other parameters, and execute the grant function. \n",
    "\n",
    "Nevertheless, with NuCypher, a  medical patient would be able to go permanently offline at this point, and still see data shared with their doctor, provided that it fell under the correct label – for example, diagnosis data that is yet to be finalized. This affords our medical application, or any other application leveraging NuCypher, a great deal of flexibility. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Joining a sharing policy with *Bob*\n",
    "As it stands, we have successfully created a sharing policy for a specific Bob (our recipient's device), and this policy is now sitting on the NuCypher network, readily awaiting for Bob to join it. Once Bob joins the policy, the designated recipient (the doctor) will be able to access future data shared on it.  \n",
    "\n",
    "##### Relevance to example real-world application\n",
    "\n",
    "Before getting into the joining details, it's worth acknowledging the various ways a recipient would actually join a policy. In this example, we have chosen a situation where some time passes in between the creation of the policy and the first time it is used. There is also a sense that the recipient has *decided* to join the policy. However, one could very well design an application where the opposite occurs – the specified recipient is automatically, and immediately, added to any policy bearing their public key. Plus, anything in between, including the joining action being contingent on the fulfilment of specified conditions. \n",
    "\n",
    "##### Relevant work/optionality abstracted by the .join_policy() method\n",
    "\n",
    "When Bob joins the policy, a few important things occur in the background. Information relating to the policy, including the data controller and recipient's respective public keys, and the data's *label*, are hashed and saved as a variable ('hrac') for record-keeping and other uses. Separately, Bob also connects to the NuCypher network, and, using the TreasureMap and *hrac*, finds the participating Ursulas. This gets Bob, and therefore the recipient, ready to receieve data once it is sent. \n",
    "\n",
    "To join the policy, Bob needs certain information to hand: \n",
    "\n",
    "1) The *label* – he needs to know what data he's after.\n",
    "\n",
    "2) *Alice's signing key* – he needs the data controller's signature\n",
    "\n",
    "3) *Signature verification* – he needs to check the signature is legitimate and that he's receiving data from the correct source. \n",
    "\n",
    "4) *List of nodes* – he can optionally connect himself to the network at this time, using the same 'bootstrap' method Alice used earlier on in the narrative."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "BOB.join_policy(label, #1 \n",
    "                alices_pubkey_saved_for_posterity,  #2\n",
    "                verify_sig=True, #3\n",
    "                node_list=[(\"localhost\", 3601)] #4 \n",
    "                )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Preparing the bulk data (this section can be skipped)\n",
    "We're going to prepare some data to play the role of the data that would actually be shared in a real-world application - the 'underlying' or 'bulk' data. The main reason we include this section is to give readers a sense of the data format we'll be working with next, but there's nothing here that's unique to NuCypher. \n",
    "\n",
    "In short, we're going to decompose the novel 'Finnegans Wake' into plaintext lumps, and then share these periodically. We'll also print out some metadata to gauge performance. \n",
    "\n",
    "##### Relevance to example real-world application\n",
    "\n",
    "It's worth pointing out that, in a normal application's architecture, the 'data' shared in a narrative like this is unlikely to a human-readable text. Instead, the literary passage we are sharing here is standing in for a more likely candidate – a symmetric key pertaining to some individual bulk data (e.g. photos, videos, chat message(s), collaborative documents, medical data, etc.), hosted in decentralized (IPFS, Storj etc.) or centralized (S3, etc.) storage somewhere, that the recipient can use to decrypt and view it. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "finnegans_wake = open(sys.argv[1], 'rb')\n",
    "\n",
    "start_time = datetime.datetime.now()\n",
    "\n",
    "for counter, plaintext in enumerate(finnegans_wake):\n",
    "    if counter % 20 == 0:\n",
    "        now_time = datetime.datetime.now()\n",
    "        time_delta = now_time - start_time\n",
    "        seconds = time_delta.total_seconds()\n",
    "        print(\"********************************\")\n",
    "        print(\"Performed {} PREs\".format(counter))\n",
    "        print(\"Elapsed: {}\".format(time_delta.total_seconds()))\n",
    "        print(\"PREs per second: {}\".format(counter / seconds))\n",
    "        print(\"********************************\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Encrypting and sharing on Alice's behalf with *Enrico*\n",
    "\n",
    "We're nearly there! We have a sharing policy, a confirmed recipient, and some underlying/bulk data we want to share. However, the bulk data is currently in plaintext format. To securely transport it to the recipient, we need to encrypt it – in such a way that it can remain encrypted until it reaches Bob, and that he alone can decrypt it.\n",
    "\n",
    "To achieve this, we're going to introduce the final character in our narrative: *'Enrico'*. An important reason Enrico exists is so that Alice (& the data controller) are not needed at this stage in the narrative to perform the encryption themselves. However, this may be doing Enrico a disservice, as the character can be harnessed for more – including 'producing' data on the data controller's behalf. \n",
    "\n",
    "##### Relevance to example real-world application\n",
    "\n",
    "In general, Enrico affords great flexibility to applications leveraging NuCypher, because it widens the range of possible entities that can encrypt for Bob. \n",
    "\n",
    "Imagine a medical scenario where the patient is expecting a regular stream of test results, from a third-party blood-testing lab, at some point in the future, and wants those results to be immediately shared with the doctors on an existing sharing policy. It would be undesirable to stay online and continually grant access to each test-result as it arrived. \n",
    "\n",
    "To avoid this, the application can assign a special role to the blood-testing lab, such that the lab gains the encryption powers of Enrico. Similar to the way 'Alice' represents the patient's device, Enrico can do the same for primary producers of data – in this case, the blood-testing lab. Hence, the lab can write the new data onto the sharing policy, under the specified label. Then, the designated doctors can access it. Thus, Enrico is 'producing' data on the patient's behalf. \n",
    "\n",
    "Note: this does not mean that Enrico, or the lab, can access other data on the sharing policy - including their own test results, once they've been saved. Accessing the data would require a *read* permission, which has only been granted to Bob – through the existence of a sharing policy, and re-encryption key, in his name. Rather, Enrico(s) have been solely granted a *write* permission to the policy. \n",
    "\n",
    "In general, these write permissions can be recorded directly onto a distributed ledger, such that the application can reliably confirm the correct actor is producing/encrypting data for a given sharing policy. As we will encounter below, this data also comes with a signature, which helps the recipient further verify its authenticity. \n",
    "\n",
    "\n",
    "Let's first create a Enrico specifically for our sharing policy.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_source = Enrico(policy_pubkey_enc=policy.public_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We're going to use Enrico to encrypt the data we wish to share – the passages from Finnegans Wake. We're also going to generate an artifact known as a 'MessageKit' – this contains a ciphertext (the encrypted version of the passages), plus two unique identifiers: the policy's public key, and the recipient's public key. \n",
    "\n",
    "We'll also take this opportunity to generate a signature. The signature is unique to that message, and, when verified, can confirm that the data has not been corrupted or manipulated while in transit, and that the data did indeed come from the expected, correct Enrico. This further mitigates the risk of incorrect data being added to a label from a malignant source that has gotten hold of the relevant public keys. The signature can be sent to Bob via a side-channel, or published publicly as a evidence of the manner in which the data was shared.\n",
    "\n",
    "Let's go ahead and create a tuple for these:  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "message_kit, _signature = data_source.encapsulate_single_message(plaintext)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want Bob to be able to verify that MessageKit and signature he will soon access came from the right Enrico, so we'll save its public key, in the same way we did Alice's public key earlier: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_source_public_key = bytes(data_source.stamp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Retrieiving the data with *Bob*\n",
    "To give the designated recipient even greater independence with regard to when and how they access the data, we're going to include a snippet which reconstructs the Enrico from Bob's perspective. This may not be necessary, if the Enrico remains online and available, but this means there's no obligation to keep Enrico around."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "enrico_as_understood_by_bob = Enrico.from_public_keys(\n",
    "        policy_public_key=policy.public_key,\n",
    "        enrico_public_key=data_source_public_key,\n",
    "        label=label\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We're now in the endzone – just one more method to run before our recipient gets their hands on the 'cleartext' – the data that the data controller intended them to see, decrypted and readable. We'll achieve this with *.retrieve()*, which takes three arguments to compute correctly:  \n",
    "\n",
    "1) The MessageKit – the actual data. \n",
    "\n",
    "2) Enrico – where we got the data from.\n",
    "\n",
    "3) Alice's public key – an identifier for the original delegator of the data. \n",
    "\n",
    "##### Relevance to example real-world application\n",
    "\n",
    "In our hypothetical medical application, this is the stage where the doctor accesses the patient's health records. How the doctor does this is flexible – the application could automatically retrieve the data as soon as it is suitably encrypted by Enrico, or the doctor could be required to proactively decide when they want their access to begin – the latter may be useful if it's desirable to notify the patient that the doctor has now begun their analysis/diagnosis, and/or how much time the doctor spends looking at the data, for example. \n",
    "\n",
    "##### Relevant work/optionality abstracted by the .retrieve() method\n",
    "\n",
    "Although the inputs above are easy to grasp, what happens in the background is more complex. \n",
    "\n",
    "The first thing that needs to be checked is that the number of Ursulas who completed a re-encryption is equal to or greater than the minimum we specified, right at the start (m). We set this to be some fraction of n (the total number of Ursulas involved), to protect against redundancy and ensure that, even if some Ursulas fail to re-encrypt the KFrags they'd received, that the data could reach it's destination. We perform this check using the TreasureMap we generated above, which guides us to the participating Ursulas. \n",
    "\n",
    "Now we know that a sufficient number of re-encryptions occurred, it's time to work with the output of that work – \"CFrags\". In the same way a KFrag is a fragment of key, a CFrag is a fragment of ciphertext. When brought together by Bob, these fragments combine into a complete ciphertext, that can be used to access the underlying data. \n",
    "\n",
    "So, the next step is to gather those CFrags. In order to ensure that the number of collectable CFrags is as expected, the figure is checked against something called a WorkOrder, which was generated previously. It's not worth digging into this, other than to say there is an auditable trail that helps prevent aberrant or malicious behavior by Ursulas.\n",
    "\n",
    "Finally, we're going to attach the gathered CFrags to an artifact called a 'capsule'. The capsule's role, for the purposes of this narrative, is to bring together the MessageKit (the data we want to share) and the collection of ciphertext fragments, such that Bob can get to the cleartext, and hence the designated recipient is able to access the data they were granted. In reality, the capsule fulfils more than just this, but we can think of the capsule as a way of simply protecting the underlying data – for example in the scenario where Bob makes a request without the correct CFrags.\n",
    "\n",
    "\n",
    "Now we understand the essential processes taken care of by *.retrieve()*, let's use it to get our hands on the data we set out to share:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "  delivered_cleartext = BOB.retrieve(message_kit=message_kit,\n",
    "                                       data_source=enrico_as_understood_by_bob,\n",
    "                                       alice_verifying_key=alices_pubkey_saved_for_posterity)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We've done it! Let's quickly check it's correct:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "    assert plaintext == delivered_cleartext\n",
    "    print(\"Retrieved: {}\".format(delivered_cleartext))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Congratulations on making it through the NuCypher Beachhead. This notebook is certainly lengthy, but it exists as a initial point of reference for the entire NuCypher Access Management System. Once you have digested the concepts that we explored here, you are in a great position to plan out an NuCypher integration, and eventually, provide your users with secure, flexible and powerful data sharing functionality. \n",
    "\n",
    "Whether reading this notebook was your first dive into NuCypher, or you've been familiar with our codebase for a while, we're very keen to hear your feedback. We're particularly interested in whether this notebook has the right depth, detail, clarity and coherence. You can email me – arjun [at] nucypher [dot] com. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}