Technical Analysis of why Phala will not be affected by the Intel SGX chip vulnerabilities
2022-12-02
Author: Dr. Shunfan(Shelven) Zhou, lead researcher of Phala Network and one of the authors of the Phala whitepaper, has been in security research for over 7 years. He is the lead author of An Ever-evolving Game: Evaluation of Real-world Attacks and Defenses in Ethereum Ecosystem, USENIX Security Symposium 2020, and other papers on program analysis.
Abstract
On Nov. 30th, security expert Andrew Miller pointed out that the vulnerabilities of Intel SGX had caused great security risks to projects such as Secret Network, which has aroused extensive discussions in the community. Intel SGX, the most widely adopted implementation of TEE, is also used by Phala’s off-chain workers, however Phala utilizes a novel system design that reduces the attack surface and mitigates the consequences of potential attacks. Our dev team considers the impacts from such vulnerabilities on Phala controllable.
This article will explain to readers:
- Why ÆPIC Leak and MMIO vulnerabilities undermined Secret Network’s security
- The reasons why Phala uses Secure Enclave (TEE)
- How Phala ensures it won’t be compromised by SGX vulnerabilities
- Future security mechanisms
Summary of Secret Network Vulnerability
1. What caused the vulnerability in Secret Network?
- Hardware with unpatched vulnerabilities (ÆPIC Leak and MMIO Vulnerabilities, announced by Intel on Aug. 9, 2022) was allowed to join the Secret Network and operate nodes. The Secret team froze the registration after the whitehat reported this problem;
- The same master decryption key in Secret Network is shared across all nodes.
Combining these two, the secrecy of the network totally depended on the least secure nodes in the network. Once any one of them is compromised, the secret key is leaked and so is user data.
2. What could attackers achieve?
As I quote from https://sgx.fail:
These vulnerabilities could be used to extract the consensus seed, a master decryption key for the private transactions on the Secret Network. Exposure of the consensus seed would enable the complete retroactive disclosure of all Secret-4 private transactions since the chain began.
3. Is Phala Network affected by the same vulnerabilities?
No. Phala adopts access control on node (called ‘worker’ in Phala) registration and key hierarchy management, which I will cover later.
Resources:
Design of Phala Trustless Cloud
Why Phala needs Secure Enclave (TEE)
Phala is a permissionless compute cloud which allows any computers to join as workers, so our threat model is that any worker is not trusted by default. Bad actors operating as workers may try to:
- peek at users’ data;
- provide false execution results, or do no computation at all;
- provide low-quality services like reducing CPU performance or blocking network access.
Among these, Quality-of-Service (the third problem) is ensured by our Supply-end Tokenomic. Furthermore, we rely on features of Secure Enclave (a.k.a. Trusted Execution Environment, like Intel SGX) and our key management mechanism to ensure the trustlessness of the whole system.
Secure Enclave provides important hardware-based security promises, including:
- Confidentiality: all the memory values are encrypted;
- Execution integrity: no one can corrupt the correctness of execution even if they control the operating system and the physical computer;
- Remote Attestation: users can remotely verify the hardware and the software running inside the Secure Enclave.
To learn more details about SGX, you can read this article.
These features serve as the trust base for us to “borrow” computer power from people. It is worth noting that as a compute cloud, the core values of Phala is the correct execution of users’ programs and the privacy of user data. This differentiates Phala from other projects that focus solely on confidentiality.
Can Phala use Zero-Knowledge Proof, Multi-Party Computation, or Fully Homomorphic Encryption as its workers?
The answer is no, yes, and yes since these solutions work in different ways.
- In ZKP case, the user does his own execution and only provides the proof on chain that he really has performed the work. This is not the cloud computing case where you delegate your computation to others;
- MPC divides jobs into different parts, so any one of the executors cannot know the original input or the final output;
- FHE enables executors to compute directly with cipher text, so they cannot know the users’ data.
Unfortunately, the current MPC and FHE solutions all have limitations on the computation they can carry out and performance, so hardware-based solutions remain the most practical choice. We are exploring the possibility of supporting TEE solutions from other manufacturers like AMD and ARM. With the proper abstraction of the interfaces, Phala could eventually implement MPC- and FHE-based workers.
Access Control on Worker Registration
Joining Phala as a worker involves two prerequisites:
- Workers must supply hardware with Secure Enclave support. Currently we only support Intel SGX, but our investigation into AMD-SEV has shown it’s also compatible with our current system;
- Workers run unmodified Phala-built programs including Phala node and off-chain pRuntime (short for Phala Runtime).
Phala follows the “Don’t Trust, Verify” principle and applies the Remote Attestation process during its worker registration. That is, the pRuntime is required to generate RA Quotes which are directly provided by the trusted hardware and certified by the hardware manufacturer (in this case, Intel). This report contains important information about the hardware and software:
- Hardware information
- Whether pRuntime is running inside SGX;
- The known vulnerabilities given the current hardware and firmware version. Based on this, Phala blockchain will reject the hardware with blacklisted vulnerabilities and assign each worker a Confidence Level.
- Software information
- The hash of the program binary, which helps ensure the pRuntime is unmodified;
- The initial memory layout of the program, so its initial state is determined.
With all this information, we can verify both the trusted hardware and the program running in it. Furthermore, the RA Quotes and confidence level metric enable us to evaluate the security level of each worker and customize our security policy based on what hardware is allowed to join the network.
Additionally, our Supply-end Tokenomic incentivizes high-quality service from the workers. This is out of the scope of this article but you can learn more at our tokenomics page linked above.
Key Hierarchy Management
The world’s first key hierarchy for blockchain-TEE hybrid system was proposed by the Ekiden paper in 2019, and serves as the basis for the Oasis project. As a compute cloud, Phala improves this design to make it viable for network of ~100k nodes. We also introduce novel mechanism like key rotation to further improve the robustness of the cloud.
Before we really dig into the details of our contract key management, it’s important for readers to know that every entity in our system has its own identity key. Every user has their account, and every worker and gatekeeper (which are elected by the workers) has its own sr25519 WorkerKey pair, which is generated inside pRuntime (so also in SGX) and the private key never leaves the SGX. The identity key is used to:
- Identify an entity’s message with signing;
- Establish an encrypted communication channel between users, workers and gatekeepers with ECDH key agreement. By default, any communication between any entities is encrypted in Phala.
MasterKey is the root of trust for the whole network. All the contract-related keys, including ClusterKey and ContractKey, are derived from MasterKey. MasterKey is generated and shared by all the gatekeepers (through the encrypted communication channel mentioned above), making the security of MasterKey totally dependent on the security of gatekeepers. This is why gatekeepers are distinguished from other workers in that:
- Gatekeepers are workers of top confidence level: they are immune to all known SGX vulnerabilities;
- Unlike normal workers, the endpoints of gatekeepers are not public and you cannot deploy contracts to them. This reduces remote access to gatekeepers;
- Increased staking amounts are required from gatekeepers to discourage bad behavior from their operators.
In Phala, workers are grouped into clusters to provide serverless compute service. A unique ClusterKey is generated for each cluster using the MasterKey (through key derivation), but you cannot revert this process to infer the MasterKey given the ClusterKey. The ClusterKey is shared with all the workers in that cluster.
Finally, when a contract is deployed to a cluster, it’s deployed to all the workers in that cluster. These workers will follow the deterministic process and derive the ClusterKey to get the same ContractKey. The ContractKeys are unique for different contracts.
What are the vulnerabilities if certain keys are leaked?
- If a WorkerKey is leaked, the attackers can decrypt all the messages sent to it such as the ClusterKey of its cluster, which can be used to access the ContractKeys of that cluster. Attackers could even impersonate a worker to provide false results to users. Such malicious activity can be detected by comparing the results from multiple workers, and then the chain would slash the compromised worker and confiscate that worker’s staked PHA;
- If a ContractKey is leaked, the attackers can decrypt the states and all the historical inputs of that contract;
- If a ClusterKey is leaked, the attackers can know the above information of all the contracts in that cluster;
- If the MasterKey is leaked, then all historical data is leaked.
What can we do if the worst case happens?
- Phala has implemented the Key Rotation for gatekeepers, which means that with the permission of the Council, gatekeepers can update the MasterKey, then correspondingly the ClusterKeys and ContractKeys.
- So when the worst case happens, we will first register the new gatekeepers with the latest hardware, deregister all the old ones (since they are likely to be vulnerable) and switch to a new MasterKey.
Future Security Mechanisms
- Use Multi-Party Computation to manage MasterKey
Currently, the same MasterKey is shared across all gatekeepers, so it’s leaked if any one of them is compromised. By turning this into MPC, the attackers will have to compromise a majority of the gatekeepers to access the MasterKey.
- Enable RA Quotes refresh
Since Phat Contract is currently not supported on the mainnet, workers only need to submit the RA Quotes once during their registration. When Phat Contract is released, we will enable a regular RA Quotes refresh so vulnerable workers would be slashed once new vulnerabilities are reported and workers don’t apply the patches.