Post-Mortem Analysis: Understanding the Delay in Oracle Responses on the Phala Testnet
During the Phala x DevDAO LensAPI Oracle Hackathon, some participants may have noticed an unexpected delay in Oracle responses. We deeply regret any inconvenience that this may have caused and, as a gesture of goodwill, we are offering additional bounties as compensation for the incident. It's important to us that we provide transparency about what led to this issue, as well as outline steps we are taking to prevent similar incidents from occurring in future releases.
Root Cause: Rate Limiting By the RPC Service Provider
TL;DR: All deployed Oracles were programmed to trigger simultaneously at 10-second intervals. Regardless of whether there were pending requests, each Oracle initiated a call to the RPC endpoint to check the request queue. This led to spikes in request volume, hitting the rate limit imposed by the RPC service provider.
Why Does the LensAPI Oracle Require an RPC Endpoint?
In order to facilitate services for consumer smart contracts on other blockchain networks, an Oracle built on Phala needs the capability to send "cross-chain" transactions to those contracts. Typically, executing such cross-chain transactions necessitates the use of bridges to connect the two different blockchain networks. However, because your Oracle is built with Phat Contract—which has native support for network access—the process is a bit different.
Specifically, a Phat-Contract-based Oracle can directly connect to an RPC endpoint, such as one on the Polygon network. It then constructs a transaction, signs it with a dynamically generated account, and sends it to the RPC service, functioning similarly to a traditional client rather than a contract. To simplify this process for our users, we offer a default RPC endpoint, eliminating the need for additional configuration during Oracle creation.
How Is the Oracle Triggered?
The triggering mechanism is the only centralized component of the LensAPI Oracle. Its primary function is straightforward: to periodically invoke the
poll() function (see the source code) within your profile, thereby initiating every project you've deployed.
Currently, this triggering process is managed through a cron job on a centralized server. Designed for complete statelessness and anonymity, this mechanism imposes no requirements for on-chain identity. As a result, anyone can run it, and the trigger itself has no influence on Phat Contract behaviors. In the event that the trigger server is taken offline, the trigger functionality can easily be restored by anyone through the execution of a simple script or by running our open-source phat-poller.
How Is the Rate Limit Triggered?
When the rate limit was reached, we had approximately 50 active Oracles deployed on our testnet. These Oracles were configured to trigger simultaneously at 10-second intervals. Regardless of whether there were any unhandled requests, each Oracle was programmed to call the RPC endpoint to check the request queue. These two factors contributed to a surge in simultaneous RPC requests, ultimately leading to the triggering of the rate limit.
How Has This Been Fixed?
Upon identifying the root cause of the issue, we implemented two key updates to mitigate its impact:
- We promptly updated the user interface to enable users to configure their own RPC endpoints.
- We modified the triggering mechanism to stagger Oracle activations, thereby preventing all Oracles from firing requests simultaneously.
While this incident was triggered by the rate limits imposed by RPC service providers, it highlights a broader question: What are the best practices for utilizing external services that are prone to failure? This question is particularly relevant in the Web3 ecosystem, where existing smart contracts can only call other on-chain contracts.
To address this, we recommend the following best practices:
- Utilize a DAO to manage the configuration settings of your deployed Oracles. This safeguards against denial-of-service attacks that could be executed by malicious operators who shut down external services upon which your Oracle depends.
- Consistently verify Oracle configurations. To facilitate this, we are introducing a configuration report generation feature within the Oracle. This will enable users to verify settings without exposing sensitive information, such as tokens contained within your RPC endpoint URI.