Bridging blockchain and the real world - from Oracle to Hybrid Smart Contracts
What is a blockchain oracle?
In our gentle introduction to blockchain, we mentioned a simple blockchain use case called prediction market using a smart contract. For example, a smart contract for predicting US presidential elections could let people send their bets on the blockchain about who they think will win the election. After the result is out, the smart contract then directs rewards to the ones who turn out to have picked the winner of the election. A problem with this use case is that the smart contract needs to know who actually wins the election to determine who got it correct. If all the votes themselves are also recorded on the same blockchain, that decision will be easy because it can count the votes directly. Today our votes are certainly not available on the blockchain, and that means the blockchain has to learn this information from an entity in the external world. That "entity" is what we call an "oracle" in blockchain terminology. The oracle is a broad concept - it refers to any entity that passes information between the blockchain and its external world. The external world could be the physical world and other digital environments.
When do we need an oracle?
Are there blockchain use cases that do not require external information at all, and thus do not need an oracle? The answer is yes but the number of such cases is very small and their usage is severely limited. The classic Bitcoin payment application could be considered a pure on-the-blockchain ("on-chain") application if the sole purpose is to transfer some Bitcoin from party A to party B. However, in many cases, the exchange of Bitcoin between two parties could be conditioned on an exchange of other off-chain assets, i.e., goods or services the Bitcoin payment is for, which then requires off-chain information.
In fact, blockchain use cases in any real economy sector would require some sort of oracle because they all deal with real-world assets that are inherently off-the-blockchain ("off-chain"). An example could be a flooding insurance smart contract that pays the farmers if a certain level of flooding occurs in the region. The flooding information has to be provided by an oracle because the blockchain itself does not know about the weather. Another example is a logistics use case, where an oracle may be used to provide information on where a shipment has arrived, or whether the temperature control during drug transportation has been maintained within its quality assurance threshold. Oracles are also needed in many blockchain finance use cases. For instance, decentralized crypto exchanges or lending and borrowing marketplaces need to consolidate the prices from many off-chain centralized crypto exchanges to obtain the fair market price of cryptocurrencies.
What requirements does an oracle need to satisfy?
Since an oracle works directly with the blockchain and supplies inputs to it, the most important oracle requirement is to ensure the integrity and security of these inputs. However powerful a blockchain is, if it receives fake inputs or if the original inputs were tampered with before reaching the blockchain, the outputs become useless or even harmful - basically the garbage in garbage out situation. Due to the blockchain's immutability property, once the fake information is on-chain, it will also pollute the blockchain and persist there forever. Even if the information is later corrected, the earlier data trace cannot be erased.
The integrity of the external data needs to be protected both by the data source and the oracle. If the data coming out of the source is already wrong, there is no chance the oracle can function as intended. If the oracle itself manipulates the data inappropriately, the inputs to the blockchain will also suffer.
How can an oracle obtain authentic data?
An oracle needs to first identify authoritative sources to retrieve data. For example, if we want to obtain labor-related data, the US Bureau of labor statistics could be a high-quality data source.
Once we have authoritative data sources, the oracle also needs to make sure the information is not tampered with during the retrieval process. The data source could provide its digital signature that allows the oracle to verify the sender of the data. But this requires updating the software at the data source. Most of today's authoritative data sources do not support this functionality when they provide data access.
If the data source cannot sign its data, there are still other ways we can improve the reliability of the data an oracle obtains. For instance, many of the public data in the real world are available from multiple different sources. We can obtain weather data from numerous weather stations; we can get the presidential election results from US government websites, or media like The New York Times, CNN, and Fox News. In these cases, the oracle can simultaneously retrieve the same target information from multiple data sources. Then it can combine this information to produce a result. What if the information from different sources is not consistent? The oracle will then have to reconcile those data based on certain criteria. If it is temperature data and all sources have similar reputations, one option might be to take the average of the values from the different sources. If it is about presidential election results, then the level of authority becomes more important in the decision process. Using multiple data sources also eliminates the single point of failure, a problem that occurs when the only data source becomes offline or corrupted.
How to ensure that an oracle passes authentic data to the blockchain?
Assuming an oracle node can obtain authentic information from the data source, we still need to make sure this information is not tampered with at the oracle before it reaches the blockchain.
One natural method is to incentivize honest oracles and penalize the bad ones. To do that, we need to have oracle accountability, e.g., by requiring the oracle to sign the data it provides. The signature of the oracle makes the data supply process non-repudiable, i.e., the oracle cannot deny that it is the sender of the signed data. Combining the non-repudiation property of the digital signature with the immutability of records on the blockchain, we can ensure that oracles providing incorrect information are identifiable. This knowledge could be filled into a reputation system where the quality of the oracle nodes can be ranked. That ranking may further be used as the basis for the reward and penalty system.
If we use only one oracle node for a blockchain application, it always bears a single-point-of-failure risk. A way to further improve oracle's reliability is to borrow from the decentralization concept of blockchain. Instead of a single oracle node, we can adopt an architecture with many redundant oracle nodes. These nodes form a decentralized network of oracle nodes. Each of them aggregates information from various external data sources. Eventually, the aggregated values from all these oracle nodes are consolidated to produce the final data relayed to the blockchain. It should be noted that the decentralized oracle network itself is not a blockchain network. The key difference between the two is that a blockchain network needs to reach a consensus about internal transactions among all the blockchain nodes, while an oracle network deals with external data from different sources, so a consensus is not feasible. For instance, if several oracle nodes retrieved different temperature information for New York, there is no consensus to be made about the actual temperature in New York. The oracle network might use an average of all these values as the most likely actual temperature to be supplied to the blockchain.
Case study: Chainlink Decentralized Oracle Network
Chainlink is a leading oracle provider in the industry. The following figure 1 shows an example of how Chainlink oracle works:
The figure lists three main components: "World's Data Sources", "Decentralized Oracle Network", and "Decentralized Computation", along with their key considerations.
- The data sources used need to have high quality. Some of these data sources might support origin proofs such as signing the data they produce.
- The decentralized oracle network serves as an abstraction middle layer between the data sources and the decentralized computation component. The blockchain sits in the decentralized computation component. Because of the data delivery and validation by the decentralized oracle network, the data sources do not need to know anything about blockchain, and they may talk to the oracle nodes using whatever methods they already have, typically through Application Programming Interface (APIs).
- In other words, the oracle node consists of two logical sides. One side interacts with the data source API, e.g., a weather station API for weather-related data. The other side handles all the complexities of the blockchain, including possibly supporting multiple different blockchains, and relays the information from the external sources to each blockchain.
- The crypto-economic guarantees in the oracle network are the incentive mechanism to ensure desired oracle behavior by leveraging crypto tokens and reputation systems.
- In addition, the oracle network can provide computations to enhance the data privacy of blockchain use cases.
Bi-directional oracles: from blockchain input to blockchain output
Since oracle interfaces with external data sources on one side and the blockchain on the other side, it does not have to relay information only in one direction. If it can bring data from the external world to the blockchain, nothing is preventing it from also taking data from the blockchain to the external world. Let us again consider the flood insurance payment blockchain smart contract mentioned above. The oracle learns from external weather sources about a flood happening and relays that information to the blockchain. The smart contract determines that a payment needs to be made to an insured farmer. However, very likely the farmer could only receive payments through the traditional banking system, i.e., it requires an off-chain payment. What the smart contract can do is to send a payment instruction through the oracle network back to a traditional bank to execute the payment. This way, the oracle becomes bi-directional. It connects external data sources to the blockchain and also connects the blockchain to external actuators.
The graph below shows an example of Chainlink's Decentralized Oracle Network passing information between the data providers and the blockchain (inside the Decentralized Computation component) in both directions.
Hybrid Smart Contracts
So far we have only considered oracle networks as information relays - although they may need to do some calculations such as data aggregation, they primarily pass information directly between blockchain and the external world. But the oracle nodes are just computers, they can do much more than that. A hybrid smart contract, introduced by Chainlink in its white paper, leverages the computational capability of the oracle network to support the business use cases that a blockchain-based smart contract trying to achieve.
Why is a hybrid smart contract a valuable idea? A normal smart contract only performs computations on the blockchain. These on-chain computations are generally slow and costly due to the blockchain's inherent characteristics. If some of the computations can be moved off-chain without affecting the core function of the smart contract, that could greatly improve the speed and efficiency of the application. Using a hybrid smart contract also makes it possible to expand the scope of computations available to the application to those that otherwise may not be possible on-chain, thus extending the use case itself. Since oracle nodes already sit between the blockchain and the external data sources as well as external actuators, they are naturally a good place to offload these computations.
The development of hybrid smart contracts is still in a very early stage. Figure 3 illustrates the conceptual architecture of a hybrid smart contract by Chainlink. In this figure, "SC" stands for smart contract, "exec" represents the computational part of the Decentralized Oracle Network (DON). "SC" and "exec" combine to form the hybrid smart contract, while DON bridges the blockchain and external services.
An oracle shortcut
Using a decentralized oracle network and aggregating information from multiple sources is a highly reliable architecture to bridge the external world with the blockchain in most cases. But there are also scenarios where such an arrangement may not be applicable. Let us consider a use case where the involved data is private and there is only a single data source, e.g., using blockchain to record and control access to patients' medical data. The data source for the medical data may only be the patient's doctor. So we do not have redundancy at the data source level. This creates a single point of failure at the data source and implies that proof of data origin mechanisms such as a digital signature from the doctor becomes critical to ensure authentic data delivery.
While redundancy at the oracle node level with a decentralized oracle network is still possible even with a single data source, its usefulness is limited. If the data source itself is authentic, and if at least some of the oracle nodes are honest, there is a higher chance the authentic data can be delivered through the decentralized oracle network. But if the data source itself is corrupted, then adding multiple oracle nodes cannot correct the data either.
In these scenarios, we can use a simplified interface between the data source and the blockchain. We can give the doctor an application that directly interfaces with the blockchain. When the doctor inputs a record through this application, that record will be automatically uploaded to the blockchain. This way, the application software essentially contains the functions of an oracle. The computer running this application thus becomes the only oracle node in place of the entire decentralized oracle network described earlier. The disadvantage of this oracle shortcut is the need to embed specific oracle functions into the specific application. The advantage is that this data delivery method is relatively simple and efficient. So it can still be very useful in appropriate situations, especially when we need to bring information from a single data source onto the blockchain.
Note that in an actual implementation of such medical records use case, the blockchain will most likely store only the digest rather than the content of the medical record because storage is expensive on the blockchain. Also, privacy mechanisms need to be employed to protect sensitive data from being exposed.
Conclusions
Blockchain by itself is an isolated digital space. Whether in real economy sectors or finance, the majority of impactful blockchain use cases will involve interacting with information from the external world - including both the physical world and other digital spaces. Oracle is the entity that bridges the blockchain with its external environment in both directions, without sacrificing the inherent blockchain characteristics. The key requirement for an oracle is to maintain the integrity and security of the information that flows through it. To achieve that, the oracle may retrieve and validate the information from multiple reputable data sources whenever possible. A decentralized oracle network of multiple nodes plus incentive mechanisms encouraging honest behavior further improves the quality of the oracle.
The decentralized oracle network can also be designed to support computations useful for the blockchain use case, thus introducing both off-chain and on-chain computation to form a hybrid smart contract. The concept and development of hybrid smart contracts are still evolving.
While a decentralized oracle network with multiple data sources is a robust oracle architecture, scenarios such as those with a single data source could adopt a minimized oracle approach by incorporating the oracle function directly into the application interfacing with the data source.
Note: this article is part of my Introduction to Blockchain, Crypto, Metaverse and Web3: Beyond the Hype. You may find the rest of the articles in the series here.