Blockchain applications are moving at lightning speed, but the blockchain databases they run on are very slow. We have a backlog of blockchain applications that are waiting for an infrastructure that can handle a much higher speed, transaction volume, and storage volume. The current dollar payment system needs to handle about 50,000 transactions per second, where bitcoin in a good case will handle 100 TPS. If we start tracking micropayments for API calls — a pressing need for many cloud andblockchain businesses — we will want capacities closer to 1M TPS. Proposals to build games and virtual worlds will need this scale, plus much bigger storage capacity.
I am proposing to build a cloud database that can satisfy many blockchain use cases, with much higher volume, and greater simplicity for API users. With “Trust but Verify”, we can implement blockchain applications at the speed and scale of the world’s biggest databases. Please post your comments and recommendations.
Blockchain applications have specific requirements. Why do we use blockchains instead of normal databases? It’s because blockchain apps want specific things like visibility and trustworthiness that they might not get from a centralized database. The real requirement is not for a blockchain mechanism. It’s for a database that delivers what I have described below as “synchronicity”, “immutability”, “verifiability”, “availability”, “invulnerability” and “inclusiveness”. As long as we deliver on those requirements, we can support blockchain apps with new and scalable database architectures.
Objection, not consent. Distributed blockchains require a large number of nodes to come to consensus. We can get similar results by allowing agents to “object” to transactions. This converts the blockchain concept from a system where a large number of participants must reach consensus every time, to a system where a small number of participants object, infrequently. It will provide a similar level of visibility and participation, with much higher efficiency.
“Trust but verify.” The visibility and trustworthiness of a widely distributed blockchain can be replicated in a cloud system with three storage providers, if we add agents to check the input and output of those providers.
Scalable verification. Current blockchains apply the same amount of verification to every transaction, even though some transactions are worth 10 cents and some are worth $1M. This is wasteful and it prevents us from doing small transactions. The fixed miners fee is an obstacle. We should be able to start small and pay agents for more verification for bigger transactions.
The problem with a widely distributed blockchain database is that it is slow. Bitcoin can handle 10 transactions per second, globally, and might eventually be pushed to 100 TPS. Ethereum can currently handle about 20. A more hierarchical cloud system such as Visa runs up to 50,000 transactions per second, and the equity trade consolidated tape runs up to 1M transactions per second — factors of at least 1,000 times faster. There is a tradeoff between speed, and the number of nodes in the consensus process. Bringing 7000 nodes to consensus and updating them synchronously to include the next block takes a lot longer than bringing three or ten nodes to consensus and then updating them.
In addition to being slow, systems with large consensus networks can only handle small amounts of data, because all of the nodes must receive and store all of the data. So, data beyond small transactions is pushed off into other storage systems, increasing complexity and reducing reliability — reducing the probability that data can be retrieved.
Finally, these systems are more complicated to set up and use when compared with commercial cloud systems, which expose only simple API’s.
“Private” blockchain clusters run faster, because they use a very small number of nodes. However, private blockchain clusters are more complicated than ordinary cluster databases, with limited advantages. These private clusters can simplify existing workflow for existing groups, but it is not clear that they can create the new value that comes from public ecosystems.
In contrast, cloud databases are much faster than distributed blockchains (100,000 TPS versus 100 TPS). They are also simpler to use because they expose only simple API’s to the outside. The complexity is on the inside. On the inside, they scale both horizontally (to more servers accepting data) and vertically (to more layers of specialized servers). They can also store much more data by using layers of storage ranging from fast active storage, to very large pools of storage for older and less frequently used data. With the proper architecture, we can use the intensive engineering investment of cloud providers to create entire virtual worlds.
A blockchain is a distributed database which appends new transactions in a specific way. Some number of network nodes (running open source software) come to consensus on the contents of the next block of transactions, hash and sign it to make sure it doesn’t change, and distribute it to the nodes. This process is designed to yield the following database attributes.
Synchronicity. If two users make the same query, they should get the same result. It is fundamental to most blockchain use cases that they simplify a workflow by creating one shared database or ledger, rather than multiple databases or ledgers that that must be reconciled.
Immutability. Those users should get the same results any time in the future.
Verifiability. A user should be able to check the completeness or accuracy of the record, usually by checking a hash.
Availability. Users should have ways to guarantee access to the data. In distributed blockchains, they can achieve this by keeping a complete blockchain.
Invulnerability, in the sense that no one participant can add inaccurate information, or disrupt the operation of the blockchain as a whole. This effect is often described as “trustlessness” (because we do not have to trust any one participant), or “trustworthiness” (because we can trust the system as a whole to be reliable).
Inclusiveness. Any user can add or read transactions, by paying modest fees or making modest contributions. This is an optional feature which private blockchains do not offer as they limit access to improve speed and security. However, inclusiveness is important for most of the truly new use cases, where blockchains organize ecosystems.
There is a backlog of blockchain applications which cannot reach commercial scale because of the limitations of current blockchain databases. I will take examples from the health care industry, which is often mentioned as a user of blockchains.
API Accounting: This is the big one, the application that is launching a thousand micropayment schemes. In the cloud era + blockchain era, we have millions of interacting apps that make trillions of API calls every day. Someday, our smart healthcare system will follow us around this way and keep us healthy. Banking, software, advertising … any industry that touches the Internet is already doing this. These API services are valuable and continuously upgraded, and the vendors deserved to be paid a little bit for each use. However, API services are not nearly as accessible as they should be, because every one of those millions of apps needs to maintain accounts with every one of its API vendors. Innovation and growth in this economy would run faster if we had a blockchainy way to track of micropayments for each API call, that works with any customer and any vendor. The volume of API calls is large, and if we are paying for API calls with micropayments, we need a high capacity database (or sidechain) to log them. In the health care case, we are trying to create a demand-driven ecosystem in which people pay for health advice and research, so we would be tracking payments for the use of data. The micropayments system does not need to do fancy accounting, because each charge is small, and we can afford to take the risk of settling them later. However, it must be high volume, and it would be a lot easier for users if it collapsed a million accounting systems into one shared service. This would also help reduce the oligopoly power of Amazon, Microsoft, and Google, which have a competitive advantage in selling API services because they can bundle up the payments. We can build a resource that will set the cloud economy free.
Messaging: In my proposal for blockchain-style messaging, I described how a single shared database would be useful for keeping track of all of the (encrypted) messages surrounding a securities trade or money transfer (replacing, for example, SWIFT). I also noted that we would need a cloud architecture to handle the high volumes of messages. This same architecture applies to the message stream surrounding a health care case, detailing the transactions, data, questions, and answers attaching to a person who is actively receiving health care.
Logging: A lot of proposed blockchain use cases rely on logging. For example, a “logistics” blockchain logs records showing where products are, and who has accepted them or paid for them. A similar blockchain might be useful for logging the location of new health data records, with pointers to the full data.
Authorization: An authorization that says “person X is allowed to get data Y” is a type of signed record that we can add to the log of data locations. With this, we can share data with all of our apps and AI advisors and human advisors.
Storage: Current blockchain systems will track the hash of documents and agreements and health records, but because they can only handle small data, they require separate storage for the actual .. documents and agreements and health records. This means that life is complicated and unreliable for the people that have to arrange for storage, and the people that have to track down the full objects later. A system with bigger storage capacity will be more useful and more reliable. Even if storage eventually ends up being specialized (for example, in personal health records), the system is much easier to operate if data can be transferred by throwing it into a known place.
Current blockchain architectures do not support these types of applications because of limitations in speed or data size. Cloud databases will support these applications, as simple SaaS and API services, but they will not provide the blockchain attributes listed above. We will design an architecture that meets both requirements.
The goal is to take a set of normal cloud databases and turn them into a TrustButVerify database. BigchainDB uses a similar strategy, taking a distributed MongoDB and adding blockchain attributes such as immutability and accounting. We need to use a slightly different tactic from BigchainDB because we do not control the database. Instead of having voting nodes that validate the transactions going into the immutable store, we will need agents that observe the functioning of the database, and raise objections if anything is handled incorrectly.
Like a standard blockchain, our database will be “write once” with hashed and signed transactions.
Transactions will need to go through three separate states:
- Approved — the cloud provider thinks it is valid and paid for. It could also be Rejected. We may allow agents to look at these and do their own validation before we commit.
Even after commit, agents will want to check that every transaction that went in also came out in some way.
We will ask the cloud providers to provide firehoses of received, approved, and committed entries. Users can subscribe to a full firehose, or to a topic or tag.
Because this database will be big, we will need to divide it into topics. Agents can specialize in a particular topic. I think we will probably want to maintain a running hash for each topic, so that we can check the completeness of a topic stream without checking the completeness of the entire indefinitely large database.
Because our database will run at high volume, it will accumulate a backlog of uncommitted transactions. New transactions can be added to the database as long as they do not reference any uncommitted transactions. We already know that transactions without any overlapping topics do not reference each other.
The system will rely on “agents” to watch the output streams from each of these stages to ensure that data is flowing through correctly. Agents can be hired by users, or run by the users themselves. We will offer a mechanism for agents to “object” to transactions that are approved but not committed. This converts the blockchain concept from a system where everyone must reach consensus, to a system where a small number of participants can object.
If we use multiple cloud providers, we will end up using multiple databases with different underlying implementations. Our database API will end up being a subset that fits on top of any of the underlying databases.
We can pay for this system with (small) fees for submitting transactions, and running queries. We can optionally hire agents to verify our transactions. And, we can pay agents more to do more comprehensive checks on transactions that we value more highly. That leaves open the question of storage. We will need to figure out how to pay for storage.
Then, we must think about invulnerability to problems caused by an administrator, user, or agent. We will need a firehose that agents can use to collect and replicate the complete database, so we cannot lose data to an inadequate provider. And, we will to be able to switch to new cloud providers, with built-in governance and technology. Essentially, new providers need to be able to subscribe to the existing database, ramp up, and then receive business from users.
There will be some problems to solve. Here are two that I am thinking about.
We need to minimize the cost of signing and checking transactions. We want to know the source of transactions, and that they haven’t been altered since they were submitted. So, we will take the computationally expensive steps to sign every transaction, and to to check the signatures, and to make a running merkle hash. This is much cheaper than proof of work mining, but it is far from free. We could distribute this work, so that each contributor of transactions is doing some checking as part of the fee for the transaction, or we can accelerate it with hardware.
We need a way to figure out if the result of a query is correct. For example, if Andy is selling me the Brooklyn Bridge, I need to be fairly sure that Andy is the owner of the Brooklyn Bridge, and I might issue a query to select the current owner. There are lots of reasons that this query could return the wrong result. The database could be out of date, the code could be buggy, or someone could deliberately try to return an incorrect result. In a small blockchain system, I can check my own copy of the database. In the Trust but Verify system, we need other tactics. I might issue the query to each of our three nodes and make sure the result was the same in each case. I might hire agents that specialize in specific topics and keep a complete stream for that topic. I would scale this, so I would do more verification for more valuable queries.
If this works, it will be a terrific resource for a lot of important applications. Help me get it rolling. I’m looking for feedback on architecture and use cases before going forward to implementation. Please post your comments and suggestions here.