Database architecture refactor: Adapters and Wrappers
See original GitHub issueReplaces https://github.com/ethereum/py-evm/issues/410
What is wrong?
The current architecture uses a single global ChainDB
for each chain.
This is proving problematic because:
- The
ChainDB
is not aware of different VM rules and thus cannot have it’s underlying business logic modified. - Some of the network syncing code needs to interact with the database in ways that aren’t currently supported. See #337
- It has a lot of API surface area.
How can it be fixed
First, we establish the following new terminology.
- Connection: The base database class used to connect to the underlying db. e.g.
MemoryDB
or theLevelDB
. This object is treated as a key/value store. - Wrapper: A class which wraps a connection, adding some extra functionality. e.g.
JournalDB
,HashTrie
- Adapter: A class which implements a high level API over top of an underlying database.
We also establish a new convention/rule regarding the APIs exposed by wrappers. Wrapper APIs may only be used at the same level that the wrapper is applied.
For Example:
Suppose the VM
class applies the JournalDB
wrapper to the database connection, and then passes the resulting object down to the VMState
. The VMState
is not allowed to interact with any of the APIs exposed by the JournalDB
.
Now, we establish the following new Adapter classes.
MetadataDB
:- Only deals with chain metadata (e.g. canonical head, mapping block numbers to canonical block hashes, mapping transaction hashes to canonical block hash in which it was mined).
- Used by the
Chain
class. - Lets look into using SQLAlchemy + sqlite3 for storage of this data.
ChainDB
:- Deals with storage of chain data (e.g. blocks, transactions, receipts)
- Lets look into using SQLAlchemy + sqlite3 for storage of this data.
- Each
VM
may specify their ownChainDB
class.
AccountDB
:- Replaces
evm.db.state.AccountStateDB
- Deals with VM state (e.g. account balances, nonces, storage, code)
- each
VMState
may specify their ownAccountDB
class.
- Replaces
So, responsibility for databases is as follows.
Chain
->MetadataDB
(replaces what we currently refer to asChainDB
VM
->ChainDB
(not to be confused with the existingChainDB
VMState
->AccountDB
(replaces the currentAccountStateDB
With this structure in place, now we explore how wrappers come into play. Each level of the call stack will pass down it’s base database which might be the actual base connection, or might be a wrapped version of the base connection. Everything is to assume that the db
object it receives only exposes the base database API.
In the event that a database connection needs to be wrapped, it should occur at the level at which the wrapper APIs will be used.
For example:
- the
AccountDB
should be responsible for applying theHashTrie
wrapper. - the
VMState
should be responsible for applying theJournalDB
wrapper.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:4
- Comments:7 (5 by maintainers)
Just checking in to say I’m reviewing this now. It’s probably going to be a little while before there’s new code, because I’m still piecing everything together. The direction outlined above looks solid.
I’m also interested in seeing how we can reduce the mutability of things further – specifics TBD.
I’m going to keep digging into designs offline with whiteboards and other contributors, and report back here with major updates.
@carver assigned to you merely to suggest tackling this when you start into this codebase.