Is a Rollup Just a Multisig?

No, it is a small (and important) part of a larger system architecture

Dec 04, 2023

Any database that interacts with cryptoassets will someday pick a rollup as its tech stack.

There are many good reasons why developers will make this decision:

Real-time audits,
Proof of solvency by default,
Custody of user funds is optional,
1 honest party can protect the entire system

Most importantly, all design and implementation efforts for rollups are focused on protecting the user, their funds, and all their interactions, from a potentially unknown and powerful system operator.

Even in the event that the entire system goes offline, a user is empowered to single-handedly recover their funds.

If rollups can gain wide-spread deployment as a tech stack, then we may have the ability to breakdown the barriers of trust and enable anyone in the global community to financially interact with each other, swooping in a new era of global e-commerce, remote hiring and frictionless provision of services.

There is truly a lot on the line to get a rollup implementation done right.

What About The Multisig?

Hold on stonecoldpat.

That all sounds good, but at the end of the day, there is a multisig underpinning the entire system. If the signers are compromised, or malicious, then they can simply steal all the funds.

So who cares about the rollup?

~ Somewhere on CT

It is true that all rollups (today) have a multisig with the authority to upgrade the underlying smart contracts, but as we will see, it is a reactionary mechanism for protecting user funds and it is part of a wider system architecture.

Security Council’s Responsibility

A multisig is a technical term for a system that requires multiple signers to authorise an action. For example, K of N signers must produce a digital signature before a transfer can be approved.

In the context of rollups, the multisig is often called the Security Council and the signers are entrusted with the power to upgrade all relevant smart contracts.

Let’s consider the security council in Arbitrum (since I am exceedingly familiar with it) to understand what type of responsibilities a council may take on:

Veto. Cancel a proposal passed by the Arbitrum DAO if the Security Council believes it violates the Arbitrum Constitution and potentially harms the Arbitrum ecosystem. For example, cancel a proposal that was passed due to an attack on governance.
Maintenance. Upgrade the Arbitrum smart contract suite for minor changes that do not justify invoking the Arbitrum DAO process. For example, change a configuration setting that impacts how transaction fees are charged to users.
Emergency events. Respond swiftly during an emergency and upgrade the smart contracts if they believe user funds are at imminent risk.

Of course, above all, the security council‘s primary duty is to tackle emergency events and act swiftly to protect user funds.

Acting as a security council member is a very much a trusted role.

Signers are trusted to react quickly, trusted to upgrade the smart contracts if an emergency strikes, and trusted to do their utmost to protect the safety of funds held by the smart contracts.

Picking The Right Multisig Threshold

There are two important choices that need to be taken into account when deciding to set up a Security Council:

How many signers are on the multisig?
How many signers are required to approve an action?

It may appear at first to be a trivial problem, after all it is just two numbers, but there is a balancing act that must be considered:

Safety violation: K members may collude to change the smart contracts and steal user funds.
Liveness violation: N-K+1 members collude to prevent any change to the smart contracts, especially problematic if a critical vulnerability is discovered.

The difficulty is to pick a threshold that upholds the safety of funds during times of peace, but enabling swift action during an emergency when user funds are under threat.

Let’s consider a concrete example. Let’s say the threshold is set to 9/10, where 9 signers must collectively sign a message. This is a significant safety threshold as 9 signers must be compromised to steal the funds. However, the downside is that any two signers can prevent the authorisation of any action during an emergency. For example, if two signers are on a transatlantic flight, then the Security Council is rendered unable to perform their duty.

Of course, if the safety threshold is low, let’s say 2/10 signers, then it only takes any two signers to collude (or be compromised) for user funds to be stolen.

*The perception of a person’s integrity may very well change over time*

Picking the appropriate threshold is a more a social than a technical problem and I’d argue it is more of an art than a science. Security largely depends on the perceived integrity of individual signers. As we will see shortly, there are methods to reduce trust in the multisig, but that will come with its own set of tradeoffs.

Security Council’s Membership

Threshold required for the multisig to instantly upgrade all smart contracts

Most rollups have an anonymous set of signers in their security council. We suspect this may be due to:

The rollup’s stage of development,
Caution around protecting members from a $5 wrench attack,
Perception that anonymity is the best option for protecting user funds.

On the other hand, there are three rollup projects who have publicly declared the membership of their security council:

Arbitrum. Signers are publicly elected and the current list is available on Tally. Only three signers are associated with the Arbitrum project (two from Offchain Labs, one from the Arbitrum Foundation),
Base. It is a 2/2 multisig that relies on the output of two different multisigs, one controlled by Base and the second controlled by Optimism.
Polygon ZkEVM. Not yet implemented, but they have announced intent to upgrade their multisig to 10/13 which includes two members from Polygon Labs alongside one advisor to Polygon Labs.
ZkSync Lite. It should not be confused with ZkSync Era, but its security council is publicly announced and it includes no direct affiliates from the rollup project (except for investors in ZKSync).

In Arbitrum and hopefully soon on Polygon, only a few signers are directly affiliated with the rollup project and the number is small enough to ensure that affiliates cannot collude to prevent an action taken by the security council (liveness violation). In ZkSync Lite, outside of investors in ZkSync, they have focused on appointing signers who are independent of the project.

In all cases, there is a strong emphasis to onboard signers who are not directly affiliated with the project.

Yet, there appears to be a lack of consensus on what makes a good multisig which brings us to several design questions:

Should anonymous members be allowed?
Should members be geographically diverse?
Should members be individuals or companies?
Should members be appointed or elected?
How many members from the same company (or country) should be allowed?
Is there a minimum size and threshold that is considered appropriate?

The general rule of thumb should be to pick members of high integrity so the public can have confidence that the system will be kept safe. At least, that is what I believe most projects are likely doing, even if it is not always publicly verifiable.

p.s. Six members of the Arbitrum security council are currently pre-appointed, but they will be replaced in the March election.

Curtailing Council’s Authority

So far, we have only considered a security council with the authority to instantly upgrade the smart contract, but there are methods to curtail the council’s power:

Time delay. All actions authorised by the security council will only be executed and take effect after time T has passed.
Pause-only. Native bridge holding all the assets can be frozen by the security council. This may pause the ability to:
- Pass messages from L2 to L1 (i.e., withdrawals),
- Finalise the ordering of pending transactions,
- Finalise new checkpoints / attestations,
- Accept new rollup data (i.e., pending transactions).
Removed. Abandon the security council and rely on another governance mechanism (like a DAO) to approve upgrades.

Of course, there is a trade-off to curtailing the security council’s ability to swiftly act and whether they can effectively respond to an emergency event that threatens user funds.

In the case that a vulnerability is privately disclosed to the security council, but it is not yet actively being used, then the security council may have the option to upgrade the smart contracts and fix the bug. By including a time delay to the upgrade, it increases the risk that an attacker can study the publicly disclosed upgrade, find the exploit, and then use it.

For example, CVE-2018-17144 in Bitcoin was initially advertised as a DDoS bug while trying to hide the more serious coin inflation vulnerability. Upgrade speed was of the essence to prevent its exploitation.

Evaluating Pause-Only as a First-Line Defence

Let’s consider potential scenarios where a vulnerability is actively being exploited by an attacker:

Malicious L2 → L1 message. An attacker can craft any arbitrary message that originates from a smart contract on the rollup and forward the message via the native bridge to smart contracts on Ethereum.
Invalid State Transitions. An attacker can execute transactions on the rollup that break the state transition function rules and it should normally be considered invalid.
Withdrawal exploit. An attacker can withdraw funds from the native bridge by only issuing transactions on Ethereum (the layer-1).

In all three cases, a time delay simply permits an attacker additional time to continue stealing the funds and reduces the window of opportunity for the security council to defend the system. The time-delay functionality is prohibitive to defending against active exploits and should only be used for routine maintenance / non time critical tasks.

We will only evaluate the ability to pause the system and the degree in which the system can be paused.

In the case of a malicious L2 → L1 message, the pause functionality can mitigate the attack without interfering with transaction activity. The security council should pause message delivery and/or the ability to finalise new checkpoints. There is an argument that L2 → L1 messages should have a time delay before they are executed to permit time for the security council to detect bug and react to the emergency event.

Defending against invalid state transitions is trickier as transaction finality has different tiers in a rollup. If we only consider transactions on the rollup and not any side-effects, then the best defence for the security council is to halt the ability for checkpoint finalising, but continue to allow pending transactions to be ordered. This can permit time for the bug to be fixed, checkpoint finalisation reactivated, and for the invalid transactions to simply be ignored.

However, if transaction activity on the rollup is not turned off, then the user experience will be messy and the rollup may appear in a state of havoc until the client software is upgraded.

This brings us to next scenario. How should the security council react if we consider how the invalid transactions on a rollup may impact other systems that observe it. The best line of defence is to either freeze the ability for the native bridge to finalise the ordering of transactions or to turn off the Sequencer altogether.

This is because some systems, like fast bridges that move funds from one rollup to another rollup, may authorise funds to be transferred once they believe that a rollup’s transaction (including the invalid transaction) is ordered for execution. In this example, it may allow an attacker to exploit a DeFi protocol on the rollup and then quickly escape with the funds by moving it to another rollup via a fast bridge.

By the time the security council can fix the bug and revert the invalid transactions, the damage may already be done. Either the DeFi protocol or the LPs in the fast bridge may bear the losses from the attack.

Finally, if the vulnerability allows the attacker to withdraw funds directly from the native bridge, similar to the Nomad Hack, then the security council may be powerless to stop it.

There is a final overarching issue with the pause-only approach. We must assume there is a wider governance system that can approve the upgrade and reactivate the rollup. If we assume the governance system is a DAO with an on-chain voting system that runs on the rollup, then there are tricky implementation issues that arise.

For example, if the L2 → L1 message bridge is paused, then the DAO’s voting result cannot be passed from the rollup to the native bridge living on Ethereum. An alternative method for the DAO to send its approval and execute the upgrade must be implemented.

Phasing Out Security Council

Some in the community believe that Security Councils should be phased out, but from my perspective, two issues arise:

False sense of security. An attacker with knowledge of an exploit will wait until the security council is phased out before performing the attack. This undermines our ability to gain confidence in the system’s security with the passage of time.
Limited recovery options. Without a security council, there is little the community can do to fight back against an attacker. The only option available is to pursue a parallel white-hat hack and hopefully recover any remaining funds.

I’d argue that security councils will always be needed, but the authority that is entrusted with them should be gradually curtailed.

With that in mind, the design question should be:

How can we enable a security council to pause the system with minimal impact on users while allowing the wider community’s input on deciding how to fix the bug and reactivate the system?

Put another way, we need an implementation that truly allows the community to step in and recover the situation before curtailing the security council’s power to pause-only.

In layer-1 blockchain land, the community does have direct input by using a user activated fork, but that approach is not applicable for smart contracts on Ethereum (like a rollup) without forking the entirety of Ethereum. There may be cases when the Ethereum community will collectively decide to save a rollup, just like TheDAO in 2016, but a rollup project should NEVER depend or expect such an outcome.

Another interesting idea along these lines is to implement an Ethereum Supreme Court to decide upon smart contract upgrades and enable a mechanism that looks similar to a user activated fork.

As mentioned already, if the rollup entrusts its security with a DAO, then there should be an implementation that allows the DAO to cast votes directly on Ethereum. This is very tricky, especially if the voting protocol lives on the rollup.

As a final note, I do believe that a comprehensive review on the type of situations that may warrant a response from the security council is required to help aid the discussion around their necessity.

Why Care About The Rollup If There Is A Multisig?

We have spent a considerable amount of time understanding the responsibility, design and need for a security council, but it is important to get back to this article’s original question:

Is a Rollup Just a Multisig?

The answer is no.

To help understand why it is best to take a step back and understand what a blockchain system is really trying to do.

A blockchain protocol is a tool that allows a user to compute a copy of the database and have confidence that they have the same database as everyone else.

With that in mind, there are two components to any blockchain system:

Blockchain Protocol. A combination of software, cryptography, and distributed systems, that enables anyone to have confidence in the integrity of the database.
Governance System. A coordination mechanism that allows all interested parties to collectively work together and agree to change the blockchain protocol.

The goal of any blockchain system, including rollups, is to ensure the blockchain protocol is always running with exceedingly reliable uptime of 99.9999%. There should be little to no interference from a trusted system operator on the day-to-day running of the system. It should be the software, cryptography, and distributed systems, that is ultimately responsible for protecting the user’s balance, the smart contract code and state.

There are times when the blockchain protocol needs to be changed for the betterment of the users. The community may want to fix a configuration issue, add a new feature, or react to a critical vulnerability that threatens the system’s integrity. This will require human intervention and it should only be invoked 0.0001% of the time.

The governance system is responsible for enabling human intervention and over the years several approaches have emerged:

Centralised party. A single party can single-handedly decide how to upgrade the system (many projects, including Bitcoin, starts out this way).
Rough consensus. An economic majority of participants signal their intent that they are ready to deploy the upgrade, a flag day is decided, and then the upgrade executes on the flag day (Bitcoin/Ethereum).
Voting protocol. All parties participate in an election and explicitly cast a vote on whether the upgrade should be approved.
None. The smart contract can be immutable and the system can never be changed.

Alongside the above, the community may decide to appoint a security council as an additional and complementary option to governance, to be used when an emergency strikes and swift action is required.

Security councils do not prevent attacks. It is a reactionary mechanism that works alongside governance for when the blockchain protocol is vulnerable to an attack that threatens user funds or the system’s reliability/performance.

Last words

All discussion around blockchain protocols, governance and security councils is critically important. The existence of this discussion is what makes cryptocurrency so special.

It is a wonderful example of Trust Engineering

An engineering discipline that focuses on identifying, measuring, and reducing/eliminating trusted elements in a system.

In cryptocurrencies, we focus on building systems that not only protect users from an all powerful system operator, but for the system to run reliably (and safely) in the most adversarial conditions possible.

This is why it is healthy for community members to remain skeptical on the merits of a security council and that is OK, but the onus is on them to come up with better solutions that can protect user funds in a reactionary manner during an emergency event.

I hope this article makes it clear why security councils can be useful, they are somewhat necessary today, but also just a small part of the wider architecture of a smart contract system. :)

Thanks to terence.eth for reviewing

Cryptocurrency and Friends

Discussion about this post