Software Development Lifecycle (SDLC)
This document describes our Software Development Lifecycle (SDLC), which is how we ship software to production.
It provides guidance, and is not prescriptive—not every step will be relevant to every project. This is because not all projects are equally security-critical, and therefore not all projects should be treated equally. When in doubt about what's needed to ship your project, ask your manager. If you deviate, note the deviation and rationale somewhere that isn't Slack (e.g. a design doc, github issue, etc.)
[!IMPORTANT] Guiding Principle: Always Be Release Ready
- At any time, the latest
develop
should be shippable via our deployment pipeline. Build confidence in correctness and security continually throughout the development lifecycle, not only at the end.- Build in vertical slices: split fixes or features into end-to-end increments that can ship independently.
- Separate feature deployment from activation using feature flags.
Process
flowchart TD Start[[ ]] --> Step0[Step 0: Ideation and Planning] subgraph P[Steps 0-3 can be partly parallelized] direction TB Step0 --> Step1[Step 1: Design, Specs, and Risk Modeling] Step1 --> Step2[Step 2: Implementation] Step2 --> Step3a["Step 3a: Create Superchain Ops tasks<br>(L1 upgrades only)"] Step2 -->|"If needed"| Step3b[Step 3b: Audit execution] end Step3a --> Step4a[Step 4a: Alphanet and Optional Betanets] Step3b --> Step4a Step4a --> Step4b[Step 4b: Final Betanet] Step4b --> Step5[Step 5: Testnet rollout] Step5 --> Step7[Step 7: Mainnet rollout] Step5 -->|"If needed"| Step6[Step 6: Governance proposal] Step6 --> Step7
Step 0: Ideation and Planning
Make sure you know, and document, the answers to:
- What is the problem we need to solve?
- What requirements and constraints do we have?
- Who exactly is the customer?
Step 1: Design, Specs, and Risk Modeling
Design Doc
Now that the problem, requirements, and customers are known, create a design doc that describes the solution:
- Create a design doc PR in
design-docs
ordesign-docs-private
(using the templates in those repos). - Share this with customers and design partners to get feedback. Iterate on the design doc until customers are happy with it.
- Announce the design doc PR in #protocol-general. Tag managers from impacted teams so they can choose a representative from their team to review the design doc:
- Product: Sam McIngvale
- DevRel: Matthew Cruz (soyboy)
- Ecosystem: Faina Shalts
- Protocol: Matt Slipper
- Proofs: Paul Dowman
- Platforms: Alfonso Munoz de Laborde
- EVM Safety: Matt Solomon
- Ask Lewej Whitelow or Aaron Levin to schedule a design review. Include the team representatives and any other stakeholders/customers on the invite.
Specs
Once the design is finalized, if it modifies smart contracts, consensus, or protocol functionality, write/update specs in the specs
repo.
Good specs will clearly document assumptions and invariants, see the AnchorStateRegistry specs as an example of well-written specs.
Determine governance impact
As you develop a design for your change, you’ll need to determine if the change requires governance. Changes that affect consensus, touch smart contracts on L1, modify predeploys, or impact transaction ordering will generally require governance approval. If you’re unsure, consult Ben Jones.
For full criteria and examples for determining if governance is needed, refer to the Governance Criteria and the Law of Chains user protections.
Risk Modeling
This will typically, but not always, be in the form of threat modeling. Reach out to the EVM Safety team for guidance or training on threat modeling.
Your initial threat model will inform engineering planning by helping answer the questions of what tests are needed,
what edge cases to cover, what new or updated monitors are needed, what runbooks need to be written or modified, and audit needs.
For more info on determining audit needs, see our audit framework and audits.md
.
Once the initial threat modeling is done, extract all answers into issues (or wherever you are tracking project tasks) for tracking purposes.
Be sure to revisit and update your threat model as the project evolves, and as mitigations are implemented.
Step 2: Implementation
At this stage, you can start writing your code. Make sure you follow these standards:
- All consensus code must be behind a feature flag, decoupled from the hardfork name (see https://github.com/ethereum-optimism/design-docs/blob/main/protocol/decoupled-features.md).
- All changes must go through code review, and have test automation. Use coverage tooling and reports to identify testing gaps.
- For new features, add acceptance tests.
- For smart contracts the specs must clearly define assumptions and invariants as described above, and you must have ~100% test coverage.
- Include any changes to OPCM and VerifyOPCM.
Step 3a: Create Superchain Ops tasks (L1 upgrades only)
If your change modifies L1 smart contracts, you’ll need a superchain-ops
playbook to execute the multisig transactions.
Contact Blaine Malone for help with this.
Step 3b: Audit execution
- See
audits.md
for more information on how to get approval for and execute an audit. - Make sure to leave sufficient leave time for scheduling the audit.
- Only start the audit once code, specs, and tests are complete.
- The audit must be completed, with fixes implemented, before Sepolia rollout.
- If there are high severity issues, do NOT proceed to the next step after fixing them—instead, perform a retro to see how those issues got in, and what else may have been missed as a result.
Step 4: Alphanet/Betanet devnet rollout
[!WARNING] Prerequisites for the final betanet All of the steps above MUST be completed before the final betanet. That includes:
- Specs
- Completion of risk modeling
- Implementation of mitigations identified by risk modeling (tests, monitors, runbooks, etc.)
- Governance impact analysis
- Audit execution and required fixes
Next, it’s time to roll out to the Alphanet, then the Betanet. See the release process and acceptance testing docs for more details.
You may deploy to multiple betanets if needed, for example to rehearse an upgrade process, but the final betanet must have no known issues and must result in no new issues being discovered.
Step 5: Sepolia rollout
Sepolia is a production network, therefore has the same standards and security requirements as mainnet. This is why the betanet in the prior step must have no known issues and must result in no new issues being discovered.
Coordinate with DevRel and external partners that may be affected. Sepolia must match mainnet as much as possible, including for partner configurations.
Step 6: Governance proposal (if governance is needed)
- Prepare Proposal:
- Reference a stable release candidate.
- Include all relevant info such as risk modeling results, audits, Sepolia performance, and activation schedule.
- Use the standard governance template.
- Review & Post:
- Obtain Foundation and Legal approval.
- Loop in Ben Jones and Bobby Dresser from the Foundation.
- Loop in Eric Van Wart from legal.
- Post the proposal on governance forums.
- Obtain Foundation and Legal approval.
- Approval & Veto:
- Wait for the vote and veto period to complete.
Step 7: Mainnet rollout
- Remove the
rc
suffixes from your releases. - Schedule the mainnet upgrade after the veto period expires.
- Coordinate with EVM Safety to schedule the mainnet multisig upgrade.
- The Product/Dev/PMO Leads to go through their checklist to ensure all stakeholder needs, documentation, and communications are in place. This includes working with Marketing, DevRel, Data, Finance, Foundation, etc.
- Monitor post-release with on-call coverage, then collect customer feedback on the overall process and track it for future process improvements.
Failure Mode Analysis (FMAs)
Overview
Our Failure Mode Analysis is loosely based on FME(C)As from aerospace engineering and are intended to shift left the process of thinking through the risks associated with a project, so that mitigations can be planned for and implemented earlier rather than later.
Writing a Failure Modes Analysis
As part of the effort towards working in the open, we have open sourced both the FMA process and the FMA template so protocol developers from the whole collective can adopt this process.
To write the FMA for your project, follow the FMA template. You can use the many existing FMAs as examples to understand how to write yours.
FMAs live in the design-docs or design-docs-private repo.
Determine Audit Requirements
The knowledge obtained in writing the FMA will help you determine the audit requirements for your project. EVM Safety is available if you need advice on this step.
-
Broadly determine the risk of the change. To do that consider the FMA and the Liveness vs. Safety and Reputational vs. Existential matrix to find out the maximum severity incident that can be caused by the software to be audited. Then determine a subjective risk category by comparing your code to the descriptions below:
Low: The feature doesn’t involve any components that can cause a significant incident.
Medium: A bug in the feature could lead to a temporary denial of service, or small loss of value across all users, or a large loss of value across a small amount of users.
High: Bugs in the feature could lead to denials of service lasting days or more, or a significant loss of assets. -
Determine a subjective complexity category by comparing your code to the examples below:
Low: Any code that is easy to explain to a non technical person, and easy reason about as a whole.
Medium: Code with several components that are each easy to reason about, or with a single feature that is complex.
High: Large codebase with several components, which are complex in their own right. Use of math, algorithms, architectural patterns, integration patterns or features that are novel or difficult to explain to a non technical person. -
Find the required audits for your risk and complexity in the table below. Read more about audit types.
The table was calibrated to past audits, but it is a statement of minimums and you always have the option to execute more audits than specified.
Internal audits should be executed before external audits.
For internal audits we currently use the Coinbase Protocol Security Team. We also have a Spearbit retainer that can be used for internal audits. EVM Safety doesn’t do internal audits.
Instead of an internal audit, you can also upgrade to a solo external audit or external audit by an up-and-coming team. It is also possible to replace an internal with an external auditor that advises during design and implementation.
For high complexity features, contests are also an option.
Low Risk | Medium Risk | High Risk | |
---|---|---|---|
Low Complexity | Peer Review | Internal Audit | External Audit |
Medium Complexity | Internal Audit | External Audit | Internal Audit + External Audit |
High Complexity | External Audit | Internal Audit + External Audit | External Audit x2 |
It is in the interest of the Tech Lead to accurately estimate the complexity and risk of the feature, with all the help provided. A major update to the FMAs in later stages of the SDLC or a High vulnerability found in the last audit will impact delivery times more than preparing for an audit or two from early stages.
Table of Failure Modes Analyses
Audit Process
Context
When teams need an audit, there should be a clear process with owners for all required steps: defining requirements and invariants, getting internal approvals, working with program management, talking to auditors, determining how many audits to get, what kinds of audits, negotiating audit prices, scheduling the audit, determining if a fix review is needed, and what to do with the results of an audit.
This document describes the use of software audits at Op Labs. In includes:
- An itemized step-by-step guide.
- Choosing a provider and preparing for the audit.
- Executing the audit.
- Reacting to the results of the audit.
- Updating this process according to results
The resulting process integrates with the SDLC and enlists PgM and EVM Safety to help the Tech Lead execute the steps that are common to all audits, so that effort and uncertainty are minimised.
For further context on this process you can read this companion document and the references.
Summary
-
- The need for audits is determined during the Risk Modeling in the Design Phase of the SDLC.
-
- During the Design Review, start an Security Readiness Review document, which will be continuously updated.
-
- Once the design doc is merged with initial risk modeling completed you can start audit procurement. The suggested path is to use our trusted partner Spearbit by sharing your Security Readiness Review document to the #oplabs-spearbit-external channel and tagging
@Sharon Ideguchi
&@Marc Nicholson
in your request. If you would like to use another provider, or do not have access to the Slack channel, please contact EVM Security.
- Once the design doc is merged with initial risk modeling completed you can start audit procurement. The suggested path is to use our trusted partner Spearbit by sharing your Security Readiness Review document to the #oplabs-spearbit-external channel and tagging
-
- This will result in a SOW from Spearbit which PMO will handle approval of via the Zip system. note: this can take several weeks for new vendors.
-
- As implementation and testing approaches the release date, decide on an audit start date and final commit to tell the auditors. Make sure the Security Readiness Review document is complete before starting the audit.
-
- Make all required fixes and have them reviewed.
- 8: Publish the deliverables
-
- If any audit findings are high severity and this is the last scheduled audit:
- 9.1: Perform a retro.
- 9.2: Perform another audit, go back to 2.
Audit Procurement
The audit requirements are established during the project FMAs in the Design Review phase of the SDLC and captured in the Security Readiness Document. Both the audit procurement and the feature implementation can start in parallel once the design is reviewed.
The Security Readiness Document is one of the deliverables from the design review and the primary artifact needed by PMO to schedule an audit. This document will be updated as necessary during the delivery lifecyle. It contains:
- A summary of the project (or a link to a suitable summary if it already exists).
- All relevant links to the project documentation, including specs and FMAs.
- The scope for the audit.
We use Spearbit as our preferred auditing services provider and have a established a retainer with them to streamline approval. However, the feature team can choose a different provider from this list, from past engagements, or from any other source if they have a strong reason to go outside of Spearbit. Program Management (PMO) is available in the #pmo slack channel for assistance with anything related to engaging auditor services.
We will agree with auditors on a high-level schedule to confirm availability and ensure they are kept up to date on the implementation timeline and process, choosing an exact audit date close to the release date. Auditors not wishing to agree to this process should not be selected.
Auditors must agree to review the fixes to the vulnerabilities reported. Auditors not wishing to agree to this step should not be selected.
Once the Security Readiness Document and auditor preference has been submitted, a SOW will be obtained from the vendor for approval on Zip by:
- Choosing "Request a Purchase/Vendor Onboarding/Purchase Renewal".
- Under "What are you looking to purchase?" select "Other".
- If the auditors have not been engaged in the past they will need to supply legal agreements, which will be also included in the Zip request.
The audit can only be executed once the Zip request is approved.
Audit Execution
A devnet deployment is a requirement for the audit execution. As the date for the alphanet deployment is known with certainty, a date for the audit can be agreed so that the audit can be executed in parallel with the alphanet and betanet deployments and acceptance testing, and concluded before the testnet deployment.
We prefer to communicate with auditors over Slack during the audit. Questions from auditors should be answered promptly and carefully. These questions reveal gaps in the specifications or the scope, which should be amended accordingly.
Each vulnerability disclosed will be considered separately, fixed on an individual commit, and reviewed again by the auditors on the repo.
For all audit findings that we will fix as part of a later feature, create an issue for each finding in the monorepo. The issue title should be the finding title, the description links to the audit report, and apply the TBD label.
After Each Audit
Once all the fixes are applied and reviewed, the project lead should upload the final audit report to our repo.
If a valid high severity vulnerability was found, and this is the last expected audit for the project, **a post-mortem must be conducted and another audit of the same type must be scheduled**. These new audits follow the same process as any other audit.
Emergency Process
The audit process is tied to the SDLC process. A fast-track audit process would only be needed if we find out that we need audits later in the SDLC process, most likely as a result of updates to the risk modelling or excessive vulnerabilities in the last scheduled audit. The process described above is still applicable in these cases.
If the audit process is started in later stages of the SDLC, the documentation will be ready and can be put together as the Security Readiness Document by including a summary of the project, if that didn’t exist yet.
We already know that we need an audit, and we can safely assume that an external audit by Spearbit will fulfil the requirements.
The audit request still need to be approved via the Zip process above. If time doesn't allow for this then you should speak with your manager & PMO about your options to fast-track an audit as an exception.
Updating This Process
This process will be reviewed if SEV0 or SEV1 incidents are revealed during production, reported through a bug bounty, or caught in the last audit before production. The post-mortem might recommend updating this process.
Conversely, this process can also be reviewed with the goal of relaxing its requirements if no SEV1 or SEV0 bugs or incidents have happened in production, the bug bounty, or any last audit for at least six months.
References
- Additional context on creating this process
- Calibration of this process against past audits
- Repository with all audit reports
- Our current framework for audits - https://gov.optimism.io/t/op-labs-audit-framework-when-to-get-external-security-review-and-how-to-prepare-for-it/6864
- An attempt to put an audit process in place - https://github.com/ethereum-optimism/wip-private-pm/blob/main/.github/ISSUE_TEMPLATE/audit.md
- EVM Safety docs on managing audits - Security Audits,Audit FAQs,How to Select an Audit Firm
- Audit Requirements for Fault Proof Contracts
- Audits and shipping secure code from @Paul Dowman summarizing Proofs informal audit framework and adding some ideas.
Audit Post-Mortem
It is not realistic to ask of anyone to either build code completely free of bugs, nor to catch all bugs in code that has already been written. However, we can should demand that no severities of a certain category are found after applying a number of security measures. In particular, we want to ensure that SEV1+ bugs are never found in the last pre-production layer or during production.
This is a process to apply when this expectation is not met, and is based in reasonable expectations from all involved, with no one expected to have extraordinary capabilities.
A piece of code is made progressively bug-free by applying layers of security. Unit testing, end-to-end testing, invariant testing, formal verification, peer reviews, internal audits, external audits and bug bounties are all layers of security.
If a SEV1+ bug is found too close to production, it can only be for two reasons:
- At least one security layer underperformed, probably more than one.
- Not enough security layers were applied.
By comparing the bug found and the security layers, it should be obvious to see if any of them underperformed by assessing the kind of bugs that should reasonably caught in it.
- Did the bug pass through some code that should have been covered by unit testing?
- Maybe the bug depended on the interaction between several components, is this a known scenario that is not covered by end-to-end testing?
- We do invariant testing, but we didn’t test the invariant that would have revealed the bug?
- Is the bug known to the security researcher community at large, but the audits missed it?
If a security layer is found to have underperformed, then the solution should be to strengthen it.
However, maybe our existing layers performed reasonably well, but we just didn’t apply enough of them. Maybe the bug was of the kind that would have been caught in an audit, only that we didn’t do one. Maybe the codebase was too complex for all issues to surface in a single audit or contest.
In that case, it might be that we misclassified the risk or complexity of the code. The process should be strengthened so that risk and complexity are correctly identified.
Finally, it might just be that risk and complexity were correctly identified, all security layers performed reasonably well, and we still got a bug. That still means that we need more layers, so then the only thing that is left is that the table that tells you how many audits you need is not demanding enough. In that case we shift the requirements to the left, so that the same risk and complexity get more security layers than before.
Audit Request Template
Use this template to communicate estimates and get approval for an audit. Please fill out the relevant sections and get approval from the folks listed below. Once you have received approval, you can engage with audit firms on details of the audit and request quotes. A zip request can then be filed to get spend approval.
Overview
Link to Security Readiness Document
Timeline and key stakeholders:
When?
Audit Dates:
Planned Release:
Who ?
Auditors:
OP Labs Facilitators:
Costs
Anticipated Number of weeks:
Expected Cost: ???
Approved
Not started Karl Floersch
Action item:
- Create a formal zip request once this document has been reviewed and approved.
References
This template supersedes the Audit Request template.
Security Readiness Template
Release Process
Protocol upgrades run on a regular schedule. This helps resolve some of the challenges we've faced in the past:
- No more waiting 3-4 months between hard forks
- Teams don't need to rush features to "catch" an upgrade
- Everyone knows when the next release is coming
- We have the opportunity to find integration bugs earlier
- Missing a train isn't a big deal - there's always another one coming
The thousand-yard view of our release process is as follows:
- Features are developed according to a stable-trunk development model.
- Features are deployed to an Alphanet for initial acceptance testing.
- If the feature works on Alphanet, it gets deployed to a Betanet for additional testing and upgrade process validation.
- If the feature works on Betanet, it gets deployed to the Sepolia Testnet for governance review.
- If governance passes, the feature is deployed to mainnet.
You will need to budget roughly 6 weeks from the time your feature is code-complete to the time it is deployed on mainnet, exclusive of audit time. Working backwards from mainnet deployment, the rough timeline is as follows:
Time | Activity |
---|---|
T | Mainnet Activation |
T-1 week | Governance veto starts |
T-2 weeks | Cut mainnet release, distribute to node operators |
T-3 weeks | Governance vote starts |
T-4 weeks | Governance review starts |
T-4 weeks | Betanet deployment and acceptance testing |
T-5 weeks | Alphanet deployment and acceptance testing |
T-6 weeks | Feature is code-complete |
Check out the release calendar for more information on the schedule.
Alphanets
The Alphanet is the initial integration environment for protocol upgrades. Its primary purpose is to validate that new features work correctly on a deployed network running real infrastructure before moving on to broader integration and upgrade testing.
The Alphanet can contain any combination of L1 and L2 upgrades. It is entirely acceptable to have an Alphanet with only L1 upgrades, and vice versa. By decoupling these two types of upgrades, we can increase our throughput and deployment flexibility.
The scope of each Alphanet is finalized during the weekly Protocol Upgrades Call on Tuesdays. To put in a request for an Alphanet once the scope has been finalized, create a new issue on the devnets repo.
Betanets
The Betanet validates a complete upgrade that will be deployed to a production networks. Unlike the Alphanet, the Betanet performs the actual upgrade process and confirms that all features work together as intended.
Betanets are deployed every three weeks, and contain the features that passed the Alphanet. If there are no passing features, the Betanet will be cancelled.
The scope of each Betanet is finalized during the weekly Protocol Upgrades Call on Tuesdays. To put in a request for a Betanet once the scope has been finalized, create a new issue on the devnets repo.
Acceptance Testing
Promoting a feature from Alphanet to Betanet and beyond is contingent upon the feature passing automated acceptance tests. See the Acceptance Testing document for more information.
Testnet
The Sepolia Testnet is the first public deployment of protocol upgrades. This allows ecosystem partners to test the upgrades in a stable environment and runs in parallel with the governance process. Unlike Alphanets and Betanets, the Testnet directly impacts external users and applications and is considered "production."
TO provide sufficient time for infrastructure providers to upgrade their systems, Testnet releases must be cut at least 1 week in advance of any hardfork activation.
All features must go through an Alphanet and a Betanet before being deployed on Testnet. This means that you should target having your features deployed to the Alphanet and Betanet right before the gov cycle at the very latest. For example:
Governance Cycle | Latest Alphanet | Latest Betanet |
---|---|---|
Cycle 34 (Feb 27 - Mar 13) | Badger (Feb 17) | Balrog (Feb 24) |
Cycle 35 (Mar 20 - Apr 2) | Cheetah (Mar 10) | Cupid (Mar 17) |
See the release calendar for the most up-to-date information on the release schedule.
Release Calendar
The calendar below shows our planned governance cycles, Alphanets, and Betanets. Each event links out to the GitHub issue describing it in more detail.
Acceptance Testing
Acceptance testing ensures OP Stack networks are feature-complete, reliable, and contain features which are ready for promotion.
The Platforms team will compile a Release Readiness Process (RRP) document, which will outline how to acceptance test devnets. This will include a list of tests to run - the Release Readiness Checklist (RRC). They will originally be run manually, but we'll automate them over time.
By automating validation and enforcing quality gates, we reduce risk and increase confidence in releases. Much of this is facilitated by a new tool, op-acceptor, which can run standard Go tests against OP Stack networks and track that network's readiness for promotion. Acceptance testing is a prerequisite for networks to promote from Alphanet, to Betanet, to Testnet.
This is a shared responsibility between the Platforms and the feature teams:
What Is It | Who Does It |
---|---|
Maintains acceptance testing tooling | Platforms Team |
Writes acceptance test for network liveness | Platforms Team |
Runs acceptance tests | Platforms Team |
Writes acceptance test for specific features | Feature Team |
Performs upgrades | Feature Team |
The Platforms team is responsible for running acceptance tests against each network. To coordinate your feature's acceptance testing, contact Stefano (stefano), Platforms Protocol DevX Pod (@Protocol DevX Pod) or Platforms Team (@Platforms Team) on Discord.
Tooling
The acceptance tests themselves are written in Go and are run by op-acceptor within the op-acceptance-tests directory of the optimism monorepo. op-acceptor provides a high-level framework for registering, running and viewing the results of acceptance tests.

Tests
To add new acceptance tests see the README for instructions on how to do this.
Release Readiness Process (RRP)
Overview
This document defines the process and expectations for devnet releases in the OP Stack. It establishes a consistent framework for determining when a devnet is ready for release and how pass/fail determinations are made. By following these procedures, we can ensure that devnets meet quality standards before release.
While the Platforms team serves as the primary custodian of this release readiness process, its success relies on collaborative ownership between Platforms and Protocol as well as contributions from across the organization.
Roles and Responsibilities
Role | Responsibilities |
---|---|
Platforms Team | • Maintain the Release Readiness Process • Run acceptance tests • Make final pass/fail determinations |
Feature Teams | • Write and run feature-specific tests • Fix identified issues in their features |
Objectives
The primary objectives of the Devnet Release Readiness process are:
- Release production networks without critical bugs
- Ensure feature coverage through comprehensive testing
- Establish a clear process for devnet promotion decisions
Release Readiness Process
Prerequisites
Before a devnet can be considered for release, the following prerequisites must be met:
- All new features must have acceptance testing coverage in op-acceptance-tests
- The acceptance tests, as defined by the Release Readiness Checklist, should be passing on a local kurtosis-based devnet
- The risk modelling for the in-scope features should have been started
Readiness Phases
The devnets are expected to be live for short periods of time. For example, alphanets will be decomissioned after three weeks.
1. Deployment
- A devnet is deployed according to the standard process
- Basic infrastructure checks ensure the network is operational (manually for now; to be automated)
2. Acceptance Testing
We work through the Release Readiness Checklist, which includes:
- Automated acceptance tests (using op-acceptor)
- Manual acceptance tests
- Feature teams run specific feature tests
- Platform runs security and load tests
- Exploratory testing is run by all teams (probing of the system looking for things that we previously missed)
3. Results Analysis
- Each of the test results are categorized, in line with our internal incident severity matrix, by their potential impact had they been on mainnet:
- Catastrophic (SEV 0): Critical to catastrophic issue that would warrant public notification, leadership awareness (and potential involvement), and potential consultation with legal. A large number of users are impacted by complete or severe loss of functionality, and SLAs have been broken
- Critical (SEV 1): Critical issue that would warrant public notification. A large number of users are impacted by severe loss of functionality, and SLAs may have been broken
- Major (SEV 2): A functionality issue that would actively impact many user' ability to transact, or a critical issue impacting a subset of users
- Minor (SEV 3): Stability or minor customer-impacting issues that would require immediate attention from service owners
4. Release Determination
- The Platforms team makes the final pass/fail determination
- A devnet must have ZERO catastrophic or critical issues (SEV 0 or 1) to be considered for promotion
- Major issues must have mitigation plans before promotion
- Minor issues are documented but don't block promotion
5. Release
When ready, the devnet is made live and public.
Integration with Existing Release Process
The Devnet Release Readiness process integrates with the existing Release Process as follows:
- Alphanet: Before promotion to Betanet, it must pass the Release Readiness process
- Betanet: Before promotion to Testnet, it must pass the Release Readiness process with stricter criteria
- Testnet: All features must have successfully passed through Alphanet and Betanet before deployment
Enforcement
Devnets shall not be released or promoted without following the release process described in this document. The Platforms team serves as the custodians of this document and guardians of the releases, with authority to block promotion of devnets that do not meet the release readiness criteria.
Tools and Resources
- op-acceptor - The acceptance testing framework
- op-acceptance-tests - Repository of acceptance tests
- devnets - The Optimism devnet environment
- Acceptance Testing - Additional context on acceptance testing process
Future Considerations and Improvements
Here are some ideas for future iterations of this process:
- After each release, a retrospective to identify process improvements
- A Release Coordinator role to coordinate the overall release process, track progress, facilitate communication, and document decisions
- A per-devnet Release Readiness Checklist (RRC) to define specific requirements for each devnet
- Public usage phase to collect feedback from the general public
- Injection testing to see how we can break the network and test incident response runbooks
- Communication through dashboards and weekly calls
- Detailed Release Decision Documentation including summary of test results, list of issues, mitigation plans, and recommendations
- Test Results Reporting through the op-acceptor dashboard and Release Readiness Reports
- Test results comms, including:
- The op-acceptor dashboard, showing test status and results
- A Release Readiness Report documenting all tests, issues, and recommendations
- Updates in the weekly Protocol Upgrades Call
- Security
Release Readiness Checklist (RRC)
This document provides a detailed checklist of requirements that devnets must meet to be considered ready for release. These are specific tests, metrics, and criteria that are evaluated as part of the Release Readiness process. The most up-to-date list can be found in the Optimism monorepo's op-acceptance-tests.
The criteria for the checks below apply to all devnets (alphanet, betanet, testnet, etc.) and should be considered a good minimum standard for acceptance.
Sanity Check
A new Kubernetes-based network typically requires about 30mins to fully startup and settle. After this, we sanity check the basic network health.
- Check the Superchain Health Dashboard
- Setup:
- Select infra_env=dev, infra_network=
, security_network1=
- Select infra_env=dev, infra_network=
- Checks:
- Overall Infra reports "Healthy"
- Overall Chain Progression Health reports "Healthy"
- Dispute Mon Security Health 1 reports "Healthy"
- Faultproof Withdrawals Security Health 1 reports "Healthy" (if applicable)
- OP-Challenger Health reports "Healthy" (if applicable)
- Setup:
- Check the SLA dashboard
- Setup:
- Select the correct network
- SLO Evaluation Window = "10m", Period = "Last 30 minutes"
- Checks:
- Overall SLA should be >=99%
- Setup:
- Check the Bedrock Networks dashboard
- Setup:
- Select the correct network
- Period = "Last 30 minutes"
- Checks:
- Chain heads are increasing as expected (unsafe, safe, l1 heads, etc)
- Divergence < 1 for all nodes
- Peer counts are nominal
- Setup:
- Check the Batcher Dashboard
- Setup:
- Select the correct network
- Period = "Last 30 minutes"
- Checks:
- Block height is stricly increasing
- RPC Errors are 0
- No more than 1 pending transaction at any one time
- Setup:
- Check the Proposer Dashboard
- Setup:
- Select the correct network
- Cluster = "oplabs-dev-infra-primary", Period = "Last 30 minutes"
- Checks:
- Proposed Block Numbers are increasing
- Publishing error count is zero (no-data)
- Balance (ETH) is non-zero
- Setup:
- Check the Challenger Dashboard
- Setup:
- Select the correct network
- Period = "Last 30 minutes"
- Checks:
- Should see games in progress
- Challenger Error Logs should be empty (no data)
- Setup:
- Check the Dispute Mon Dashboard
- Setup:
- Select the correct network
- Period = "Last 30 minutes"
- Checks:
- No incorrect forecasts / incorrect results or alerts
- Note: It can take awhile for this to show up
- Error Logs should be empty (no data)
- No incorrect forecasts / incorrect results or alerts
- Setup:
- Check the Conductor Mon Dashboard
- Setup:
- Select the correct network
- Period = "Last 30 minutes"
- Checks:
- Leader count should be 1
- Errors should be 0
- Should be showing all the nodes, conductors all unpaused and healthy
- Setup:
- Alerts for the devnet
- In Slack, check our #notify-devnets channel
- All P1 alerts have either been addressed or have a remediation plan
- In Slack, check our #notify-devnets channel
Feature Verification
Note: For testing a flashblocks-enabled network, refer to this (Flashblocks RRC](https://www.notion.so/oplabs/Flashblocks-Release-Readiness-Checklist-1faf153ee16280ac80d8cda0162f2392).
Automated Testing
Run automated acceptance tests using op-acceptor.
- Use the appropriate feature gate for the target network. This should not be
base
, but will include it. It should be one of the latest forks, such asinterop
orflashblocks
which pertains to what the network is deployed as and testing.
The command will be something like so:
# Navigate to the optimism monorepo
cd optimism/op-acceptance-tests;
# Set your DEVNET_ENV_URL to point to the absolute path of
# your target networks devnet-env.json
DEVSTACK_ORCHESTRATOR=sysext
DEVNET_ENV_URL=/path/to/the/network/devnet-env.json
$(mise which op-acceptor) \
--testdir ../optimism \
--gate interop \
--validators ../acceptance-tests.yaml \
--log.level INFO
Manual Testing
Manually run any non-automated feature tests. (Note: This is a temporary step until we automate all of our current tests. Going forward we aim to have no manual feature tests.)
Load Testing
Run automated acceptance tests using op-acceptor
- Use load-testing gate for the network