We are in the midst of a major architectural change at SRS. As new SRSWP services are created and products begin using our SOA/SaaS platform, it is vital to understand that every service must provide a stable architecture.
SRSWP services do not grow slowly or linearly like many new products. These are foundational elements which will be integrated into other applications. They will grow by entire user-bases at a time! For example:
- Until recently one basic service was handling about 9 requests/second at the top end.
- Over the course of a week two more applications started using it, and over night the service jumped up to 15 requests/second then to 47 requests/second!
- One app that will begin using the service in the next few months will bring a load of around 29 additional requests/second.
SRSWP services are not expanding into new markets. We are designing services which will impact all SRS products and customers. At this level, even small mistakes result in major problems. So, before any release there are four major areas that must be considered:
- Security:Mission-critical systems must define and enforce security policies.
- Have we considered how to secure the service itself?
- Have we considered how to protect the data this service touches?
- Have we considered how to provide security at every level (application, software, operational)?
- Availability:Mission-critical systems cannot have downtime.
- How will this service provide 100% availability?
- How will we perform scheduled maintenance?
- How will we handle operational failures (e.g. rogue web head)?
- How are we backing up data and services?
- How will we regularly test our ability to restore and recover from backups?
- How will this service provide 100% availability?
- Stability:Mission-critical systems cannot crash; they cannot change contracts.
- Is the service 100% stable?
- How do we prove stability before release and on an ongoing basis?
- Scalability:Mission-critical systems must seamlessly scale to support load.
- How does this service scale to support a full load of users for the next year?
Any problem with an SRSWP service harms more than just that service and the team who owns it. Problems are magnified as they cascade to every consumer and their users. Problems are reflected across SRS's product lines. This must not happen!
If any service you work on cannot confidently answer all of the questions above, this must become a top priority. Work with your team leads and VPs and make sure this doesn't fall through the cracks.
At Dreamforce '11, a presentation I particularly enjoyed was given by Ryan Smith from Heroku. Titled Designing for the Cloud: The 12 Factor App, Ryan discussed some fundamental design patterns and practices that have made Heroku successful.
An interesting analogy that was made compares applications to swiss army knives. The analogy is relevant to SRS and provides a great visual depiction of the work we are doing as part of our SaaS and SRSWP initiatives.
Historically our applications were designed as large, monolithic beasts. Like this knife, every feature that could be imagined was rolled into one of our flagship products. This design meant:
- Duplicated effort because there were no shared components between product lines.
- Intense effort required to join the team due to the large, interconnected designs.
- Even small changes were risky and had the potential of destabilizing anentire product.
- Management of each product line required enormous effort to tightly coordinate development and release of new features.
Contrast that complexity with a design where:
- Components are small, independent apps that work together (like Linux tools).
- Each component delivers specific functionality.
- Touch points between components arewell-defined contracts.
The workflow enabled by this component-based architecture is truly liberating.
- Small teams (perhaps even a single-person team) can build on top of the shared platform to quickly create new products.
- Most products do not need to worry about operational infrastructure, databases, etc.
- Products can take advantage of shared services to quickly enable powerful features in innovative ways.
- Products can tap into shared repositories of both customer-generated and catalog data.
- Existing products are simpler to maintain and introducing change is far less risky.
- A smaller codebase means that the project is much easier to grok.
- Well-defined contracts that have robust automated tests written against them mean that each component can be released independently with confidence.
- Teams can work more efficiently by choosing technologies and frameworks that are tailored to fit specific needs.
- Using standards-compliant web services for an API means that apps written in Java, Ruby on Rails or Node.js can access shared services as easily as a legacy, .NET application.
Following a component-based approach will make the creation of new apps a trivial exercise. It will free us to focus on solving interesting problems rather than being bogged down by operational overhead. The quality of our offering will increase as we become much more responsive to customers.
Applying these principles means something different for each of our existing projects and teams. What remains to be done for your team to fully benefit from this component-based design? What new functionality would you like to see exposed by the SRSWP?