Introduction:

IT departments have to deliver new features ever faster, even in the face of more demanding requirements for availability (24/7) and security. DevOps promises to do exactly that, by fostering a high degree of collaboration across the full IT value chain (from business, over development, operations and IT infrastructure). But there’s a problem.  While many software-development teams have made steps toward DevOps practices, most enterprise  IT Infrastructure organizations still work much as they did in the first decade of this century: They use a “plan-build-run” operating model organized by siloed infrastructure components, such as network, storage, and computing. DR exercise is even more challenging in large companies with these types of legacy systems and processes in place. Manual way of performing IT Operation tasks often results in the DR site out of sync with the deployed production applications.

This whitepaper explores the strengths of Cloud Control’s AppZ platform  in extending the DevOps model to perform  application development, deployment, run time operations, and IT infrastructure routine tasks as an automated unit of work across primary and DR sites so that they always remain in sync.

The Problem:

Disaster Recovery(DR) planning and executing is a big challenge in all companies. Companies establish a Business Continuity Planning (BCP) department with a charter to set-up a comprehensive DR strategy across multiple business units and applications. BCP team often in collaboration  with business units and application/infrastructure teams puts together  50+ steps DR test plans for each production application.  Because of the manual nature of these DR exercises, the time required to conduct a successful DR runs into hours. With businesses and applications running  24*7, application and Infrastructure teams find it hard to find a large time window to conduct the DR exercise as well.

Even after conducting a complete DR exercise, results often reveal that:

  1. The DR Site is not in sync with primary because of infrastructure or application configurations not matching.
  2. DR steps have become outdated due to ongoing changes within applications.

This often results in re-scheduling of DR exercises.   Most problematic, these failures erode the faith in the viability of the DR site from an application resiliency perspective.  Thus when actual disaster strikes, application teams try to rectify the issue with the primary site itself, as they are not quite sure whether the DR site is up-to-date or not.  The current way Infrastructure teams handle the application deployment with a manual control system makes this complicated as shown below.

Disaster Recovery in the Cloud

The Solution:

Cloud Control’s AppZ simplifies and enables DR as a repeatable and simplified  process running on a continuous basis to make business applications more resilient for on-premise or public cloud deployments. AppZ’s DR Drift is a metadata driven alerting module for client implementations where all application and infrastructure changes are tracked and alerted if there is an out of sync condition between the Primary and DR sites. 

AppZ can automate both legacy and cloud native application deployment across multiple cloud and IaaS providers for 

AppZ provides clients with out of the box capabilities for increased resiliency and automated DR  by simplifying and automating the change control process.  Deployment Automation is the foundation of a resilient IT infrastructure. AppZ uses a proven method, based upon the experience from 1400+ successful cloud migrations in regulated environments to provide the following benefits:

  • Deployment Automation : Application and Infrastructure
  • Disaster Recovery Automation with replication status aware probes 
  • Application and Infrastructure Drift Management
Disaster Recovery in the Cloud

With the above features in place, clients can confidently rely on their DR sites. Additionally they can attain better utilization of IT infrastructure by enabling application tiers to run in Active/Active mode.  

AppZ enables a true GitOps model, where application and infrastructure changes are traceable to Source Control Systems (i.e. GitHub) with user details, reasons for the changes and timestamps. The AppZ platform enforces separation of duties between Application Developers, DevOps engineers and Infrastructure Admins.

With this approach, build and certification of Infrastructure components can be separated from the application development process. This removes a tremendous amount of risks of all kinds: unverified open source vulnerabilities, malware etc., from the overall application deployment process. It also keeps things simple for Developers and Infrastructure teams charged with ensuring resiliency as summarized below.

Disaster Recovery in the Cloud

In summary, AppZ provides the following capabilities for clients:

  • Web and engine deployed in Active/Active mode with rolling upgrades for 24*7 applications 
  • Databases running Active/Passive (Read only) with async replication. Database change deployed fully automated from the application point of view
  • Integration with change management
  • Integration with Incident management
  • Integration with performance metrics collection, log collection and log analysis
  • Automated DR and Drift watch

With its Application centric focus, AppZ Control Plane offers a single pane of glass view for deployed Applications across the entire Data Center(s) running as Private, Hybrid or Multi Cloud platform.

Disaster Recovery in the Cloud

AppZ offers a single dashboard with distinct views for developers, managers, infrastructure admins, Risk/Audit personnel and other stakeholders. Some of the sample views are listed below

  • Applications under a Business Unit
  • Environments (DEV/UAT/PROD) where each application is running 
  • Infrastructure details for all tiers (Web/Engine/DB) of the applications 
  • Configuration parameters for the applications & infrastructure
  • Deployment history with artifacts 
    • List of changes
    • Link  to SCM (Source Code Management) artifacts
    • Approval history, in terms of people testing and approving changes for production move
    • Static Scanning of source code for any known vulnerabilities
    • Performance metrics for running application infrastructure 
  • DR readiness of Applications including backlog and drift
  • Application level Cost Matrix

Detailed Use Case: Health Care Provider Client Portal

This client sought to provide a high availability website to meet member enrollment peak demand and adopt a cloud native CI/CD pipeline that provided a full audit trail and logging capability for enhanced control and audit requirements.  AppZ from Cloud Control Solutions provided the basis to support the new website to migrate the legacy applications running on VMs to Kubernetes Cluster.  AppZ enabled this transition with the existing internal expertise.

AppZ deployment highlights to make the Client Portal resilient, secure, scalable and DR ready:

  1. A Kubernetes cluster was built with a Master Node and 4 worker Nodes. Worker nodes were split between DMZ and Private VLANs(2 nodes each). WordPress front end application was deployed in DMZ and the MariaDB database in private VLAN cluster
  2. The Kubernetes Cluster was logically split into DEV, UAT and PROD regions with 50% resource quota allocation to the PROD region
  3. Standard AppZ Stacks (re-usable, curated and scanned images) with client customizations were used to deploy WordPress and MariaDB applications within a week time frame
  4. AppZ Control Plane was used to demonstrate the true GitOps model to the Client.  Business Unit has been deploying on-going enhancements with full automation and control provided by the Control Plane, without any infrastructure admin support
  5. Client Portal was up and running in the new production environment within 8 weeks. Since then the client has successfully performed multiple zero downtime upgrades of the Client Portal application and infrastructure using the AppZ framework  
  6. Setup automated backup of configuration and data so that the entire kubernetes cluster can be brought up at the DR site within the required RPO(Recovery Point Objective) and RTO(Recovery Time Objective)
Disaster Recovery in the Cloud

Other AppZ Case Studies

Data Science Software Analytics Firm.  

A Series A Funded Data Science Analytics firm building new cloud native products recognized that they needed an out-of-the box CI/CD solution that included a SOC2 controls focus that would be required by their client base selected CCS’ AppZ.  With CCS’ AppZ they immediately had a CI/CD pipeline, control and audit ready solution that allowed their developers to focus on the product and deliver secure immutable code with a documented build process that eliminates privileged access in hybrid cloud deployments (Private and AWS Cloud).

CCS’ AppZ allowed this client to offer capabilities and automation for a fraction of the time and cost of it’s competitors on Private and Azure Clouds.

Legacy Transformation. 

To improve controls, scalability and reduce cost via automation this large Financial Services client used CCS’ proven framework to migrate legacy VMWare applications to new VMWare and Docker deployments to AWS while reducing Privileged Access and providing regulatory audit and integrated Dev/Ops dashboards with reporting capabilities for all software, network and database changes across the Private and Public Cloud.

Software Defined Network Start-up. 

CCS’ AppZ was selected as the CI/CD pipeline and provides control and audit support for this Series B funded start-up now in General Availability offering state of the art software configurable hardware and software networking on AWS and Google Cloud.  AppZ supports developers who continue to use GitHub and enforces control by allowing the audit of all product changes coupled with the ability to monitor for drift with automatic rollback.

Financial Services SAAS Provider. 

This client sought to reduce privileged access and provide an audit trail of all deployments for legacy code, database and network configurations.  Using the CCS AppZ Framework Terraform and Ansible scripts were provided to parametrize configuration files while also adding Key and Secrets Management capabilities without changes to the underlying legacy applications.  The same application as code approach was used to provide a self-service capability for versionized stored procedure changes and deployments

About Cloud Control Solutions:

Cloud Control Solutions (CCS) AppZ Platform consists of proven software, framework and methodology to accelerate Digital Transformation and Cloud Migration. CCS AppZ has a business application centric control plane which is cloud agnostic and can make Private Cloud or Public Cloud or both immediately available for business to start consuming with full automation, control and governance. 

AppZ enables a true GitOps model, where application and infrastructure changes are traceable to Source Control Systems (i.e. GitHub) with user details, reasons for the changes and timestamps. The AppZ platform enforces separation of duties between Application Developers, DevOps engineers and Infrastructure Admins. With this approach, OS/Middleware build and certification are separated from the application development process. This removes a tremendous amount of risks of all kinds: unverified open source vulnerabilities, malware etc., from the overall application deployment process. 

The AppZ Control Plane offers out-of-the box views into deployed software inclusive of privileged access, drift from policy, build and operational status and resource consumption from one single pane of glass. It can integrate with all enterprise tools and will provide governance and control for building, deploying and monitoring applications and infrastructure with least privileged access as well as automated alerts and resolution. 

Equally important, the framework can be deployed easily for applications residing in a hybrid environment: Legacy Data Centers, Private and Public Clouds. This proven approach is far from a lift and shift, as it can enhance legacy applications to use key and secret management and to take advantage of Cloud infrastructure with containerization,  Docker and Kubernetes for added security, scalability and resilience. 

The Cloud Control team has a proven track record of migrating thousands of complex and intricate legacy application workloads to the Cloud while modernizing the IT infrastructure to provide full control and governance at major financial institutions, healthcare providers and start-ups.

About The Author

Disaster Recovery in the Cloud

Sanjeev Kumar

Head of Product, Cloud Control

Sanjeev has more than 20 years of experience in design, development and architecture of FinTech solutions at BNY Mellon and State Street. He is passionate about automating and reducing the challenges of overall IT implementation. He is a firm believer in IT becoming a utility with major Cloud vendors like AWS, Azure and Google providing the backbone with simple/standardized interfaces for secure and faster application development at reduced cost and complexity.