Automating 10x Faster with Low-Code/No-Code Devops

Introduction

In order to build, deploy and secure applications in the cloud, there are many different things that need to be stitched together. Building a fully automated and compliant infrastructure for as small as a 50 VM infrastructure in a regulated industry is a multi-month arduous process and takes many skilled engineers. In this document, we study the typical approach that companies take towards DevOps using infrastructure as code and the challenges around it. We then describe how a Low-Code/No-Code approach to Devops can speed up automation by 10x, while lowering the cost by 75%. DuploCloud provides one such solution that many of our customers in regulated industry are using to realize improved speed, security and lower operational cost by an order of magnitude.

The hyper-scale automation techniques described here have in fact been in use inside cloud providers like AWS and Azure where just a thousand engineers are operating millions of workloads across the globe with top notch availability, scale, security and compliance standards. The DuploCloud team were among the original inventors of these automation techniques in public cloud and are now democratizing them for the mainstream IT. 

While the focus of this whitepaper on Devops, a similar writeup on security is @ https://portal.duplocloud.net/compliance/Implementation.html.

The Trend Towards Modern Cloud Based Deployments

Three major shifts are happening across all industries today:

  1. Infrastructure is moving to cloud and becoming 100% software driven i.e. Infrastructure-as-code (IAC)
  2. Applications are getting more fragmented and diverge a.k.a. micro-services
  3. Increasing number of application functionalities have moved out of developer’s code into Platform Services completely managed by cloud providers.

With the increasing adoption of public clouds, we see enterprises wanting to move towards a state where cloud operations are 100% software driven. Applications are becoming a combination of micro-services using containers, managed services for databases, messaging, key-value stores, NoSQL stores, Lambda functions and object stores.

For developers, it has become easier to build more complex applications that can work at scale using a lot of off the shelf components and services. While all this provides great flexibility and agility in terms of application development, it creates a lot of fragmentation in the infrastructure. What used to be just a few items around storage, compute and network have now fragmented to scores of configurations around security groups, containers, namespaces, clusters, IAM roles, Policies, object stores, no SQL databases and so forth. 

First Step: Application Blueprint

The first step towards the realization of a cloud deployment is to draw out a high level application architecture. This would typically be done by an architect in the organization. An example of this is shown in Figure 1.

Figure 1

Figure 1 shows a deployment architecture for an application in AWS. Here, we see a topology that consists of one set of micro-services packaged as a Docker container running in ECS fargate. Aurora My-SQL is used as a Database, services are exposed to the internet via an external load balancer which is fronted by a WAF. Another set of microservices could be using Lambda functions and API gateway. Data stores include S3 and DynamoDB in this case.

 If an organization is in Azure they have a deployment architecture that is Azure specific. The constructs and terminology may change but conceptually it's the same. One such topology is shown in Figure 2.

Such high level architecture, with say 15 odd constructs, gets passed to DevOps teams which then translate these into 100s of lower level cloud configurations that would require thousands of lines of Infrastructure-as-code. Deep subject matter expertise is required both in operations as well as programming (hard to find skill! Ever heard of a sysadmin who loves java or a .NET developer who knows the nuances of security best practices?.   

[a]

Figure 2

DevOps Lifecycle

Infrastructure configuration to realize the application blueprints is typically done in a series of phases as shown in Figure 3.

Figure 3

  1. Base Infrastructure: This is the starting point where one would pick the regions, bringup VPC/VNETs with right address spaces, setup VPN connectivity and availability zones.
  2. Application Services: This is the area where we have our virtual machines, databases, NoSQL, object store, cdn, elastic search, redis, memcache, message queues and other supporting services. Public clouds have directed 90% of their investments in this area and an increasing number of clients are writing their applications using these for faster GTM, higher scale and reliability. Further in this area are DR, backup, Image templates, resource management and other such supporting functions.
  3. Application Provisioning: Depending on the application packaging type, different automation techniques and tools can be applied. For example
    1. Kubernetes, Amazon ECS and Azure Webapp for containerized workloads.
    2. Amazon Lambda and Azure functions for serverless workloads.
    3. Databricks, EMR, Glue etc for big data use cases.
    4. Sagemaker and Kubeflow for AI use cases.
  4. Logging, Monitoring and Alerts: These are the core diagnostic functions that need to be set up. Centralized logging can be achieved by elastic search, splunk and SumoLogic, ES, Splunk and Datadog. For monitoring and APM we have Datadog, Cloudwatch, SignalFx and so on. For alerts we have sentry. Many unified tools like Datadog provide all 3 functions.
  5. CI/CD: There are probably 25+ good CI/CD tools in the industry from our good old Jenkins to CircleCI, Harness.io, Azure Devops and so on. In this layer one also needs to put in place security testing pipelines that enforce secure coding practices via static code analysis and penetration testing.

SecOps Lifecycle and Compliance Frameworks

Organizations in non-regulated industries tend to follow a set of best practices determined by inhouse devops engineers. These would be table stakes and include security groups, IAM/AD policies, encryption and some basic user access controls. Regulated industries have published a prescriptive framework for security in cloud. While they are prescriptive they are exhaustive to implement and interpreting them in the context of a certain cloud deployment does require deep subject matter expertise. An example of the security controls is the PCI guide @ https://aws.amazon.com/quickstart/architecture/compliance-pci/ A screen shot of controls is shown figure 4.

Figure 4: PCI Controls on AWS

Similar to PCI different standards have their own control sets as shown in the table below.

Figure 5

“A key element of compliance controls that often people are caught unaware of is that 70% of them are to be applied at provisioning time. If we miss them at provisioning time, many require reprovisioning for example encryption of disks, placement of VMs in right subnets and so on. This means security is 70% devops function. On the other hand almost no standard security software like prisma cloud, Threat stack, Lace works et al. have any role in provisioning. “

Current State of DevOps and Infrastructure-as-code

If one were to describe in one sentence the job of a devops engineer then it would be

“A Devops engineer builds cloud infrastructure by stitching together a multitude of tools using his/her interpretation of best practices and standards”.

Figure 6 Shows a few examples of tools that are stitched together

Between 2010 to 2015, the most common approach to infrastructure automation was the use of templates wherein the operator gets a description of a desired configuration and puts them in the form of templates. The key assumption is that the topology will not change and when it changes, those have to be implemented out-of-band. Templates are great for one time setup but soon people realized that infrastructure is constantly changing. Fragmented applications, microservices and plethora of cloud services add further volatility. This led to Infrastructure-as-code.

To accomodate ever changing infrastructure specifications, it was determined that we should treat the entire configuration as if one were building a software product. Engineering teams provide high level specifications to devops teams whon then translate into lower level nuances where each and every detail is written down as code and any change follows a typical SDLC (Software Development Life Cycle) that includes code review, testing and rollout.

This approach has some clear advantages like:

  • Single source of truth is saved in a git repository and state file.
  • Declarative state
  • Change tracking
  • Repeatability

It also introduces a substantial set of disadvantages. Key among them are:

  • Increased subject matter expertise for the operator role which now requires a programmer.
  • Open ended and requires the operator to provide the lowest level of details. IAC is basically a programming language and the onus is on the user to write the correct code. For example, one can create a security group open to the internet and IAC will not complain.
  • Longer change cycles: Many activities in operations are just in-time and are required to be done by junior, lesser skilled operators. For example adding an IP to a WAF to prevent an attack, applying a patch to a server or executing a script. If the user has to go and update IAC, get a code review done, do some regression testing and rollout the changes, then most operations teams will fall short as they will neither have the skill set nor the speed to address the need of the hour.
  • Modularizing changes is hard leading to Centralized control (Anti-pattern to Microservices): Pre-IAC pieces of infrastructure were configured and updated independently by different people in different shifts of operations. WAF, VMs, Containers, IAM, Security groups, DBs and so on are all different functions. But with IAC, while it has concepts like modules, the code base written even by the best of the devops engineers operates like a monolith with a huge surface area of impact. The scope of most state files is quite wide. As programming language Terafform is in its infacy. There are no classes, objects, inheritance or threads. Things as basic as loops and user defined functions is hard to write. Figure below compares this evolution. Everything is one big block of monolithic code. One wrong “terraform apply” can be catastrophic.

“Among all of these the most difficult one is being able to find an engineer who is good at both programming and operations. This is like finding a unicorn. How often you come come across someone talking about objects, classes, functions together with CIS bench marks, IAM policies and WAF. As of writing of this article, it is not surpirsing that there are 60,000 Devops opening in Linkedin alone.”

DuploCloud: No-code / Low-Code DevOps

At DuploCloud we set out to address these problems and make IAC better. We envisioned a solution which has the following key elements:

  1. A Rules based engine to translate a High Level Application specification to low level infrastructure constructs automatically based on:
  1. Cloud provider (AWS, Azure or GCP) where the application is being deployed. The engine has well architected framework rules for each supported cloud.
  2. Application architecture at a level of abstraction shown in Figure 1 and 2.
  3. Desired compliance standard like PCI, HIPAA, GDPR etc as prescribed by cloud providers. We have described the security and compliance aspect of the platform in a separate write @ https://portal.duplocloud.net/compliance/Implementation.html.
  1. State Machine: In cloud infrastructure today almost nothing is done one time. For ongoing changes, detecting drifts and remediation we need a state machine.
  2. Application Centric Policy Model that compartmentalizes infrastructure constructs based on application boundaries. Figure 6 shows the high level DuploCloud policy model.

Figure 7

  1. No Code UI: For users who do not want to manually write IAC, they can weave the E2E devops workflow using a web UI. An E2E demo to see DuploCloud in actions is below

  1. Low-Code IAC (Terraform provider/SDK): Using an SDK with built in functions for best practices and compliance controls one can reduce the amount of TF code by over 90%.

    An analogy of low-code devops is if terraform is like C language where the user has to do all memory management, self implement functions like hashmap, dictionaries then Java provides an out-of-box implementation of these constructs. For example the user can instantiate a Hashmap object by a single line of code. In the same way, DuploCloud provides an SDK into Terraform where much of the best practices are built in.

For example the code snippet in figure 8 shows how one could create a new host via TF and ask all host level PCI controls with a single flag; ask the host to be joined to a EKS cluster with a node selector. A complete example of building a topology using DuploCloud TF is described in the next section

resource "duplocloud_aws_host" "host1" {  
  tenant_id = duplocloud_tenant.tenant1.tenant_id
  user_account = duplocloud_tenant.tenant1.account_name
  image_id = "ami-062e7f29a4d477f5a"
  capacity = "t2.small"
  agent_platform = eks
 
labels = "app01"
  friendly_name = "host1" 
 
pci = true
}

Figure 8

  1. Self Hosted Solution: Duplocloud software completely deploys within the customer’s cloud account and no data of any sort is managed from outside. It deploys as a virtual machine with admin privileges to the account. Users interface with it in one of the 3 ways:
  1. Web Portal
  2. Terraform using DuploCloud Provider
  3. Rest API

Figure 9

Demonstrating a Deployment with Low Code and No-Code

Let’s take an example topology of an application in AWS and see how we can realize it using first no-code (i.e. web UI) and then using low-code (Teraform script with DuploCloud provider)

Deployment Topology: What we have is an application that consists of a set of microservices to be deployed on EKS. The environment is to have a VPC, 2 Availability zones with 1 public and private subnet each. Database is hosted in AWS RDS, S3 is the object store. All instances and containers are to be run in Ec2 instances in private subnets and applications exposed to the internet via a load balancer that is fronted by a WAF. The environment needs PCI compliance and the control set defined in AWS PCI guide @ https://aws.amazon.com/quickstart/architecture/compliance-pci/ is to be followed. Cloud watch should be used for metrics.


Figure 10

Low Code Deployment using web UI : Watch in the following demo

No-Code Implementation: Figure 12 shows a code snippet which demonstrates the following parts of the deployment

  • Lines 5-13 Create “infrastructure” called “finance” that includes a VPC in us-west-2 with 2 AZ with one public and one private subnet and an EKS cluster
  • Lines 14-16 Create a “tenant” called “invoice” in the above “infrastructure” that will implicitly create security groups, IAM roles, instance profiles, KMS keys, Pem keys, a namespace in EKS and many other placeholder constructs depending on the compliance framework to be followed.
  • Lines 18 - 26 Create a host in the “invoice” tenant and ask it to be joined to EKS cluster. User specifies high level specific parameters like name, capacity, enable_pci and internally the platform will apply the right set of securty groups, IAM roles, instance profiles, userdata to join to EKS, IAM policies and a whole set of Host based security software like vulnerability assessment, FIM, Intrusion detection and orchestrate the system to also collect these logs and register the node in a SIEM.

Figure 12

  • Lines 27-33: Create an EKS service or “K8S Deployment”. Using simple declarative user specifications, behind the scenes the software will translate into EKS calls to deploy it in the right namespaces, set label and node selectors, affinities etc.
  • Lines 34-49: Expose the service via an Application Load balancer: User specifies what ports need to be exposed with health check urls. Behind the scenes the platform will auto-generate the nuances around node ports, ingress, annotations, VPCs, subnets, security groups and other details. Defaults for health check timeouts, health count etc were picked but could have been passed in the config.
  • Lines 51-56: Choose a DNS name for the service and attach it to one of the preexisting waf ids. Behind the scenes Route53 programing of ALB Cnames, attachment of WAF etc is done.  

Some constructs which are one time like WAF, Cert in ACM, Hardended AMI are done out-of-band and made available to the system to consume.

“In these 56 lines of code in the above snippet we have covered 70% of the topology shown in figure10. Setup and ongoing operations for this whole setup would still be less than 300 lines of code which without Duplocloud provider would be thousands of lines of code.”

Neither a PAAS, nor a restrictive abstraction on Cloud

A common drawback of many cloud management platforms is that while they provide benefits of abstraction, they may seem restrictive if one has to leverage functions of the cloud provider that have not been exposed by the platform. In fact most cloud management platforms are restricted to specific use case templates that have to be created beforehand by an administrator and then exposed to the user. Anytime a change is needed in the topology the administrator has to step in and update the templates.

“DuploCloud is neither a PAAS nor does it come in the way of engineering teams and their cloud usage. Think of it like a SDK to terraform or a devops bot that can auto generate the lower level devsecops nuances and boring compliance controls while the engineering teams can focus on building application logic using cloud native services”

There are 3 reasons why DuploCloud that makes it flexible:

  1. DuploCloud provides the ability to build and operate arbitrary workflows by combining any of the cloud services exposed by the platform. There are no “pre-baked” Templates required to be created by an administrator. For example the blueprint shown in Figure 1 and 2 is arbitrary. Of course workflows can be saved as templates and reused
  2. Next is the ability to incorporate new cloud services. DuploCloud’s rules based engine enables us to add support for new cloud features effortlessly behind the scenes for most use cases we just have to add a set of json configurations auto generate code as per the cloud provider configuration specifications and best practice guide for that feature set. This is in fact DuploCloud's core IP. Further these feature sets are not customer specific but based on cloud services. For example once Managed Kafka support is added in DuploCloud that is available to all customers.
  3. Interlace Native Terraform with DuploCloud SDK. The advantage of having a self-hosted platform within one’s own cloud account means that the user is free to use the cloud resources in case the service is not exposed or the workflow has some custom quirks. Within a single TF file one can invoke the DuploCloud TF provider to provision a set of high level resources and then add certain custom configuration using the cloud’s TF provider. For example in the code snippet shown below using the DuploCloud TF provider, within a Tenant the user adds a load balancer mapped to a service (lines 77-90). Then in the resultant ALB using the AWS TF provider (lines 94-112) the user has added a new listener which redirects  the url https://abcorp.com to https://app.abcorp.com which is provisioned as a cloud front resource separately.

Figure 13

No Platform Lock-in

A common concern that customers have when using a powerful technology is whether they are getting locked into a proprietary platform. Fortunately we have addressed the concern with an ability to export native terraform code with the state (state file) of the current infrastructure. This would be the scenario when the customer wants to wean off the DuploCloud solution for some reason and they can have their native terraform with no proprietary constructs.

“With the ability to export native terraform (IAC), disengaging with the DuploCloud platform is like terminating a Devops engineer but still having access to the IAC code they wrote. Except that one would now have to hire a new engineer to maintain and evolve the automation with all the best practices, efficiency and scale that is desired. The existing workloads are not impacted.”

Features Galore

In this blog we have described some basic concepts and features of the platform. Almost all common services in AWS and Azure supported. About two or three services are added each month. Any cloud service requested by the clients are added within a couple of weeks.

Any configuration to be made in the cloud provider (AWS, Azure Or GCP), Kubernetes or in a third party tool supported natively (like OSSEC, Wazuh, ELK, ClamAV, Datadog etc) is within the scope of the platform. Historically 90% of the use cases have been served out-of-box in the software. For the rest there are 2 options (a) DuploCloud team will train and update the software with a typical turn around time of 4-5 days. (b) The required configuration can be done directly on the cloud provider, K8S cluster or the respective tool. An example of the 10% use case was when Managed Kafka was newly released, when a customer requested it to be exposed via Duplo it took us 4-5 days.

The platform has a resume of a skilled devops engineer and is as trainable as a one. Figure 14 shows the representative services around which customers have built workflows

Conclusion

Every company is going through a digital transformation with a focus on moving to public clouds and achieving faster application delivery. With the growing demand for devops expertise, many enterprises are struggling to fill in all the open positions to achieve the desired business goals. The skills shortage is slowing down overall application modernization, cloud migration and automation projects which are critical for both business growth and to remain competitive.

DuploCloud provides a new no-code based approach to devops automation.

With several dozen customers in regulated industries across Publicly listed enterprises, SMBs and MSPs, we are able to show productivity improvements across their teams. They are able to do a lot more with a lot less such that in house DevOps teams can focus more on other application related improvements instead of worrying about infrastructure, security and compliance. The three key advantages of using DuploCloud are:

  1. 10X faster automation
  2. Out-of-box secure and compliant application deployment
  3. 70% less cloud operating costs