CI/CD as it applies to common architectures
This first post kicks off a series on how CI/CD is adapting for AI-first applications. We’ll explore why CI/CD matters, the benefits it brings, and why we do what we do today, and how it would evolve.
Introduction
Engineering teams have long embraced continuous integration and deployment (CI/CD) to respond to code changes, build, test, and deploy software efficiently. It’s a powerful way to prevent code drift and ensure issues are addressed and rolled out frequently. CI/CD also encourages the development of robust testing suites that teams can trust—enabling them to delegate deployment decisions with confidence.
But does the same approach work for AI/ML? Conceptually, yes. Technically, however, the tools, timelines, and feedback loops differ significantly.
This post aims to highlight the core benefits of CI/CD and uncover some of its hidden value—to establish parallels in AI/ML workflows, identifying additional requirements, and exploring what CI/CD should look like in the context of AI/ML.
My ultimate goal is to explain how we can pragmatically bridge the gap between traditional CI/CD and AI/ML workflows, while still achieving the objectives and benefits expected from CI/CD in conventional software architectures. That deeper dive, however, will be the focus of a future post.
If you're already familiar with the synergy between CI/CD strategies and major architectural shifts—feel free to skip ahead to the Key Observations for CI/CD strategies section. I also plan to explore newer AI-infused architectures and their impact on CI/CD in an upcoming post.
Background - CI/CD strategies
We’ll begin by exploring the objectives and benefits of CI/CD, and how it has evolved over the past 25 years, as the first CI product I am aware of was around 2001.
In my view, there have been five major eras of CI/CD, each shaped by the maturity of the IT industry and the dominant architectural paradigms of the time:
Monolithic Applications Dominated by traditional software and early web services through the mid-2000s.
Microservices and A/B Deployments Introduced more granular deployment strategies, many of which remain in practice today.
Macroservices An evolution of microservices, emphasizing simplified API consumption through well-defined aggregators and orchestrators.
Software-Defined VMs Gained traction in the early 2010s with tools like Docker, enabling control over machine and OS configurations.
Software-Defined Infrastructure (IaC) Encompasses network, infrastructure, and security as code—accelerating adoption in the latter half of the 2010s and continuing to reshape industry practices.
While pub-sub and event-driven architectures were widely adopted during the microservices era, I’ve chosen to omit them from this post. From a CI/CD perspective, they align most closely with eras 2 and 3 due to their inherent separation of concerns (e.g., publisher/subscriber or producer/consumer) and the need for versioned schemas for messages and events.
To understand how CI/CD evolved to support these architectural and engineering shifts, let’s dig a bit deeper. Note: I’ve also excluded supply chain risk management topics—such as vulnerability scanning, SBOM creation, and validation—as these are additional layers that can be integrated into CI/CD pipelines and should be supported by all vendors.
Monolithic Applications
Architecture A single unit of deployment—where all code is built and deployed together.
Code Management The simplest approach was using monorepos, but alternatives existed. More complex setups could coordinate multiple codebases using shared strategies such as common branch names, naming conventions, or full-rebuild mechanisms.
Continuous Integration (CI) Build order was determined by intra-dependencies or a statically defined dependency graph. Advanced CI systems supported parallel builds of independent components, while simpler systems executed builds sequentially.
Artifacts CI outputs were typically snapshotted and included pre-install and post-install scripts to support deployment..
Continuous Deployment (CD) Snapshot artifacts were deployed to pre-configured servers. While strategies existed to minimize impact on live workloads, deployments often occurred during operational windows. Multi-site environments (e.g., DEV, TEST, PROD) were common, with formal maturation steps such as acceptance testing. Deployment scripts were executed as part of the rollout process.
Testing
CI Phase Included unit tests and component-level integration tests. Builds would halt on test failures.
CD Phase Ran integration and acceptance tests on the deployed system. Failures triggered rollbacks to previous versions.
Microservices
Architecture Each service is designed to be independently deployable and upgradeable, with clearly defined minimum version requirements for downstream dependencies. Upstream services may encounter multiple concurrently running versions, but backward compatibility ensures that consumers remain unaffected.
Code Management A common pattern is one repository per service, promoting separation of concerns. The challenge lies in managing inter-service dependencies, maintaining a comprehensive test suite, and ensuring backward compatibility—since any exposed interface is effectively “public.”
Continuous Integration (CI) Each repository is built independently, producing versioned artifacts. These artifacts are snapshotted and may include redundant copies of shared dependencies across services.
Artifacts CI outputs are versioned and snapshotted per service. Shared dependencies may be duplicated across services, depending on build and packaging strategies.
Continuous Deployment (CD) Artifacts are deployed to pre-configured environments. To minimize disruption, deployment strategies typically include incremental approaches such as blue-green or rolling deployments. Multi-environment setups (e.g., DEV, TEST, PROD) are used to validate deployments through structured maturation steps like acceptance tests and one-box validations.
Testing
CI Phase Executes unit tests and component-level integration tests. Builds are halted on failure.
CD Phase Runs integration and acceptance tests on the deployed service. Failures trigger rollback to the previous version—on a per-service basis.
Macroservices
Architecture A coordinated set of services designed to work together and expose a unified public API. This simplifies consumer interactions across multiple microservices and centralizes orchestration. Upstream consumers interact only with the public API, which may support multiple concurrently running versions. Backward compatibility ensures that consumers remain unaffected.
Code Management An optimized approach often involves one or more monorepo’s encompassing the tightly coupled services that are deployed together. This reduces complexity and limits the “public” surface area requiring backward compatibility guarantees.
Continuous Integration (CI) Multiple repositories (or components within a monorepo) are built either in batches or incrementally. Aggregated, versioned artifacts are snapshotted for deployment.
Artifacts CI outputs are aggregated and snapshotted. Macroservices benefit from reduced artifact size due to shared dependencies being bundled together.
Continuous Deployment (CD) Aggregated artifacts are deployed to pre-configured environments. Deployment strategies typically include incremental approaches such as blue-green or rolling deployments to minimize disruption. Multi-environment setups (e.g., DEV, TEST, PROD) support structured validation through acceptance tests, one-box tests, and other maturation steps.
Testing
CI Phase Executes unit tests and component-level integration tests. Builds are halted on failure.
CD Phase Runs integration and acceptance tests on the deployed aggregate. Failures trigger rollback to the previous version of the macroservice bundle.
Software defined-VMs
Architecture Artifacts are packaged into containers (e.g. Docker), which include a predefined OS version and core operating system dependencies, in addition to the produced software bundle. This encapsulation ensures consistency across environments.
Code Management A common practice is to include a Dockerfile
alongside the application code. This streamlines the developer workflow—enabling code changes, container builds, and local testing within a single repository.
Continuous Integration (CI) Containers are built, scanned for vulnerabilities, and published to a container registry—typically under a development tag or branch.
Artifacts Containers must be stored immutably in a suitable container repository. Best practice involves separating development-stage containers from officially published versions to maintain integrity and traceability.
Continuous Deployment (CD)
Container Distribution Containers may be redistributed to compute-local caches or mirrored across official registries to improve resiliency and reduce latency during image pulls.
Application Configuration Applications consuming the container must be updated to reference the new version. Even when using the
latest
tag (not recommended), a deployment cycle is required to ensure the new image is picked up.Post-Deployment Testing Critical in this model, especially when containers are produced by different teams or used in varied contexts (e.g. jobs, services). Validation ensures compatibility and correct behavior across use cases.
Testing
CI Phase Runs unit and component-level integration tests. Developers are expected to validate container behavior locally before committing changes to the pipeline. Broader testing is deferred to CD stages. Container optimizers and checkers could be used to reduce the number of layers, container size etc.
CD Phase Since containers can be consumed in multiple ways, post-deployment testing is performed within the application context. Acceptance tests are essential, particularly given the lack of standardized practices for testing containers outside their consuming applications.
Software-Defined Infrastructure
Architecture The runtime infrastructure stack and hosting environment are defined through code—commonly using tools like Terraform. There is natural overlap with software-defined VMs: the container or VM image must be published to a repository, and the infrastructure must be configured to reference that version (e.g. a Kubernetes pod pointing to a specific container image).
Code Management A layered design is recommended, often involving nested providers (e.g. cloud infrastructure with Kubernetes on top). These layers may reside in separate repositories or a shared one, depending on ownership boundaries and review workflows. Ideally, tools should support scoped ownership, enabling targeted code reviews, partial builds, and selective deployments of affected layers and their dependents.
Continuous Integration (CI) Infrastructure specifications should be validated for semantic correctness (e.g. terraform plan
) and compliance (e.g. Sentinel policies). Advanced setups may use provider-specific CDKs and general-purpose languages like Python to compose the infrastructure and test it.
Artifacts The target-state intent must be captured in a format compatible with the deployment tool—such as kustomized YAML or Terraform state files. These artifacts are critical for rollback scenarios and for understanding the expected infrastructure configuration with all the inputs and transformations.
Continuous Deployment (CD) Infrastructure changes are deployed to target environments using strategies that minimize disruption (e.g. blue-green or rolling deployments). Multiple environments (e.g. DEV, TEST, PROD) are essential to manage risk and validate changes through structured maturation steps like acceptance tests and one-box validations.
Testing
CI Phase Unit tests may be executed if supported by the IaC toolchain.
CD Phase Acceptance tests validate the deployed infrastructure. Failures trigger rollback to the previous known-good state.
Post-CD Monitoring Drift detection should be proactive. Manual changes by SREs or DevOps teams during incident response may alter infrastructure state—these changes should be either incorporated or remediated to maintain consistency.
Key Observations for CI/CD strategies
Code as the Primary Input Across all CI/CD strategies, code—and its associated dependencies—is the foundational input. This code is transformed, processed, and tested, producing two critical outputs:
Artifacts and Evidence Includes logs, test results, and other metadata that document the CI/CD process.
Deployable Units Versioned packages, dependencies, and configurations ready for deployment.
CI and CD as Distinct Processes CI and CD are separable stages. The interface between them (often a dashed line) can be implemented in various ways—push vs. pull models, gated flows, or events and triggers.
Deployment Tracking and Site Management Deployable units can be deployed and redeployed across multiple environments. A single site may host multiple units, making precise tracking essential for rollback, compliance, and regulatory purposes.
Confidence Through Testing and Environments Multiple environments (e.g. DEV, TEST, PROD) and layered testing strategies are essential to build deployment confidence. While not strictly part of CI/CD, they are foundational to achieving “continuous” delivery.
Modular Pipelines and Deployment Strategies Modern architectures benefit from modular pipelines and targeted deployment strategies (e.g. blue-green, rolling updates). A single pipeline to “rule them all” may sound appealing—but it can also “bind” flexibility. (Yes, I’m a LOTR fan.)
So far, these principles should resonate with most developers—even if they are not fans of LOTR.
The Value We Get from CI/CD Systems
While the motivation for CI/CD is well-documented, there are subtler benefits worth highlighting—especially as we begin to consider AI, ML, and analytics workflows, which may require different CI/CD approaches.
Let’s start by asking a few foundational questions:
Why build in a CI system—can’t I just do it locally?
Why deploy from a CD system—can’t I push from my machine?
How do tests in CI/CD contribute to deployment confidence?
And what are the most valuable tests?
Building in a CI System
Historically, builds required significant compute and memory—beyond what developers had locally. Today, commodity hardware can often handle these tasks, but consistency remains a key concern.
Beyond hardware, CI systems ensure:
Consistent Dependency Resolution
Reproducible Outputs and Test Results
Verifiable Artifacts (e.g. SBOMs)
These objectives can be met locally using containerized pipelines, enabling edge-based CI workflows.
Summary Objectives:
Must scale and execute builds in reasonable time
Must produce consistent, reproducible, and verifiable deployable units
Deploying from a CD System
CD systems enforce separation of duties—ensuring that deployments to sensitive environments (e.g. PROD) are vetted, planned, and authorized. This includes:
Dry runs in staging environments
Adherence to maintenance windows
Blast radius assessments for critical components
This objectives cannot be easily met when running locally, since they can impact end users.
Summary Objectives:
Must enforce separation of duties for sensitive deployments
Must verify CI/CD evidence meets thresholds for each deployment unit and target
Tests During CI/CD
Tests for Deployment Confidence - A robust test suite should form a lattice of whitebox and blackbox tests. These should validate current functionality and guard against regressions (e.g. backward compatibility, serialization).
Effective tests are:
Meaningful (but not overly complex)
Comprehensive (without being burdensome)
Representative (including edge cases)
Deterministically Interpretable (even if results vary slightly)
Most Valuable Tests - Test value hinges on environmental control. Without it, tests become flaky and erode confidence. You have two options:
Control the entire environment
Create enclaves or partitions that ensure consistency
This leads to three key test categories:
Unit and Whitebox Tests Validate internal logic, edge cases, and error handling.
Integration, Acceptance, and Load Tests Detect system-level issues before reaching production.
Post-Deployment Tests Surface real-world issues affecting users—especially valuable when run in production enclaves.
Documenting and tracking what each test aims to cover, will contribute to a more informed choice of which tests strategy to employ in order to cover a particular aspect of a systems in the most valuable way.
The value objectives can be met, including running tests in production, with some rigor and architecture that can addresses at list the noisy neighbour problem.
Summary Objectives:
Must have a lattice of whitebox and blackbox tests
Tests must be meaningful, comprehensive, representative, and machine-interpretable
Track and optimize test value without sacrificing confidence
Ensure controlled environments for reliable execution
Design systems to support safe testing in production (e.g. enclaves, tenants, routable cells)
Wrap Up
How do these CI/CD principles shift when applied to AI/ML workloads?
Unlike traditional software, AI/ML deployable units rely not only on code, but also on data and—often—pretrained models. These inputs and outputs introduce new challenges:
Data and Model Dependencies Deployable units may require large datasets or models as inputs, and can produce similarly large artifacts as outputs.
Runtime and Compute Constraints Training and inference workloads can be prohibitively expensive or too slow for traditional CI systems. They often require specialized, high-cost compute resources.
Confidence Over Time Deployment confidence isn’t immediate—it’s earned through alignment between predicted and actual outcomes, evaluated against real-world data.
Non-Deterministic Test Results Especially with models like LLMs, test results may not be fully deterministic or not deterministically interpretable by machines.
I’ve dedicated this post to establishing a foundation—deconstructing CI/CD across common software architectures—to prepare for the nuanced conversation around AI-infused systems.
Understanding why we do CI/CD, and how it has evolved, is as fundamental today as using version control. But elevating CI/CD to support AI/ML workloads requires a shift in mindset and tooling.
This post should serve as a useful primer for entry-level engineers. The next one will dive deeper—targeting engineers building AI-infused applications, helping them evolve their CI/CD strategies by understanding the full spectrum of benefits and how to adapt them for AI-driven systems.
Everything I write are my opinions and perspectives and do not represent my past, current of future employers.
Great recap of CI/CD. Really curious on how "Non-Deterministic Test Results" should be handled in the new Agentic world. The new dimension of quality of the LLM response definitely adds another parameter to consider when identifying the test framework that is sufficient.