Article by Drew Tabor, an engineer at AddThis
Are you limited in moving to full CI/CD because of resource constraints? Are you having trouble scaling or concerned about performance? Or maybe you’re tired of waiting on your pipelines to run?
To explore how to improve all of the above, let’s dive into one of the core principles from the Agile Manifesto–the art of maximizing the amount of work not done.
What Your Pipeline Really Needs to Do
Take this example of a pipeline that builds and deploys to a test environment:
Seems like a reasonably intuitive pipeline, right? Obviously, you need to install your npm packages. You can’t deploy a build if there isn’t a build. And also, you want to make sure all your tests pass. Sticking your build in a Docker container (like in this example) and deploying it certainly doesn’t seem optional.
However, if we zoom out, we start to see some redundancies. Here’s a pipeline that deploys to a production environment. Notice that test already doesn’t run. A bit of foreshadowing…
I give you this:
“But Drew, I don’t want my code going straight to production immediately after the test environment!” you scream, shaking your laptop. “I need to do some manual testing first!”
Don’t worry. You can have your cake and eat it, too.
See the following:
click to deploy to prod:
stage: Begin_deploy_to_prod
script:
- <notification that prod is deploying>
when: manual
allow_failure: false
Adding the “when: manual; allow_failure: false” pauses the pipeline on this job and waits for you to resume the pipeline. Don’t want or need to deploy a particular commit from the test environment to production? Don’t. It won’t hurt anything.
Maximize your work not done.
The Tricky Part
GitLab CI has a caching mechanism we can use to do even less work if we set it up the right way.
How often do you update your dependencies? Probably not that often, right? And if you’re not updating them from one commit to the next, `npm install` or `npm ci` is doing the exact same thing from one pipeline to the next. If only there were a way to just run that when it needed to be run…
Well, as of October 2018, this is very easy to implement.
Before:
install_dependencies:
stage: install_dependencies
script:
- npm ci
artifacts:
paths:
- node_modules/
After:
install_dependencies:
stage: install_dependencies
cache:
key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR
paths:
- node_modules/
script:
- npm ci
only:
changes:
- package-lock.json
In the code block above, the purpose of install_dependencies has shifted from “install all node modules from scratch and pass them downstream” to “only update the cached node modules if they have changed.” Here is the documentation for `npm ci` for those not familiar: `npm ci.`
The build job, in turn, reads in the node modules from the cache instead of an artifact:
build:
stage: build
cache:
key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR
paths:
- node_modules/
policy: pull
script:
- gulp build
artifacts:
paths:
- build
Note the cache key – this will render out to something like drewsBranch-drewsProject. While it’s certainly possible to get this functional cross-branch, in my opinion, it introduces a lot of potential brittleness and edge cases that aren’t worth dealing with.
Best to keep it simple in this case and stick with one cache per branch. Be sure to specify that you only want to pull the cache in the policy – the build job doesn’t need to upload the cache again after running since nothing in it has changed.
Squeezing Out More Improvement
One last thing I want to touch on is selectively pulling artifacts. This is a very incremental improvement compared to previous strategies but can still compound into nice gains depending on the size of your project and number of stages in your pipeline.
In the example pipeline from above, the “build” job creates an artifact that is passed to downstream jobs. The downstream jobs, by default, download all upstream artifacts before starting the script you provide. However, you can specify which, if any, artifacts a given job actually needs!
For us, the only job that needs the build artifact is “dockerize,” so we can tell the remaining jobs not to download anything, thereby speeding them up even more:
test_deploy:
stage: deploy
script:
- <deploy script that pushes a remote Docker image to Kubernetes>
dependencies: []
This principle also applies to the cache – don’t pull it if you don’t need it!
Show Me the Numbers
This is all well and good in theory, but how about some real numbers to show a real picture? I took one of our projects at AddThis and gave it a CI facelift while writing this article. Here is the before and after:
Install dependencies
Before: ~3 minutes
After: 0* *Note: For master, without any changes to dependencies. More often than not this project does not have dependency changes
Time to deploy to test, from merge to master until finish
Before: ~7 minutes
After: ~4 minutes
Improvement: 43%
Time to deploy to production, from button click until finish
Before: ~7 minutes
After: ~1 minute
Improvement: 86%
Bonus Numbers: Resource Usage
We maintain our own runners and are keenly interested in resource usage as owners, in addition to speed as users. The more efficiently projects use the runners, the fewer resources we need to maintain the projects.
Before this facelift, two pipelines (one to test, one to prod) ran 11 jobs – Install Dependencies (2), Build (2), Test, Dockerize (2), Deploy (3), and Purge CDN (1).
Afterward, it only runs 7 – Build, Test, Dockerize, Deploy (3), and Purge CDN. This is a 37% improvement in jobs run! We can now support almost 5 projects instead of 3 with the same amount of resources.
What does your pipeline look like?