I was working with Tekton Pipelines recently & thought about the best way to attach a cache for build dependencies. In a cloud native setting, file storage can be a little tricky to work with, especially with PVCs only being attachable to one Pod at a time and some providers taking ages to move claims between nodes. I wanted a solution that was scalable and made sense, even for use outside of a pipeline.
Approach
Searching for how other people approached this I came across this fun solution by Chmouel in which they use a small uploader service that stores tarballs of go dependencies, which are retrieved & uploaded on every run. This got me to thinking: what if I just used container images as dependency caches? This would work in a similar way to how layers in container builds are cached, but replicable for unique pipeline runs.
I set the following conditions to make this realistic & usable in production:
- Containerfiles should still work standalone without the cache image by default
- The cache images shouldn’t make the resulting image bigger
The diagram below illustrates the general idea of what I came up with: Why just have build tools in the builder image?
flowchart TD subgraph Application[Application Source] code(Code) pm(Package Manifests) end subgraph ir[Image Registry] cimg(Cache Image) rs(Resulting Image) end runtimestage --> rs ccf --> cimg cimg --> fcc subgraph ccf[Cache Containerfile] fbt[FROM build-tools] --> cpm cpm[COPY package-manifest] --> rid rid[RUN install dependencies] end subgraph bcf[Build Containerfile] subgraph buildstage[Build Stage] fcc[FROM cache-image] --> ca ca[COPY application] --> rid2 rid2[RUN check dependencies] --> rub rub[RUN build] end buildstage ----> runtimestage subgraph runtimestage[Runtime Stage] caf[COPY artifact] --> epa epa[ENTRYPOINT artifact] end end ca --> Application cpm --> Application
Examples
All of the following examples can be found in this repository: https://github.com/konstfish/container-image-build-cache
Java
I set up a Spring app with an unnecessary amount of dependencies (Java Moment) & created two Dockerfiles. Dockerfile.cache
boils down to me setting maven as the base image, copying in the pom & then running mvn dependency:go-offline
. This results in a simple (but quite big) image containing both build tools and dependencies. Best place to store this imo would be some internal registry, either local, or the internal OpenShift registry.
Moving on, the Dockerfile
is more or less just a boilerplate multi-stage Java build, which ends up with a pretty minimal resulting image, only containing the build artifact jars and OpenJDK. Two build arguments are passed, specifying the build stage image and tag, which are set to the same base (maven) as the cache Dockerfile. This ensures, that the image will still build for anyone, even without the cache image. Maven also checks/updates packages before the build, should the pom.xml change from the last build of the cache image.
Building the cache with docker build --no-cache -f Dockerfile.cache -t bloat-cache .
takes about 27 seconds. Starting a build with docker build --no-cache -t bloat .
takes 36 seconds. Running the same build using the cache image as a base takes only 7 seconds docker build --no-cache --build-arg="BUILD_IMAGE=bloat-cache" --build-arg="BUILD_TAG=latest" -t bloat .
NodeJS
For this demo I used a Vue app with yarn. The same principles still apply, cache Dockerfile
which fetches node_modules
& a build Dockerfile to check dependencies, build a dist & copy it into an nginx runtime image. Build speeds were decreased by 16 seconds.
Go
Wanted a challenge with this one, so I opted to make a Go Workspace with a generic Dockerfile. Since go.work.sum
still lists all required dependencies, a common cache image also works well for this structure. Dockefile.cache copies in the entire working directory unfortunately, since I didn’t find a dynamic way to include all component go.mod files. Main difference in the Dockerfile is the additional build argument COMPONENT
which specifies which component to build & package into the resulting Dockerfile. This means the same Dockefile can be used for all applications in the Go Workspace. Build times went from 31s without the cache to 8s with it. Go also had the smallest resulting images since I’m pretty much just shipping alpine with a binary!
Additional Notes
If you don’t want to use a separate cache Dockerfile, you can achieve the same results by tagging the build stage of the main Dockerfile as the cache image. This can be done using docker build --target builder --no-cache -t bloat-cache .
note the target being set to the build stage.
Summary
In practice, I build these cache images on a 0 0 * * *
crontab & it has been working pretty well for me with builds looking faster across the board. Leveraging build stages ensures the cache doesn’t affect output images in size. Checking dependencies during the build stage ensures everything is up to date without taking a hit to speed.
Thank you for reading! ⛴️ Happy Containerizing! :)