I was working with Tekton Pipelines recently & thought about the best way to attach a cache for build dependencies. In a cloud native setting, file storage can be a little tricky to work with, especially with PVCs only being attachable to one Pod at a time and some providers taking ages to move claims between nodes. I wanted a solution that was scalable and made sense, even for use outside of a pipeline.

Approach

Searching for how other people approached this I came across this fun solution by Chmouel in which they use a small uploader service that stores tarballs of go dependencies, which are retrieved & uploaded on every run. This got me to thinking: what if I just used container images as dependency caches? This would work in a similar way to how layers in container builds are cached, but replicable for unique pipeline runs.

I set the following conditions to make this realistic & usable in production:

  • Containerfiles should still work standalone without the cache image by default
  • The cache images shouldn’t make the resulting image bigger

The diagram below illustrates the general idea of what I came up with: Why just have build tools in the builder image?

flowchart TD
    subgraph Application[Application Source]
    code(Code)
    pm(Package Manifests)
    end

    subgraph ir[Image Registry]
    cimg(Cache Image)
    rs(Resulting Image)
    end

    runtimestage --> rs
    ccf --> cimg

    cimg --> fcc

    subgraph ccf[Cache Containerfile]
    fbt[FROM build-tools] --> cpm
    cpm[COPY package-manifest] --> rid
    rid[RUN install dependencies]
    end

    subgraph bcf[Build Containerfile]

    subgraph buildstage[Build Stage]
    fcc[FROM cache-image] --> ca
    ca[COPY application] --> rid2
    rid2[RUN check dependencies] --> rub
    rub[RUN build]
    end

    buildstage ----> runtimestage

    subgraph runtimestage[Runtime Stage]
    caf[COPY artifact] --> epa
    epa[ENTRYPOINT artifact]
    end
    end

    ca --> Application
    cpm --> Application

Examples

All of the following examples can be found in this repository: https://github.com/konstfish/container-image-build-cache

Java

I set up a Spring app with an unnecessary amount of dependencies (Java Moment) & created two Dockerfiles. Dockerfile.cache boils down to me setting maven as the base image, copying in the pom & then running mvn dependency:go-offline. This results in a simple (but quite big) image containing both build tools and dependencies. Best place to store this imo would be some internal registry, either local, or the internal OpenShift registry.

Moving on, the Dockerfile is more or less just a boilerplate multi-stage Java build, which ends up with a pretty minimal resulting image, only containing the build artifact jars and OpenJDK. Two build arguments are passed, specifying the build stage image and tag, which are set to the same base (maven) as the cache Dockerfile. This ensures, that the image will still build for anyone, even without the cache image. Maven also checks/updates packages before the build, should the pom.xml change from the last build of the cache image.

Building the cache with docker build --no-cache -f Dockerfile.cache -t bloat-cache . takes about 27 seconds. Starting a build with docker build --no-cache -t bloat . takes 36 seconds. Running the same build using the cache image as a base takes only 7 seconds docker build --no-cache --build-arg="BUILD_IMAGE=bloat-cache" --build-arg="BUILD_TAG=latest" -t bloat .

NodeJS

For this demo I used a Vue app with yarn. The same principles still apply, cache Dockerfile which fetches node_modules & a build Dockerfile to check dependencies, build a dist & copy it into an nginx runtime image. Build speeds were decreased by 16 seconds.

Go

Wanted a challenge with this one, so I opted to make a Go Workspace with a generic Dockerfile. Since go.work.sum still lists all required dependencies, a common cache image also works well for this structure. Dockefile.cache copies in the entire working directory unfortunately, since I didn’t find a dynamic way to include all component go.mod files. Main difference in the Dockerfile is the additional build argument COMPONENT which specifies which component to build & package into the resulting Dockerfile. This means the same Dockefile can be used for all applications in the Go Workspace. Build times went from 31s without the cache to 8s with it. Go also had the smallest resulting images since I’m pretty much just shipping alpine with a binary!

Additional Notes

If you don’t want to use a separate cache Dockerfile, you can achieve the same results by tagging the build stage of the main Dockerfile as the cache image. This can be done using docker build --target builder --no-cache -t bloat-cache . note the target being set to the build stage.

Summary

In practice, I build these cache images on a 0 0 * * * crontab & it has been working pretty well for me with builds looking faster across the board. Leveraging build stages ensures the cache doesn’t affect output images in size. Checking dependencies during the build stage ensures everything is up to date without taking a hit to speed.

Thank you for reading! ⛴️ Happy Containerizing! :)

Hits