- Simple Builds
- Caching libs
- What if I need to install some custom libs?
- What if I decide to add docker?
- What if I want to build and run something on docker?
GitHub Actions was a great addition to GitHub. They streamline the development workflow by keeping all continuous integrations and deployment processes inside the project’s repository instead of in an external tool. Developers can now build, test, and deploy projects on Linux, macOS, and Windows platforms within their GitHub repository. As the usual CI/CD platforms GitHub Actions also supports:
- Containers or virtual machines
- Multiple languages
- Multi-container apps
I have made some experiments in small projects with no complaints, it’s clean, easy to set up, and practical. However, I have also tested for bigger projects with complex build requirements. My use case was:
- I have a python project
- I have a test suit to run
- I have a specific list of requirements (including python and C libraries)
- I need a database
- I want tests to run in every commit
- I want tests to run fast
Let’s go through how GitHub Actions goes about satisfying these requirements for complex builds.
Simple builds on Github Actions are sweet. In this post, you will not find details on how simple builds work, so if you are new to CI/CD, or to what are actions the links below will help you foster an understanding of the topic:
- Continuous Integration: https://www.martinfowler.com/articles/continuousIntegration.html
- Continuous Deployment: https://www.atlassian.com/continuous-delivery/continuous-deployment
- GitHub Actions for Python projects: https://dan.yeaw.me/posts/github-actions-automate-your-python-development-workflow
This post will extend on a simple build described below:
name: Build on: push: branches: - "\*" - "****/****" jobs: test-linux: runs-on: ubuntu-latest strategy: matrix: python-version: [3.8.6] steps: - uses: actions/checkout@v2 - name: Set up Python $ uses: actions/setup-python@v2 with: python-version: $ - name: Install dependencies run: | pip install -r test-requirements.txt - run: | python3 -m pytest
All it does is to run tests on every push for every branch. Note that this is as simple as it can be.
The first improvement one can make is to cache dependencies, since they don’t change very often we can save some time during builds just by having them stored for later usage.
- name: Cache install test-requirements uses: actions/cache@master id: cache-pip with: path: ~/.cache/pip key: $-pip-$ restore-keys: | $-pip-
A few comments about how this works on Github Actions:
- It searches for a saved cache based on a key you pass
- If the actions don’t find a cache, the requirements are installed
- The job proceeds to run all the steps
- If everything passes the cache is saved for a next run
There are two details here. When using ` ~/.cache/pip` as the cache path the job is not caching the library installation, what it is doing is making use of the pip cache and merely caching the wheels that allow us to install the lib. This means that the cache is only avoiding a trip to the remote register while in runtime is still installing the libs.
One different approach is to cache the whole installed packages and by proxy the complete installation. In our case:
- name: Cache install test-requirements uses: actions/cache@master id: cache-pip with: path: /opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/ key: $-pip-$ restore-keys: | $-pip-
*OBS: The path changes for each platform1
My simple build using only pip cache took 14s to install all dependencies. After changing the cache to be the packages folder, the procedure now takes 3s to restore the cache and 3s for installing the libraries (outputting
Requirement already satisfied: for all libraries).
Even though the speed improvement is substantial, this approach is still fragile2 because it is sensitive to paths and OS versions. Another issue is that this doesn’t scale well, being that as your requirements grow this type of cache will get slower to make, to the point that you will have to weigh which one is faster: saving the cache and then restoring it, or just using the regular pip cache. In my real-life example caching the whole
site-packages folder took something around 20 min while the installation using pip cache only took 50s.
What if I need to install some custom libs?
What happens if you need to compile and build a lib on every job to run your tests? The answer is that you get a major slow down on build times. Github Actions cache allows us to extend the cache functionality by keeping the compiled files and then speed up the builds. As an example of such a library, I will use libpostal3 a C library for parsing street addresses around the world:
- name: Cache libpostal uses: actions/cache@v2 id: libpostal-cache with: path: $ # custom variable key: $-libpostal-cache - name: Install libpostal if: steps.libpostal-cache.outputs.cache-hit != 'true' run: | sudo apt-get update sudo apt-get install curl autoconf automake libtool pkg-config mkdir -p $ rm -rf $/libpostal cd $ git clone https://github.com/openvenues/libpostal cd libpostal ./bootstrap.sh ./configure --datadir=$ make -j4 sudo make install sudo ldconfig - name: Install test-requirements run: | cd $/libpostal sudo make install sudo ldconfig $ cd $PROJECT_ROOT # custom variable pip install -r test-requirements.txt
*OBS: CACHE_DIR and PROJECT_ROOT are custom variables only necessary to ease the visualization.
Github Actions make available the cache hits historic for each step of the pipeline allowing us to successfully skip some steps when the content is recovered from the cache. This feature allows us to separate the slow part from the quick part of building libpostal, the slow part being the compiling part. By caching the compilation result that lives inside the libpostal folder, we are always restoring the files allowing the installation to resume itself with
sudo make install && sudo ldconfig which runs pretty quickly.
What if I decide to add docker?
Docker is extremely useful in most situations, in this case, I’m using it to set a database for the tests. Docker Compose is a great tool to spin these on-demand containers, we can use it directly by adding:
- run: | docker-compose up --detach postgres python3 -m pytest docker-compose down
For the code above to work you will need a docker-compose.yml file with a Postgres service specification like this:
Postgres: image: postgres:latest ports: - "5432:5432"
What if I want to build and run something on docker?
Building and running images are a good fit for eventual jobs, for frequent builds (such as tests suits) this may turn out to be a major slow down. Depending on your Dockerfile builds may take a long time besides just the docker setup time already makes docker jobs slower than local jobs (if you are running tests for example). One speedup for complex setups is to pull from Dockerhub or ECR a built image, however, depending on the size of your image you will also have to cope with the time to pull on every run.
If you must build and run containers on-demand you will need to cope with no-cache since Docker Layer Caching is not natively integrated with GitHub Actions, some workarounds can be made but they are case-specific4. Some actions on the marketplace claim to give docker layer cache for GitHub Actions but they are not reliable for larger projects will frequent builds.
In this post, we discussed some ways to keep complex builds fast by using GitHub Actions features. We discussed cache-specific python libraries, cache on compiled libraries, multi-container apps, and on-demand docker builds and runs. Github Actions is a great tool equivalent to other CI/CD tools on the market, except for Docker Layer Caching which is not natively supported.