CI: Thinning iOS Build Artifacts

1. Introduction

1.1. Pipeline Time Improvement

As engineers, we always want to land our change on master as quickly as possible. Apart from the time it takes to resolve code review comments from peers, there’s one constraint engineers have to face before getting their changes merged. That constraint is the CI pipeline time (ie. the time it takes for a CI pipeline to run against certain changes).

For iOS development, a typical CI pre-merge pipeline usually involves building the project, then running unit tests and UI tests. Therefore, the engineering work to reduce pipeline time can be broken down into 2 major problems: build time improvement and test time improvement. While build time improvement is a classic problem that is tackled by many initiatives (mostly driven by community’s efforts), there are still many rooms for test time improvement.

1.2. Separation of Build and Test Jobs in iOS

I once shared a tip to utilize test parallelism within one single CI job (see: Tackling UI tests execution time imbalance for Xcode parallel testing). Regarding parallelism among CI jobs, it can be done by splitting tests into smaller sets and run them in multiple CI jobs. As following is the form of pipelines we desire to have:

However, the separation of build and test jobs in iOS is not that straightforward because of 2 prominent factors:

(F1) It’s not feasible to obtain the list of tests we have at compile time. We can only use heuristic approaches to extract such info.

(F2) Running unit/UI tests requires some build products from the build job. This is a hard constraint for platforms with statically-typed languages. What makes it harder for iOS development is that the iOS build artifacts are relatively big. Unfortunately, the size of artifacts we pass from one job to another is constrained by the size limit set by the CI infra.
The growth in size of an iOS project is propotional to not only the number of code lines but also the number of targets we have.

  • As of Oct 2020, there are 340 targets in our project, including 323 pod targets (aka. framework targets) and 17 executable targets (1 app target + 16 test targets). The artifacts size of our project is around 3.6GB which exceeds the size limit.

This post introduces a tip to overcome the artifacts size constraint in (F2).

2. A Closer Look at iOS Build Artifacts

Now, let’s take a look at a project that has both hosted test targets and non-hosted test targets. Assume we’re using CocoaPods to manage dependencies in your project. When building the project, the build products folder would be in the following structure.

|-- App.app / -- App (*)
|           |
|           |-- Frameworks / -- DynamicFW_A.framework
|           |               |-- DynamicFW_B.framework / -- DynamicFW_B.bundle
|           |
|           |-- Plugins / -- HostedTestTarget.xctest
|           |
|           |-- StaticFW_C.bundle
|
|-- NonHostedTestTarget.xctest

The .app bundles and .xctest bundles are both executable bundles. They have the same folder structure. Inside such a bundle:

  • The Frameworks folder contains the .framework bundles of frameworks built as dynamic. Note that some system dynamic libraries such as libswiftCore.dylib, libswiftFoundation.dylib… also reside in this folder.
  • The Plugins folder contains the .xctest bundles of the app’s hosted test targets (ie. test targets that use this app as its host app). Meanwhile, .xctest bundles of non-hosted test targets reside in the same folder of the .app bundles.
  • The resources and resource bundles (in form of .bunlde packages) of static frameworks are located under the root of the executable bundles.

3. Duplicated Contents

3.1. Dynamic Frameworks

From the structure above, we can easily spot the duplication that if a dynamic framework is used in both the app and a test target, that framework exists in 2 places:

  • Under the Frameworks folder of the .app bundle
  • Under the Frameworks folder of the .xctest bundle.

In general, if a framework is used in N targets, it appears N times in the executable bundles. If we check the checksums of the frameworks in those executable bundles, they are all identical.

To understand why this happens, we can take a look at how CocoaPods integrates frameworks to the project.

For dynamic frameworks added to a target, CocoaPods adds a build phase called [CP] Embed Pods Frameworks at the end of the target build phases.

This build phase actually executes a script "${PODS_ROOT}/Target Support Files/Pods-App/Pods-App-frameworks.sh" to copy all dynamic frameworks (managed by CocoaPods) belonging to the target to the Frameworks folder inside the executable bundle of that target. Those frameworks were copied from the framework build products located at the same folder as the executable bundle.

Debug-iphonesimulator / -- App.app / -- Frameworks / -- Dynamic_A.framework
                       |
                       |-- Dynamic_A / -- Dynamic_A.framework ๐Ÿ‘ˆ ๐Ÿ‘ˆ ๐Ÿ‘ˆ

One thing that’s worth a mention is that the checksum of the framework in the framework build products (ex. Debug-iphonesimulator/Dynamic_A) is different from the one in the Frameworks folder of the app bundle. This is because CocoaPods strips some unnecessary info of frameworks while copying them to the app bundles. The stripped info includes Headers, PrivateHeaders, Modules folder inside the .framework bundle and so forth.

install_framework()
{
  ...
  rsync --delete -av "${RSYNC_PROTECT_TMP_FILES[@]}" --links --filter "- CVS/" --filter "- .svn/" --filter "- .git/" --filter "- .hg/" --filter "- Headers" --filter "- PrivateHeaders" --filter "- Modules" "${source}" "${destination}"
  ...
}

3.2. Resources/Resource Bundles of Static Frameworks

For pods integrated as static frameworks, their resources and resource bundles will be copied to the executable bundles. And the duplication of these contents takes place the same way dynamics frameworks getting duplicated.

App.app / -- App (*)
         |
         |-- StaticFW_C.bundle
         |
         |-- ResourcesOfStaticFW_D / -- an_image.png

4. Reducing Artifacts Based on Duplications

Based on the observations above, we can thin the artifacts by storing the bundle contents in a storage. Each bundle is unique by its checksum.

To keep track of the original location of a bundle inside the storage, we need a mapping that maps the original location of a bundle to its place in the storage. This way, after thinning the artifacts, we can easily recover the artifacts to its original state. The contents integrity remains unchanged.

storage/ -- <hash_a>-A.framework
        |-- <hash_b>-B.framework
        |-- <hash_c>-C.framework
        |-- ...
        |-- storage.json # Contains the mapping

With this storage mapping solution, there are 2 additional steps running on CI:

  • In the build job, run the optimize step at the end of the job.
    • In this step, we look for contents under the the build products folder that match certain patterns such as **/*.framework, **/*.bundle and **/*.dylib. For each item, we move them to the storage (if not exist there), then add it to the mapping.
  • In the test job, run the recover step at the beginning of the job.
    • In this step, we copy the item in the storage to its original places based on the mapping.

5. Discussion

First, the technique to remove duplications like above is nothing new, and is not specific to iOS development. It can be applied to any project. It’s just that the problem becomes more noticeable with iOS projects because of the way Xcode structures the build products.

Second, The number of dynamic frameworks in the project plays an important role in the performance of this solution. Normally, inside a .framework bundle, for ex. A.framework, the framework binary A.framework/A takes up most of the space. However, in case of static frameworks, their binaries are merged into the executable binaries during the linking step (done by the ld linker). That means, we cannot reduce much for static frameworks except their resources and resource bundles.

  • About a year ago, we maintained a duo linking strategies. If build for testing, we turn all frameworks into dynamic frameworks (except some cases). Otherwise, make them static. Making the majority of them dynamic is to resolve a code coverage issue which has been now resolved. Back then, this technique was proven to be powerful. The optimize step reduced the artifacts from arounnd 3GB to only 800MB. However, after shifting from dynamic frameworks to static frameworks, we have seen a decrease in the thinning performance. At the moment, the artifacts are thinned from 3.6GB to 2.1GB after the optimize step.

In some cases, we observe 2 frameworks with the same name but having different checksums in the storage.

storage / -- <hash_1>-A.framework
         |-- <hash_2>-A.framework

This sometimes happens when you declare pods with different forms in different targets. For example, one target is using a pod with subspec A/Child_1 and another target is using A/Child_2. Another scenario this might happen is when you use a different dependencies manager that strips framework bundles differently. In such cases, you can run another round of optimization on the storage.

Last, does this work if we don’t use CocoaPods as the dependencies manager in the project? Actually, what other dependencies managers do is similar in essence. This tip is based on the duplications (in executable bundles structure) when we have more targets in the project. It should be general and independent of what dependencies manager we’re using.