WebDriverAgent - The Heart of iOS E2E Testing

In the previous post, we explored the overview of E2E testing in mobile development.

Today, we’re going to dive deeper into how E2E testing with Appium works on iOS.

1. Introduction to WebDriverAgent (WDA)

1.1. Running Your First Test With Appium

Here’s a simple code snippet to start your tests. For simplicity, we’ll run tests on simulators.

from appium.webdriver.webdriver import WebDriver
from appium.options.ios import XCUITestOptions

# Create a WebDriver
options = XCUITestOptions()
wd = WebDriver('http://127.0.0.1:4723', options=options)

# Activate the app
wd.activate_app('org.wikimedia.wikipedia')

# Finding elements and click
wd.find_element(by='accessibility id', value='Settings').click()
wd.find_element(by='accessibility id', value='Close').click()

In this test, we create a WebDriver with the remote URL set to http://127.0.0.1:4723, which is the default Appium server port. The Appium server can be started using the appium or appium server command.

With the given WebDriver, we can perform various actions, such as activating/launching the app, finding a certain UI element and click it. These actions will be reflected on the simulator.

1.2. Examining Appium Logs

Looking at relevant Appium logs, we notice that most of the requests to Appium were forwarded to another server on port 8100. For example, in the find_element call with accessibility id “Settings”, Appium sent a request at the following endpoint http://127.0.0.1:8100/session/86A3C426-6434-4CF1-9264-B022ACCE2A9D/element to get the response, transform it, and then return to the client.

The server on port 8100 is, in fact, a WebDriver server, a critical component for iOS E2E testing.

1.3. WebDriverAgent (WDA)

When running tests with Appium, you will notice another app called WebDriverAgentRunner installed alongside your app under testing (AUT). It is sometimes called WebDriverAgent, or WDA for short. This app is a WebDriver server implementation that allows you to launch & kill apps, tap & scroll views, or confirm view presence on a screen.

When running UI tests from Xcode, you also see the presence of an additional app AppUITests-Runner. This UITests-Runner app is very similar to the WDA in the sense that both use XCTest framework and Apple’s API to execute commands on a device. The main difference is that WDA establish a server that follows the WebDriver spec.

You can build the WDA from https://github.com/appium/webdriveragent, or from the project under ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent (if you already installed Appium with the xcuitest driver), or simply by running appium driver run xcuitest build-wda --name <SimulatorName>

Upon launch, WDA exposes two ports: 8100 and 9100.

Port 8100 is used for processing WebDriver commands (queries and actions). You can quickly check the status at http://localhost:8100/status.

$ curl http://localhost:8100/status

{
  "value" : {
    "build" : {
      "version" : "8.12.0",
      "time" : "Jan  1 2025 10:35:42",
      "productBundleIdentifier" : "com.facebook.WebDriverAgentRunner"
    },
    "os" : {
      "testmanagerdVersion" : 65535,
      "name" : "iOS",
      "sdkVersion" : "18.0",
      "version" : "18.1"
    },
    "device" : "iphone",
    "ios" : {
      "simulatorVersion" : "18.1",
      "ip" : "192.168.1.3"
    },
    "message" : "WebDriverAgent is ready to accept commands",
    "state" : "success",
    "ready" : true
  },
  "sessionId" : null
}

Meanwhile port 9100 is used for streaming the screen via MJPEG. You can try opening http://localhost:9100 on web browser to see the mirroring of the phone.

Ports 8100 and 9100 in this example are sometimes called WDA port and MJPEG port respectively.

1.4. WDA on Simulators and Devices

Note that the server on ports 8100/9100 is running on the device. You can verify the WDA status by opening http://localhost:8100/status in Safari on the phone.

For simulators, since they share the same network as the host machine, you will see those ports occupied on the computer once WDA is launched.

When launching WDA on physical devices, the device network is different from the computer’s. As a result, port forwarding is required to map the device’s ports to the computer. This can be done using tools like iproxy, which comes with libusbmuxd (can be installed via Homebrew).

$ iproxy -u <DEVICE_UDID> 8100:8100 9100:9100

This command binds the TCP ports 8100 and 9100 with those on the device over a USB connection. After running it, you should see ports 8100 and 9100 occupied, and both http://localhost:8100 and http://localhost:9100 should be accessible.

In fact, device farm solutions like Sonic or Appium Device Farm use a similar approach. For example, you can find the code running the iproxy command in the sonic-agent project here.

2. WDA in Device Farms

2.1. WDA and Remote Testing

As mentioned in the previous post, third-party device farm solutions like Sonic or Appium Device Farm not only help manage devices (e.g., determining whether one is available or occupied), but also allow you to operate the device through a web UI. The underlying implementation turns out to be simple. Since both tools function similarly, let’s explore how Appium Device Farm achieves this functionality.

As follows is the web UI when occupying a physical device. The web mirrors what’s displaying on the phone.

Tapping the Google Maps app and examining Appium logs, we notice that behind the scenes, the tool sends a request to WDA at this endpoint: /session/{session_id}/actions.

We can simulate the same request using curl, and then observe the same result, ie. the Google Maps app being activated.

Similarly, we can mimic other gestures by inspecting the WDA endpoint of the corresponding actions and their payloads, then performing curl requests accordingly. The following demonstrates a drag gesture on the map. As you can see, the map on the web UI reflects the drag action as soon as the request is fired.

In summary, when you interact with a device via the web UI, your gestures are translated into a series of W3C actions, which are then fed to WDA. Native iOS gestures can be mapped to a chain of web actions. For example, a long press gesture at coordinates (x, y) could be represented as follows:

pointer move to (x, y) → pointer down → pause 500ms → pointer up

It’s important to note that what you see on the web UI is just a live stream of what’s happening on the phone, not direct interaction with the device. This means the display may not immediately reflect your interactions. For example, in Appium Device Farm, while slowly dragging the map, it doesn’t move until a few seconds later, as shown in the GIF below. This delay occurs because the action chain is only submitted once the pointer is released. This logic is just a design choice of the tool, and of course, could be adjusted to improve responsiveness.

2.2. Launching WDA

Via xcodebuild

By default, without any special capabilities, Appium starts WebDriverAgent (WDA) using xcodebuild and waits until the WDA server is up and running.

This approach is simple, but not ideal for large-scale usage due to several reasons. First, the overhead time for xcodebuild to build and test the WDA project can be long, sometimes up to 15 seconds, even when a build cache is available. Additionally, xcodebuild is not a lightweight process, typically consuming around 120MB of memory per instance. This results in a total of 1GB for 10 devices.

Another challenge is that we need to properly setup provisioning profiles and certificates on the host machine in order to build the WDA project on real devices. This introduces additional maintenance cost for device farm operations.

A better approach is to pre-install WDA on the device and launch it without xcodebuild. Appium provides capabilities to test with a pre-installed WDA (see: here), which relies on simctl/devicectl under the hood.

Via simctl/devicectl

WDA (or any other app) can be launched using simctl for simulators and devicectl for devices.

For simulators:

$ xcrun simctl launch --terminate-running-process <SIMULATOR> com.facebook.WebDriverAgentRunner.xctrunner

For devices:

$ xcrun devicectl device process launch --terminate-existing --device <DEVICE> com.facebook.WebDriverAgentRunner.xctrunner

Where com.facebook.WebDriverAgentRunner.xctrunner is the bundle ID for WDA. This ID might vary depending on your provisioning profile setup.

Note that with iOS 17+, you need to remove XCTest-related frameworks under Frameworks/*.framework to successfully launch WDA via devicectl (see: here).

A best practice when launching WDA is to stream WDA logs, which can be redirected to a dedicated file for debugging purposes. This can be done using the --console option in simctl and devicectl:

$ xcrun simctl launch --terminate-running-process --console <SIMULATOR> com.facebook.WebDriverAgentRunner.xctrunner

Via cross-platform tools

While the previous options are decent, they are specific to macOS. If you are using a Linux server/machine to host iOS devices, there are a few cross-platform alternatives to explore. Sonic has developed its own in-house wrapper, sonic-ios-bridge (sib), which simplifies the process of launching WDA with the sib run wda command. However, this tool is only compatible with iOS versions prior to 17 (as of Jan 2025). This limitation is due to breaking changes in the communication mechanism in iOS 17, shifting from TCP-based communication to QUIC + RemoteXPC (see: here).

Starting at iOS 17.0, Apple refactored a lot in the way iOS devices communicate with the macOS. Up until iOS 16, The communication was TCP based (using the help of usbmuxd for USB devices) with TLS (for making sure only trusted peers are able to connect)

So far, the most dominant cross-platform tools that support iOS 17+ is probably pymobiledevice3.

2.3. Avoiding Port Collisions For Multiple Devices

The default ports used by the WDA proxy and MJPEG server are 8100 and 9100, respectively. When managing multiple devices, it’s essential to assign unique ports to each device to prevent port collisions.

For real devices, this is straightforward: regardless of the ports being used on the devices, we only need to ensure that the forwarded ports on the computer are different.

$ iproxy -u <DEVICE-X> <WDA-PORT-X>:8100 <MJPEG-PORT-X>:9100
$ iproxy -u <DEVICE-Y> <WDA-PORT-Y>:8100 <MJPEG-PORT-Y>:9100

However, for simulators, we need to specify custom ports for the WDA server because of the shared network. Luckily, WDA allows us to override these ports using two environment variables: USE_PORT and MJPEG_SERVER_PORT. To do this, set these variables with the SIMCTL_CHILD_ prefix when launching the server with simctl:

$ env SIMCTL_CHILD_USE_PORT=8101 SIMCTL_CHILD_[MJPEG_SERVER_PORT](https://github.com/appium/WebDriverAgent/blob/a1b5af6/WebDriverAgentLib/Utilities/FBConfiguration.m#L143-L145)=9101 \
    xcrun simctl launch --terminate-running-process <SIMULATOR> com.facebook.WebDriverAgentRunner.xctrunner

Similarly, you can launch WDA with custom ports on real devices using devicectl, using a similar approach (with the DEVICECTL_CHILD_ prefix):

$ env DEVICECTL_CHILD_USE_PORT=8101 DEVICECTL_CHILD_[MJPEG_SERVER_PORT](https://github.com/appium/WebDriverAgent/blob/a1b5af6/WebDriverAgentLib/Utilities/FBConfiguration.m#L143-L145)=9101 \
    xcrun devicectl device process launch --terminate-existing --device <DEVICE> com.facebook.WebDriverAgentRunner.xctrunner

2.4. Managing the WDA Life Cycle Yourself in Appium Tests

So far, we’ve discussed how WDA contributes to automation and enhancing the power of testing with Appium.

But why should we worry about WDA management when Appium already handles the heavy lifting, from building and launching WDA for testing, to port forwarding?

In my company, we initially let Appium manage WDA. Of course we use prebuilt and pre-installed WDA for simulators and devices to save time. However, we encountered several issues when running tests concurrently on multiple simulators/devices. These issues weren’t just performance-related; they also involved stability. For instance, if the WDA connection was interrupted during a test, Appium did not have a strategy - such as automatically relaunching WDA - to recover from the disruption. As a result, we decided to take control and manage the WDA life cycle ourselves.

By doing so, specify the 'appium:webDriverAgentUrl': <URL> capability in Appium tests (see: here). This tells Appium that the WDA was launched beforehand, so it doesn’t need to handle that part.

Managing the WDA life cycle ourselves has provided several advantages. First, it gives us more control when dealing with errors. For example, if WDA encounters issues (which are frequent on devices), we can simply relaunch WDA and retry the test. Another benefit arises when a WDA session has been running for a long period, potentially leading to intermittent issues. In this case, we prefer to keep WDA sessions short by restarting WDA after a few tests.

A further advantage of self-managing WDA is the ability to collect and isolate WDA logs for each test. This is crucial for troubleshooting and resolving issues effectively.

3. Summary

In this post, we have explored how WDA plays a critical role in E2E testing with Appium. Acting as the server that directly controls iOS devices remotely, WDA implements most of the WebDriver spec, making it the backbone of not only test automation but also remote testing solutions.

By understanding the role of WDA and how it functions within Appium, you can design your testing solution to be both more efficient and resilient, especially when dealing with device-related issues.

Stay tuned for more posts, where we’ll dive into various aspects of E2E testing, such as additional tools, strategies, and best practices.