1. What Is Appium?
Appium is an open-source, cross-platform UI automation testing framework covering mobile (Android / iOS) and desktop (Windows). Its core philosophy is to act as a “translator” — test scripts never directly manipulate the device; instead, Appium translates commands into each platform’s native API calls.
Three core advantages:
- No App modification required: Test production builds directly without injecting any SDK or re-packaging
- Cross-platform & cross-language: One API covers Android, iOS, and Windows simultaneously, with support for Python, Java, JavaScript, Ruby, C#, and more
- Cross-application type: Native apps, mobile browser pages (Web), and hybrid apps (Hybrid) are all covered
2. Core Component Inventory
A side-by-side comparison across the three platforms. Note that “component” does not equal “process” — some are standalone processes, while others are libraries loaded into other processes (see Section 7 for details):
| Layer | Android | iOS | Windows |
|---|---|---|---|
| Test Script | Written in any language | Same | Same |
| Client Library | In-process library, wraps calls into W3C WebDriver HTTP requests | Same | Same |
| Appium Server | Standalone Node.js process, listens on :4723, handles routing & session management | Same | Same |
| Driver Plugin | appium-uiautomator2-driver (loaded inside Server process) |
appium-xcuitest-driver (same) |
appium-windows-driver (same) |
| Cross-machine Channel | ADB (USB / TCP) | libimobiledevice | WinAppDriver (local HTTP or remote) |
| Device-side Agent | UIAutomator2 Server APK (:6790) | WebDriverAgent (:8100) | WinAppDriver.exe (standalone process) |
| System UI Framework | UIAutomator2 (Accessibility) | XCUITest | Windows UI Automation (UIA) |
| Test Target | Android App | iOS App | Win32 / WPF / WinForms / UWP applications |
Appium 1.x vs 2.x: In the 1.x era, all drivers were bundled inside the Server. Starting with 2.x, drivers became independent pluggable plugins installed on demand. However, regardless of the version, drivers always run inside the Appium Server process — they are never standalone processes.
Key differences on Windows:
- No need to install an agent APK/IPA on the device — Windows UIA is built into the OS; WinAppDriver calls it directly
- When the test machine and the device under test are the same PC, all components co-reside on one machine — no physical cables, no network traversal
- The test target is a desktop application window, not a mobile app
3. Architecture: The Three-Layer Model (All Platforms)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Test Script (any language)
│
HTTP · WebDriver Protocol
│
┌──────────────────┐
│ Appium Server │ ← Protocol translation & capability aggregation
└───┬─────┬─────┬──┘
│ │ │
platformName: android iOS windows
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────────┐
│ UIA2 │ │ XCUITest │ │ Windows │
│ Driver │ │ Driver │ │ Driver │ ← All inside Server process
└────┬─────┘ └────┬─────┘ └──────┬───────┘
│ ADB │ libimobiledevice │ HTTP (local or remote)
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────────┐
│ UIA2 │ │ WDA │ │ WinAppDriver │
│ Server │ │ (:8100) │ │ .exe (:4724) │ ← Standalone processes
│ (:6790) │ │ │ │ │
└────┬─────┘ └────┬─────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
UIAutomator2 XCUITest Windows UIA
│ │ │
▼ ▼ ▼
Android App iOS App Windows Desktop App
Appium Server routes requests to the appropriate Driver based on the platformName set during Session creation. The Driver then sends commands down to the device-side agent through the platform-specific channel for execution.
4. End-to-End Request Flow
4.1 Android (Classic Cross-Machine Flow)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
① driver.find_element(By.ID, "btn_login").click()
② Client Library → POST /session/{id}/element/click (W3C WebDriver)
③ Appium Server (:4723) → Look up platformName → Route to UIA2 Driver
④ UIA2 Driver → Forward via ADB TCP tunnel to the device
⑤ UIA2 Server APK (:6790) → Call Accessibility API
⑥ UIAutomator2 framework → Traverse control tree to locate the button
⑦ Android system injects MotionEvent → App receives onClick callback
⑧ Result returns along the same path in reverse
4.2 Windows (Local Same-Machine Flow) ⭐
When the test machine and the app under test both run on the same Windows PC, the chain is significantly simplified:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
① driver.find_element(By.NAME, "Calculate").click()
② Client Library → POST /session/{id}/element/click (W3C WebDriver)
③ Appium Server (:4723) → Look up platformName → Route to Windows Driver
④ Windows Driver → Forward via HTTP to WinAppDriver (localhost)
⑤ WinAppDriver → Call Windows UI Automation COM API
⑥ UIA → Locate the target button within the window's control tree
⑦ System injects click event → Desktop app receives callback
⑧ Result returns along the same path in reverse
Same-machine characteristics:
- All components (script, Server, Driver, WinAppDriver, app under test) co-reside on a single Windows PC
- No USB cable, no network traversal — everything communicates over localhost
- WinAppDriver plays the role of “device-side agent”, but it is just a regular Windows process — no root/jailbreak/Accessibility permissions required
- Ultra-low latency, easy debugging: set breakpoint → step through → watch the desktop change instantly
4.3 iOS
The flow structure is identical to Android; replace the device-side chain with WebDriverAgent → XCUITest → iOS system.
5. How Does Appium Test Browsers? Is It Like Selenium?
Appium’s browser testing looks similar to desktop app testing on the surface, but the underlying mechanism is entirely different. There are two cases to distinguish.
5.1 Mobile Browsers (Appium’s Core Web Scenario)
When testing Chrome on Android or Safari on iOS, Appium’s architecture looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Test Script
│ driver.find_element(By.CSS, ".login-btn").click()
│ ↑ Note: CSS/XPath selectors, not UIA/Accessibility
▼
Appium Server (:4723)
│ Based on session config, determines current context is WebView
│ Delegates WebDriver commands to ...
▼
Chromedriver (Android) or Safari Remote Debug (iOS)
│ ← A subprocess managed by Appium, NOT an Appium Driver
│ ← Protocol: Chrome DevTools Protocol or WebKit Remote Debug
▼
Browser Process (Chrome / Safari)
│ Locates the element within the rendered DOM
▼
Button on the page gets clicked
Key differences from desktop app testing:
| Desktop App (Windows Driver) | Mobile Browser (Appium) | |
|---|---|---|
| Locator Strategy | By.NAME, By.CLASS_NAME, AutomationId — UIA control tree |
By.CSS, By.XPATH, By.ID — DOM tree |
| Underlying Path | WinAppDriver → Windows UIA | Appium-managed Chromedriver → Chrome DevTools |
| Context Switching | Not needed; only one Native context exists | Required: driver.switch_to.context('WEBVIEW_xxx') |
| Element Inspector | inspect.exe / Accessibility Insights | Chrome DevTools |
5.2 Desktop Browsers
Appium cannot test desktop browsers as “browsers”. Appium’s Windows Driver can only see the browser as a Win32 window — it can see the menu bar, address bar, and tab bar as browser chrome controls (UIA controls), but it cannot see the DOM inside the page.
If your goal is to test web page content, use Selenium directly:
1
2
3
4
5
6
7
# Correct: Selenium for desktop browser testing
from selenium import webdriver
driver = webdriver.Chrome()
driver.find_element(By.CSS, ".btn").click() # Operates directly on the DOM
# Wrong: Appium for desktop browser testing
driver.find_element(By.NAME, "Address bar") # Can only find browser chrome controls
5.3 Summary: Appium vs Selenium for Browser Testing
| Dimension | Appium Testing Browser | Selenium Testing Browser |
|---|---|---|
| Element Location | DOM selectors (CSS / XPath) — same as Selenium | DOM selectors |
| Underlying Engine | Appium proxies Chromedriver — one extra hop | Chromedriver talks directly to browser |
| API Surface | find_element() |
find_element() — nearly identical |
| Context Management | Manual context switching NATIVE_APP ↔ WEBVIEW required | None (always within the page) |
| Use Case | Mobile hybrid apps requiring switching between native pages and WebViews | Pure web testing |
Bottom line: Testing mobile browsers with Appium feels similar to Selenium on the surface (both use CSS/XPath, both speak WebDriver protocol), but there is an extra Appium proxy layer underneath. Appium’s unique value lies in context switching between Native and WebView — something Selenium cannot do. As for desktop browsers, use Selenium directly; Appium simply cannot reach the page DOM.
6. Desktop App vs Browser Automation: Full Technical Comparison
Bringing together the previous two sections, here is a complete side-by-side comparison of the three automation scenarios:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Scenario A: Desktop Native App Scenario B: Mobile Browser Scenario C: Desktop Browser
Appium Client Appium Client Selenium Client
│ WebDriver │ WebDriver │ WebDriver
▼ ▼ ▼
Appium Server Appium Server ChromeDriver
│ │ │
Windows Driver Chromedriver (subprocess) Chrome DevTools
│ │ │
WinAppDriver Chrome DevTools ▼
│ │ Chrome Browser
Windows UIA ▼ │
│ Chrome Browser DOM
▼ │
Desktop App DOM
| Dimension | Desktop Native App | Mobile Browser | Desktop Browser |
|---|---|---|---|
| Tool | Appium + Windows Driver | Appium + Chromedriver | Selenium + ChromeDriver |
| Location System | UIA control tree | DOM | DOM |
| Context | Single (Native) | Switchable (Native ↔ WebView) | Single (Web) |
| Element Selectors | Name, ClassName, AutomationId | CSS, XPath, ID | CSS, XPath, ID |
| Hop Count | 3 hops | 3 hops (Appium → Chromedriver → Browser) | 1 hop (Selenium → Browser) |
7. Windows Process Topology
Distinguishing between “logical components” and “OS processes” is key to understanding Appium’s architecture.
7.1 Which Components Must Run as Standalone Processes?
When testing a desktop application on Windows, the actual running processes are:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
┌──────────────────────────────────────────────────────┐
│ Windows PC │
│ │
│ ┌─────────────┐ HTTP (:4723) ┌───────────────┐ │
│ │ python.exe │ ───────────────→ │ node.exe │ │
│ │ (test script)│ │ (Appium Server │ │
│ │ + Client Lib │ │ + Windows │ │
│ └─────────────┘ │ Driver) │ │
│ └───────┬───────┘ │
│ │ │
│ HTTP (:4724) │
│ │ │
│ ┌───────▼───────┐ │
│ │ WinAppDriver. │ │
│ │ exe │ │
│ └───────┬───────┘ │
│ │ UIA COM │
│ ┌───────▼───────┐ │
│ │ TargetApp.exe │ │
│ │ (app under │ │
│ │ test) │ │
│ └───────────────┘ │
└──────────────────────────────────────────────────────┘
4 processes in total:
| Process | Executable | When Started | Must Stay Running? |
|---|---|---|---|
| Test Script | python.exe / java.exe |
During test execution | No — exits when tests finish |
| Appium Server | node.exe (running appium command) |
Manually started before tests | Yes — throughout the test session |
| WinAppDriver | WinAppDriver.exe |
Manually started before tests | Yes — throughout the test session |
| App Under Test | TargetApp.exe |
Launched by Appium during tests or pre-launched manually | Exists for the session duration |
7.2 What Is NOT a Standalone Process?
| Logical Component | Actual Form | Where It Runs |
|---|---|---|
| Client Library | Python/Java package (e.g., Appium-Python-Client) — pure library code |
Inside the test script process |
| Windows Driver | Appium 2.x plugin (npm package), loaded via require() |
Inside the Appium Server process |
| UIA | COM component built into Windows OS | Kernel/system service, not a standalone process |
7.3 Complete Startup Sequence
For same-machine Windows testing, the operational order is:
1
2
3
4
5
6
7
8
9
10
# 1. Start WinAppDriver (Appium can also auto-launch it if configured)
"C:\Program Files\Windows Application Driver\WinAppDriver.exe"
# 2. Start Appium Server
appium
# 3. Run the test script
python test_calculator.py
# → Script auto-exits after execution
# → Appium Server and WinAppDriver stay running for subsequent test reuse
7.4 Process Count Comparison with Mobile
| Windows (Same-Machine) | Android Physical Device | iOS Physical Device | |
|---|---|---|---|
| PC-side processes (excl. script) | 2 (Appium Server, WinAppDriver) | 2 (Appium Server, ADB daemon) | 1 (Appium Server) |
| Device-side processes | 0 (app under test is also a PC process) | 2 (UIA2 Server APK, app under test) | 2 (WDA, app under test) |
| Total | 4 | 5 | 4 |
8. Why This Architecture?
The chain contains layered proxies — this is the core design of Appium’s architecture:
1
2
3
4
5
6
7
8
9
Test Script ──→ Client Library ──→ Appium Server ──→ Driver Plugin
│
┌────────────────────────────────┘
▼
Device-side Agent
(UIA2 Server / WDA / WinAppDriver)
│
▼
System UI Framework
8.1 Client ↔ Server: Cross-Language + Cross-Platform
Each platform’s underlying UI framework (UIAutomator2 / XCUITest / Windows UIA) uses its own proprietary protocol, and it is impractical for every language SDK to adapt to each one individually. Appium Server consolidates this complexity:
- Language decoupling: All language Clients only need to speak the universal “WebDriver lingua franca”; the Server alone handles the heavy lifting of translating to each underlying protocol
- Platform abstraction: Whether testing Android, iOS, or Windows, the script always calls
element.click()— the Server auto-routes based onplatformName - Capability aggregation: Session management, recording/playback, screenshots, logging, and other shared capabilities are uniformly implemented at the Server layer
8.2 Driver ↔ Device-side Agent: Crossing Security Boundaries
- Android / iOS: The system security model prohibits external processes from directly manipulating the UI. A privileged agent process must run inside the device to reach the control tree — that is the role of UIA2 Server APK or WDA
- Windows: WinAppDriver is also an independent local agent process that accesses the desktop control tree through the UIA COM interface. Windows’ security boundary is more relaxed than mobile (within the same user session), so the chain is shorter
8.3 The Long-Term Value of Decoupling
Mobile operating systems and official testing frameworks iterate frequently (e.g., Apple migrated from UIAutomation to XCUITest; Microsoft continuously updates UIA). Because the Server and Driver are decoupled through the plugin mechanism:
- When the underlying framework undergoes a major overhaul, only the corresponding Driver plugin needs to be updated or replaced
- Test scripts require zero refactoring — long-accumulated test assets are preserved
The Windows same-machine scenario perfectly demonstrates this flexibility: even though the chain is short (no USB/network layer), the plugin model remains the same — from Appium Server’s perspective, the Windows Driver is a peer plugin alongside all others, with no architectural special-casing.
9. Summary
Appium Server is a middle platform for protocol translation and capability aggregation. The chain may appear long, but it trades negligible transmission latency for: absolute programming language freedom, highly unified cross-platform code logic, and extreme extensibility for future upgrades.
| Scenario | Process Count | Typical Latency Sources |
|---|---|---|
| Android / iOS physical devices | 4–5 | USB transmission + network round-trips |
| Android / iOS emulators | 4–5 | Emulator virtualization overhead |
| Windows desktop app (same-machine) | 4 | Nearly negligible (all localhost) |
-
Previous
领域驱动设计(Domain-Driven Design)知识指南 -
Next
Appium in Practice: Writing Test Cases and Locating Elements