Skip to content
πŸ§ͺIndependent Stress Test

Agent Browser Stress Test

Full transcript of Claude Sonnet 4.6 pushing Agent Browser to its limits β€” 100+ operations across 10+ domains with zero crashes.

April 4, 2026 Β· Claude Sonnet 4.6 Β· Cowork Mode

9.5 / 10

β€œIt’s a genuinely production-ready browser automation platform.”

1

Initial Stress Test

Ish
Use agent browser
Claude
The web browser skill is loaded and ready. I have access to Playwright for headless browser automation, plus an agent browser for interactive browsing. What would you like me to do?
Ish
stress test it
Claude

Let me stress test the agent browser through a demanding battery of tests across navigation, content extraction, interaction, form detection, multi-tab, screenshot, JS execution, and rapid navigation.

TestResult
Content extraction (Wikipedia, JS-heavy)PASS
Interaction (Hacker News)PASS
Form detection & fill (httpbin)PASS
Multi-tab (initial)PASS*
ScreenshotPASS
JS execution (DOM, mutation, injection)PASS
Rapid navigation (6 URLs incl. 404/500)PASS
Content extraction (Wikipedia, JS-heavy): ~3,254 tokens. ARIA snapshot + markdown extraction clean.
Interaction (Hacker News): Live data loaded. Click on "More" in 310ms.
Form detection & fill (httpbin): 12 fields detected (text, tel, email, radio, checkbox, time, textarea). 7 filled successfully.
Multi-tab (initial): navigate() reuses tab β€” window.open() is the correct approach.
Screenshot: Captured and saved to host filesystem.
JS execution (DOM, mutation, injection): 3/3 evals in <12ms. Background changed, element injected.
Rapid navigation (6 URLs incl. 404/500): Error codes handled gracefully. Browser stable throughout.

Initial verdict: Reliable and production-ready. Rating: 8.5/10

Corrections & Deeper Understanding

Ish
it has multitab browsing
Claude
You’re right β€” tested properly using window.open() + switch_tab. Opened 4 tabs across GitHub, Wikipedia, HN, and example.com. Switched between all, executed JS on each to confirm correct page context, round-tripped back to original. Tab provenance (openedFrom) tracked perfectly β€” the browser records which tab and URL spawned each new one. Multi-tab is fully supported.
Ish
How would you rate this browser.. and how would you compare it to what you are used to
Claude

Rating: 8.5/10

Navigation is fast, ARIA snapshots are token-efficient, JS execution near-instant (2–7ms), form detection thorough, error handling graceful, multi-tab with provenance tracking is a nice touch. Solid across 30+ operations on 8 domains without a crash.

Compared to raw Playwright scripts: more convenient but less flexible. Playwright gives full control β€” custom headers, network interception, parallel contexts, complex loop logic. The agent browser is better for interactive step-by-step workflows. Minor gaps: whitespace noise in extract_clean, non-200 HTTP codes return status 0, screenshots on host filesystem only.

Ish
it has playwright aswell
Claude

That changes my comparison significantly. It’s not an alternative to Playwright β€” it has Playwright built in. Two-tier system: MCP tools for interactive step-by-step work, full Playwright underneath for raw scripting power. Plus session management I hadn’t touched: save_session, restore_session, recall_memory, list_runs, toggle_window.

Revised Rating: 9/10 β€” It’s a full browser platform.

2

Real Chrome Mode β€” Harder Push

Ish
then push harder.. but before you do.. would you like me to put it in headless or β€œreal chrome” mode?
Claude
Run Chrome first for visual verification, then switch to headless for the speed benchmarks, parallel loads, and session persistence tests.
Ish
kl.. start with chrome then I will switch
Claude

Confirmed: usingRealChrome: true, visible, 4 tabs open. Let’s push it.

32 runs tracked Β· ~20 minutes Β· 10+ domains Β· 60+ operations

TestResult
Session persistence (save/wipe/restore)PASS
Memory recall (Qdrant)FAIL
Heavy DOM (GitHub Trending)PASS
Multi-step workflow (Wikipedia)PASS
Multi-tab orchestration (5 tabs)PASS
Screenshot verificationPASS
Error resiliencePASS
Run history auditPASS
Session persistence (save/wipe/restore): 169 cookies + 2 origins saved, wiped, restored β€” exact match.
Memory recall (Qdrant): Backend not running / not seeded. Python subprocess errored.
Heavy DOM (GitHub Trending): Top 10 repos with structured data extracted in 4ms.
Multi-step workflow (Wikipedia): Search β†’ click result β†’ extract β†’ history.back() β†’ verify. Strict mode correctly blocked ambiguous selectors.
Multi-tab orchestration (5 tabs): Cross-tab JS extraction. Provenance tracking solid throughout.
Screenshot verification: Host-based architecture is a major advantage β€” sandbox Playwright had zero network access.
Error resilience: DNS failure, invalid URL (input validation), bad selector (5s timeout), JS throw (caught), infinite loop (60s MCP timeout β€” browser survived and recovered).
Run history audit: All 32 runs logged: runId, task, status, timestamps, step count, errors, final URL.

Most impressive:

Surviving the infinite loop. 60s MCP timeout killed it. Browser recovered to a real page on the very next call. Production-grade resilience.

Revised Rating: 9.5/10

Headless Mode Setup

Ish
headless ready
Claude
Toggle attempted β€” headless binary missing at ~/.agent-browser/browsers/. Needs npx playwright install chromiumrun from the agent browser’s own app directory.
Ish
Chrome for Testing 145.0.7632.6 (playwright chromium v1208) β†’ ~/Library/Caches/ms-playwright/chromium-1208
Chrome Headless Shell β†’ ~/Library/Caches/ms-playwright/chromium_headless_shell-1208
Claude
Installed at system level but agent browser looks in its own ~/.agent-browser/browsers/. Need to install from the agent browser app folder specifically.
Ish
done
3

Headless Mode β€” Speed & Scale

Claude

Toggle confirmed: usingRealChrome: false, headless shell running. Let’s go.

~100 operations Β· 6 test categories Β· zero crashes

Speed Benchmark β€” 10 Navigations

SiteTime
example.com597ms
httpbin/json575ms
httpbin/uuid219ms
httpbin/ip214ms
httpbin/user-agent219ms
jsonplaceholder/posts137ms
jsonplaceholder/users77ms
news.ycombinator.com861ms
wikipedia/headless_browser455ms
github.com/trending1,189ms
Average454ms
Total (10 pages)4,543ms
TestResult
8-tab parallel extractionPASS
50 sequential JS evaluationsPASS
Multi-page scraping (HN Γ—3)PASS
Selector stress (20 strategies)PASS
Session save/restore Γ—5PASS
8-tab parallel extraction: All 8 opened in <700ms. Extract per tab: 3–14ms.
50 sequential JS evaluations: 50/50, zero failures. ~72ms total β€” <1.5ms/eval avg. DOM queries, math, crypto, base64, storage, performance API.
Multi-page scraping (HN Γ—3): 90 stories across 3 pages. Each nav + extract under 1s.
Selector stress (20 strategies): 20/20. ID, tag, class, attr, prefix, contains, pseudo-class, descendant, child, closest(), ARIA role β€” all clean.
Session save/restore Γ—5: 5 consecutive cycles, no degradation, 15–16 cookies stable per cycle.

Final Verdict

Real ChromeHeadless Shell
SpeedGoodFaster on lightweight pages
JS eval throughput~2–7ms avg~1.5ms avg
Multi-tabβœ… Full supportβœ… Full support
Session persistenceβœ… 169 cookies saved/restoredβœ… Stable across 5 cycles
Infinite loop resilienceβœ… Survived + recoveredNot re-tested (assumed same)
Memory recall❌ Qdrant not running❌ Same
Screenshot accessHost filesystem onlyHost filesystem only
9.5 / 10

The only gaps are memory recall (Qdrant not wired up) and screenshots being on the host filesystem only.

Everything else is solid across both modes.

It’s a genuinely production-ready browser automation platform.

Learn More About Agent Browser

Transcript compiled from Cowork session Β· Claude Sonnet 4.6 Β· April 4, 2026