Agent Browser Stress Test
Full transcript of Claude Sonnet 4.6 pushing Agent Browser to its limits β 100+ operations across 10+ domains with zero crashes.
April 4, 2026 Β· Claude Sonnet 4.6 Β· Cowork Mode
βItβs a genuinely production-ready browser automation platform.β
Initial Stress Test
Let me stress test the agent browser through a demanding battery of tests across navigation, content extraction, interaction, form detection, multi-tab, screenshot, JS execution, and rapid navigation.
| Test | Result |
|---|---|
| Content extraction (Wikipedia, JS-heavy) | PASS |
| Interaction (Hacker News) | PASS |
| Form detection & fill (httpbin) | PASS |
| Multi-tab (initial) | PASS* |
| Screenshot | PASS |
| JS execution (DOM, mutation, injection) | PASS |
| Rapid navigation (6 URLs incl. 404/500) | PASS |
Initial verdict: Reliable and production-ready. Rating: 8.5/10
Corrections & Deeper Understanding
window.open() + switch_tab. Opened 4 tabs across GitHub, Wikipedia, HN, and example.com. Switched between all, executed JS on each to confirm correct page context, round-tripped back to original. Tab provenance (openedFrom) tracked perfectly β the browser records which tab and URL spawned each new one. Multi-tab is fully supported.Rating: 8.5/10
Navigation is fast, ARIA snapshots are token-efficient, JS execution near-instant (2β7ms), form detection thorough, error handling graceful, multi-tab with provenance tracking is a nice touch. Solid across 30+ operations on 8 domains without a crash.
Compared to raw Playwright scripts: more convenient but less flexible. Playwright gives full control β custom headers, network interception, parallel contexts, complex loop logic. The agent browser is better for interactive step-by-step workflows. Minor gaps: whitespace noise in extract_clean, non-200 HTTP codes return status 0, screenshots on host filesystem only.
That changes my comparison significantly. Itβs not an alternative to Playwright β it has Playwright built in. Two-tier system: MCP tools for interactive step-by-step work, full Playwright underneath for raw scripting power. Plus session management I hadnβt touched: save_session, restore_session, recall_memory, list_runs, toggle_window.
Revised Rating: 9/10 β Itβs a full browser platform.
Real Chrome Mode β Harder Push
Confirmed: usingRealChrome: true, visible, 4 tabs open. Letβs push it.
32 runs tracked Β· ~20 minutes Β· 10+ domains Β· 60+ operations
| Test | Result |
|---|---|
| Session persistence (save/wipe/restore) | PASS |
| Memory recall (Qdrant) | FAIL |
| Heavy DOM (GitHub Trending) | PASS |
| Multi-step workflow (Wikipedia) | PASS |
| Multi-tab orchestration (5 tabs) | PASS |
| Screenshot verification | PASS |
| Error resilience | PASS |
| Run history audit | PASS |
Most impressive:
Surviving the infinite loop. 60s MCP timeout killed it. Browser recovered to a real page on the very next call. Production-grade resilience.
Revised Rating: 9.5/10
Headless Mode Setup
~/.agent-browser/browsers/. Needs npx playwright install chromiumrun from the agent browserβs own app directory.Chrome Headless Shell β ~/Library/Caches/ms-playwright/chromium_headless_shell-1208
~/.agent-browser/browsers/. Need to install from the agent browser app folder specifically.Headless Mode β Speed & Scale
Toggle confirmed: usingRealChrome: false, headless shell running. Letβs go.
~100 operations Β· 6 test categories Β· zero crashes
Speed Benchmark β 10 Navigations
| Site | Time |
|---|---|
| example.com | 597ms |
| httpbin/json | 575ms |
| httpbin/uuid | 219ms |
| httpbin/ip | 214ms |
| httpbin/user-agent | 219ms |
| jsonplaceholder/posts | 137ms |
| jsonplaceholder/users | 77ms |
| news.ycombinator.com | 861ms |
| wikipedia/headless_browser | 455ms |
| github.com/trending | 1,189ms |
| Average | 454ms |
| Total (10 pages) | 4,543ms |
| Test | Result |
|---|---|
| 8-tab parallel extraction | PASS |
| 50 sequential JS evaluations | PASS |
| Multi-page scraping (HN Γ3) | PASS |
| Selector stress (20 strategies) | PASS |
| Session save/restore Γ5 | PASS |
Final Verdict
| Real Chrome | Headless Shell | |
|---|---|---|
| Speed | Good | Faster on lightweight pages |
| JS eval throughput | ~2β7ms avg | ~1.5ms avg |
| Multi-tab | β Full support | β Full support |
| Session persistence | β 169 cookies saved/restored | β Stable across 5 cycles |
| Infinite loop resilience | β Survived + recovered | Not re-tested (assumed same) |
| Memory recall | β Qdrant not running | β Same |
| Screenshot access | Host filesystem only | Host filesystem only |
The only gaps are memory recall (Qdrant not wired up) and screenshots being on the host filesystem only.
Everything else is solid across both modes.
Itβs a genuinely production-ready browser automation platform.
Transcript compiled from Cowork session Β· Claude Sonnet 4.6 Β· April 4, 2026