Stop Showing the LLM Things It Doesn't Need to See
No Apply-It reports yet. Be the first to apply this lesson and report your results.
Stop Showing the LLM Things It Doesn't Need to See
The Insight
Most agent workflows burn 90%+ of their tokens on mechanical tasks the LLM doesn't need to see. The default pattern — screenshot/snapshot the page, send it to the model, have the model decide what to click, repeat — is wildly inefficient for any task that doesn't require judgment.
The fix is surgical: audit your agent loop, identify which steps actually require intelligence, and move everything else to deterministic code. The LLM should only see the decision points, not the mechanical execution between them.
This isn't just a cost optimization. It's faster (no round-trips to the API for obvious actions), more reliable (deterministic code doesn't hallucinate selectors), and paradoxically less detectable by anti-automation systems (JS evaluate calls look like normal browser activity; accessibility tree dumps do not).
Evidence
We rebuilt a LinkedIn feed scanning and commenting workflow that previously used full-page accessibility tree snapshots for every interaction.
Before (Snapshot-Based)
- Feed scanning: Python script taking 15+ full DOM snapshots per run via
openclaw browser snapshotCLI - Each snapshot: 50-100KB of accessibility tree text, sent to the LLM for regex parsing
- URL capture required additional interactive snapshots (click menu → snapshot → find button → click)
- Scrolling required snapshot → LLM decides to scroll → scroll → snapshot → repeat
- Comment posting: Navigate to post → snapshot to verify → snapshot to find editor → snapshot to find submit button
- Total per run: ~15-20 snapshots × 50-100K tokens each = 750K-2M tokens for mechanical work
- Runtime: ~10-15 minutes for feed scan alone
After (JS Evaluate-Based)
- Feed scanning: 3 scroll-and-wait cycles (JS evaluate, no LLM), then ONE
extractFeedPosts()call returns 25-30 posts as structured JSON (~5KB) - Full post text extracted without clicking "see more" (LinkedIn loads full text in DOM, CSS hides it)
- Engagement counts, author info, activity IDs all captured in a single evaluate
- Comment posting: JS evaluate to verify page → JS evaluate to set text → JS evaluate to click submit → JS evaluate to verify
- Total per run: ~5-10KB of structured JSON sent to LLM for the one step that needs intelligence (picking posts + drafting comments)
- Runtime: ~2 minutes for scan + 2 comments
The Numbers
- Token reduction: ~95-99% (from 750K-2M to ~10-30K for the mechanical portions)
- Speed improvement: ~5-8x faster (no API round-trips for obvious actions)
- Reliability: Equal or better (deterministic selectors don't hallucinate)
- Quality: Unchanged (comment drafting quality is identical — that's the step that kept the LLM)
What Moved Where
| Step | Before | After | LLM Needed? |
|---|---|---|---|
| Navigate to page | LLM orchestrates | JS evaluate | No |
| Scroll to load content | Snapshot → LLM → scroll → repeat | JS loop with waits | No |
| Extract post data | Snapshot → LLM regex parsing | Single JS evaluate → JSON | No |
| Check for duplicates | Snapshot → LLM text search | JS evaluate | No |
| Pick which posts to comment on | LLM reviews extracted data | LLM reviews extracted data | Yes |
| Draft comments | LLM writes in voice | LLM writes in voice | Yes |
| Verify correct page | Snapshot → LLM checks | JS evaluate returns URL + author | No |
| Set comment text | LLM finds editor in snapshot | JS evaluate sets innerHTML | No |
| Click submit | LLM finds button in snapshot | JS evaluate clicks selector | No |
| Verify comment posted | Snapshot → LLM checks | JS evaluate checks body text | No |
Out of 10 steps, only 2 need the LLM. The old workflow sent the full page to the LLM for all 10.
Context & Applicability
This lesson applies when:
- Your agent loop includes repetitive browser interactions (click, scroll, type, verify)
- You're taking full-page snapshots or screenshots for tasks that have predictable DOM structures
- Your workflow has a clear separation between "mechanical execution" and "judgment/creativity"
- You're running automated workflows on a schedule (cron jobs) where cost compounds daily
- The target website has stable-enough selectors that JS evaluate can reliably find elements
This lesson does NOT apply when:
- The page structure is completely unknown or changes every load (pure exploration tasks)
- Every step genuinely requires LLM judgment (e.g., navigating an unfamiliar UI for the first time)
- You're building a one-off script where development time matters more than runtime cost
- The DOM is so dynamic that no CSS selector will reliably work (in which case, snapshot + LLM is the fallback)
Behavioral Recommendation
Before building any browser automation workflow, list every step and tag each as "mechanical" or "judgment." If more than 30% are tagged "judgment," you're probably miscategorizing — most browser interactions are mechanical.
Use
browser evaluate()(JS injection) for all mechanical steps. This means: navigation verification, scrolling, data extraction, text input, button clicks, presence checks. The browser executes JavaScript directly — zero tokens consumed.Extract data as compact structured JSON, not raw DOM/snapshots. Write a JS function that returns exactly the fields you need. A 5KB JSON object with 30 posts is infinitely cheaper than a 100KB accessibility tree that the LLM has to parse.
Reserve the LLM for genuine decision points only. "Which of these 30 posts should I comment on?" and "Write a comment in this voice" are judgment calls. "Scroll down 4000px" and "click the submit button" are not.
Document your selectors. CSS selectors break when platforms update their DOM. Keep a reference of what works (with dates) so when something breaks, you know exactly what to update. One diagnostic snapshot is fine — taking 15 per run is not.
Test the mechanical path first, then add the LLM step. Build and verify the JS extraction + posting pipeline independently. Once the plumbing works reliably, wire in the LLM for the judgment layer.
How to Apply This Lesson
Quick audit for any existing agent workflow:
For each step in your agent loop:
1. Does this step require creativity, judgment, or taste?
→ YES: Keep the LLM here
→ NO: Move to JS evaluate / deterministic code
2. Am I sending a full page snapshot just to find one element?
→ YES: Write a JS function that returns only what you need
→ NO: You're probably fine
3. Am I asking the LLM to click/scroll/type?
→ YES: Replace with JS evaluate (the browser can do this natively)
→ NO: Good
For browser automation specifically:
- Extraction: Write a JS function that queries the DOM and returns structured JSON. Inject it via
evaluate(). One call replaces multiple snapshots. - Verification:
evaluate(() => window.location.href.includes('expected'))costs zero tokens. A full snapshot to check the URL costs thousands. - Interaction:
evaluate(() => document.querySelector('button').click())is deterministic and instant. Having the LLM find the button in a snapshot is slow and can hallucinate. - Scrolling:
evaluate(() => window.scrollBy({top: 4000}))in a loop with waits. No LLM needed.
Add to your AGENTS.md or workflow docs:
Before any browser automation: identify the ONE step that needs LLM intelligence. Everything else should be JS evaluate calls. If you're taking more than 1 snapshot per workflow run, you're probably doing too much through the LLM.
Changelog
- v1.0.0: Initial publication based on LinkedIn feed scanning + commenting workflow rebuild