summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorrtk0c <[email protected]>2025-11-09 00:05:53 -0800
committerrtk0c <[email protected]>2025-11-09 00:05:53 -0800
commit6d214703e96de29dd9d329b6b7936a0e116d2adc (patch)
treefd0998564fbe2fc7dca1d5d5135d24795b7dc8e9
parentd5b2c02af69a4886cf038a32c727264227e64878 (diff)
poor-mans-selenium and adding "why, just why" tagHEADmaster
I love this
-rw-r--r--content/blog/hand-rolled-ngrok-over-protonvpn.md2
-rw-r--r--content/blog/poor-mans-selenium.md42
2 files changed, 37 insertions, 7 deletions
diff --git a/content/blog/hand-rolled-ngrok-over-protonvpn.md b/content/blog/hand-rolled-ngrok-over-protonvpn.md
index 49f9c83..43daf8d 100644
--- a/content/blog/hand-rolled-ngrok-over-protonvpn.md
+++ b/content/blog/hand-rolled-ngrok-over-protonvpn.md
@@ -2,7 +2,7 @@
title: "I paid for the whole vpn, so I'm damn well going to use the whole vpn"
summary: "Or: hand roll a ngrok with protonvpn port forwarding for shenanigans"
date: 2025-10-24T21:00:00-07:00
-tags: ["sysadmin"]
+tags: ["sysadmin", "why, just why"]
categories: ["Life of a sysadmin"]
---
diff --git a/content/blog/poor-mans-selenium.md b/content/blog/poor-mans-selenium.md
index f7c8dbb..d30839b 100644
--- a/content/blog/poor-mans-selenium.md
+++ b/content/blog/poor-mans-selenium.md
@@ -1,11 +1,41 @@
---
-title: "Poor Man's Selenium"
+title: "Poor man's selenium"
summary: "When chrome-driver just decides to not work on your development workstation today"
-draft: true
+date: "2025-11-09T00:00:00-08:00"
+tags: ["why, just why", "webdev"]
+categories: ["Web development"]
+ShowToc: false
---
-Idea: do what I did for certain website scraping, just write elaborate code inside the javascript console
+Sometimes if you just need some light web scraping, nothing too complicated.
+Other times, chrome webdriver will just decide to not work, but that's pretty rare.
+Or you really need to extract some useful bits of information, but you don't have selenium already installed handy. Maybe you're on a borrowed laptop that resets on every checkout.
-- `HTMLElement.click()`
-- `setTimeout` recursion, or hand-craft some async/await timeouts?
-- `HTMLElement.outerHTML` to stringify a DOM node
+In these cases, the devtools console will do. No need to take out selenium.
+
+The essence here is you can interact with the page in a few basic, but often good enough ways:
+
+- `document.querySelector`, obviously
+- `outerHTML` on HTMLElement to stringify a DOM node. Do some string manipulation to clean it up.
+ - optionally do some string or DOM processing (clone the node first)
+ - when done, copy the array to clipboard. Now you have scraped output!
+- simulate user clicking with `click()` on HTMLElement
+ - This might not always work, so your milage may vary.
+- study their code, and emulate some behaviors yourself to "interact" with the page
+ - For example, if button fetches HTML and replaces a subtree, you can just do the fetch yourself.
+ - Only really possible if it's using jQuery or simple vanilla JS.
+- for navigation...
+ - Replace `window.location.href` to navigate, but **be careful**: this wipes out your devtools console. If you really need to do this, spending some time to craft a full script that you can just paste into console will pay off.
+ - Modify `<a>` with `target="_blank"` to batch open links in new tabs.
+- call `setTimeout` recursively
+ - This isn't super easy to code. Consider quickly cobble together an async/await helper like https://stackoverflow.com/a/33292942
+
+...
+
+...
+
+...
+
+Ok I'll confess, the real reason is chrome webdriver broke on me.
+As they say: I did this not because it was easy, but because I thought it would be easy. Easier than selenium, which honestly I'm not sure if that's true.
+I guess it was kind of fun?