diff options
| -rw-r--r-- | content/blog/hand-rolled-ngrok-over-protonvpn.md | 2 | ||||
| -rw-r--r-- | content/blog/poor-mans-selenium.md | 42 |
2 files changed, 37 insertions, 7 deletions
diff --git a/content/blog/hand-rolled-ngrok-over-protonvpn.md b/content/blog/hand-rolled-ngrok-over-protonvpn.md index 49f9c83..43daf8d 100644 --- a/content/blog/hand-rolled-ngrok-over-protonvpn.md +++ b/content/blog/hand-rolled-ngrok-over-protonvpn.md @@ -2,7 +2,7 @@ title: "I paid for the whole vpn, so I'm damn well going to use the whole vpn" summary: "Or: hand roll a ngrok with protonvpn port forwarding for shenanigans" date: 2025-10-24T21:00:00-07:00 -tags: ["sysadmin"] +tags: ["sysadmin", "why, just why"] categories: ["Life of a sysadmin"] --- diff --git a/content/blog/poor-mans-selenium.md b/content/blog/poor-mans-selenium.md index f7c8dbb..d30839b 100644 --- a/content/blog/poor-mans-selenium.md +++ b/content/blog/poor-mans-selenium.md @@ -1,11 +1,41 @@ --- -title: "Poor Man's Selenium" +title: "Poor man's selenium" summary: "When chrome-driver just decides to not work on your development workstation today" -draft: true +date: "2025-11-09T00:00:00-08:00" +tags: ["why, just why", "webdev"] +categories: ["Web development"] +ShowToc: false --- -Idea: do what I did for certain website scraping, just write elaborate code inside the javascript console +Sometimes if you just need some light web scraping, nothing too complicated. +Other times, chrome webdriver will just decide to not work, but that's pretty rare. +Or you really need to extract some useful bits of information, but you don't have selenium already installed handy. Maybe you're on a borrowed laptop that resets on every checkout. -- `HTMLElement.click()` -- `setTimeout` recursion, or hand-craft some async/await timeouts? -- `HTMLElement.outerHTML` to stringify a DOM node +In these cases, the devtools console will do. No need to take out selenium. + +The essence here is you can interact with the page in a few basic, but often good enough ways: + +- `document.querySelector`, obviously +- `outerHTML` on HTMLElement to stringify a DOM node. Do some string manipulation to clean it up. + - optionally do some string or DOM processing (clone the node first) + - when done, copy the array to clipboard. Now you have scraped output! +- simulate user clicking with `click()` on HTMLElement + - This might not always work, so your milage may vary. +- study their code, and emulate some behaviors yourself to "interact" with the page + - For example, if button fetches HTML and replaces a subtree, you can just do the fetch yourself. + - Only really possible if it's using jQuery or simple vanilla JS. +- for navigation... + - Replace `window.location.href` to navigate, but **be careful**: this wipes out your devtools console. If you really need to do this, spending some time to craft a full script that you can just paste into console will pay off. + - Modify `<a>` with `target="_blank"` to batch open links in new tabs. +- call `setTimeout` recursively + - This isn't super easy to code. Consider quickly cobble together an async/await helper like https://stackoverflow.com/a/33292942 + +... + +... + +... + +Ok I'll confess, the real reason is chrome webdriver broke on me. +As they say: I did this not because it was easy, but because I thought it would be easy. Easier than selenium, which honestly I'm not sure if that's true. +I guess it was kind of fun? |
