How easy is it to use one of these to do actual automation (not testing)? Say I ...

oefrha · on Nov 10, 2022

It’s pretty easy unless one or more of the following is true: (1) the site invalidates your session often and uses a strong CAPTCHA on login (you can hook up a CAPTCHA solving service for cheap if your usage frequency is low, if it’s a type supported by the solving services). (2) The site employs advanced automation detection and denies access, in which case you may be screwed. I’ve seen sites defeating puppeteer-extra-plugin-stealth before. (3) the site uses a div soup with no discernible structure and random CSS class names, and the information you want doesn’t have uniquely identifying features, e.g. you want to extract some numbers on the page without labels. In which case you might have to resort to CV in the worst case.

ur-whale · on Nov 10, 2022

Another annoying one is 2FA logins.

oefrha · on Nov 10, 2022

TOTP is easily automated. SMS-only is apparently difficult.

MetaWhirledPeas · on Nov 10, 2022

> Say I wanted to login to an SPA, navigate to a page and download a file?

Most testing tools make this kind of thing pretty easy. Cypress sits in the page's JavaScript and has access to a Node back end, so when you're not clicking on stuff you can be firing off requests or doing basically anything.

What you might find is that you don't need any fancy stuff though. Maybe look at the download request in Chrome Dev Tools and see if maybe you can just execute a POST command in your language of choice?

chrismarlow9 · on Nov 10, 2022

Sounds like you can use a macro on the OS level or a plugin like greasemonkey/tampermonkey.

brightball · on Nov 10, 2022

Never tried that. I’m basically trying to setup a cron to login to this site every day and download a CSV.

squeaky-clean · on Nov 10, 2022

Any of these would work fine for your idea and not be very difficult to do. (Assuming the site doesn't have any anti-bot measures). You'll need to create a cron that runs the automation script or opens a browser with your greasemonkey script installed.

Sometimes you can even do these with pure curl. POST the login form, get back the necessary cookie or token, then request the download URL, if it's easily predictable.

dotancohen · on Nov 10, 2022

In my worst case scenario doing something similar, I had to go through the login process via Selenium, then download the file (it actually was a CSV for me too) in Python, by grepping the source code (Xpath and CSS selectors were useless) with a regex. There are ways to share cookies between Firefox and Python, and you could probably save a step by running Selenium from Python. Then make an HTTP request with the proper cookies, UA, and accept-* headers.

serpix · on Nov 10, 2022

Nordpool energy prices perhaps?