How easy is it to use one of these to do actual automation (not testing)? Say I wanted to login to an SPA, navigate to a page and download a file?
Asking mainly because there’s a tool I have use without an API and if I could script something like that it would make my life a lot easier. Just never tried that type of thing before.
It’s pretty easy unless one or more of the following is true: (1) the site invalidates your session often and uses a strong CAPTCHA on login (you can hook up a CAPTCHA solving service for cheap if your usage frequency is low, if it’s a type supported by the solving services). (2) The site employs advanced automation detection and denies access, in which case you may be screwed. I’ve seen sites defeating puppeteer-extra-plugin-stealth before. (3) the site uses a div soup with no discernible structure and random CSS class names, and the information you want doesn’t have uniquely identifying features, e.g. you want to extract some numbers on the page without labels. In which case you might have to resort to CV in the worst case.
> Say I wanted to login to an SPA, navigate to a page and download a file?
Most testing tools make this kind of thing pretty easy. Cypress sits in the page's JavaScript and has access to a Node back end, so when you're not clicking on stuff you can be firing off requests or doing basically anything.
What you might find is that you don't need any fancy stuff though. Maybe look at the download request in Chrome Dev Tools and see if maybe you can just execute a POST command in your language of choice?
Any of these would work fine for your idea and not be very difficult to do. (Assuming the site doesn't have any anti-bot measures). You'll need to create a cron that runs the automation script or opens a browser with your greasemonkey script installed.
Sometimes you can even do these with pure curl. POST the login form, get back the necessary cookie or token, then request the download URL, if it's easily predictable.
In my worst case scenario doing something similar, I had to go through the login process via Selenium, then download the file (it actually was a CSV for me too) in Python, by grepping the source code (Xpath and CSS selectors were useless) with a regex. There are ways to share cookies between Firefox and Python, and you could probably save a step by running Selenium from Python. Then make an HTTP request with the proper cookies, UA, and accept-* headers.
Asking mainly because there’s a tool I have use without an API and if I could script something like that it would make my life a lot easier. Just never tried that type of thing before.