These sites all suffer from the same defect, amazon pa-api pricing is NOT consistent in any region with the carted values an end user will be shown. This is a well known thing if you have worked with that api before and you are essentially just dropping the authors 24 hr amz cookie for them to earn off all other sales. Not to say thats bad, but the value add from a price comparison site like this is minimal to the end user as you will very likely not get that shown price.
Unfortunately, this is what I sometimes experience, and I am not sure there's much that can be done. I am already trying to filter out outliers, but if a price looks "plausible", this filtering doesn't do much.
Sometimes I get prices for items that are unavailable or completely off (perhaps from a 3rd-party seller?).
Is this really it? The prices just seem completely wrong from the links I clicked. I can’t imagine the PA-API is really that far off for every product, unless something has changed drastically from the last time I used the API.
If you are putting something out for free for anyone to see and link and copy, why is LLM training on it a problem? How’s that different from someone archiving it in their RSS reader or it being archived by any number of archive sites?
If you don’t want to give it away openly, publish it as a book or an essay in a paid publication.
The problem is that LLM “summaries” do not cite sources. They furthermore don’t distinguish between making summaries and taking direct quotes; that “summary” is often directly lifting text that someone wrote. LLMs don’t cite in either case. It’s a clear case of plagiarism, but tech companies are being allowed to get away with it.
Publishing in a paid publication is not a solution because tech companies are scraping those too. It’s absolutely criminal. As an individual, I would be in clear violation of the law if I took text someone else wrote (even if that text was in the public domain) and presented it as my own without attribution.
From an academic perspective, LLM summaries also undermine the purpose of having clear and direct attribution for ideas. Citing sources not only makes clear who said what; it also allows the reader to know who is responsible for faulty knowledge. I’ve already seen this in my line of work, where LLMs have significantly boosted incorrect data. The average reader doesn’t know this data is incorrect and in fact can’t verify any of the data because there is no attribution. This could have serious consequences in areas like medicine.
Its important to consider others perspectives, even if inaccurate. As it was expressed to me when I suggested "why not write a blog" to a relative who is into niche bug photos and collecting they didn't want to give their writing and especially photos to be trained on. They have valid points honestly and an accurate framing of what will happen, it will get injested eventually likely. I think they overestimate a tad their works importance overall but still they seemed to have a pretty accurate guage of likely outcomes. Let me flip the question, why should they not be able to choose "not for training uses" even if they put it up publically?
> why should they not be able to choose "not for training uses" even if they put it up publically?
I'm having trouble even parsing that question; "Publically" means that you put yourself out there, no? It sounds to me like that Barbra Streisand thing of building an ostentatious mansion and expecting no one to post photos of it.
I suppose you could try to publish things behind some sort of EULA, but that's expressly not public.
As I understand it, terms of use on a publicly accessible page aren't enforceable. That's why it's legal to e.g. scrape pages of news sites regardless of any terms of use. If it's curlable, it's fair game (but it's fair for the site to try to block my scraping).
This is not an answer to your question, but one issue is that if you write about some niche sort of thing (as you do, on a self-hosted blog) that no one else is really writing about, the LLM will take it as a sole source on the topic and serve up its take almost word for word.
That's clearly plagiarism, but it's also interesting to me as there's really no way the user who's querying their fav. ai chatbot if the answer has truthiness.
I don't see how this is different from the classic citogenesis process; no AI needed. If a novel claim is of sufficient interest, then someone will end up actually doing proper research and debunking of it, probably having fun and getting some internet fame.
Agreed, it's definitely a problem, but I'm just saying that it's the basic problem of "people sometimes say bullshit that other people take at face value". It's not a technical problem. The most relevant approach to analyze this is probably https://en.wikipedia.org/wiki/Truth-default_theory
Are you suggesting that the AI chatbot have this built-in? Because the chances that I, an amateur who is writing about a subject out of passion, have gotten something wrong would approach 1 in most circumstances, and the ask that the person receiving the now recycled information will perform these checks every time they query an AI chatbot would be 0.
These scrapers can bring a small website to its knees. Also, my "contribution" will be drowned in the mass, making me undiscoverable. Further, I can't help fearing a nightmare where someday I'm accused of using AI when I'm only plagiarizing myself.
Fear of AI scrape? I'm just amused at the idea of my words ending up manipulating chatbots to rewrite stuff that I've written, force-feeding it in some distorted form to people silly enough to listen.
We are a major sized user cohort and using social platforms is just not worth the energy is my feeping also. Granted not family tradeoff in ky case, just I don't have free time to waste.
It's sad that even hitting these meteics will reault in little actual growth. Bluesky is devoid of shareable content. Threads is.... just go to threads and use it and I bet you come away feeling like its unusable like I did. Fediverse when I browse it is like venturing into a ghost town. EVER time I see a blog with a linked acct I check it out. Always they are devoid of interactions. Wordpress blogs have real comments (sometimes) with real interactions happening at a decent clip. Thats the real state of things. Numbers go up predictions like this make no sense to me for one big fat reason, where are the interactions? (I want it to work just ftr)
Hasn't been my experience with Lemmy and some closer knit communities on Mastodon. My interests are niche, though.
IMO if you were used to the smaller communities of the pre-social media internet, fediverse stuff feels familiar. You aren't going to get 256k upvotes like you will on Reddit, but you can have some interesting conversations.
Point to publically posted bs links or embeds please then. Not F2F shares. That's what I am talking about. X and FB have those in droves and they are a real growth driver.
Not embeds or links, but tons of skeet screenshots on Reddit, daily. ubiquitous. Not remotely as common as xeets, but the platform is way less common so that checks out.
And the overwhelming majority of Xitter content that breaks out of platform are just screenshots as well and not direct links/embeds.
This is not really a guide to local coding models which is kinda disappointing. Would have been interested in a review of all the cutting edge open weight models in various applications.