Hacker Newsnew | past | comments | ask | show | jobs | submit | guessmyname's commentslogin

I’m also as pedantic as you and use “LLM” even talking about these systems but you need to be flexible and accept that “AI” is already in everyone’s head when referring to GPT variants.

> Who is learning this for the first time only now?

A teenager, probably. Not everyone is 100 years old.



Yes. Everyone and their grandma wants to build the ultimate panacea of AI so of course you’ll see a myriad of AI-powered products and services on a daily basis until the tech industry as a whole is done with the topic.


Oh! I thought it was landmines too and was very confused + concerned when I saw dots near where I live.


MSRC (Microsoft Security Response Center) — https://msrc.microsoft.com/

They’ll close a report as “no action” if the issue isn’t related to Microsoft products. That said, in my experience they’ve been a reasonable intermediary for a few incidents I’ve reported involving government websites, especially where Microsoft software was part of the stack in some way.

For example, I’ve reported issues in multiple countries where national ID numbers are sequential. Private companies like insurers, pension funds, and banks use those IDs to look up records, but some of them didn’t verify that the JSON Web Token (JWT) used for the session actually belonged to the person whose national ID was being queried. In practice, that meant an attacker could enumerate IDs and access other citizens’ financial and personal data.

Reporting something like that directly to a government agency can be intimidating, so I reported it to Microsoft instead, since these organizations often use Azure AD B2C for customer authentication. The vulnerability itself wasn’t in Microsoft’s products, but MSRC’s reactive engineers still took ownership of triage and helped route it to the right contacts in those agencies through their existing partnerships.



Why is Google indexing these harmful images in the first place?

Microsoft, Google, Facebook, and other large tech companies have had image recognition models capable of detecting this kind of content at scale for years, long before large language models became popular. There’s really no excuse for hosting or indexing these images as publicly accessible assets when they clearly have the technical ability to identify and exclude explicit content automatically.

Instead of putting the burden on victims to report these images one by one, companies should be proactively preventing this material from appearing in search results at all. If the technology exists, and it clearly does, then the default approach should be prevention, not reactive cleanup.


How is an image model supposed to detect if there was consent to share the picture?

If you're saying they shouldn't index any explicit images, you're talking about something very different from the article.


I think that “one by one” part allows different interpretations of what guessmyname possibly meant.

But I fail to make sense of it either way. Either the nuance of lack of consent is missing, or Google is blamed for not doing what they just did from the very first version.


They probably make money showing pork search results



That sounds haram.


Filthy pork addicts...


Oink oink


I built a CLI years ago for the same purpose.

From what I can tell, your program treats files as duplicates if they share the same normalized filename and the exact same size; it doesn’t compare contents or hashes.

Mine samples bytes at specific positions, hashes those samples, and compares the hashes to produce a similarity score rather than a strict match. This works great for photos, two shots taken in the same second can differ slightly in pixels but still depict the same scene, so they’re considered duplicates. It also normalizes image orientation by rotating based on the brightest corner, so photos in different orientations are compared using the same features.


Yeah I will for sure implement hashing down the line, the current file name/size comparison was good enough for what I need at the minute and an initial release.

Given the time it'd be cool to try single threaded vs parallelism (rayon) on larger datasets and compare the performance.

Nice work on your tool, sounds like you've put a lot of consideration into it.


What’s the title for? Is it about “reading” or is it about “books” ?

A lot of people who say they “read books” really mean they bought one or checked it out from the library, then only dipped into it here and there, maybe a few paragraphs at a time.

I haven’t read a proper book cover to cover in years, probably not since high school. But I do read a lot every single day, either for my job or because I genuinely want to grow professionally. I’ll also read a few chapters from books friends or coworkers recommend, especially the parts that seem most relevant. I just don’t really see why I need to finish the whole thing if I’m already getting what I came for.

My parents, meanwhile, will read the same books over and over again, cover to cover, every year.


Replace "books" with "sustained reading for entertainment" and it's more clear what's meant. Reading a summary or occasional chapter isn't the same thing, nor is reading technical literature.

Note that this isn't an oblique way to frame your preferences as bad. They're simply a different kind of activity, like how writing commit messages is a different activity than writing a novel. There are different activities even within this definition of "reading". I primarily consume new books. My spouse usually re-reads old ones. One of us is better equipped for literary analysis while the other is better equipped for relatable conversations with normal people, but neither is a more "correct" way to read.


I've bookshelves full of obscure nonfiction but only dip into specific chapters when curiosity demands, which is most days. But every day it's a different book. I can't remember when I last read an entire book, it just seems inefficient. Get the info, appreciate the learning, move on.

"Sustained reading for entertainment" sounds like an ordeal rather than delight.


Well yeah, you're using them as reference books. You wouldn't necessarily approach a textbook the same way, since the point there is to guide you through a series of lessons that gradually build on each other. Similarly for narrative works. Jumping into the middle of a nonlinear narrative entirely misses the intentional choices behind the structure, for example.

You can read how you want, of course. The consequence is sometimes simply that you close yourself off from other aspects of the medium. There aren't many aspects bigger than narrative structure, but that's your choice to make.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: