If it doesn't understand meaning and context, it can't work.
It will over-report on video of baby's first bath, beach volleyball, and high school wrestling; it will under-report on many of the kinkier or more subtle forms of porn.
Humans get this stuff wrong all the time simply by having different cultural contexts; there's no way the 2022 state of the art is up to the challenge.
Fully agree - This seems like a project with rather limited in scope. And that's ok. The authors even acknowledge it.
It's not an absolute judge of SFW'ness, neither does it claim to be. I can see it as a tool to skim a large set of videos for further human review.
Besides, beach volleyball very much is NSFW depending on the situation.
That doesn't always stop corps productizing it, and making it their official NSFW filter. Good enough is good enough, just so long as they can pay less salaries.
Exactly. So many products get sold based on handwaving and being able to check boxes. Anybody releasing open-source software needs to ask themselves, "What uses will this be put to?" Computers can be instruments of freedom or instruments of control.
I don't think people should be wary of "what it could be used for" outside privacy violations or military action. The distinction "used for freedom vs control" is too vague/subjective.
In this case, I think it's best just to hold companies accountable for discrimination that isn't based on some protected attribute. Companies lean to heavily on "right to refuse service (for any reason)" and "we can choose who to do business with".
The fact we distinguish between "things you are allowed to discriminate on" and "thins you aren't" makes for a very convoluted process. Same with at-will employment tbh. We should just flip it from a "you need a reason to thing you were unjustly discriminated against" to "you need a good reason to fire someone".
Subjectivity seems fine to me when it's a decision each person has to make for themselves. If you don't want to think about what you might be helping others to do, you don't have to. But I will.
I mean, "not safe for work" doesn't mean "immoral". Beach volleyball isn't something you should be watching at work, at least not in the kind of open office plan where everybody sees your screen. There's a lot of talk about workplace "lad culture" being hostile to women these days -- possibly enough to mean liability; also you probably shouldn't be hostile to women in that specific way.
It's interesting. I've not tested the model on anything too risque, but again, with the well known Baywatch intro as a frame of reference, wide shots of the whole cast in their swimsuits, are fine. But a close up of any single cast member a swimsuit will invariably trigger the model. Male or female.
In the blog, I suggest this could be the result of an uncultured data set when training the CNN. Or perhaps the dataset was fine, and this is pushing the hard limit of what ResNet50 CNN architecture can do (the off the shelf model I use for this is an extension of ResNet50).
Some of the anomalous results are amusing. One day, I uploaded a video of a female violinist in concert, and the model flagged every close up of her as NSFW! Just those closeups. Wide shots, and closeups of other musicians were absolutely fine.
Again some of that might be down to me (clunky code / very low NSFW threshold). And I suspect the model I used was itself a PoC (https://github.com/emiliantolo/pytorch_nsfw_model). But it does make you wonder how the bigger labs with critical products, like Palantir, handle doubts like this.
Why?
During the Olympics the beach volleyball plays out across our floors at work just like any other event.
In fact they played the olympics on the big screens because it’s easier on the network.
>Beach volleyball isn't something you should be watching at work, at least not in the kind of open office plan where everybody sees your screen.
Watching the olympics at lunchtime is commonplace and accepted in most workplaces.
Being a creep about it is not. It's almost always a pretty clear line between the two.
It only becomes sexualized by the behavior of the people watching - professional athletes competing for their country isn't something intrinsically inappropriate.
This is a great example of how categorizing the content isn't a great solution.
If someone is sexualizing something in the workplace, the problem is the person's behavior, not the thing they are sexualizing.
The situation we find ourselves in came from setting boundaries around content instead of behavior. Our society treats nudity itself as if it were the act of sexualization, assuming that people have no control over their own behavior.
I sometimes have mens cycling on while I work. This summer I'll properly have the womens tour de france on. They are almost as 'scantily' clad as beachvolley players. Though their bellies are covered. Is that NSFW or hostile to women? Is it okay to have the woman 100 meters dash on or is it as inappropriate as beach volley (they do wear almost as 'revealing' clothing)? Is mens sport okay? Maybe marathon and other sports with skinny clothed men is SFW, and not diving muscular men in tight panties is not?
The idea that watching scantily-clad women s.a a Beach volleyball tournament is automatically hostile to women is just ingrained at this point. Maybe one day we'll return to normality.
While I agree in part, at the rate new content is produced and uploaded to the platforms around the globe, could you even fill enough seats with reviewers at an economically-doable rate to keep up?
We've seen plenty of author acknowledged research grade models and datasets get used in production, this is part of how black people get labeled as "monkey" every once in a while
I want to believe it's a matter of adding human labor to the Pool of Knowledge in the AI engine(s); this is, this is not, go do the learning thing.
But when it comes to online video checking, it's another tool in the arsenal; it's the first lines of defense. First check: does the signature match a previously known and confirmed video marked as porn or otherwise unacceptable. Second check: does the more fuzzy AI thing consider it porn with a high probability. Third check, which will be things the AI mark as 'not sure' or incorrectly marks as 'porn', and things humans flag up manually, will be humans checking and judging things.
And of course it'll be flexible, because the definition of porn - or what is 'not acceptable' - will vary by country and culture, and these companies want to be active everywhere.
except Europe, unless they are given tax breaks and data, amirite
Why does everything have to be perfect? Humans get this wrong all the time as you say, and we still do it - right?
Seems like we already acknowledge good enough practices and are okay with them. Lets not demand perfection in the same way we shouldn't assume perfection of ~~AI~~ ML.
It could potentially revolutionize the opposite application, those sites where someone manually finds and catalogs all the time points at which movies contain nudity so people can see their favorite celebs in the buff. False positives are probably worth the automation there.
yeah but this is true for profanity filters as well, and yet useful profanity filters still exist. Is it 100% accurate? No. But it's probably still useful.
- Babies (https://github.com/wingman-jr-addon/wingman_jr/issues/22)
- Beach volleyball (but this definitely has SFW and NSFW variants, based on a somewhat subjective line)
- Athletes in general. The model particularly thought some American football players were NSFW for a long time.
- Swimming
- Yoga - again, most SFW and some NSFW here but it still struggles
- Wrestling was a tough one for sure
- Pokemon
While indeed tough, I've seen definite progress. So it's not just a matter of tech, but also of considering the human element - the state of the art may not be up to the challenge of perfection, but it is definitely up to a point of true utility for some use cases. I'm happy about that.
As a note, it uses an EfficientNet Lite L0 backbone - I'm a bit limited in what type of scanning I can perform in a sufficiently speedy manner.
I also agree on the context for sure - one reason I haven't tried switching to an object detection method (and that I don't rely heavily on truly random crops) is that the focus of the image is highly important for the NSFW-ness in some cases. True, two images may contain the same content ... but one is far worse than the other. The nature of CNN's still has some of this location-invariance baked in, but I don't want to exacerbate it.
One challenge I think the OP may run into here that may also not be immediately obvious is that accuracy on image stills does not translate that well to video. I have basic video support in my addon, and while I knew there would be some differences, I was surprised at how many discrepancies there really are. As two examples:
- Images in video are often blurrier. In true still images, there is a somewhat higher prior involved with amateur NSFW content and blurriness. This can be a source of false positives.
- The opposite of the note above about focus. Taking stills of moving images will have many transitory frames that seem inappropriate on their own because it seems as if they are focusing on something when in reality the camera is just panning - obvious to the human, less so to the model trained on stills.
At any rate, given how well your list of edge cases coincided with failures I've grappled with, I'd be interested to see how well you think my addon stacks up for still images when set to stay in "normal" mode. I'd love to hear any feedback you have via GitHub so I can make it better.
The biggest difference between our two projects, is the effort you've put into training your own model, which I think is amazing.
One of the technical issues that you pointed out, is that a model trained on still images, shouldn't be expected to work on video. While I did not train a custom model for this this project, I'm current working on another DNN model for a completely different purpose, where I think feeding frame deltas into the model, will improve the outcome.
As a hobbyist, I would reckon for porn and the like, analysing frozen frames is probably just enough. For violence however, I would agree with you and say that some effort to encode motion would be essential.
Focussing on NSFW content generally, I would guess, depending on the scale of your project, that you will forever run into 'edge cases' for NSFW images, even before you run into the soft wall of subjectivity.
I agree that the tech is improving all the time, and I think something like this can be made to be truly useful one day. Possibly soon. But it would need a large, active development team, a great deal more compute, and a LOT of data. In much the same way that no home/garage coder can hope to put together a model like GPT3 right now, I would think that a foolproof NSFW classifier would need more resources than you or I have access to at this moment.
But things change all the time.
Thinking about what you're doing, one thing I might suggest, if you have time to develop it, is to add some kind of 'recording' mechanism to your plugin, so that the users themselves can add to your dataset... But you have to wonder how many users will allow that! XD
I'm also wondering if a Firefox extension is the best place for your model? To that end, I would suggest putting the app on a server (which is what I originally wanted to do with my hack) which will give you the opportunity to crowdsource data collection. People might be more willing to volunteer data in that way (in a similar way to how people use https://builtwith.com/).
You're also very welcome to take the UX work I've done on this opensource project (because this hack was ultimately just a UX experiment), and plug your model in. If your model and trained weights are available, I'd like to try and create a branch myself, if I have time.
Also, as hobbyist building knowledge, I hadn't heard of `EfficientNet Lite` before.
I'd been considering Darknet - https://pjreddie.com/darknet/ for embedded stuff until reading your post.
Thanks for the response dynamite-ready. There's a lot in here, but I'll try to comment on a couple items. Some of your suggestions I've actually thought about extensively, so perhaps you'll find the reasoning interesting?
Regarding the current state of tech: I agree the tech still has quite a ways to go. I think one of the most interesting aspects here is how e.g. NSFW.js can get extremely high accuracy - but not necessarily perform better in the real world. I think it speaks in part to the nature of how CNN's work, the nature of the data, and the difficulty of the problem. Still, having seen how incredibly good "AI" has gotten in the last decade, I have quite a bit of hope here.
Regarding putting it on a server: that is indeed a fair question, but my desire is to keep the scanning on the client side for the user. In fact, it was actually the confluence of Firefox's webRequest response filtering (which is why I didn't make a Chrome version) and Tensorflow.js that allowed me to move from dream to reality as I had been waiting prior to that time. I can't afford server infrastructure if the user base grows, and people don't want to route all their pictures to me. So I guess I see the current way it works as a bonus, not a flaw - but it DOES impact performance, certainly.
Regarding data collection with respect to server - yes, this is something I've contemplated (there's a GitHub issue if you're curious). There are, however, two things that I've long mulled over: privacy and dark psychological patterns. Let me explain a bit. On the privacy front - it is not likely legal for a user to share the image data directly due to copyright, so they need to share by URL. This can have many issues when considering e.g. authenticated services, but one big one also is that the URL may have relatively sensitive user-identifying information buried in its path. I can try to be careful here but this absolutely precludes sharing this type of URL data as an open dataset. On the psychological dark patterns front - while I'm fine with folks wanting to submit false positives, I think there's a very real chance some will want to go flag all the images they can find that are false negatives (e.g. porn). I don't think that type of submission is particularly good for their mental health or mine. So, in general, I think user image feedback is something that would be quite powerful but needs a lot of care in how it would be approached.
Regarding the UX - thanks! And you're welcome to try the model as well - I've tried to include enough detail and data to allow others to integrate as they wish: https://github.com/wingman-jr-addon/model/tree/master/sqrxr_... Also, let us know how things go if you try out Darknet.
For whatever reason, animated Pokemon were a particularly tough case - I remember Charizard in particular being a tough offender for false positives. I cannot find a reference to it now, but I distinctly remember that at one point Yahoo's open_nsfw also had issues with Pokemon (although these two networks do not have a common lineage). Why would these cause problems? Not sure - maybe something to do with gradients of near-skin-tones.
If this was for a product, the first thing I would have done, was drop half the tools in the stack for their inefficiency! For example, the GUI shell is using Node Webkit (which I prefer to Electron, but is essentially the same thing). That in itself is quite bad, but it's not the worst approach to building a desktop app, as Microsoft and Slack have already proven.
But the .exe's you mentioned are also quite large.
The code is very transparent though.
I doubt you'd have been the only one to note how clunky this all is!
It's in a subfolder for the GUI project. Do you have any advice on the non standard text in the license? This is a concern I've never had before. Tbh, I'm surprised anyone has read it!
> Please link to <https://raskie.com>, if you intend to redistribute this .exe.
Now that's probably okay so long as it's a nice request ("please"), and not a requirement; AIUI, GPLv3 doesn't (generally) allow you to add conditions (ex. a license file that says "GPLv3 but no commercial users" isn't allowed by the text of GPLv3). But IANAL and I'm sure this is more complex than I think it is. Which brings us to a reasonable useful point: I would suggest never putting anything in a license file other than the exact standard form of the license; adding anything else, at best, makes people have to read it and try to understand what you've done rather than being able to say "oh, it's GPLv3 and I've cached that that's 100% okay for me" and stop thinking about it.
Modifying a single byte in the license file means the license is no longer what it purports to be. Don’t do it.
The license file needs to be in the root of the repo.
The first thing many developers who have been bitten by this in the past do is run all the boilerplate files like licenses through diff. Some probably have it hooked to every git clone request.
Interesting subject and lots of praise to be had if the model can be made accurate, but seems it's not really there yet. I wish you luck in getting there!
NSFW is definitely a sliding (multi-dimensional) scale, and different cultures and work places place themselves differently on it. At a previous workplace (software department in a manufacturing company in Western Europe) a women with only the lower half of her bikini laying on a beach was fine as a desktop background as long as you couldn't see her nipples. At a different company in field service exposed frontal breasts were still SFW. Meanwhile in many other jobs you would get in trouble for far less.
Many years ago, I worked a printing company, and several of our accounts were local porn mags and an adult newspaper that was basically escort ads. The account reps loved those jobs.
I probably wouldn't do this in a shared office, but I'm pretty open about the fact that I often like to have a show in the background while I work. Keeps my mind from wandering too far if I have a momentary pause for a big download or a long compile or something. And a lot of people find moderate background noise helps them focus. I've been through every Star Trek multiple times and am plenty productive at work. But Bay Watch isn't something I would want coworkers to see. Nipples or no nipples, it's trying to sexualize / objectify about as much as they could get away with in prime time TV. I can't go join a meeting about inclusive culture with my female coworkers if they just saw me staring at Pamela Anderson's bouncing tits in the middle of the workday.
I agree with pretty much everything you said. My statement was not a defense of watching Baywatch but rather a comment on watching everything being highly dependent on your workplace situation and culture.
In an office setting the background noise can be highly disruptive, for example.
High jacking the top comment to include my epiphany that "work" is really relative. I used to work at a place that had playboys in the bathroom.
We should replace NSFW with NSFC - Not Safe for Childern.
"Images of naked people" and "sexualised images of naked people" are different things. A six year-old reading Playboy is totally different to a six year-old visiting a nudist beach.
Or just "contains nudity". I always found it strange that people seem more worried about kids seeing naked people than they are about kids being exposed to violence.
Here's a blog post where I explain my motivation for creating the program - https://raskie.com/post/practical-ai-autodetecting-nsfw. There are some hyperparameters to tune, but you're right, it's extremely flaky. Part of that might be down to my work (I personally wanted the app to be prudish enough to flag the Baywatch intro), but the off the shelf model I used for this, does produce some strange results. Kind of wanted to use this app as a convo piece to discuss AI with people who aren't entirely technical, or with technical people who wonder what AI can and can't do.
I would suppose that depends on the workplace and area(and time) where you live. It is definitely "somewhat" sexualised content in my opinion (close up of a womens body stripping down), even though there is no nipple to see. But sure, lots of normal advertisement sells more sex. So is the red line, when a nipple is shown? That would classify biological and medical content as nsfw.
My point is, human morals are very different. Humans fight about, what content is NSFW. A computer model can never be "acurate" in that sense.
Indeed, I worked in a photo lab back in the day when NSFW images were still on film and we had to think about these kinds of rules, but basically anything was permitted for medical purposes - I once scanned slides for a doctor doing a lecture on pelvic inflammatory disease. She did not say what they were when dropping them off and amusedly asked about it when she picked them up. But it was the very opposite of sexy and quite clearly for genuine medical purposes.
This reminds me of a question I've long had about this sort of thing you reminded me of - in the US at least, we seem to draw the line for showing off breasts at the nipple, but only female nipples. Setting aside any merit or logic to that, can AI actually determine female vs male nipples? I assume not through the nipple itself, but the breast it's attached to or the face of the person?
There's no such thing as a perfect model. The acknowledgement is welcome.
Per the Baywatch example - there's clearly a grey area here, and in my opinion the model is working as intended. False positives for NSFW are better than false negatives.
Clearly incorrect by any reasonable definition unless you work at a swimwear company or a few bits of the media.
I mean, you or I might think Alexandra Paul in a one-piece is an image approaching the beauty of classical sculpture, but quite a lot of classical sculpture is NSFW in most of the world, and the brow is a bit lower where Baywatch is concerned.
One (dare I say?) "reasonable" definition of NSFW is nudity, which in most contexts I'm guilty of associating it with. I know it's an acronym (Not Safe For Work) but I never took it that literally, as otherwise most content would be NSFW unless you work in media. I wouldn't consider Seinfeld NSFW, but the companies I work for wouldn't want to see me watching it at work time, even if it's not normally considered NSFW.
Nudity is not even the beginning of NSFW. If you're looking at nudity in most places you're way beyond NSFW and into gross misconduct.
Very many workplaces have a definition of NSFW that is a bit more restrictive than Facebook's content rules.
And rightly so. Go to work, do your work, go home. It's not adult daycare.
Watching video footage of people running around in swimwear is not really appropriate content unless you work in media or swimwear.
And the title sequence to Baywatch is hardly an advert for sporting endeavour; Baywatch is titillation. It would be entirely reasonable for almost anyone, male or female, to complain about that content.
It is far more sensible to define NSFW as "things I shouldn't be looking at just in case someone has an unjustifiable complaint to make about me that they are looking to bolster with additional offences".
I would consider the Baywatch intro NSFW. Perhaps you don't consider it so because it's Baywatch, but if it was a random video of closeups of people in bathing costumes I bet you would. One thing I find really interesting with AI/Machine Learning is how it can cause you to re-evaluate your own biases. I say the AI is right here and you are wrong.
I wonder if it would be possible to not only single scenes in a movie out, but also use imdb data to help detect which actor/actress is in said scenes for categorization. Could be an extremely useful tool.
That's an interesting idea. There are off the shelf tools that can recognise famous people already (Amazon Rekognition, for example). The problem in connection with this hack, is in not wanting to call a third party web service.
Wouldn't it be easier to use facial recognition for that? In fact I recall years ago Google Play Movies already did that, where you could pause the movie and it would tell you the name of the faces on screen at that moment.
Or do you mean if you're looking for NSFW scenes of a particular actor?
Shameless plug: At PixLab, we offer similar model available as a REST API endpoint: https://pixlab.io/cmd?id=nsfw. The NSFW API endpoint which let you detect bloody & adult content. This can help the developer automate things such as filtering users' image uploads. A tutorial on using such API is available on https://dev.to/unqlite_db/filter-image-uploads-according-to-....
The solution can also be deployed on-promises for real-time, local video analysis without leaving the deployment environment: https://pixlab.io/on-premises.
Seems like this might be the use case for OPs project. A overly sensitive filter that allows you to send only positive alerts to an external API for further validation.
Surprised there is no free tier here but I’m not experienced in this space.
It will over-report on video of baby's first bath, beach volleyball, and high school wrestling; it will under-report on many of the kinkier or more subtle forms of porn.
Humans get this stuff wrong all the time simply by having different cultural contexts; there's no way the 2022 state of the art is up to the challenge.