Really appreciate that, this was exactly the goal. Detection always felt like a guessing game, so we wanted to flip the model and build something verifiable from the start. We’re already starting conversations with folks close to the major LLM providers, so fingers crossed this helps move things in the right direction. Thanks again for the kind words, means a lot!
They flag real students as cheaters, mislabel original writing as “likely AI,” and rely on statistical guesswork that just isn’t reliable. Even OpenAI shut down their own detection tool, citing low accuracy.
So I built EncypherAI: an open-source tool that embeds verifiable cryptographic metadata into AI-generated text at the moment of creation.
Think of it like a digital fingerprint: invisible, tamper-proof, and verifiable in milliseconds.
- No changes to how the text looks or reads
- Works with OpenAI, Anthropic, local models, or custom pipelines
- Lightweight Python package with CLI
It uses invisible Unicode variation selectors to embed the metadata without altering the visible text.
Metadata can include model ID, timestamp, purpose, and even user or session IDs, all verifiable offline using HMACs.
We're hoping this becomes a baseline standard for AI content attribution, something platforms and LLM providers can adopt to prove when something was generated, instead of guessing. This is already sparking conversations with leading LLM providers building toward responsible AI infrastructure.
Interesting study, but I think this framing is a bit too pessimistic. AI isn’t replacing critical thinking, it's shifting where we need to apply it. We'll be spending less time on monotonous tasks and more time architecting projects, validating AI outputs, and creatively solving problems AI can't handle. The real skill will be knowing how to leverage AI effectively by planning, prompting, debugging, and critically evaluating its results. It’s a new kind of critical thinking, not a decline.
However, the study highlights a potential trap: passively offloading critical thinking tasks to AI could hinder development. For younger generations especially, proactively cultivating strong critical thinking skills is essential to truly harness AI's power. It's about augmenting their abilities, not outsourcing them.
We've had the opposite experience, especially with o3-mini using Deep Research for market research & topic deep-dive tasks. The sources that are pulled have never been 404 for us, and typically have been highly relevant to the search prompt. It's been a huge time-saver. We are just scratching the surface of how good these LLMs will become at research tasks.
Yes, several of the most popular (and even lesser-popular but newly open-sourced models such as Gemma 3 27b) overuse Em dashes. Even when prompting them to not use dashes, they almost can't help themselves and include them occasionally anyways as it must be part of their learned stylometry. It's just not a common symbol to use at all as most people generally use commas for the same purpose. I can't even remember learning about Em dashes in my college english classes.
I submitted an application which I typeset using LaTeX, and some people thought it was AI-generated because of en and em dashes. I have been using these since forever.
Thanks for the detailed explanation of autoregression and its complexities. The distinction between architecture and loss function is crucial, and you're correct that fine-tuning effectively alters the behavior even within a sequential generation framework. Your "An/A" example provides compelling evidence of incentivized short-range planning which is a significant point often overlooked in discussions about LLMs simply predicting the next word.
It’s interesting to consider how architectures fundamentally different from autoregression might address this limitation more directly. While autoregressive models are incentivized towards a limited form of planning, they remain inherently constrained by sequential processing. Text diffusion approaches, for example, operate on a different principle, generating text from noise through iterative refinement, which could potentially allow for broader contextual dependencies to be established concurrently rather than sequentially. Are there specific architectural or training challenges you've identified in moving beyond autoregression that are proving particularly difficult to overcome?
That's a really interesting point about committing to words one by one. It highlights how fundamentally different current LLM inference is from human thought, as you pointed out with the scene description analogy. You're right that it feels odd, like building something brick by brick without seeing the final blueprint. To add to this, most text-based LLMs do currently operate this way. However, there are emerging approaches challenging this model. For instance, Inception Labs recently released "Mercury," a text-diffusion coding model that takes a different approach by generating responses more holistically. It’s interesting to see how these alternative methods address the limitations of sequential generation and could potentially lead to faster inference and better contextual coherence. It'll be fascinating to see how techniques like this evolve!
But as I noted yesterday in a follow-up comment to my own above, the diffusion-based approaches to text response generation still generate tokens one at a time. Just not in strict left-to-right order. So that looks the same; they commit to a token in some position, possibly preceded by gaps, and then calculate more tokens,