Yeah, it won't do your taxes for you, but it can sure help you do them yourself. Probably won't put you out of your job either, but it might help you accomplish more. Of course, one result of people accomplishing more in less time is that you need fewer people to do the same amount of work - so some jobs could be lost. But it's also possible that for the most part instead, more will be accomplished overall.
People frame that like it's something we gain, efficiency, as if before we were wasting time by thinking for ourselves. I get that they can do certain things better, I'm not sure that delegating to them is free of charge. We're paying something, losing something. Probably learning and fulfillment. We become increasingly dependent on machines to do anything.
Something important happened when we turned the tables around, I don't feel it gets the credit it should. It used to be humans telling machines what to do. Now we're doing the opposite.
And it might even be right and not get you in legal trouble! Not that you'd know (until audit day) unless you went back and did them as a verification though.
Except now, you can hire a competent professional accountant and discover on audit day that they got taken over by private equity, replaced 90% of the professionals doing work with AI and made a lot of money before the consequences become apparent.
Yes, but you're going to pay through the nose for the "wouldn't have to worry about legal trouble at all" (part of what you're paying for with professional services is a degree of protection from their fuckups).
So going back to apples-and-apples comparison, i.e. assuming that "spend a lot of money to get it done for you" is not on the table, I'd trust current SOTA LLM to do a typical person's taxes better than they themselves would.
I pay my accountant 500 USD to file my taxes. I don't consider that "through the nose" relative to my my inflated tech salary.
If a person is making a smaller income their tax situation is probably very simple, and can be handled by automated tools like TurboTax (as the sibling comment suggests).
I don't see a lot of value add from LLMs in this particular context. It's a situation where small mistakes can result in legal trouble or thousands of dollars of losses.
I'm on a financial forum where people often ask tax questions, generally _fairly_ simple questions. An obnoxious recent trend on many forums, including this one, is idiots feeding questions into a magic robot and posting what it says as a response. Now, ChatGPT may be very good at, er, something, I dunno, I am assured that it has _some_ use by the evangelists, but it is not good at tax, and if people follow many of the answers it gives then they are likely to get in trouble.
If a trillion-parameter model can't handle your taxes, that to me says more about the tax code than the AI code.
People who paste undisclosed AI slop in forums deserve their own place in hell, no argument there. But what are some good examples of simple tax questions where current models are dangerously wrong? If it's not a private forum, can you post any links to those questions?
So, a super-basic one I saw recently, in relation to Irish tax. In Ireland, ETFs are taxed differently to normal stocks (most ETFs available here are accumulating, they internally re-invest dividends; this is uncommon for US ETFs for tax reasons). Normal stocks have gains taxed under the capital gains regime (33% on gains when you sell). ETFs are different; they're taxed 40% on gains when you sell, and they are subject to 'deemed disposal'; every 8 years, you are taxed _as if you had sold and re-bought_. The ostensible reason for this is to offset the benefit from untaxed compounding of dividends.
Anyway, the magic robot 'knew' all that. Where it slipped up was in actually _working_ with it. Someone asked for a comparison of taxation on a 20 year investment in individual stocks vs ETFs, assuming re-investment of dividends and the same overall growth rate. The machine happily generated a comparison showing individual stocks doing massively better... On closer inspection, it was comparing growth for 20 years for the individual stocks to growth of 8 years for the ETFs. (It also got the marginal income tax rate wrong.)
But the nonsense it spat out _looked_ authoritative on first glance, and it was a couple of replies before it was pointed out that it was completely wrong. The problem isn't that the machine doesn't know the rules; insofar as it 'knows' anything, it knows the rules. But it certainly can't reliably apply them.
(I'd post a link, but they deleted it after it was pointed out that it was nonsense.)
Interesting, thanks. That doesn't seem like an entirely simple question, but it does demonstrate that the model is still not great at recognizing when it is out of its league and should either hedge its answer, refuse altogether, or delegate to an appropriate external tool.
This failure seems similar to a case that someone brought up earlier ( https://news.ycombinator.com/item?id=43466531 ). While better than expected at computation, the transformer model ultimately overestimates its own ability, running afoul of Dunning-Kruger much like humans tend to.
Replying here due to rate-limiting:
One interesting thing is that when one model fails spectacularly like that, its competitors often do not. If you were to cut/paste the same prompt and feed it to o1-pro, Claude 3.7, and Gemini 2.5, it's possible that they would all get it wrong (after all, I doubt they saw a lot of Irish tax law during training.) But if they do, they will very likely make different errors.
Unfortunately it doesn't sound like that experiment can be run now, but I've run similar tests often enough to tell me that wrong answers or faulty reasoning are more likely model-specific shortcomings rather than technology-specific shortcomings.
That's why I get triggered when people speak authoritatively on here about what AI models "can't do" or "will never be able to do." These people have almost always, almost without exception, been proven dead wrong in the past, but that never seems to bother them.
It's the sort of mistake that it's hard to imagine a human making, is the thing. Many humans might have trouble compounding at all, but the 20 year/8 year confusion just wouldn't happen. And I think it is on the simple side of tax questions (in particular all the _rules_ involved are simple, well-defined, and involve no ambiguity or opinion; you certainly can't say that of all tax rules). Tax gets _complicated_.
This reminds me of the early days of Google, when people who knew how to phrase a query got dramatically better results than those who basically just entered what they were looking for as if asking a human.
And indeed, phrasing your prompts is important here too, but I mean more that by having a bit of an understanding of how it works and how it differs from a human, you can avoid getting sucked in by most of these gaps in its abilities, while benefitting from what it's good at. I would ask it the question about the capital gains rules (and would verify the response probably with a link I'd ask it to provide), but I definitely wouldn't expect it to correctly provide a comparison like that. (I might still ask, but would expect to have to check its work.)