> There's something incredibly peaceful about being in the hands of an expert you trust. [...] AI can absolutely shatter that feeling in an uncomfortable way [...] but I don't know if I can fully trust AI either.
This really is key. We know we can't trust the AI, but at the same time we're also more comfortable asking the AI for clarifications or confronting it. Not having a time-bound appointment or paying by the hour helps a lot. But even then, more information doesn't necessarily help!
I once brought my 11-year-old car, a Civic with 150k miles, to multiple garages. I figured I'd play the "second opinion" game to correlate what the garages recommended to decide on what needed to be done...
I got 3 completely unrelated recommendations, including one that I knew was invalid! I felt worse off than when I started!
The solution to uncertain information isn't more information, which the AI can certainly provide, it's better information, and AI cannot currently provide that.
AUAurornis1 天前
I have multiple LLM subscriptions at any given time, plus an array of local models.
When I ask a question outside of my domain of expertise I like to ask all of the LLMs I have access to. I also create separate sessions and ask the same question multiple ways.
It’s revealing to see how many different and contradictory answers I get, most of which are presented confidently.
The last time I ran a medical question through Claude I couldn’t even get consistent answers between sessions.
It’s also scary how easily you can lead each LLM to the answer you have in mind. When I would start asking questions about different options that other LLMs had presented, each session would drift toward that explanation.
MAmarcus_holmes1 天前
In my day job we tried creating a credit assessor tool using LLM as the credit assessor.
It did great, generated a report on the assessed business that was incredibly detailed and plausible.
Then I started running tests and getting into the details, and found that if you ran the same report on the same data, it generated completely different, still very plausible, results. I could run the same source data through the assessment process 10 times and get 10 very different results. We had to can the project and go a different route.
LLMs are designed to produce plausible results, not factual results. We can fix this when using them for software dev by using linters and tests (though we've all had the experience where the LLM invents an API endpoint). I would not trust raw LLM output in any situation where that kind of testing and verification capability isn't present.
SUSuppafly1 天前
What's crazy is that there are ton of businesses building processes around LLMs that haven't done this exercise and fully believe the LLM is giving them accurate data.
YUyubblegum20 小时前
> LLMs are designed to produce plausible results, not factual results.
They are true to their name: Language models. It is precisely the same problem in a language: a grammatically correct sentence is not necessarily true.
XBxbmcuser1 天前
Yup I use llm to write scripts for me to process data I don't ask the llm to process the data themselves. Even when I wrote something for my day trading I used llm write scripts that do all the processing and predict price movement from that the more data is pre processed the more all the llm come up with similar trades.
BObondarchuk20 小时前
It's funny that if the LLMs had all given the same result each time (it sounds like) you would have considered it more valid, even though it might just be giving a single wrong answer more consistently.
ADadamddev11 天前
Linters and tests help of course, but they cannot "fix" the problem since tests cannot prove the absence of bugs.
GRgreenail15 小时前
you can set the "temperature" which is a lever on how stochastic the prediction is. If you are doing your own inference this is clear and easy. If you are consuming tokens this is outsourced.
DIdirkt1 天前
What happened to VERIFYING an answer? Does nobody do that anymore?
When I ask an LLM, I trace the sources, and see if they make sense.
More often than not the sources don't actually say anything about the topic in particular...
> It’s also scary how easily you can lead each LLM to the answer you have in mind.
Exactly. Which is why "treat an LLM like a human expert who can answer your question" doesn't work. It's more like a human bullshitter who makes up convincing looking answers, and tries to please you. If the answers have actually some grounding in the training material, that's useful as some kind of holistic google, but often it's not.
PApalata1 天前
> What happened to VERIFYING an answer? Does nobody do that anymore?
The problem with medical advice is that you may not be competent to verify the answer, right?
I agree that asking 5 LLMs to vote and trusting the answer is totally the wrong approach, of course. But LLMs (and traditional material) can help getting more informed. For instance, instead of going to your doctor with the LLM diagnosis and trying to convince the doctor that the LLM is right, you can try to build your own understanding of the problem and go ask the doctor to explain to you what you understood correctly and what you misunderstood.
If you have some understanding, it's harder for a specialist to bullshit you. But you need your own critical thinking and you need to put effort into actually learning something, blindly trusting and repeating what LLMs say doesn't help.
PRprmph1 天前
I've also noticed the opposite problem: Sometimes the LLM, when asked a detailed question (probably with some lead-in), pushes back in a way that betrays that they fell back to general tropes without really considering the nuances of your specific context.
This happens many times, and I usually have to lead the LLM through a chain of reasoning to prove to it that its objection, through generally sound, do not apply to my specific situation.
Someone not as well versed in the subject matter would think the LLM found a smoking gun (which they love to do), and be led on a wild goose chase.
MAmathieuh1 天前
As you say, often you check up on the LLM's "reasoning" and it doesn't follow at all, or you can easily get it to contradict itself with just as much certainty as it had about its previous convictions.
It is very scary to me that people are entrusting potentially life-altering decisions to these things.
OTotabdeveloper41 天前
> When I ask an LLM, I trace the sources, and see if they make sense.
Professional tip: you can cut out the LLM middleman here and save a lot of time and money.
BAbase69822 小时前
My step mom was having debilitating pain. A year of going to doctors and no one was able to find a cause. I scanned her discharge paper work which had her prescriptions on it and gave it to Claude. It identified a prescription that had that exact side effect. They later confronted her primary care that concurred and took her off it.
A friend of mine's wife recently passed. They were chasing a suspected heart defect for over a year. She had been intermittently fainting. At about the year mark they decided to scope her digestive track. They found bleeding ulcers from cancer that was all over her body. I input her fainting symptoms into Claude and gastro impact was number two suspected after heart issues.
I have a few of other cases it's helped with. I'm not sure it could do worse than my own experience with the medical system. This is doubly true in places that lack any sort of medical care.
NUnums15 小时前
My mom had cancer and she was on regular, suppressive chemotherapy. I put her info into an AI and it correctly noted that her chemotherapy had stopped being effective 2 months prior based on factual lab reports. She was unaware of this. I was able to be her health advocate much more effectively by respectfully asking her oncologist targeted questions. He was already on top of it and was addressing the issue. Our conversation was respectful and, due to my educating myself, went up another level. Ultimately, it was a positive interaction. I was satisfied that he was indeed expert at his craft, and he was satisfied that we were aware of the uncertainty of the new treatment with a risk-based understanding of the viability of success. This was a positive engagement with an expert. In parallel situations around non-health issues, I've found the ego of the expert seems to be the determinative factor in whether or not the interaction goes well.
PApalata1 天前
> It’s also scary how easily you can lead each LLM to the answer you have in mind.
Scary in this context of course, but I find that it is an interesting thought for coding: it suggests that maybe, a developer who knows what they are doing will end up leading the LLM to coding something that make more sense than a developer who doesn't know and just vibe-codes blindly.
Sounds pretty obvious, but I wanted to say it.
NCncruces1 天前
And all it takes is not blindingly accepting the first thing it spews if you suspect there's a better answer (and are in a position to evaluate that better answer).
LOLogicFailsMe20 小时前
As someone who uses Claude Code to summarize published research, you have to ground it in peer-reviewed results or it gets lost. But also, I am grounded with two degrees in the source material. So I am feeding it my views and asking if the published work agrees or disagrees with my opinions and I get fantastic results that way to the point of knowing current clinical trials and treatment regimens than most of the oncologists and which led to a great conversation with the clinical trials team. This doesn't replace people, but it augments existing expertise amazingly well.
But also, I hear so many tales of running out of tokens. I ask Claude Code to build a tool to perform a task. I review the tool and then I let it rip if I'm happy with it. As I understand things, most just ask Claude Code to do the task. That seems a bit fraught.
Anyway, you have to impose constraints IMO and ask the right questions to get the answers you need or yes Claude Code (or any other LLM) will eventually just agree with you.
ANandai18 小时前
Yeah a lot of focus lately on making context windows enormous and putting everything in them. (It should know every detail of your life!) But in my experience LLMs are extremely "prime-able" and also tend to hyperfixate on details.
So when asking difficult questions I tend to remove as much context as possible, rather than adding it. I don't want it to reflect my own ideas or biases back to me, I want an actually fresh perspective.
COComputerGuru18 小时前
Yup. This works until it doesn’t (fairly soon thereafter), both from experimentation and understanding of theory. Here’s an illustrative example: https://old.reddit.com/r/Bard/comments/1l1qxk9/why_does_gemi...
PAparpfish19 小时前
LLMs are well suited to my (some would say annoyingly) curious nature.
when i get an answer, and my first instinct is to ask a ton of follow-ups and "what about"s. i've learned to tamp this down with fellow humans, but with LLMs its great because most of the time the response is "you're right, something doesn't add up... let me try again". i think we eventually converge on to something reasonably true
UNunknown20 小时前
[deleted]
ESEsophagus41 天前
Have you ever let the LLMs “discuss” with each other to see if that would give better answers?
You might end up with the answer from the most persuasive LLM, but you might also end up with better results.
Wonder if there is a paper out there on this.
SCscheme2711 天前
The problem is how do you know whether the answer is just the most persuasive or actually the most accurate one? It's hard to figure this out without domain knowledge.
MNmncharity1 天前
With direct discussion, the same tendency to harmonize towards groupthink applies.
Aside from the statelessness GP mentioned, one can insert anti-conciliatory intermediation. "I saw a random claim go by, but something about it seems not quite right. What am I missing? They said: [...]." Weaponizing the bias, and orchestrating the discourse from the harness.
CAcadamsdotcom1 天前
The problem with trying to write a paper is the results depend on RNG.
ROrockostrich18 小时前
There are 3 kinds of mechanics:
Scammers who do the lowest effort diagnostic and "fix" to get you to pay a smaller amount of money to fix the problem in the short term even though it'll re-present itself a week/month/year later.
Upsellers who will find other things "wrong" with your car and pressure you into paying to fix them because they sound a lot worse than they are.
Good mechanics that will explain what they did to diagnose the issue and recommend different options depending on what the issue is.
Funnily enough, I've found that doctors tend to also fit into these 3 archetypes.
PApalata16 小时前
Yes, and that's a problem. Doctors (or experts in general) hate it when people don't trust them, but the thing with experts is that people have to trust them. And in my life (and a few times just in the last few years), enough doctors have been wrong enough that I cannot just trust them anymore [1]. If it is important, I will ask them to explain to me, and sometimes I will just ask for a second opinion.
I have read about doctors complaining that "with AI, patients now come with their own diagnosis and don't trust us when we say it's bullshit, and it is a problem". I can feel for them, but if they give the feeling that they don't listen to the patients and the patients don't trust them, it's not only the patients' fault, I would say.
[1]: I have more than one examples of my relatives like this: A doctor says "wow that's bad go to the ER", the ER says "nope it's all good, go home", first doctor learns about that and says "WTF you GO TO THE ER, call me and I will insult them on the phone", and finally resulting in a surgery where the doctors say "they were lucky we could operate right now, because in a matter of hours they could have died from this". How in the world can I trust them after one event like this? Happened to me (in some variation) 3 times. Not based on an LLM diagnosis in the first place: based on a doctor's diagnosis.
ROrockostrich14 小时前
Heh, I hear stories like that everyday from my partner who is an ICU nurse. Not as dire, but there are constant inter-department arguments about moving patients because of resource constraints and the ICU could end up completely understaffed/resource constrained if the wrong NP or charge nurse is working. I'm amazed our healthcare system works at all to be honest.
NUnums15 小时前
Maybe a difference here is asking AI for conclusions. When I have it do a buyer's report for me, I ask it for "what questions should I be asking? What are typical things that go wrong with this type of vehicle?" I don't delegate conclusions to the AI but use it to educate myself. Then, I can gather further information to make MY decision .. to buy it or not.
ROrockostrich13 小时前
I don't think so. LLMs tend to over-index on providing results in general whether it's a conclusion or not. When you ask it "What are typical things that go wrong with this type of vehicle?" you're forcing it to make a conclusion about which results to include and it will almost certainly provide results even if those issues aren't as much of a concern compared to typical issues with other vehicles.
For example, I just prompted Kimi-K2.6 with:
> I'm considering buying a used base model 2010 Honda Civic with 80k miles that's been garage kept. What are typical things that go wrong with this type of vehicle?
It listed 10 issues including the engine block cracking (which wasn't even an issue with 2010 Civics). Started a new chat and asked about a 2010 Toyota Camry, another unbelievably reliable car, and it listed 9 similar issues. Started a new chat and asked about a 2011 Jeep Grand Cherokee, a notoriously unreliable vehicle, and it listed the same number of issues.
Sure it's data to make decisions on either way, but it really all comes down to how good your prompts are and whether or not you can think critically about the output, whether or not that output is a conclusion or just data collection.
JDjdblair1 天前
The best mechanic I ever had kept my ‘98 Subaru going past 200k miles. Once during a repair I asked him to do an inspection and tell me if there was anything else I should replace. He told me not to do that, and that any mechanic would always find something, but not necessarily the next thing to break.
He said it better using an expression I hadn’t heard before or since, something like “don’t go looking for goats when your herd is already with you.”
DUdumb12241 天前
Exactly. Old parts of the system will be working if you leave them undisturbed. Mechanics have very good intuitions of this sort of thing.
I read about before there's proper engineering / physics theory about this too, it's like a car as a machine is a linear/smooth physics system with multiple weaknesses. Overtime longtime period of running many places might weaken but it still evolves into a slightly different smooth system, until you introduce a replacement which cause a mis-match of impedance or something like that.
TAtass23 小时前
Maintenance-induced failures are what it’s called with small aircraft.
You’ll do something to prevent a failure (like, replace an old but functional alternator) but cause an oil leak or engine vibrations because you had to remove the propeller to complete the job.
JOjohn-tells-all1 天前
There's a big difference between a _puzzle_ and a _mystery_. In a puzzle, the goal state is known, and as more pieces - data - appears, the goal gets closer. You know how far you are from the goal.
A mystery is worse. With each additional piece of data, the goal gets farther away. Everything is more and more confusing.
(Popularized by Malcom Gladwell)
MRmrlongroots1 天前
Maybe I am missing something but I just find this wrong.
Everything is a puzzle: there is one "Truth" or one diagnosis. You (a smart human) should be able to converge on it by cross-examining your LLMs. By themselves, they have no interest in revealing this, no stakes, which makes them tools only useful at the hands of a capable investigator.
PAParacompact1 天前
> You (a smart human) should be able to converge on it by cross-examining your LLMs.
What makes you think this is fundamentally different from cross-examining ELIZA? There is no guarantee that the LLM will help you converge on anything. Indeed actually calling out an LLM on BS tends to eventually produce an "I don't know and can't help you further" answer (as it should).
SCscheme2711 天前
The problem is that the diagnosis might not be known for a while. There's a few conditions and diseases that require an autopsy for a guaranteed diagnosis and therefore are diagnosis based on symptoms in clinical settings.
010101010101011 天前
> The solution to uncertain information isn't more information, which the AI can certainly provide, it's better information, and AI cannot currently provide that.
I'd argue that AI _can_ currently provide that, but that it can't do it _reliably_, and that to non-experts it's impossible to differentiate, which makes it all the more dangerous.
MAmargorczynski1 天前
Isn't that the case with human "experts"? If you had encounters with doctors, mechanics, etc. you'll know you can get a completely different diagnosis for the same problem which obviously means (in most cases) that the person you thought an expert is wrong.
What is needed are studies that will take a cold look at the actual results because AI seems to be required to be perfect or it is useless. It just needs to be as good as a human for most stuff, but in the long run it will be much better. At least that what extrapolating current reality shows us.
WWwwweston1 天前
We have systems around humans that exist to manage expertise gaps, credibility signals, and accountability. This is part of what makes humans as good as they are, along with specialized training and some measure of meritocratic selection. We license and regulate and account and litigate to make a system that responds and improves.
Some of this might be applicable to LLMs, but some isn’t and much of it would be resisted. This is one reason we’re not likely to get “as good as a human” because at some level we’re not optimizing for the outcomes; we’re optimizing for speed, convenience, some participant’s economics, and underlying beliefs.
EDed_elliott_asc1 天前
The soothing sound of ChatGPT telling us how right and clever we are…how could it possibly hallucinate, certainly not 5.5
NOnonethewiser1 天前
You’ve really honed in on the key issue. This is exactly how keen hackers news commenters approach this.
BRBratmon1 天前
To provide a competing point of anecdata: A Gemini diagnosis saved me $3,000 in unnecessary repairs on my Civic.
FLfluidcruft1 天前
YouTube has saved me at least that much in appliance repairs... and it doesn't even have an AI. It's amazing how valuable access to information can be.
UNunknown1 天前
[deleted]
AHahepp1 天前
I would love to hear more about this
DYdyauspitr1 天前
Saved me $2000 on a koi pond pump and filtration system
DUdumb12241 天前
I tried that AI diagnosis for my 15 old Ford C MAx too, however with a diagnostic problem the issue is unless you've got the ground truth, there's simply no way to verify any tool / human with a metric that you can compare and decide on future tasks.
The AI might be very good at diagnosing all minor issues, but might not lead to a successful repair, whereas human mechanics are extremely good on 80% of major issues that's not the ground truth, but will lead to successful repairs (that might not address the root but simply patch it). So it comes down to manage expectation / outcomes.
SEserial_dev1 天前
These tools can’t reliably fix a 4px misalignment on my icon, better ask them about a medical report… but honestly, I would do the same.
GIGigachad1 天前
Tbh LLMs pulling data out of medical documents in it's training set and searchable online is likely a much easier task than fixing some weird CSS alignment issue.
DDdd8601fn19 小时前
Also most of them can’t actually see what they’re doing. It’s hard for me to get things pixel perfect while blindfolded, too.
THthrowaway203721 小时前
You nerd sniped me with the story about your used car. What happened in the end? I really want to know! There are some fun YouTube channels that basically do the same. Someone who is an expert auto mechanic takes a used car to various repair garages and asks them to recommend a course of action.
NAnamelessone21 小时前
Sounds like a fun watch! What is the name of the channel?
RYryukoposting1 天前
> I got 3 completely unrelated recommendations, including one that I knew was invalid! I felt worse off than when I started!
I almost had a very similar experience with my beater Lexus. It took 2 independent shops and 3 dealers to finally figure out what was causing the ABS to go off randomly at low speeds. Turns out there's some obscure Toyota-specific tool from the late '90s that picked up a proprietary diagnostic code, and the third dealer was the only one that still had that particular piece of equipment.
...and of course, the thing that's broken has been out of production for 20 years and remanufactured ones cost more than the car is worth. I ended up just unplugging the ABS control module.
Point being: once I knew what was wrong, all the seemingly contradictory information from the other 4 shops suddenly fit together. It's just such a weird thing to go wrong that no reasonable tech would ever have considered it.
DAdarkwater23 小时前
> I got 3 completely unrelated recommendations, including one that I knew was invalid! I felt worse off than when I started!
I would frame it differently: you now know which shops are not to be trusted. So, next time you need one, you will take a better decision.
ABabirch23 小时前
There are few things better in this world than having a car shop you can trust. I found one and pray that management doesn't change.
JBjbs78920 小时前
Especially in the medical field where the placebo effect / mindset shapes outcomes.
CLclates19 小时前
> The solution to uncertain information isn't more information, which the AI can certainly provide, it's better information, and AI cannot currently provide that.
Aside from the LLM-ism (it isn't foo, it's bar) - this is a thought terminating cliche. You definitionally don't know if some information is better or not given that you were uncertain about the information in the first case.
"I went to three mechanics and got three different answers" - your takeaway is just "Ah - I clearly need better informed mechanics."
Which is on it's face absurd because if you could clearly judge the ability of the mechanics you wouldn't need their evaluation. You'd just do the evaluation yourself.
WEweatherlite1 天前
> it's better information, and AI cannot currently provide that
It sometimes can, if it straight out never can no one would use it. People use it , lots of them.
ULUltraSane1 天前
> There's something incredibly peaceful about being in the hands of an expert you trust
This is the primary business model of enterprise IT and is why companies pay so much for 4 hour disk replacement.
NOnonethewiser1 天前
You only got 3 opinions on your car? Why not 50? You could have found a more useful signal by getting more information.
I get it - getting an opinion from a mechanic is time consuming. Not true of AI though.
KGkgeist1 天前
A few years ago (before the AI craze), I was misdiagnosed with tuberculosis. I had a chronic cough, and an outsourced radiologist at a clinic found signs of tuberculosis. The findings were sent to the city's tuberculosis hospital, as required by the country's law. The doctors there took the radiologist's conclusion at face value and required me to stay at their hospital for at least 8 months under a strict, prison-like regime. There was no option to say no, because I was considered some kind of biohazard, and by law I had to comply.
Before I was admitted, I quickly found another radiologist, who diagnosed pneumonia instead. I sent his report to the chief doctor at the tuberculosis hospital, and after some deliberation they concluded that the original reading was wrong. Turns out the doctors there can't read scans at all and just believe whatever a radiologist says...
The funny thing is, they had already officially put me on the tuberculosis register and didn't want to admit they had made a mistake. So instead, they simply gave me another paper saying that I had been cured of tuberculosis by them... in 7 days. I'm probably the only person in the country to defeat tuberculosis in a week :)
So if you don't trust the radiologist/doctor, maybe find another doctor if you can afford it? You can compare their conclusions and see if they match. Two unrelated doctors or radiologists saying the same thing is probably about as close to the truth as you're going to get. I'm not sure though whether I should trust AI or humans more. AI can hallucinate, but I've been misdiagnosed by humans so many times too...
AZazan_1 天前
How is it possible? You can't diagnose tuberculosis just based on imaging and tuberculosis hospital has to know that!
KGkgeist1 天前
Yeah, I know! It was strange. They gave me a test, and it came back negative, but they insisted it was negative because I had "latent tuberculosis," which supposedly wasn't detectable by the test yet but was about to become active.
I forgot to mention that, besides getting a second opinion from another radiologist, I also took a more modern test at another private clinic. That test has better detection rates than the one the state clinic used, and it came back negative too.
I have suspicions they had some kind of government quota to keep the hospital staffed with patients in order to receive funding. Or they were just completely incompetent. I pushed back by bringing them another radiologist's report and the results of a better test that I paid for myself, so I guess they decided to back down.
SPspwa422 小时前
You'll find doctors always believe and treat the worst diagnosis any professional has put on a case. That's a legal thing, not a skill issue.
Think about the consequences of mistakes in both directions ...
UNunknown1 天前
[deleted]
SEselfmodruntime11 小时前
Well you can't diagnose pyelonephritis without a urine culture as well, which my GP kindly noted after I already took a full 14 day dosis of antibiotics. The ER I was at before tried to, anyway.
SHshiandow1 天前
Not only that, what is the point confining someone to prevent the spread of a disease about a quarter of the world is already infected with?
I suppose there could be reasons, but I don't know them.
KGkgeist18 小时前
Some countries and jurisdictions still have laws that allow for the involuntary confinement of tuberculosis patients, I guess dating back to the times when tuberculosis was rampant in those countries? And most professionals seem to be okay with the policy:
https://theunion.org/news/is-involuntary-incarceration-of-tb...
>17% said that, as a matter of principle, the involuntary incarceration of TB patients was inappropriate on any grounds.
>Regionally, members from Europe Region had the highest percentage of respondents objecting to the policy as a matter of principle (26.2%) while the North America Region had the lowest (3%).
The emergence of multi-drug resistant tuberculosis in the 1990s is probably one of the reasons:
>Respondents most strongly supported the policy of incarceration for patients known to have multidrug-resistant TB (49.7%)
KEkennywinker16 小时前
Because it’s a nasty disease, and they’d like to prevent its spread. A quarter of the world may have TB, but there are only like 10,000 cases in the US every year
COcomboy1 天前
Incentives.
RYryan_n19 小时前
Yea I find a lot of stories on the web about doctors misdiagnosing things to have oddities like this that don't seem to make sense. It often seems like the author is leaving something out. Not saying OP is lying, but tb is a very, very weird conclusion to come to from just one radiology report...
KGkgeist19 小时前
See my answer in this same subthread. I was perplexed myself as to why I was diagnosed based on just one radiology report. But the moral of my story is that you can always try to obtain a second opinion from another doctor. I'm not saying doctors shouldn't be trusted in general.
IGigortg1 天前
I had a similar experience. My son had pneumonia and was still filling pain after 10 days of antibiotics. Took an X-Ray to three different doctors, and only one got the right diagnosis (pleural effusion). It's really something we should have a central place with top notch professionals looking at it, instead having each doctor to find by themselves.
MNmncharity1 天前
I once worked on a medical hackathon concept for computer-assisted population screening for cervical cancer in a developing nation. Community health workers take photos. The AI would look at the images, and make a call of "clearly negative" vs "clearly positive" vs "needs (scarce) expert review". But taking good photos is hard, so it's also "photos insufficient" and "worker needs additional mentorship on taking photos". Only by computes reducing all three costs - expert workload, exam success, and quality-control/training - might successful deployment be financially and logistically plausible for that nation.
BEbeacon2941 天前
What country / municipality are you in? This is not my understanding of Tuberculosis...
ENengeljohnb21 小时前
A second opinion is a smart move if one has doubts about their diagnosis. Doctors make mistakes, and even though I've worked with countless great doctors, I've never worked a job where there wasn't at least one who was undiscerning, or downright lazy and negligent. It's hard to tell people to trust their doctor when I know there are plenty of doctors out there like this.
But AI as of right now is worse than any bad doctor I've ever worked with.
COCodingJeebus18 小时前
The healthcare affordability crisis is only going to exacerbate the trend of using AI as a replacement for a real doctor. I went to urgent care a few months ago to get tested for COVID and two other flu strains and it came out to almost $500.
Anecdotally, several people in my life who embrace less traditional (and sometimes more conspiratorial) views on modern healthcare tend to be the ones that can't afford it. A confident-sounding chatbot to answer questions day and night about what's going on with your body is very seductive in a world where access to real healthcare is getting further and further out of reach.
DOdoublepg2316 小时前
> I went to urgent care a few months ago to get tested for COVID and two other flu strains and it came out to almost $500.
They have at-home COVID+Flu tests are my local CVS for $35, why go to an urgent care?
ENengeljohnb18 小时前
That's the balance I'm finding it very hard to strike when talking to my family about doctors.
Everyone is either a "all doctors are scams" QAnon type, or they blindly trust everything their doctor says, no matter how fishy, in fear of coming off as one of the former group.
And, to use a phrase we all hate by now, you're absolutely right. When most people have to go into debt to even see a doctor, what can people possibly conclude from that besides "all doctors are out to scam you?"
LAlaybak16 小时前
> AI can hallucinate, but I've been misdiagnosed by humans so many times too...
I've heard this experience from quite a few folks before, but this is my first time hearing about a mandatory 8 months quarantine as a consequence... damn
RPrpastuszak1 天前
Asking for a friend, who is in a somewhat similar predicament — it wasn’t Portugal, was it?
QUQuantumNoodle17 小时前
Your TB stories made me recall my (fond) TB stories. I came from a country that requires tuberculosis vaccines as a school-entry requirement. I have the vaccine and antibodies. Then moved to a country that didn't have this requirement but I had to be tested to make sure I didn't have TB for things like camps, college, etc. The test is something like a vaccine that injects only dead TB cells, if the injection site welts up from anti-bodies then you had to get another whole panel of tests (like X-ray of lungs). Thankfully I've never had it but TB is apparently no joke. Though blasting my chest with radiation is no healthier :p
Anyway, thanks for sharing!
THthemantalope1 天前
Radiologist. I don’t read MR shoulder exams in my day to day practice, but from the few pictures shown , I can’t conclusively disagree with the original report.
These models are generally terrible at reading medical images. The amount of public training data on the internet compared to the number of scans a radiologist reads in training is minuscule. There’s obviously a ton of medical images in general but very few, and even fewer along with a report are available on the internet publicly for download.
There are vision language models coming out of research labs that are excellent in describing and localizing findings. Still at the level of a 1st or 2nd year radiology resident, but as we all say - this is the worst the models will ever be.
DEdeaux1 天前
Absolutely. It's very unfortunate that this post used the worst example possible of using LLMs for medical purposes.
General-purpose LLMs are _fantastic_ at medical diagnosis that do not involve imaging. I am completely convinced that given enough information and time, frontier models already outperform >90% of doctors on initial diagnosis of internal issues and suggesting medical tests to further reject or confirm the most likely theories. To the point where I'm eagerly waiting for the first hospital in the world that's willing to be open and honest about using them for that first step, and then proceeding from there. I'll be on a flight there as soon as one arrives.
At the same time, they're worse than useless at anything involving medical imaging. Asking them to interpret them is worse than trying to interpret them yourself as a layman. And you surely wouldn't interpret them yourself.
THthrowaway203721 小时前
> General-purpose LLMs are _fantastic_ at medical diagnosis that do not involve imaging.
Can you share the reasons that you believe this?
> At the same time, they're worse than useless at anything involving medical imaging.
What is special about medical imaging that makes AI/LLMs specifically bad?
DEdeaux4 小时前
> Can you share the reasons that you believe this?
Firstly, please keep in mind I'm talking about the entire doctor population of the world here. Not sure which particularly bubble of this earth you have experience with, but note how half the word's population lives in India/China/Indonesia/Pakistan/Nigeria/Brazil/Bangladesh/Russia. Now I do believe that it holds the same for e.g. Europe and non-China East-Asia, but still.
How many patients has the world-wide average doctor seen? How long have they been a doctor?
How many have they seen with the particular condition the patient has?
How much time do they spend listening to and reasoning about a patient? The median in the world is likely under 3 minutes.
How many real-world incentives do human doctors have to deal with?
Given infinite time and resources, and zero external incentives, maybe the median human doctor would outperform the LLM at this task. But this is completely detached from the real world.
> What is special about medical imaging that makes AI/LLMs specifically bad?
LLMs: Besides lack of training data as mentioned elsewhere, they're simply not trained for high-fidelity image processing in general. It's not limited to medical imaging. It's a bit like the "How many Rs in strawberry" thing, but worse.
As for "AI" in general, medical image analysis is a very active field. These tend to be purpose-built though, not general-purpose. It seems likely at some point they'll become mainstream, but there's still a way to go.
RIriahi20 小时前
You can see it in just this PDF report.
It's multiple things. It never shows the subscapularis in the way that people actually look the tendon. It hyper fixates on the axial when I find the sagittal much more useful for subscapularis.
Figure 7. There's an arrow pointing "to the acromial undersurface". The arrow is not pointed to that location.
Figure 5. "thin bursal fluid". This is within physiologic variation, but is calling bursitis.
It keeps bringing up irrelevant normal things like the shape of the coracromipal arch, I assume because lots of websites have information about that as a patient focused possible cause for rotator cuff impingement.
I am reminded of the recent Stanford MIRAGE study which found that LLMs will happily hallucinate answers about medical images if the medical images are omitted.
https://arxiv.org/html/2603.21687v2
YFyfontana23 小时前
Yeah, medical computer vision is a (fascinating) field with a lot of ongoing research. SOTA models are highly specialized, and are only getting good enough to be used by actual doctors and patients. Using a general purpose LLM to do this is similar to giving a credit card to Openclaw and telling it to make you rich through the stock market & cryptos.
MAMaro19 小时前
I don't have insider information, but: if one of the AI companies really wants their models to become really good at this and publicly available datasets are scarce, they can probably just buy anonymized X-ray/MRI scans paired with the human doctor's diagnosis, and train on them. I don't know what the legal story is around this, but AI companies have near infinite money, so I'm sure they can buy their way around regulations (eg. by buying them from a less regulated country).
BIbillynomates1 天前
Anecdotally, I've had Claude (Sonnet and Opus latest) consistently misread numbers from screenshots of my macro tracking app. Makes me skeptical of claims about its usefulness for anything requiring accurate image interpretation, let alone MRI analysis.
ODodiroot23 小时前
I can see how your thesis is valid.
Like OP, I also had a shoulder MRI, and asked two AIs for opinion (awaiting a follow up appointment to discuss the results).
They both insinuated much more serious problem than it was (as judged by an orthopaedic doctor).
THthrowaway203721 小时前
No trolling here: Do you feel threatened by the advance of AI/LLMs with respect to your field? I would. I am a computer programmer, and it absolutely feels threatening.
ZOzoul18 小时前
As a programmer, I don’t feel threatened by the technology itself, but I do feel threatened by the second-degree effects such as what the technology does to our field, especially in the wrong hands.
MAmake31 天前
[deleted]
PIpiterrro1 天前
It funny to see the community here expects the human body to be treated like a deterministic function: for input X expect output Y - and that transfers to diagnosis - people expect to receive the same diagnosis from different specialists for the same issue.
Given human body complexity, the diagnosis is a compound output of the experience, knowledge gained throughout the career and diagnosis methods/equipment, the title (like Dr) is a certification imposed by the state so its "safe" to let people practice since they passed "the bar" - but that doesn't imply everyone will be treating the same.
Some specialists update their knowledge monthly, some yearly and some don't do it at all, there are so many variables in play here (geo, politics, even weather haha).
Having said that, choosing the specialist is really important, getting opinions about their practice and their speciality, you can only maximize your chance of getting the right diagnosis, but don't expect to get it right just because somebody is called a Dr.
CHcharles_f1 天前
> It funny to see the community here expects the human body to be treated like a deterministic function
In a community largely made of people whose job it is to produce such functions, I'd say it's to be expected
KIKingMob1 天前
It's funny (and a little depressing), because HN routinely assumes that their world view, and thus, their domain expertise, transfers.
There's no shortage of tech people convinced they deeply understand law, medicine, philosophy, etc. despite never having read much on the topics.
JOjohnwalkr18 小时前
Most of my "favorited" comments on here are by software people with confident yet incorrect statements (usually by way of vastly underestimating complexity) about one of my domains of expertise.
I can't find it but one of the greatest show HN was a blog post about someone who was annoyed by his inconsistent shower temperature control. From memory, he spent a full weekend adjusting it, taking measurements, making graphs, and proposed "next steps" about prototyping better temperature control with microcontrollers and servo and pontificated about developing a product, of course controlled by software. He skipped the part where a bit of research leads you to the already common "thermostatic mixing valve".
BPbpicolo21 小时前
The internet at large is full of armchair experts, it's not just a tech thing.
NOnozzlegear17 小时前
People in tech have particularly big armchairs though.
B8b800h1 天前
I'm not sure what your point is. Are you saying that medicine is inherently fallible and therefore AI is more likely to make a good diagnosis - particularly a cluster of specialist AIs?
MRmrlongroots1 天前
Yeah I think the OP is muddling the point by conflating "physician's version of the diagnosis" with "The Diagnosis".
There is absolutely one "The Diagnosis". Human body is a machine, albeit a very complex one, and all measurement sources have noise. But they are all measuring one reality, and if there is a problem, there should be one explanation that all measurements align with. They can be noisy but can never be conflicting (instrument error notwithstanding).
Physicians' ability to arrive at "The Diagnosis" would vary, but it does not mean one does not exist. I am not sure if characterizing human body as derministic or not is relevant here.
PIpiterrro1 天前
I think „the diagnosis” is over simplification and lots of professionals would disagree that there’s always a single one. As a patient your goal is to eliminate the symptoms of whatever is going on in your system. Often times there could be many reasons for it and only curing one can help you already. The diagnosis is a help tool to choose the roght curation method.
Thus, chasing the „right” diagnosis (whatever that is?) is pointless, as it only the outcome (reducing symptoms, stopping the damage) can tell you if the diagnosis was right, but not the only one right.
SXsxg1 天前
I'm a radiologist but can't really weigh in without seeing the full 3D MRI dataset. Regarding this point:
> They performed shockwave therapy on my shoulder even though a recent clinical practice guideline says clinicians should not use or recommend shockwave therapy for rotator-cuff tendinopathy without calcification; I was told during ultrasound that there was no calcification.
Ultrasound isn't a great way to assess for calcification. It'll find large calcification but easily miss small ones. Plain radiograph would be more helpful, but the MRI may have revealed it as well. Either way, shockwave therapy isn't harmful in the absence of calcification--it's just not helpful.
Edit: when a radiology report says something isn't present, there's always an implicit caveat that the finding isn't present within the context of the modality and images obtained. So an ultrasound report can state there are no calcifications while a plain radiograph can report the presence of calcifications without being inconsistent. Obviously very confusing to patients and people unfamiliar with medical jargon, but clarifying this in reports would make them sound even more qualified, "hedgey", and annoying to read than they already are.
AMambicapter1 天前
> So an ultrasound report can state there are no calcifications while a plain radiograph can report the presence of calcifications without being inconsistent. Obviously very confusing to patients and people unfamiliar with medical jargon
This is being overly nice, I think. Anyone who doesn't understand this is an idiot imo. You would have to assume that every type of diagnosis instrument has infinite clarity and is always correct to be confused in this case.
Reminds me of the Babbage quote where somebody asked him, if I put the wrong question into this computing device, will it still give me the right answer? His response, paraphrased "I can not fathom the logic of the minds which would come up with such a question".
MAMattyMc1 天前
> Anyone who doesn't understand this is an idiot imo
I don’t think that’s true. Avoiding this mistake requires knowing that an ultrasound may not detect calcification. For a patient reading their own report, I don’t think that’s intuitive. I would expect most people to read “no calcifications” and assume that their joint has no calcifications.
FRFr0styMatt881 天前
Exactly. I was about to reply to the comment with “perfect example of not knowing what you don’t know” in terms of self-diagnosis.
My internal model is/was “if the scan wasn’t set up / can’t detect the thing, why would the statement be present at all?”.
That implicit assumption is really subtle.
NKnkrisc1 天前
Most people should have learned at a young age that absence of evidence is not evidence of absence. My 8 year old understands this. After all, you can rarely ever prove something does not exist, only that it is unlikely to exist.
If a report states that X was not found, it does not mean X did not exist, it means it was not found.
What may be lost on the layperson is the nuance and understanding of how thorough or not a particular scan is and how much weight to give the findings and thus the odds that the report is correct.
EQeqmvii1 天前
It’s 2026 and my computer will happily give me the right answer even when i make typos. I love it.
TOtomlockwood1 天前
It's a fatal flaw to think counter-intuitive == wrong.
AMambicapter19 小时前
Not really, it just requires to assume an ultrasound has infinite, perfect resolution when you are faced with a different imaging tech which reports things that didn't appear in the first one. That's just stupid.
GEGeorgelemental1 天前
> You would have to assume that every type of diagnosis instrument has infinite clarity and is always correct to be confused in this case.
There's a difference between 99.9% clarity and 50% clarity. Even if neither exactly equals 100%, it's understandable that a layperson would expect different language between them
BRBrokenCogs1 天前
This comment sounds like it's written by someone who doesn't interact with real people very often
DRDrewADesign1 天前
I’ll bet they’ve got a debilitating case of engineer’s disease, too.
PAParacompact1 天前
"On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
IAIanCal1 天前
Off topic but I have always felt this seemed like his misunderstanding rather than theirs. It’s an odd question, but it’s a very sensible point to make if Babbage has just told you this will solve the problem of mistakes in calculations - humans being involved at the start means human error still plagues the output.
ARareoform1 天前
To quote the LLM-ism, they were making a sharp point. It doesn't matter how precise the calculations are if you're calculating the wrong thing.
I suspect their sarcasm might have escaped Babbage who seems to have been on what we now call "the spectrum."
FRFr0styMatt881 天前
Actually, I would be really pleased if a member of Parliament asked that. That shows a level of deeper consideration.
Isn’t there a saying about there being no stupid questions, only stupid answers or something?
UNunknown1 天前
[deleted]
AKakoboldfrying1 天前
> Anyone who doesn't understand this is an idiot imo
I disagree. A priori it's not obvious to a layperson whether or not a statement that uses unconditional phrasing is intended to be authoritative or conditional on something unspecified, like the resolution of the measuring device. This goes for any sufficiently technical field.
If you got the brakes checked on your car, and the mechanic did <something> and told you there are no issues with them, and you then took your car to a different mechanic who did <something else> and told you there is a problem, you would not be an idiot for thinking that these conclusions contradict one another.
DDdd8601fn18 小时前
It’s funny that the answer to this has increasingly become “yes” over the last few decades.
DRDrewADesign1 天前
I don’t think people are idiots if they don’t understand how a normally intelligent person might not intuit that. I do think they have a seriously underdeveloped theory of mind.
BUBurningFrog1 天前
> Anyone who doesn't understand this is an idiot imo
Even if this is true, so what?
Idiots get sick at least as often as others, and the medical system needs to work as well as it can for that population too.
CRcrypttales1 天前
[deleted]
RYrylando1 天前
As a rad tech, YOU TELL ‘EM DOC! I do like some uses of AI I’ve seen that help patients advocate for themselves or understand basic things like blood panel numbers, but it’s really bad at glazing people and leading them down medical rabbit holes kind of like the OP.
You would think that the AI would point out that calcium is best demonstrated on Radiographs/CT imaging vs Ultrasound or something to that effect.
GAgarciasn1 天前
Semi-related: my father has complications from a motorcycle accident ~25y ago that crushed arteries in his leg coupled with diabetes (insulin / kept sugar at ~100 and his A1C was kept under 6.7 for ~15y). 6w ago had to have his toes removed due to dry gangrene; they eventually (2.5w ago) had to remove his leg below the knee because of the severe blood flow issues below the knee.
Between the toes and the below the knee amputation, there were no less than 15 different doctors and PAs / related personnel who COULD NOT COME TO A CONSENSUS. They would just tell my mother and I (PoA) the details; they refused to come up with a singular plan of action moving forward, leaving it up to us to make 'an informed decision,' something that's IMPOSSIBLE when you have to take up to 15 different opinions into consideration.
What exactly are we supposed to do as patients/family members when medical personnel cannot give reasonable paths forward and instead just throw a bunch of shit over the fence at you and tell you, "you decide what to do from here," regardless of how many VERY DIRECT conversations I had w/the 'care team' on doing better to provide a limited array of options and reasons/likelihood of 'positive outcomes'.
I'm used to dealing with a wide variety of stakeholders/SMEs in decision-making; it's my job to apply my extensive industry experience to present our clients with their options, ranked and reasoned. Doctors, in my experience and most recently with my father, clearly do NOT do that (I assume due to liability; but, no real idea, honestly). So; when dealing with LIFE CHANGING circumstances, what are we supposed to do except rely on what might be able to offer more analysis and option narrowing w/AI?
I certainly don't want to make the job of medical staff more difficult by putting out crazy theories I found on the interwebbernets through my own research, etc; but, when we're having to deal with uncertainty and insanity, what else can we do?
FRFr0styMatt881 天前
Your see this in coding agents too. The only times so far I’ve really seen Opus tie itself into a knot is where I’ve asked it to fix something that I thought was broken but actually wasn’t in the way I had described. It will bias towards your description (I’m guessing because that’s the most recent context it has?).
MRmring336211 天前
i'm sorry, but AIs only "know" about stuff that they have been trained on.
If we would allow AIs to be trained on the petabytes of medical data hidden in hospital systems, they would most likely be much better at diagnosing illnesses and conditions than the average doctor.
(Justifiable) Privacy around medical records so far prevents this.
You think you're cheering for humans, but in fact you are gatekeeping healthcare.
EUEufrat1 天前
I feel like the promise of these models is to help people make more informed decisions. Improving the knowledge economy and general understanding.
The problem is these are just statistical models at the end of the day, so you need to know something to be able to identify the errors. You can’t let them really be autonomous and you also can’t really have people turn into glorified approvers. If the machine is correct 89% of the time, you cannot make people responsible for that 11%. It’ll just cause automation fatigue.
tl;dr: the actual use cases of these LLM (or generative AI in general) is rather limited, so it is offensive how much hay has been given to them eating the entire capitalist system. They are not fit for purpose.
2A2ap1 天前
Agreed. Not a radiologist, but I do a fair bit of MRI research. Experts vs lay people probably have different success with getting the right diangosis out of a frontier model. Subtle changes in prompts can cause different diagnosis[1]
[1] https://www.nature.com/articles/s41591-026-04501-8
HAhaldujai1 天前
Radiologist who does read shoulder MRI would like to add that over half the annotations are wrong, glaring mistakes in anatomy and cardinal direction which begs the question of how is it making these findings without knowing what it’s looking at (here’s a hint, it’s hallucinated based on reports it sees).
REred75prime1 天前
What is "it"? Claude Opus 4.x? ChatGPT-5.x? GLM? DeepSeek? RadFM? Med-PaLM?
FOfoobarian1 天前
Huh, I'm reading and looking up these words you guys are saying and it is starting to look exactly like the symptoms I have been having with my own right shoulder! I feel like a giant gaping rabbit hole just opened up next to my desk.
SXsxg1 天前
We're discussing calcific tendinitis (https://radiopaedia.org/articles/calcific-tendinitis?lang=us). If you think you have it, you can see a doctor and consider shoulder radiographs to start.
ODodiroot23 小时前
Can vouch for it. Ultrasound hasn't found calcification in my shoulder but MRI did. Exactly as you said, because it was very small.
TItiahura1 天前
Why isn’t diagnostic ultrasound used in orthopedics? They inspect fetus hearts and other organs everyday, why not shoulders? Seems much cheaper and faster.
SXsxg1 天前
They do. Ultrasound in orthopedics is a relatively newer field, and there aren't quite as many sonography techs and radiologists experienced in reading these studies, which is likely why you don't see it offered more widely.
Edit: I should mention that ultrasound is basically unusable for evaluating bones. Sound waves can't penetrate bone, and so you end up just seeing a huge black void. That's a huge orthopedics use case that ultrasound just can't benefit. However, ultrasound is fantastic for evaluating muscles, ligaments, tendons, and other superficial soft tissues.
SCscrollop1 天前
We order ultrasounds all the time for shoulders (for like soft tissue issues; for trauma, you'd start with an xray). For other joints, such as the knee, MRIs are a better choice (unless htere has been substantial trauma, in which case xray initially or further), though more expensive, unless you're excluding a Baker's cyst, in which case an ultrasound is fine.
Since MRIs are more expensive, private doctor's might order them instead of an ultrasounds.
(I'm a doctor)
TRtrentor1 天前
Ultrasound was overlooked by US medicine as a first line imaging tool for a long time because it takes real skill and experience to do it right. But it's making a comeback. We've had Chinese, Indian, Australian, and American doctors visit us for one to two month stints to build up their skills.
Given the skill involved, it's probably a liability concern they don't want the exposure over there.
PRprdonahue1 天前
They're used quite a bit for nerve entrapment—both in diagnosing and treating.
BFbflesch1 天前
It's a manual, non-standardized process without a standardized output. Image quality depends both on user skills (how deeply they press the sensor on the skin) and the machine they have. Unlike CT/MRI the examination results cannot be easily shared and compared between patients for studies.
RARA_Fisher1 天前
So Opus might be correct?
ENengeljohnb1 天前
> I'm a radiologist
Any comment that doesn't start with this or similar qulaification should be taken with a grain of salt (yes, including this one).
Medical imaging is one of those things everyone thinks is simple because they don't know what they don't know. I'm a cardiac sonographer, and I have to assume radiologists hear at least as many eye-rolling takes on AI coming for their job as I do.
LOlostlogin1 天前
Ahh, AI is coming for your job.
Full sarcasm, is there one that’s that’s more immune?
BAbacktoyoujim1 天前
Does radiology really make +$700,000.00 a year ?
Someone on reddit claiming to be a radiologist claimed that.
I wonder where the savings will go when those jobs are gone.
EJEji17001 天前
> Does radiology really make +$700,000.00 a year ?
The radiologist I know does not, but they are paid very well (and these numbers are always dumb when you're not sure if they're living in Manhattan vs literally anywhere in Kentucky)
Like most medicine, a large % of the job could be done by any decently talented person willing to follow instructions and shadow for a few months.
Like most medicine, the remaining % is what you're paying for, because it is literally life and death and you can't do things like "pull the logs" or "lets turn it off and take it apart" or "huh i need to put this down and come back later". Even in radiology, because "well lets just do it again to be sure" is often not a viable option.
While there is a problem in how we have inflated the cost of education for medical fields, the insane health insurance issues (US obviously, but it does have some effect globally when the expert radiologist you hire from the US to help with research costs that much), and probably some better ways to approach splitting the work for the entire field, like most professions dealing in life or death, medicine likely will always be paid well.
SAsarchertech1 天前
Physicians salaries account for about 8% of healthcare costs in the US.
RErecursive1 天前
The savings go straight into patients' worse outcomes.
BLblanched1 天前
You know the radiologist you're responding to is a real person? Your last line seems needlessly callous.
THthe_real_cher1 天前
To the consumer! Haha just kidding. We all know where they'll go.
THthrowforfeds1 天前
I've seen a lot of friends and family members almost immediately get offered surgery for shoulder pain. It's just often the default for people that do surgeries for a living.
I also had a pretty painful shoulder issue at one point, where the pain just wasn't subsiding for months. I tried massages and acupuncture as I didn't want to do surgery, but it wasn't helping at all. The thing that fixed it for me was just really focusing on doing pull-ups. I couldn't do them at all when I started, so I began with dead hangs and scapular pull-ups, eventually progressing to regular pull-ups, and then training with a "grease-the-groove" method once I could get a few per set. I stopped the training schedule once I was getting in around 17 pull-ups per set, and now just do 6 sets of about 7-8 pullups 3x per week spaced throughout the day. I'll also do some shoulder mobility drills [1].
Whenever I get lazy about keeping up with them inevitably discomfort will start arising again, but it goes away once I get back to strengthening.
[1] https://www.youtube.com/watch?v=vP8YmmRMz6I
AVavgDev16 小时前
I have a story about this very issue.
I have hip impingement, I played sports and got a labral tear in one of my hips. My hip would get sore and painful after a lot of activity. I've seen a top surgeon in the US. After we just met, he looked at the MRI(yes there was a small labral tear there) and said he can have me on an operating table in 2 weeks.
I was shocked, because the recovery absolutely sucks. So, I got 2nd and 3rd opinion.
3rd opinion was a doc with 20+ years of experience. Asked me if I plan on going pro in any sports, I said no, he said the surgery is not worth it. I did some PT and barely have issues with that hip.
Then, Obama admin created a website to see what gifts($) physicians accept. The 1st surgeon had accepted six figures+ from stryker. The older doc? 0.
There is no money in PT for a surgeon. I would thread lightly with popular and young surgeons.
DGdguest23 小时前
Personally I've always appreciated talking to nurses I know.
The respectable ones know they aren't doctors, but they've seen a lot more recoveries and cases where minimal intervention was required. As some people have said some surgeons like to cut people up.
KTktosobcy1 天前
I had issues with my shoulder for years. Tried PT as well as pull/push-ups but doing that made the pain worse (if I wasn't doing any exercises involving the shoulder it was "fine")…
DRdripdry451 天前
same here. I started doing yoga and rock climbing, and it stretched everything out, and strengthened all the muscles around it. I rarely have an issue now.
ALalistairSH1 天前
On the flip side, when I had rotator cuff issues, the surgeon recommended months of physiotherapy before resorting to the knife. And it worked. And by weight training regularly with a focus on correct shoulder movement, the pain stays away.
It really seems like if you, as a patient, go looking for a quick fix, that’s what you’ll be offered. And if you educate yourself a bit and then go t for the best fix for you, you usually get they.
PRpreg_match1 天前
Physical therapy is very often under recommended in the US under the belief that insurance won’t cover it. They might. And, for anyone reading, you don’t even need a referral for the first 30 days in some states. Physical therapy is for more than just hip replacements and car accident trauma. Like regular therapy, a lot of “normal” people can benefit from it. It’s also not just stretching.
HUhuhtenberg1 天前
What did you have exactly?
With calcifications, physio without the shockwave component definitely doesn't allow going back to the normal gym routine. It's just not enough.
BLblitzar1 天前
> the surgeon recommended months of physiotherapy before resorting to the knife
In my limited experience, "If all you have is a hammer, everything looks like a nail", rings particularly true with medical professionals.
LIlinsomniac1 天前
~2 years ago I used ChatGPT "deep research" to investigate a chronic sinus infection I'd been fighting for ~3 years. After seeing 3 GPs and 3 visits with an ENT, I fed all the observations I had into the AI. In particular, I couldn't get the ENT to explain why he visually saw, via a scope, evidence of allergic reaction in my sinuses, but then later concluded, after an allergy test, that it couldn't be treated via allergy medication. I asked this question a few times and he just never answered.
ChatGPT surfaced a NIH study that concluded that 20% of people have allergic reactions that are isolated to a body location, and that shoulder "skin prick" testing may not reveal. I asked him about that and he said "that's not how allergies work". Full stop. He was unwilling to even look at the study.
He prescribed a CPAP and regular nebulizer treatments. Side story: the CPAP place sent me a SMS message that I couldn't recognize was not a phishing attempt, and when I reached out to inquire who they were they never replied.
So I decided: Let me just try taking a second-gen allergy tablet every day and see what happens.
My sinus infections have gone away. Previously I was getting a major sinus infection at least quarterly. Maybe he's right that allergies don't work that way, but allergy tablets have absolutely solved my problem. Which I'm thankful for because I tried a CPAP for a solid month a few years ago and I just could not get used to it, and was sleeping like crap.
BRbraiamp1 天前
Ok, there's a lot to unpack here and you really had the deck stacked against you. First, lets go from the top, once a test says X, disproving that X is really hard. And that's not unique to the medical profession, it's inherent to all humans and we suck at revisiting or revising our decisions, much less at looking at the possibility to even reverse it.
Which moves us to the next two issues: liability and time. Any moment that you ask someone to revise a decision and specially with the stakes that the medical profession has that nobody has the time nor the inclination to open themselves for a mess.
Now, if you really want to be successful, you have to, before they even have a case with you, and specially before the diagnostic loop closes, to suggest the tests that the study has, since that has the biggest chances of looking at the right thing to look. Just be straight that you walked in with a theory. Doctors notice when they're being steered way faster than they notice when you're actually right. That's how you work with the systems that have a overworked mass trying their best.
LIlinsomniac1 天前
>before they even have a case with you
My problem is that I needed information from 2 ENT visits to feed into ChatGPT to get that study. On the first visit he scoped my sinuses and immediately said "I can see evidence of allergic reaction, see those white bumps?". On the second visit I got an allergy stick test and it came out negative.
Those helped lead to that NIH study. It would have been very hard to have walked in with that study in hand.
THthrowaway203720 小时前
> Let me just try taking a second-gen allergy tablet every day and see what happens.
Stupid question: Why did you wait three years before trying this tactic?
LIlinsomniac18 小时前
Not stupid. Because it wasn't on my radar that it was allergy related until the ENT mentioned allergies.
NOnostrebored1 天前
Daily allergy tablets are associated with huge increases in early onset Alzheimer’s. Glad you found something that works, but might be good to get some of the allergen injections :)
LIlinsomniac1 天前
That seems to be only for first generation, drowsy-making, tablets. Second gen formulas don't cross over the blood/brain barrier.
https://www.myalzteam.com/resources/zyrtec-and-alzheimers-me...
There IS one year-old finding that suddenly stopping Zyrtec after daily 3-month use may lead to nasty itching, and if that happens you can re-start and then taper off. https://www.fda.gov/drugs/drug-safety-communications/fda-req...
CEcenamus1 天前
Where are getting that from?
All I can find is about 1st gen antihistamines (i.e. Benadryl, which I doubt many people take daily, because of the drowsiness).
Even for those, evidence seems to be mixed at best. "Huge increases" seems like hyperbole.
FUfuomag91 天前
Only first gen, 2nd gen does not have this issue anymore or it’s greatly reduced
MEmeindnoch1 天前
Misinformation.
Only first-generation antihistamines with anticholinergic effects are associated with cognitive decline in elderly patients.
TNtnchr1 天前
I believe it depends on which ones, the older gen or certain classes of antihistamines
DAdarkwater1 天前
Wait, what?? Now I'm getting in panic mode because I do take regularly anti-hystaminic tablets/pills (the newer ones, based on ebastine because they don't make me feel sleepy)
RArasmus16101 天前
As a radiologist I have found Claude and ChatGPT to be absolutely terrible at MRI and I would not trust it one bit. It has its merits if you need to research stuff that is more text based, but radiological images is just something that they cannot interpret good enough (yet)
LOlostlogin1 天前
AI makes up for its poor reporting by enhancing the images.
Current Siemens MR software ‘Deep Resolve’ makes up the signal (adding about 50%), then makes up every second pixel, and then, for 3D sequences, makes up every second slice. It’s locking about 59% of the time off each sequences. And it’s really really good.
I’m an MR tech.
RArasmus16101 天前
but those are two different things. Of course something like Deep Resolve is great, as are modern model based reconstruction algorithms for CTs, but here we are talking about LLMs and their ability to interpret medical images, which has nothing to do with what you said.
MImicrogpt1 天前
Sorry? You use AI to hallucinate medical images and that's good?
THthrowawayffffas1 天前
Sure but claude and ChatGPT are not Siemens 'Deep resolve'.
PIpickleRick2431 天前
It's like people who expect ChatGPT to be really good at chess because chess engines with super-human performance have been around for decades, so obviously the latest frontier LLM that took billions to train should find the task trivial.
Actually, I'm curious what ChatGPT 5.5's ELO is- I wouldn't be too surprised if it's 2000+ just from its basic understanding of chess principles from all the content it has digested.
SIsimonreiff1 天前
ChatGPT is completely unplayable at chess on its own. It's unable to keep track of the state of the chess position and therefore will make an illegal move within about 10-12 moves. I would put GPT-5.5's rating at 400, since it can't even make legal moves reliably.
I've tried to pay chess with GPT-5.5, even played it again tonight, allowing it to use `python-chess` to keep track of the state of the position and to get a list of legal moves at each turn, so that it was fair. I also gave it blindfold odds, again to make it a fair fight, but it was not even close. GPT still isn't better than maybe 1000 Elo, maybe 1200 tops. Even with what amounts to being able to see the position and also being unable to make an illegal move, GPT-5.5 hangs material left and right, doesn't make a plan, and got smoked even when I gave it blindfold odds, to the point it's boring for me to play even under those conditions. I'm not sure it's better than whatever the GPT model was that was out about 8 months ago. I also thought it might be somewhat better than a beginner due to reading chess books, but no, it's complete garbage at playing chess, not even average-level skill.
NInicksergeant1 天前
Interestingly LLMs are extremely bad at chess position _images_. I have to imagine if you give it positions in text it'd be pretty great but when I was learning chess and pasting images of positions in for analysis I couldn't believe how wrong it was. I actually thought it was looking at the board in reverse but even when pointing out problems it seemed completely incapable of understanding what it was missing (of course... it doesn't really "understand" anything).
LLMs truly are marvels with text but anything spatial seems to really mess it up, somehow.
I4i4i1 天前
"A 2026 Finnish study published in JAMA Internal Medicine that used magnetic resonance imaging (MRI) scans to look at patients’ shoulders found that 99% of Finnish adults over 40 have at least one rotator cuff abnormality."
https://brainlenses.substack.com/p/abnormality
Incidental Rotator Cuff Abnormalities on Magnetic Resonance Imaging
https://jamanetwork.com/journals/jamainternalmedicine/fullar...
VOVolsk120 小时前
I thought this was an interesting experiment and I repeated it with my own DICOM. Results are terrible.
Claude has complete opposite diagnosis on my ACL, mensci and cartilage.
Claude:
Primary finding: Complete ACL tear with the classic pivot-shift bone bruise signature (posterior lateral femoral condyle + anterior lateral tibial plateau edema) and large hemarthrosis. PCL, MCL, LCL, menisci, and cartilage all intact.
Radiologist:
English translation of findings & conclusion:
Mild joint effusion. No Baker's cyst. Post-ACL reconstruction with minor cyst formation in both the femoral and tibial bone tunnels. The ACL graft shows heterogeneous signal but no complete or recurrent rupture. PCL and collateral ligaments intact. The lateral meniscus appears abnormal, likely from prior partial meniscectomy, with significant cartilage loss (partly Grade 4) at the posterior lateral compartment, osteophyte formation, and reactive bone marrow edema. The medial meniscus shows diffuse signal change from prior repair but no recurrent tear (specifically no recurrent bucket-handle tear). Mild chondropathy with focal cartilage loss on the lateral side of the medial femoral condyle. Cyclops lesion present. No definite loose bodies.
NOnostrebored1 天前
I don’t understand the negative reactions. Medical care as it exists requires the doctor and patient to have their brains switched on. I’ve almost never had a problem where a doctor provides me with a diagnosis and I go about my day. Most of the times that I have, I’ve been confident about the problem and known what I needed. The doctor was a barrier to accessing care.
Dr. GPT is a good brainstorming tool. It helps synthesize information in a way that primary texts don’t. But it does force you to say “that doesn’t make sense”.
I do think that people saying “doctors don’t know the state of the art” have a weaker case. If you think about it in terms of token density during pretraining and how post training datasets are constructed, I think it would take us a very long time to adapt to any fundamental shifts. If we have forgotten how to cure scurvy, how many journal articles would it take before we adapt to a discovery?
STStefanBatory1 天前
> I do think that people saying “doctors don’t know the state of the art” have a weaker case.
This is kinda the case though. In Poland I met only one psychiatrist that knew about DSM-5. In this year. DSM-5 was a thing from 2013.
Doctors are people just as us, not every single of them is good.
NOnostrebored13 小时前
Oh I agree with you, it's just that I don't think LLMs are either. If you think of LLM knowledge, especially in scientific/engineering fields, as a lossy representation of the density of ideas, then you'd expect to see some weird behavior. I'm sure there is some sort of a temporal discounting and people thinking about this, but a naive NLL or Reverse-KL on medical literature would engrain some weird, wrong ideas.
BObonesss1 天前
Many DSM-5 diagnosis come into effect with the ICD-11, ICD-10 doesn't have a good deal of them, and that rollout is still fresh & ongoing.
It is kinda spooky, though, to have freshly minted doctors from a few years back whose school-knowledge will forever be "outdated and archaic" based on standards published before they were in school.
Some good advice I got: treat this as a generation shift, find younger and newer doctors who are familiar with the "modern" standards.
ROroryirvine1 天前
Why would you expect a Polish psychiatrist to understand the differences between different versions of a diagnostic manual used only in the US?
JEjeswin1 天前
I would not trust AI on images. But I once had ChatGPT tell me that an MRI report was very likely to be incorrect based on the text, and offered a different diagnosis. Since it was semi insisting, I visited another doctor who made me do a retest. Long story short, ChatGPT was correct.
Again, this is just one single person's experience. So not worth much.
NOnostrebored1 天前
I think that much of the visual gap is because what to attend to in images is less structured. Anecdotally small qwen finetunes (ie less than 10B) take task accuracy from sub 30% on FMs to 90%. We have sold some of these for outcome based back office tasks.
I think we’ll see a lot of specialized VLMs that provide real value.
FEferfumarma21 小时前
This sounds fascinating. Can you provide any detail regarding the nature of the diagnosis or problem it identified?
ENenergy1231 天前
Anecdote but I gave Gemini Pro an image of an individual with Herpes Zoster which the doctor said was something else. Gemini gave the correct diagnosis which allowed for correct treatment and cure.
I don't understand why doctors don't prompt LLMs before saying wrong things. Is it ego?
I can understand for radiology because you need a specialized convolutional network, but for more knowledge based things...
ALalwa1 天前
“A man with a watch knows the time; a man with two watches is never sure.”
I imagine reasons for what you’re asking might include:
* Prompting an LLM is work, and they’re already overworked just doctoring—every conversation with a computer is a conversation you’re not having with a patient;
* They’re probably right more often than they’re wrong;
* “When you hear hooves, think horses, not zebras”: the 15th case today of strep throat is probably strep throat, regardless of today’s 15th falsely-confident LLM weighing-up;
* They tend to have spent many many years honing a clinical intuition that makes an examination, to some degree, hard to articulate fully to the LLM;
* Liability/overdiagnosis: All this stuff is probabilistic. Inevitably, there’s going to be a time when the LLM throws out something I thought unlikely that turns out to be right, and there will be other times when it’s wrong but now I have to document why. How many false leads do I need to chase per one true differential? Does this really compare favorably to seeking a second opinion from another human doctor?
* Not everything needs to make it into the record. Once it’s in the LLM, it’s discoverable and litigable and hackable and permanent;
* Medicine is practiced in very different ways in different contexts—even in this thread, one radiologist routinely orders ultrasounds for soft tissue shoulder problems, and the other medical-world person replying has never heard of such a thing—presumably both within US health care contexts. Some doctors hand out antibiotics like candy, others are more cautious with respect to resistance. What’s right can depend on the time, the place, the clinical setting—more than just the immediate patient-level facts at hand, in ways that become awkward or unwise to express explicitly.
And of course… who’s to say they don’t do LLM-assisted research, in cases where they think it might be helpful?
FCfc417fc8021 天前
> I don't understand why doctors don't prompt LLMs before saying wrong things. Is it ego?
Either that or laziness I'd imagine. This isn't limited to LLMs. Expert digital assistant systems that you query have existed for a long time. A good physician will double check anything even slightly unexpected against one.
SEsenectus11 天前
mate the other day chatGPT (enterprise) told me that the kernel 7.0.2 was older than 6.69
you cant trust these toys at all. that doesn't make the useless, just untrustworthy.
HPHPsquared1 天前
6.69 hasn't been released yet, to be fair.
RIricardobayes1 天前
That might be doctors new nightmare: people who second guess everything with AI. Previously it was "google your symptoms".
MEmettamage1 天前
Well I live in the nightmare that is the Dutch healthcare system [1]. There are many things that they will fix but they didn’t fix my sleep. A friend fixed my sleep. He is a doctor and prescribed me the right thing. The thing is, he shouldn’t have had to intervene. Without him I could have ended up poor and destitute as my sleep was wrecking me.
And yea, I already did all the standard things. CBT for insomnia helped somewhat. My insurance didn’t fully cover it either, unless I was willing to wait for 8 to 12 months.
And I recently met someone with slow moving metastatic cancer. Thanks to LLMs they will most likely live another 3 to 5 years extra since the Dutch conventional mainline treatment hasn’t been taken yet. But it is German doctors that helped them and Belgian doctors that pointed out in a second opinion that a lot more can be done.
LLMs have a part to play. The false positives are awful, but I have seen an average of 5 out of 10 care when things become too complicated.
Except for trauma treatment. The Dutch healthcare system is amazing once they diagnose classic PTSD.
So it’s definitely not all bad but the trust I had when I was younger has been eroded quite a bit and LLMs can meaningfully step in, in my case at least.
[1] I know there are worse systems. But from what I have heard there are clearly better systems nowadays. It has slipped a lot
SIsimianwords1 天前
Hey what did you do to fix your sleep? Help us all and maybe an llm will index your diagnosis (hi ChatGPT)
JSjs21 天前
The NYT did this profile a while back: "Ben Riley was already writing about the risks of chatbots when his dad started trusting A.I. over his doctor."
The dad was a retired neuroscientist who delayed cancer treatment against medical advice because he was certain he had been misdiagnosed based on his own research that he did with the help of A.I.
https://www.nytimes.com/2026/04/13/well/ai-chatbots-cancer.h...
There's a comment on the article from Ben Riley:
> I am very grateful to Teddy Rosenbluth for sharing my father's story with the world, her kindness and curiousity proved to be restorative in ways I didn't anticipate.
> The two words that everyone used to describe my dad: "intelligent" and "kind," and he was indeed both of those things. The sad irony here is that it was his human intelligence, combined with these strange new tools that purport to be a form of 'artificial' intelligence, that led to his ill-advised decision to forego the treatment he needed for his CLL. A doctor has already commented on this story with the observation that AI "confidently asserts erroneous conclusions," and we simply have no idea how often this is happening or the magnitude of the harm that results.
> Not a day goes by that I don't feel the pang of my father's absence. He might still be here if not for AI. I try not to think about that, but sometimes I can't help myself.
RVrvnx1 天前
The context is very important: decades of a poorly-diagnosed chronic illness had left him deeply distrustful of the medical system.
This is the real root issue.
At 75 years old, he was stubborn. Is that reasonable ? Yes, perfectly. Could he have been right since the beginning ? Certainly. Did he deny evidence ? Yes.
Zero doubt that he was intelligent, everything points toward that direction, but that doesn't make a person less stubborn, because accepting the evidence, is also accepting that you were wrong if you initially postured yourself as adversarial instead of cooperative.
He would have read Wikipedia, scientific papers, etc, even without AI.
He did not want to be convinced. It works both ways:
https://www.foxnews.com/health/woman-says-chatgpt-saved-her-...
or
https://www.today.com/health/mom-chatgpt-diagnosis-pain-rcna...
Nonetheless, someone very smart, just didn't want to move from his position.
BEbensonperry1 天前
i mean, other smart people have famously delayed cancer treatment without needing poor guidance from LLMs! that's not at all new or unique to LLM chatbots
IEieie33661 天前
GPT-4o, which is what that article is most likely about, was an older low param count slop model which was known for abusing emojis and sycophancy. It does not really have any relevance to latest claude frontier models.
Your comment is akin to saying "Karen from facebook who is a human pushed essential oils and ivermectin as a cure to cancer. Now doctor Y is suggesting chemo. Both are humans, humans cannot be trusted!"
NOnosioptar1 天前
I asked a clanker about symptoms I was having. (I'm not an idiot, I was already on my way to hospital, clanker was just to take my mind off symptoms during the drive.)
The clanker said I'd be fine, I just needed some rest and OTC meds.
The medical staff immediately turfed me to surgery because the same set of symptoms I told the clanker were enough to concern them that I needed emergency surgery.
Had I have listened to the clanker, I'd be dead because I did need emergency surgery. (Hell, I almost kicked the bucket because I waited for someone to wake up to give me a lift because.my insurance probably doesnt cover an ambulance ride.)
THthrow3108221 天前
Very curious what made you run to the emergency first thing in the morning that an LLM understood as "just normal, take some OTC meds and wait".
W1w10-11 天前
It's not just the second-guessing. It's the getting in the ballpark but striking out: explaining in detail why they are not correct. A little bit of patient knowledge requires a tremendous amount of doctor time to explain away the ignorance.
It's a 180 for me: While I believe doctors should explain diagnosis or treatment decisions when asked, I don't believe they should be taxed with explaining away alternatives. In my anecdotal 2nd- and 3rd-hand experience, doing that is taking at least a third of their time (on roughly 5% of the patients who think demanding answers will make things better) -- with zero improvement to diagnostic accuracy or treatment effectiveness. Doctors already consult with other doctors, and it makes no sense for them to have to consult with ignorant patients or treat their AI psychosis on top of their disease. It doesn't increase patient autonomy any more than adding a steering wheel for child car seats would help toddlers learn to drive.
MImindslight1 天前
Explaining diagnosis and treatment recommendations decisions inherently involves explaining away the alternatives. In this world where patients are ultimately responsible for our own care, explaining your rationale is a straightforward part of the job - otherwise there is nothing for patients to base their decisions on apart from how the options make them feel. If visits haven't been allotted enough time to get the job done, then that is something you need to take up with health plan bureaucrats rather than taking it out on patients.
BIbilsbie1 天前
It’s funny every profession deals with customers making their own guesses at diagnosis.
I told my mechanic the film flam is broken but he said it was the rim ram. He fixed it and we all went in with our lives.
But doctors insist on this God like status so it’s a “nightmare” when patients try to help themselves.
____MatrixMan__1 天前
I dunno man, it's one thing to have your car still be broken because you were wrong, it's a different thing poison yourself on the basis of having done your own research. The mechanic can laugh at you, it hits a doctor differently.
UNunknown1 天前
[deleted]
NInicman231 天前
you are literally taking sleeping pills ..
WEweatherlite1 天前
Nightmare because they're always right and the A.I second guessing is always wrong, or because they just don't like to be second guessed?
B8b800h1 天前
Well it was a nightmare for my mother's do-nothing GP surgery in the UK. She had several conditions which were being handled completely separately without central coordination, and her health was in serious decline. We went in with a list of 20 AI-generated questions based on her conditions and treatment (which I was able to screen as I have a bio postgrad, but not medical training), including those related to NICE guidelines and procedure, and, frankly the GP bricked it and ordered a load of new interventions. My mother started to get proper treatment.
I wouldn't trust AI to make a diagnosis, but I would absolutely trust it to notice where procedure hasn't been correctly followed, where a treatment is counter-indicated because someone has missed a line on a health record, or where there's a clear potential alternate diagnosis which has been missed for spurious reasons. Also, unfortunately, where doctors aren't doing a decent job - often because they're overworked or underfunded.
TUtuvix1 天前
There’s more than two options here. It was already difficult to deal with self diagnosis for doctors, now we have a machine that outputs recommendations, and does it with confidence whether it’s correct or not.
The same issues that were present with search-engine self diagnosis are still present with LLMs. If you provide Google with an incomplete list of symptoms and can’t interpret the information you find correctly, you will likely get an incorrect diagnosis. The same is true for LLM output.
VIvimda1 天前
Nightmare because users approach LLMs with the false confidence that they're always right, and present LLM outputs as fact to Doctors who have to waste time explaining that it's wrong most of the time. It hurts more than it helps.
MImixologic1 天前
Its a nightmare because it erodes trust. Doctors are not "always right" which is why "always get a second opinion" is codified in culture.
But AI's problem is that its completely full of shit, sometimes, and the people most qualified to evaluate whether its full of shit are the doctors, not the patients, but just like OP's original article, patients are left feeling like their second opinion from AI might be more trustworthy than their doctors opinion.
DRdrw851 天前
Nightmare because the AI is just generating a random text that fits the question.
GRgruntled-worker1 天前
This is obviously going to happen. But sub-par and sloppy doctors are a thing too. Medicine has been using semi-intelligent systems for years that were nevertheless found to improve outcomes.
We need studies that quantify error rates from each source type, then we need to account for the fact that the artificial type will keep improving.
ILilovecake19841 天前
Indeed. I don’t even get what OP thinks they are getting out of this other than doubt.
RAraincole1 天前
People should've googled their symptoms and especially the prescriptions they got. It has always been a good practice. If[0] AI proves to be the new google then people should ask AI too.
[0]: IF.
SAsarchertech1 天前
Do you know how many life threatening illnesses I’ve diagnosed myself with by googling symptoms?
COconsp1 天前
It can be helpful in your understanding the choices made by asking questions and thus in reassurance, but it requires something most people lack: understanding you are likely wrong since you are just collecting information without understanding it.
Pretty much the like most manager these days, so I understand the frustration of the GPs.
SESeriousM1 天前
And say it's true because the AI said so.
GIgib4441 天前
It's so much worse than some Google results: people see LLMs as a trusted friend who never talks back and never questions you, who is excellent at convincingly communicating their bs, reeling you in with "tell me more so I can really lock this down", continuing to fool you
A con artist, a fraud
RVrvnx1 天前
No, this flow is actually very good.
Like any domain, when you have questions or need a solution, you make research first, then you ask a specialist.
If you explain well the symptoms and context you can have proper advices and then decide on the path next:
Case A) It looks benign and advices / information that you collected seem reasonable, then you go your way.
Case B) You need second opinion of a specialist because the subject is too complex, or there are medications that you need approval.
Once you have challenged LLMs, and read about the topics over and over then you genuinely become really good at understanding it (especially if you triangulate over LLMs and ask them to challenge, you start to have genuine questions). No matter if the answer is right or wrong, you have elements. Maybe you missed the point, but you come prepared.
At home you have the time to assess the options, pros and cons of each approaches, the possible questions to ask and then challenge the doctor.
Shared decision-making is an actual evidence-based model of care, and patients who arrive understanding their condition and carrying specific questions tend to get better attention and better outcomes.
Some doctors get annoyed, because they have big ego and choose to be patronizing, but it is exactly their job to answer such questions.
With LLMs, it's quite good, you get nuanced and rather useful answers.
Before LLMs, no matter the topic you searched for, the answer was the same: "you have cancer / an [obviously deadly] rare disease"
The other problem, in many places:
• The doctors are not affordable
• They are too busy for you (< 15 minutes)
• You may need to wait months to get an appointment
• They are not good (country-side is an example, and sometimes even country-level)
+ you can have all of these factors together.
So, you have something deeply bothering you, your only appointment is in 4 months. It would be insane not to take the time to explore different solutions and not to come informed about the topic.
If you express your prompt properly and do not rely on imagery, you can absolutely have top-tier advices.
NEneonstatic1 天前
Agreed. This gets worse in cultures in which Doctors have no habit or haven't been trained that educating the patient is part of the job. Whenever I am back to my birth country, I specifically avoid doctors that are older than mid 30s, because they all have the same, terrible bed manner. They might be good at diagnosing and treating, but they never, ever explain anything, even when asked. Some even have "helpful pamphlets" to hand to the patient - anything to avoid explaining. It seems that in their view their job is not helping the patient, but completing a task - running a scan, performing a procedure, administering medicine etc. The human, that is subject of the task, is invisible.
DEdeaux1 天前
Frustrating post. This gives rightful ammunition to the calls of "LLMs need to be avoided for anything medical". Even though the issue is that they're asking it to interpret images. They need to be avoided for that, but that doesn't say much about their medical accuracy outside of image interpretation.
It would already be a huge benefit to 90% of people worldwide if the very first part of most hospital visits would be outsourced to frontier-level LLMs. Yet this kind of misuse just gives the medical industry a stick to beat that idea into the ground.
Oh well, I'm sure there will be at least a few countries that will indeed embrace frontier models for initial diagnostic medical purposes. Maybe medical tourism destinations. But it's unfortunate for those who can't afford the trip.
RArafterydj1 天前
I feel like I'm going nuts.
There are other commenters saying this is a good practice they've also done for other injuries. You are saying you are an actual radiologist and immediately clock the problems with its advice.
I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful.
This is itself alarming to me, but no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information.
DAdang1 天前
(We detached this subthread from https://news.ycombinator.com/item?id=48709121.)
APappplication1 天前
This is the root of AI psychosis. There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks because their fundamental basis is not evidence, it’s belief.
It is weirdly religious in a way, because if you were to present contrary evidence (e.g. experts in a field weighing in about how plausible sounding responses are bunk), you would only be told you don’t believe enough in the long term potential and capabilities.
Don’t get me wrong, I think we all agree capabilities will eventually improve (and farther-future capabilities could reasonably surpass experts), but really is unclear if the current transformer architectures with their probabilistic/hallucinatory outputs will plateau before they surpass current experts abilities in all promised fields.
CHcheschire1 天前
I was a very early adopter in my circles with AI and I shared it with many people. Strangely, I seem to be the most skeptical about AI in my circles as well, but because I was the gateway for a many folks, they want to come back and share their experiences with me.
And it's so much like listening to someone in a church congregation sharing their experiences with god. Clear and obvious gaps are hand-waved away exactly how you're describing.
OPoperatingthetan1 天前
>This is the root of AI psychosis. There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks because their fundamental basis is not evidence, it’s belief. Treating it as if it is an intelligence is the problem.
The problem is that AI psychosis is fundamentally the belief that an LLM is "thinking" at all. Outputs are just believable word vomit which resembles factual information.
HEhectdev16 小时前
I _believe_ the term "AI Psychosis" is a "thought-terminating cliché" that readily puts you in a position to disarm any criticism to your point of view, which, if you're aware of it or not, it's a belief in and of itself. I'm more willing to bet your can't have discussions because you're trying to have debates.
But on your actual point, I don't think AI needs to "surpass current experts abilities in all promised fields" as a marker of its ability. The immediate gains has already shown some remarkable promise and more LLMs should have had safeguards around mental health up front. If I were to put it on a scale, I would say it is net positive long term with a strong negative spike up front which was somewhat preventable. But who knows, maybe is just have "AI Psychosis" and you can easily dismiss me.
LAlazide1 天前
I don’t think they will improve, there is too much incentive to poison the datasets going forward.
A lot of the models up to this point have been benefitted - like Google did - from essentially ‘pre SEO’ internet.
Now the same tools are being used to generate nigh infinite good sounding bullshit, which poisons the dataset in all sorts of hard to detect ways.
To add insult to injury, the human experts are also not as. Naive, and have many incentives to poison their own input in subtle ways too.
SUsublinear1 天前
Human expertise is also improving all the time and not limited to just connecting dots. When AI seems to surpass a particular human, it's just because the human lacks broader knowledge and fails to investigate further.
An expert already knows they don't know everything. That was never the point. Critical thinking cannot be delegated to AI any more than it can be delegated to a book. There is nothing new going on here.
UNunknown1 天前
[deleted]
PEperching_aix1 天前
> There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks
Do you think it is any more possible to have a proper discussion with someone who preemptively paints the other person as mentally ill? Or someone who preemptively victimizes themselves?
Cause I don't think these are the hallmarks of an honest discussion. See also the entire past decade of political discourse.
Like, consider this:
> It is weirdly religious in a way, because if you were to present contrary evidence (e.g. experts in a field weighing in about how plausible sounding responses are bunk), you would only be told you don’t believe enough in the long term potential and capabilities.
A trivial counter to this is that you can just be an expert at something (e.g. your own work), use the damn thing yourself (professionally), and evaluate the outcomes for yourself. Then maybe remark "LLM good".
Now you come and remark "LLM bad", and point at random "evidence", either of outright other workloads, or even the one at hand: you're asking someone to reject the reality they've already experienced, entirely based on the assumption that they're "merely religious" or "in psychosis". You tell me if that's any more epistemically rigorous and sensible than their story.
TOTomasBM1 天前
Why is it psychosis and not lower standards?
While I can understand being skeptical of non-experts' claims that such answers are enough, I don't understand why you call it "psychosis" and not simply naivety or lack of expertise.
At the same time, the new so-called "models" haven't been pure transformer-based LLMs, but entire systems with tools (with access to the Internet), data storage, and the options to trigger additional instances for different tasks.
QNqnleigh1 天前
Totally agree. I'm a scientist, and like most scientists I have some specialized skills that most of my colleages don't. AI has empowered them to learn and build things that they might have otherwise needed me for. But there have been quite a few cases where it led them very far down a wrong path. This has started happening way more often in the last few months.*
We've known since the beginning that AIs confidently say incorrect things. But now that they can speak confidently about very complex topics, and mostly say correct things, we are letting our guard down and lots of subtle falsehoods are slipping through.
*In one case, I was able to put things back on track because the AI suggested my colleague talk to me; somehow it figured out we were co-workers.
ASaspenmartin1 天前
Right but hallucination rates have been consistently decreasing every model iteration. It's about error rates. As also a fellow scientist, I also will mess something up. Humans have an error rate. Once that error rate is low enough, it doesn't matter that it's > 0, it matters that it's low enough to be trustworthy and useful. Coding agents of 2024-25 had error rates too large; you couldn't meaningfully vibe code anything and needed a ton of oversight. It's still true but FAR less so, and this is after like a year of iteration.
BIbitlad1 天前
>very far down the wrong path.
Absolutely agree. Have seen this first hand
SXsxg1 天前
I see your argument, but it's not exactly news that an expert found a flaw in a popular tool. You could say the same about Wikipedia--experts have tons of issues with it, but Wikipedia still provides value to non-experts. The most likely alternative to Wikipedia for non-experts is simply not trying to learn anything new.
Similarly with LLMs, you can't just write them off entirely because they sometimes provide misleading or incorrect advice. The positive utility maximizing view is to learn when you need to call in an expert. I recently moved in to a new house and have used Claude extensively to figure out basic things (e.g., adjusting the garage door height, how to mount a TV). However, when the HVAC suddenly stopped working, I gave Claude a shot for an hour and tried some non-destructive fixes, but then realized I had to call in an HVAC expert.
OHohyes1 天前
The free alternative to Wikipedia is the library, not “don’t learn anything new ever”.
I find Claude is surprisingly similar to a confident but incorrect coworker, with the benefit that Claude will reevaluate when I correct it.
FRfrereubu1 天前
Slightly OT Nitpick: in regard to experts and Wikipedia, when doing a neuroscience-adjacent MSc, experts in the field actually directed me to Wikipedia as an excellent source for high-level neuroanatomy, including recent research, so I'm not sure your blanket description about experts and Wikipedia is correct.
APApplejinx1 天前
You 100% can write them off entirely and go about your business as you previously had done. Ignoring the errors, it is very debatable whether there are even productivity gains beyond: human programmer or whatever is excited and cranked up to unsustainable degrees of activity and thinking to 'keep up' with what he thinks is an AI doing the work.
I'm seeing this fairly often and when it isn't garbage it's a capable person who has gotten inspired by their 'collaboration' in which the busywork is being done by a machine, but they're doing so much directing and correcting that it's not unlike what would happen if they got heavy into meth and went on a tear.
You absolutely can write them off entirely and decide for yourself what your comfort level of human-killing speed-freakism you want to pursue in your productivity. There's a long history of humans managing astonishing levels of productivity through self-destructive means. This is not even cheaper, once the 'first one's free' wears off: it's just a novel method of getting humans to burn themselves harder in the belief that they have a magic feather.
The ones who're really throwing themselves into the situation are the ones who'll burn out, but who aren't setting themselves up for atrophy and learned helplessness. Anyone who believes the technology lets them be a lazy manager just getting paid, is in for an unpleasant discovery.
SBsbarre1 天前
> Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading
Yes, this is exactly so. AI is able to confidently sound plausible enough to convince laypersons or anyone who isn't very familiar with the subject matter, which is a big part of the mass-appeal "magic" of ChatGPT and other similar tools. It's like having a know-it-all friend (who also makes shit up to bridge their own knowledge gaps).
In many non-advanced non-specialized situations, AI is right enough to be at best useful or at worst not harmful (usually landing in the middle somewhere).
But speaking for myself, in areas where I consider myself quite proficient, I can very easily spot the subtle inconsistencies and naive conclusions that AI responses provide, and I have to guide/steer/correct it a lot to get good results when the subject matter is complex enough.
DAdavid-gpu1 天前
Last week I went to a highly-specialized tertiary clinic about further treatment for a rare medical condition that I was diagnosed and treated for as a child. The two very specialized doctors I met there confirmed a diagnostic mistake that a specialist had made ten years ago. The only reason I pursued a second opinion, ten years later, was because Google Gemini had explained to me that the specialist ten years ago had performed the wrong type of test for my condition.
Do these LLMs make mistakes? They sure do, I see it all the time. But they can also help people make breakthroughs.
And this isn't the only time that Gemini has helped me diagnose long-term health issues, either.
I am not advocating to trust anything they say blindly, but they can be a great place to form new hypotheses and learn the right terms to look for when you are unfamiliar with a subject.
WAwasabi9910111 天前
Can you elaborate on how you use Gemini to diagnose long term health issues? Considering doing the same for myself, but I have no idea what is too much vs too little information, and generally the type of prompt engineering to do.
MEmeowface1 天前
I may be missing something, but I think it's unclear that the parent poster here is necessarily actually contradicting anything the AI said. It may depend on the exact information the OP wrote to Claude and GPT. The full transcripts would be needed. (Though there is definitely a separate point that a doctor would generally better know all the right questions to ask, while current LLMs may be making certain assumptions.)
The LLM may have, from its "perspective", implicitly thought the OP was telling it that he had strong reason to believe there was no calcification and was not considering the bigger picture of possibly receiving an incomplete/poor assessment from the medical staff. In fact, the issue here may be the LLM overly trusting doctors vs. trusting its own expertise.
NLnlawalker1 天前
> no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information
"Be wowed by the convenience and speed", or merely "take advantage of the mere availability"? What most people find to be damning about expert advice is that they simply can't get it anywhere, at any cost that they can afford.
WHwhatever11 天前
So if you want to do a surgery but you don’t see any surgeons around you ask a grocery butcher to have his way?
HIhighfrequency1 天前
Seems natural enough. There will always be complexity and nuance that is missed by an AI model or person - the world is just super detailed. The more expertise you have the more you will be aware of that nuance. That doesn't mean the model or person is not useful as a starting point.
SCscosman1 天前
I dunno. I know a lot of software engineering experts. AI isn't always right, but neither are the people, and it's getting better and better.
Software is one domain where it excels because of structured training data and simulation environments, so I'm well aware it's better here than other areas.
Still there's somewhere balanced between saying every time it's "insufficient or incomplete or outright misleading" and "just trust AI". AI's a useful source of information/reasoning/research, but know you need to validate it's answers for important decisions.
AUAurornis1 天前
> I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful.
I always recommend people try asking LLMs a lot of questions on something they know first. Programmers should start by asking LLMs to work on a codebase they’re familiar with first.
You’re overstating the problem, though. Even for an expert the LLM will get a lot of things right and can be helpful under a watchful eye.
The real problem is knowing how to identify when it’s on the right track and when you need to correct it, because both cases are presented with the same tone and confidence.
An expert can better identify when the LLM output doesn’t sound plausible. Someone unfamiliar with the topic will think everything it says looks correct.
MAmattgreenrocks1 天前
You're not. This site was also bullish on using LLMs as therapists, which defeats the very point of them, and reflects a lack of knowledge on what exactly therapists do for people.
More on topic: if the article's author arrived at a definitively negative result would this have shown up on HN?
KRkryogen1c1 天前
On the flip side of this problem, novel best practices lag the medical standard of care, other human failures like corruption and competing priorities notwithstanding.
For example, we had to advocate for certain practices during the birth of our first child that became routine during our second several years later.
So, neither side is guaranteed correct, doctor or citizen researcher (which did not include LLMs in my case, for the record). The truest answer is also the most useless one, applicable to all fields: it depends.
The real question is: if you embrace being a layman, whom do you trust more: LLMs/the internet or experts, like doctors? I think the answer is pretty clearly experts.
RArapatel01 天前
You shouldn’t expect frontier models to work on medical imaging. There is much more that goes into building a medical imaging product. First and foremost is data. Medical imaging datasets are not prevalent one the public internet at the scale necessary to have good performance on medical imaging tasks especially MRI. Also the labels are super noisy.
This is completely different than asking for general medical reasoning which is more derived from papers, public standards and textbooks.
Text exists at the right scale but images don’t.
JSjstummbillig1 天前
No, not anytime someone is an actual expert at anything, AI output appears insufficient. That is why experts in various fields use AI.
Then to say "Aha, but all of that is AI psychosis" makes obviously no sense: Why would we trust experts when they offer critique but not when they say "this is helpful"?
Overall: People are not insane. AI makes mistakes and, often, fails completely. AI also helps them do things better, quicker, increasingly so. The jaggedness of AI is confusing and real.
TOtorben-friis1 天前
How many times have you seen an expert go "yeah these results are good consistently enough for a non expert to trust them without expert assistance"?
There is a huge difference between having a chance of a good result, which can be useful for experts able to filter out the bullshit, and consistent success. I would generate code as a helper, I would never allow a guy from marketing to merge unreviewed AI code.
LAlazide1 天前
I’ve never seen an expert use AI in their field beyond the initial ‘oh interesting’ stage.
BAbaxtr1 天前
This is a serious issue for young people I think.
I have seen outputs that look good but the actual content is bad. If you’re inexperienced in a field you can’t see it because AI makes anything look right.
I have gotten very good results with AI but you can’t take the first answer at face value. You need to be suspicious and challenging until you tweak out the right answer over time.
JEje421 天前
The question is how far is AI off compared to the professional that we have access to.
World best experts are not accessible to most of us. :(
UNunknown1 天前
[deleted]
XIxivzgrev1 天前
Well that's part of the problem. AI is not accountable - if you take its advice and hurt yourself, who is responsible?
A real doctor is accountable.
They might both "know" a lot of things but implicitly the party who is accountable is going to be more trustworthy.
And I don't see that going away until AI companies must be licensed for application x and can lose their license / be sued if engaging in malpractice.
SEserf1 天前
>I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading
media is awash at the moment with experts chiming in to support AI, saying their fields are being revolutionized, etc.
it seems unsurprising to me that the laymen opinion would follow the loudest media trumpets.
VOvoidUpdate21 小时前
How do LLMs get information from images? Do they have to run essentially the opposite of an image generation model, taking an image and converting it into a description? I'm just concerned that the description wouldn't be able to encapsulate the information needed to differentiate exactly what is wrong with a shoulder. The image -> text model would need to know what it should actually report back to the LLM about the image, so that it doesn't just say "this is an MRI of a shoulder" or similar. It would be like a layperson describing a bridge, and asking an engineer if the bridge is safe based on that description
MRMr-Frog20 小时前
older vision LLMs chopped up images into patches which were projected into the same embedding token space as words. Newer ones use an encoder to more efficiently project an image into token space. Then it runs through the same attention layers as the text component.
WEweird-eye-issue21 小时前
No, it does not work like that, it actually can process the image itself there is not an intermediate image to text step
VOvoidUpdate21 小时前
How does a Large Language Model process images then?
MImisja1111 天前
As someone who has had shoulder issues for the last 25 years or so, including partial tendon tears, I can tell you that even if your tendon would have been damaged, the treatment would have been strange.
With moderately damaged tendons, you want:
1. stop any inflammation, by taking NSAIDs for a few days
2. detect and correct any behavioral patterns that could have caused the presumed overwear of the tendon
2. start physiotherapy to strengthen those muscles that can take over the load from the damaged tendon
These are not quick fixes, because quick fixes don't exist here. Stuff like shockwave treatment, massages etc will only lessen the problems for a few hours at most, after which they will come back.
EQeqvinox1 天前
> My hope is that in a couple of model generations, we'll trust AI to review MRIs the way we trust it to proofread our emails.
https://www.nature.com/articles/d41586-026-01947-1
I've started asking my doctors whether they use AI, and if they say yes look for another one.
RMrmbyrro1 天前
That study seems to be confounding factors and rushing to a questionable conclusion.
A very plausible explanation for the adenoma detection rate to have gone down is simply that its prevalence went down among the population in the second three-month period.
This was not a randomized trial. Concluding that "AI usage degrades physicians' skills" is questionable at the very least.
EQeqvinox1 天前
There's a whole bunch of other studies on this topic, as well as metastudies, and from what I can tell the problem is real.
https://www.sciencedirect.com/science/article/pii/S245195882... (+ cf. its references)
THthrowatdem123111 天前
I don’t even trust AI to proofread my emails.
DAdazhbog1 天前
You should always be getting a second or third opinion from real doctors for matters like surgeries, radiology, etc.
One doctor diagnosis + LLM is gonna throw you off. You need more datapoints.
CHChrisMarshallNY1 天前
In the US, this is standard advice. I note that the OP is in Germany. Maybe they do things differently, there.
AUAurornis1 天前
The OP describes getting injected with a homeopathic botanical formulation and receiving another type of therapy that wasn’t indicated for his condition.
I wonder if this person was going to a traditional doctor or if they were visiting some type of specialty clinic as a second opinion. For most conditions you can find specialty clinics that will prescribe and administer (and bill for) a lot of non-indicated treatments, but some patients like being in the care of doctors who take action and do things after being recommended more conservative treatments by primary doctors.
TStsss1 天前
In Germany we get zero-th opinion because you can't even get an appointment within the next 8 months.
评论
20 条顶层评论请先登录 h4cker 账号,然后连接 Hacker News 后发表评论。
> There's something incredibly peaceful about being in the hands of an expert you trust. [...] AI can absolutely shatter that feeling in an uncomfortable way [...] but I don't know if I can fully trust AI either. This really is key. We know we can't trust the AI, but at the same time we're also more comfortable asking the AI for clarifications or confronting it. Not having a time-bound appointment or paying by the hour helps a lot. But even then, more information doesn't necessarily help! I once brought my 11-year-old car, a Civic with 150k miles, to multiple garages. I figured I'd play the "second opinion" game to correlate what the garages recommended to decide on what needed to be done... I got 3 completely unrelated recommendations, including one that I knew was invalid! I felt worse off than when I started! The solution to uncertain information isn't more information, which the AI can certainly provide, it's better information, and AI cannot currently provide that.
I have multiple LLM subscriptions at any given time, plus an array of local models. When I ask a question outside of my domain of expertise I like to ask all of the LLMs I have access to. I also create separate sessions and ask the same question multiple ways. It’s revealing to see how many different and contradictory answers I get, most of which are presented confidently. The last time I ran a medical question through Claude I couldn’t even get consistent answers between sessions. It’s also scary how easily you can lead each LLM to the answer you have in mind. When I would start asking questions about different options that other LLMs had presented, each session would drift toward that explanation.
In my day job we tried creating a credit assessor tool using LLM as the credit assessor. It did great, generated a report on the assessed business that was incredibly detailed and plausible. Then I started running tests and getting into the details, and found that if you ran the same report on the same data, it generated completely different, still very plausible, results. I could run the same source data through the assessment process 10 times and get 10 very different results. We had to can the project and go a different route. LLMs are designed to produce plausible results, not factual results. We can fix this when using them for software dev by using linters and tests (though we've all had the experience where the LLM invents an API endpoint). I would not trust raw LLM output in any situation where that kind of testing and verification capability isn't present.
What's crazy is that there are ton of businesses building processes around LLMs that haven't done this exercise and fully believe the LLM is giving them accurate data.
> LLMs are designed to produce plausible results, not factual results. They are true to their name: Language models. It is precisely the same problem in a language: a grammatically correct sentence is not necessarily true.
Yup I use llm to write scripts for me to process data I don't ask the llm to process the data themselves. Even when I wrote something for my day trading I used llm write scripts that do all the processing and predict price movement from that the more data is pre processed the more all the llm come up with similar trades.
It's funny that if the LLMs had all given the same result each time (it sounds like) you would have considered it more valid, even though it might just be giving a single wrong answer more consistently.
Linters and tests help of course, but they cannot "fix" the problem since tests cannot prove the absence of bugs.
you can set the "temperature" which is a lever on how stochastic the prediction is. If you are doing your own inference this is clear and easy. If you are consuming tokens this is outsourced.
What happened to VERIFYING an answer? Does nobody do that anymore? When I ask an LLM, I trace the sources, and see if they make sense. More often than not the sources don't actually say anything about the topic in particular... > It’s also scary how easily you can lead each LLM to the answer you have in mind. Exactly. Which is why "treat an LLM like a human expert who can answer your question" doesn't work. It's more like a human bullshitter who makes up convincing looking answers, and tries to please you. If the answers have actually some grounding in the training material, that's useful as some kind of holistic google, but often it's not.
> What happened to VERIFYING an answer? Does nobody do that anymore? The problem with medical advice is that you may not be competent to verify the answer, right? I agree that asking 5 LLMs to vote and trusting the answer is totally the wrong approach, of course. But LLMs (and traditional material) can help getting more informed. For instance, instead of going to your doctor with the LLM diagnosis and trying to convince the doctor that the LLM is right, you can try to build your own understanding of the problem and go ask the doctor to explain to you what you understood correctly and what you misunderstood. If you have some understanding, it's harder for a specialist to bullshit you. But you need your own critical thinking and you need to put effort into actually learning something, blindly trusting and repeating what LLMs say doesn't help.
I've also noticed the opposite problem: Sometimes the LLM, when asked a detailed question (probably with some lead-in), pushes back in a way that betrays that they fell back to general tropes without really considering the nuances of your specific context. This happens many times, and I usually have to lead the LLM through a chain of reasoning to prove to it that its objection, through generally sound, do not apply to my specific situation. Someone not as well versed in the subject matter would think the LLM found a smoking gun (which they love to do), and be led on a wild goose chase.
As you say, often you check up on the LLM's "reasoning" and it doesn't follow at all, or you can easily get it to contradict itself with just as much certainty as it had about its previous convictions. It is very scary to me that people are entrusting potentially life-altering decisions to these things.
> When I ask an LLM, I trace the sources, and see if they make sense. Professional tip: you can cut out the LLM middleman here and save a lot of time and money.
My step mom was having debilitating pain. A year of going to doctors and no one was able to find a cause. I scanned her discharge paper work which had her prescriptions on it and gave it to Claude. It identified a prescription that had that exact side effect. They later confronted her primary care that concurred and took her off it. A friend of mine's wife recently passed. They were chasing a suspected heart defect for over a year. She had been intermittently fainting. At about the year mark they decided to scope her digestive track. They found bleeding ulcers from cancer that was all over her body. I input her fainting symptoms into Claude and gastro impact was number two suspected after heart issues. I have a few of other cases it's helped with. I'm not sure it could do worse than my own experience with the medical system. This is doubly true in places that lack any sort of medical care.
My mom had cancer and she was on regular, suppressive chemotherapy. I put her info into an AI and it correctly noted that her chemotherapy had stopped being effective 2 months prior based on factual lab reports. She was unaware of this. I was able to be her health advocate much more effectively by respectfully asking her oncologist targeted questions. He was already on top of it and was addressing the issue. Our conversation was respectful and, due to my educating myself, went up another level. Ultimately, it was a positive interaction. I was satisfied that he was indeed expert at his craft, and he was satisfied that we were aware of the uncertainty of the new treatment with a risk-based understanding of the viability of success. This was a positive engagement with an expert. In parallel situations around non-health issues, I've found the ego of the expert seems to be the determinative factor in whether or not the interaction goes well.
> It’s also scary how easily you can lead each LLM to the answer you have in mind. Scary in this context of course, but I find that it is an interesting thought for coding: it suggests that maybe, a developer who knows what they are doing will end up leading the LLM to coding something that make more sense than a developer who doesn't know and just vibe-codes blindly. Sounds pretty obvious, but I wanted to say it.
And all it takes is not blindingly accepting the first thing it spews if you suspect there's a better answer (and are in a position to evaluate that better answer).
As someone who uses Claude Code to summarize published research, you have to ground it in peer-reviewed results or it gets lost. But also, I am grounded with two degrees in the source material. So I am feeding it my views and asking if the published work agrees or disagrees with my opinions and I get fantastic results that way to the point of knowing current clinical trials and treatment regimens than most of the oncologists and which led to a great conversation with the clinical trials team. This doesn't replace people, but it augments existing expertise amazingly well. But also, I hear so many tales of running out of tokens. I ask Claude Code to build a tool to perform a task. I review the tool and then I let it rip if I'm happy with it. As I understand things, most just ask Claude Code to do the task. That seems a bit fraught. Anyway, you have to impose constraints IMO and ask the right questions to get the answers you need or yes Claude Code (or any other LLM) will eventually just agree with you.
Yeah a lot of focus lately on making context windows enormous and putting everything in them. (It should know every detail of your life!) But in my experience LLMs are extremely "prime-able" and also tend to hyperfixate on details. So when asking difficult questions I tend to remove as much context as possible, rather than adding it. I don't want it to reflect my own ideas or biases back to me, I want an actually fresh perspective.
Yup. This works until it doesn’t (fairly soon thereafter), both from experimentation and understanding of theory. Here’s an illustrative example: https://old.reddit.com/r/Bard/comments/1l1qxk9/why_does_gemi...
LLMs are well suited to my (some would say annoyingly) curious nature. when i get an answer, and my first instinct is to ask a ton of follow-ups and "what about"s. i've learned to tamp this down with fellow humans, but with LLMs its great because most of the time the response is "you're right, something doesn't add up... let me try again". i think we eventually converge on to something reasonably true
[deleted]
Have you ever let the LLMs “discuss” with each other to see if that would give better answers? You might end up with the answer from the most persuasive LLM, but you might also end up with better results. Wonder if there is a paper out there on this.
The problem is how do you know whether the answer is just the most persuasive or actually the most accurate one? It's hard to figure this out without domain knowledge.
With direct discussion, the same tendency to harmonize towards groupthink applies. Aside from the statelessness GP mentioned, one can insert anti-conciliatory intermediation. "I saw a random claim go by, but something about it seems not quite right. What am I missing? They said: [...]." Weaponizing the bias, and orchestrating the discourse from the harness.
The problem with trying to write a paper is the results depend on RNG.
There are 3 kinds of mechanics: Scammers who do the lowest effort diagnostic and "fix" to get you to pay a smaller amount of money to fix the problem in the short term even though it'll re-present itself a week/month/year later. Upsellers who will find other things "wrong" with your car and pressure you into paying to fix them because they sound a lot worse than they are. Good mechanics that will explain what they did to diagnose the issue and recommend different options depending on what the issue is. Funnily enough, I've found that doctors tend to also fit into these 3 archetypes.
Yes, and that's a problem. Doctors (or experts in general) hate it when people don't trust them, but the thing with experts is that people have to trust them. And in my life (and a few times just in the last few years), enough doctors have been wrong enough that I cannot just trust them anymore [1]. If it is important, I will ask them to explain to me, and sometimes I will just ask for a second opinion. I have read about doctors complaining that "with AI, patients now come with their own diagnosis and don't trust us when we say it's bullshit, and it is a problem". I can feel for them, but if they give the feeling that they don't listen to the patients and the patients don't trust them, it's not only the patients' fault, I would say. [1]: I have more than one examples of my relatives like this: A doctor says "wow that's bad go to the ER", the ER says "nope it's all good, go home", first doctor learns about that and says "WTF you GO TO THE ER, call me and I will insult them on the phone", and finally resulting in a surgery where the doctors say "they were lucky we could operate right now, because in a matter of hours they could have died from this". How in the world can I trust them after one event like this? Happened to me (in some variation) 3 times. Not based on an LLM diagnosis in the first place: based on a doctor's diagnosis.
Heh, I hear stories like that everyday from my partner who is an ICU nurse. Not as dire, but there are constant inter-department arguments about moving patients because of resource constraints and the ICU could end up completely understaffed/resource constrained if the wrong NP or charge nurse is working. I'm amazed our healthcare system works at all to be honest.
Maybe a difference here is asking AI for conclusions. When I have it do a buyer's report for me, I ask it for "what questions should I be asking? What are typical things that go wrong with this type of vehicle?" I don't delegate conclusions to the AI but use it to educate myself. Then, I can gather further information to make MY decision .. to buy it or not.
I don't think so. LLMs tend to over-index on providing results in general whether it's a conclusion or not. When you ask it "What are typical things that go wrong with this type of vehicle?" you're forcing it to make a conclusion about which results to include and it will almost certainly provide results even if those issues aren't as much of a concern compared to typical issues with other vehicles. For example, I just prompted Kimi-K2.6 with: > I'm considering buying a used base model 2010 Honda Civic with 80k miles that's been garage kept. What are typical things that go wrong with this type of vehicle? It listed 10 issues including the engine block cracking (which wasn't even an issue with 2010 Civics). Started a new chat and asked about a 2010 Toyota Camry, another unbelievably reliable car, and it listed 9 similar issues. Started a new chat and asked about a 2011 Jeep Grand Cherokee, a notoriously unreliable vehicle, and it listed the same number of issues. Sure it's data to make decisions on either way, but it really all comes down to how good your prompts are and whether or not you can think critically about the output, whether or not that output is a conclusion or just data collection.
The best mechanic I ever had kept my ‘98 Subaru going past 200k miles. Once during a repair I asked him to do an inspection and tell me if there was anything else I should replace. He told me not to do that, and that any mechanic would always find something, but not necessarily the next thing to break. He said it better using an expression I hadn’t heard before or since, something like “don’t go looking for goats when your herd is already with you.”
Exactly. Old parts of the system will be working if you leave them undisturbed. Mechanics have very good intuitions of this sort of thing. I read about before there's proper engineering / physics theory about this too, it's like a car as a machine is a linear/smooth physics system with multiple weaknesses. Overtime longtime period of running many places might weaken but it still evolves into a slightly different smooth system, until you introduce a replacement which cause a mis-match of impedance or something like that.
Maintenance-induced failures are what it’s called with small aircraft. You’ll do something to prevent a failure (like, replace an old but functional alternator) but cause an oil leak or engine vibrations because you had to remove the propeller to complete the job.
There's a big difference between a _puzzle_ and a _mystery_. In a puzzle, the goal state is known, and as more pieces - data - appears, the goal gets closer. You know how far you are from the goal. A mystery is worse. With each additional piece of data, the goal gets farther away. Everything is more and more confusing. (Popularized by Malcom Gladwell)
Maybe I am missing something but I just find this wrong. Everything is a puzzle: there is one "Truth" or one diagnosis. You (a smart human) should be able to converge on it by cross-examining your LLMs. By themselves, they have no interest in revealing this, no stakes, which makes them tools only useful at the hands of a capable investigator.
> You (a smart human) should be able to converge on it by cross-examining your LLMs. What makes you think this is fundamentally different from cross-examining ELIZA? There is no guarantee that the LLM will help you converge on anything. Indeed actually calling out an LLM on BS tends to eventually produce an "I don't know and can't help you further" answer (as it should).
The problem is that the diagnosis might not be known for a while. There's a few conditions and diseases that require an autopsy for a guaranteed diagnosis and therefore are diagnosis based on symptoms in clinical settings.
> The solution to uncertain information isn't more information, which the AI can certainly provide, it's better information, and AI cannot currently provide that. I'd argue that AI _can_ currently provide that, but that it can't do it _reliably_, and that to non-experts it's impossible to differentiate, which makes it all the more dangerous.
Isn't that the case with human "experts"? If you had encounters with doctors, mechanics, etc. you'll know you can get a completely different diagnosis for the same problem which obviously means (in most cases) that the person you thought an expert is wrong. What is needed are studies that will take a cold look at the actual results because AI seems to be required to be perfect or it is useless. It just needs to be as good as a human for most stuff, but in the long run it will be much better. At least that what extrapolating current reality shows us.
We have systems around humans that exist to manage expertise gaps, credibility signals, and accountability. This is part of what makes humans as good as they are, along with specialized training and some measure of meritocratic selection. We license and regulate and account and litigate to make a system that responds and improves. Some of this might be applicable to LLMs, but some isn’t and much of it would be resisted. This is one reason we’re not likely to get “as good as a human” because at some level we’re not optimizing for the outcomes; we’re optimizing for speed, convenience, some participant’s economics, and underlying beliefs.
The soothing sound of ChatGPT telling us how right and clever we are…how could it possibly hallucinate, certainly not 5.5
You’ve really honed in on the key issue. This is exactly how keen hackers news commenters approach this.
To provide a competing point of anecdata: A Gemini diagnosis saved me $3,000 in unnecessary repairs on my Civic.
YouTube has saved me at least that much in appliance repairs... and it doesn't even have an AI. It's amazing how valuable access to information can be.
[deleted]
I would love to hear more about this
Saved me $2000 on a koi pond pump and filtration system
I tried that AI diagnosis for my 15 old Ford C MAx too, however with a diagnostic problem the issue is unless you've got the ground truth, there's simply no way to verify any tool / human with a metric that you can compare and decide on future tasks. The AI might be very good at diagnosing all minor issues, but might not lead to a successful repair, whereas human mechanics are extremely good on 80% of major issues that's not the ground truth, but will lead to successful repairs (that might not address the root but simply patch it). So it comes down to manage expectation / outcomes.
These tools can’t reliably fix a 4px misalignment on my icon, better ask them about a medical report… but honestly, I would do the same.
Tbh LLMs pulling data out of medical documents in it's training set and searchable online is likely a much easier task than fixing some weird CSS alignment issue.
Also most of them can’t actually see what they’re doing. It’s hard for me to get things pixel perfect while blindfolded, too.
You nerd sniped me with the story about your used car. What happened in the end? I really want to know! There are some fun YouTube channels that basically do the same. Someone who is an expert auto mechanic takes a used car to various repair garages and asks them to recommend a course of action.
Sounds like a fun watch! What is the name of the channel?
> I got 3 completely unrelated recommendations, including one that I knew was invalid! I felt worse off than when I started! I almost had a very similar experience with my beater Lexus. It took 2 independent shops and 3 dealers to finally figure out what was causing the ABS to go off randomly at low speeds. Turns out there's some obscure Toyota-specific tool from the late '90s that picked up a proprietary diagnostic code, and the third dealer was the only one that still had that particular piece of equipment. ...and of course, the thing that's broken has been out of production for 20 years and remanufactured ones cost more than the car is worth. I ended up just unplugging the ABS control module. Point being: once I knew what was wrong, all the seemingly contradictory information from the other 4 shops suddenly fit together. It's just such a weird thing to go wrong that no reasonable tech would ever have considered it.
> I got 3 completely unrelated recommendations, including one that I knew was invalid! I felt worse off than when I started! I would frame it differently: you now know which shops are not to be trusted. So, next time you need one, you will take a better decision.
There are few things better in this world than having a car shop you can trust. I found one and pray that management doesn't change.
Especially in the medical field where the placebo effect / mindset shapes outcomes.
> The solution to uncertain information isn't more information, which the AI can certainly provide, it's better information, and AI cannot currently provide that. Aside from the LLM-ism (it isn't foo, it's bar) - this is a thought terminating cliche. You definitionally don't know if some information is better or not given that you were uncertain about the information in the first case. "I went to three mechanics and got three different answers" - your takeaway is just "Ah - I clearly need better informed mechanics." Which is on it's face absurd because if you could clearly judge the ability of the mechanics you wouldn't need their evaluation. You'd just do the evaluation yourself.
> it's better information, and AI cannot currently provide that It sometimes can, if it straight out never can no one would use it. People use it , lots of them.
> There's something incredibly peaceful about being in the hands of an expert you trust This is the primary business model of enterprise IT and is why companies pay so much for 4 hour disk replacement.
You only got 3 opinions on your car? Why not 50? You could have found a more useful signal by getting more information. I get it - getting an opinion from a mechanic is time consuming. Not true of AI though.
A few years ago (before the AI craze), I was misdiagnosed with tuberculosis. I had a chronic cough, and an outsourced radiologist at a clinic found signs of tuberculosis. The findings were sent to the city's tuberculosis hospital, as required by the country's law. The doctors there took the radiologist's conclusion at face value and required me to stay at their hospital for at least 8 months under a strict, prison-like regime. There was no option to say no, because I was considered some kind of biohazard, and by law I had to comply. Before I was admitted, I quickly found another radiologist, who diagnosed pneumonia instead. I sent his report to the chief doctor at the tuberculosis hospital, and after some deliberation they concluded that the original reading was wrong. Turns out the doctors there can't read scans at all and just believe whatever a radiologist says... The funny thing is, they had already officially put me on the tuberculosis register and didn't want to admit they had made a mistake. So instead, they simply gave me another paper saying that I had been cured of tuberculosis by them... in 7 days. I'm probably the only person in the country to defeat tuberculosis in a week :) So if you don't trust the radiologist/doctor, maybe find another doctor if you can afford it? You can compare their conclusions and see if they match. Two unrelated doctors or radiologists saying the same thing is probably about as close to the truth as you're going to get. I'm not sure though whether I should trust AI or humans more. AI can hallucinate, but I've been misdiagnosed by humans so many times too...
How is it possible? You can't diagnose tuberculosis just based on imaging and tuberculosis hospital has to know that!
Yeah, I know! It was strange. They gave me a test, and it came back negative, but they insisted it was negative because I had "latent tuberculosis," which supposedly wasn't detectable by the test yet but was about to become active. I forgot to mention that, besides getting a second opinion from another radiologist, I also took a more modern test at another private clinic. That test has better detection rates than the one the state clinic used, and it came back negative too. I have suspicions they had some kind of government quota to keep the hospital staffed with patients in order to receive funding. Or they were just completely incompetent. I pushed back by bringing them another radiologist's report and the results of a better test that I paid for myself, so I guess they decided to back down.
You'll find doctors always believe and treat the worst diagnosis any professional has put on a case. That's a legal thing, not a skill issue. Think about the consequences of mistakes in both directions ...
[deleted]
Well you can't diagnose pyelonephritis without a urine culture as well, which my GP kindly noted after I already took a full 14 day dosis of antibiotics. The ER I was at before tried to, anyway.
Not only that, what is the point confining someone to prevent the spread of a disease about a quarter of the world is already infected with? I suppose there could be reasons, but I don't know them.
Some countries and jurisdictions still have laws that allow for the involuntary confinement of tuberculosis patients, I guess dating back to the times when tuberculosis was rampant in those countries? And most professionals seem to be okay with the policy: https://theunion.org/news/is-involuntary-incarceration-of-tb... >17% said that, as a matter of principle, the involuntary incarceration of TB patients was inappropriate on any grounds. >Regionally, members from Europe Region had the highest percentage of respondents objecting to the policy as a matter of principle (26.2%) while the North America Region had the lowest (3%). The emergence of multi-drug resistant tuberculosis in the 1990s is probably one of the reasons: >Respondents most strongly supported the policy of incarceration for patients known to have multidrug-resistant TB (49.7%)
Because it’s a nasty disease, and they’d like to prevent its spread. A quarter of the world may have TB, but there are only like 10,000 cases in the US every year
Incentives.
Yea I find a lot of stories on the web about doctors misdiagnosing things to have oddities like this that don't seem to make sense. It often seems like the author is leaving something out. Not saying OP is lying, but tb is a very, very weird conclusion to come to from just one radiology report...
See my answer in this same subthread. I was perplexed myself as to why I was diagnosed based on just one radiology report. But the moral of my story is that you can always try to obtain a second opinion from another doctor. I'm not saying doctors shouldn't be trusted in general.
I had a similar experience. My son had pneumonia and was still filling pain after 10 days of antibiotics. Took an X-Ray to three different doctors, and only one got the right diagnosis (pleural effusion). It's really something we should have a central place with top notch professionals looking at it, instead having each doctor to find by themselves.
I once worked on a medical hackathon concept for computer-assisted population screening for cervical cancer in a developing nation. Community health workers take photos. The AI would look at the images, and make a call of "clearly negative" vs "clearly positive" vs "needs (scarce) expert review". But taking good photos is hard, so it's also "photos insufficient" and "worker needs additional mentorship on taking photos". Only by computes reducing all three costs - expert workload, exam success, and quality-control/training - might successful deployment be financially and logistically plausible for that nation.
What country / municipality are you in? This is not my understanding of Tuberculosis...
A second opinion is a smart move if one has doubts about their diagnosis. Doctors make mistakes, and even though I've worked with countless great doctors, I've never worked a job where there wasn't at least one who was undiscerning, or downright lazy and negligent. It's hard to tell people to trust their doctor when I know there are plenty of doctors out there like this. But AI as of right now is worse than any bad doctor I've ever worked with.
The healthcare affordability crisis is only going to exacerbate the trend of using AI as a replacement for a real doctor. I went to urgent care a few months ago to get tested for COVID and two other flu strains and it came out to almost $500. Anecdotally, several people in my life who embrace less traditional (and sometimes more conspiratorial) views on modern healthcare tend to be the ones that can't afford it. A confident-sounding chatbot to answer questions day and night about what's going on with your body is very seductive in a world where access to real healthcare is getting further and further out of reach.
> I went to urgent care a few months ago to get tested for COVID and two other flu strains and it came out to almost $500. They have at-home COVID+Flu tests are my local CVS for $35, why go to an urgent care?
That's the balance I'm finding it very hard to strike when talking to my family about doctors. Everyone is either a "all doctors are scams" QAnon type, or they blindly trust everything their doctor says, no matter how fishy, in fear of coming off as one of the former group. And, to use a phrase we all hate by now, you're absolutely right. When most people have to go into debt to even see a doctor, what can people possibly conclude from that besides "all doctors are out to scam you?"
> AI can hallucinate, but I've been misdiagnosed by humans so many times too... I've heard this experience from quite a few folks before, but this is my first time hearing about a mandatory 8 months quarantine as a consequence... damn
Asking for a friend, who is in a somewhat similar predicament — it wasn’t Portugal, was it?
Your TB stories made me recall my (fond) TB stories. I came from a country that requires tuberculosis vaccines as a school-entry requirement. I have the vaccine and antibodies. Then moved to a country that didn't have this requirement but I had to be tested to make sure I didn't have TB for things like camps, college, etc. The test is something like a vaccine that injects only dead TB cells, if the injection site welts up from anti-bodies then you had to get another whole panel of tests (like X-ray of lungs). Thankfully I've never had it but TB is apparently no joke. Though blasting my chest with radiation is no healthier :p Anyway, thanks for sharing!
Radiologist. I don’t read MR shoulder exams in my day to day practice, but from the few pictures shown , I can’t conclusively disagree with the original report. These models are generally terrible at reading medical images. The amount of public training data on the internet compared to the number of scans a radiologist reads in training is minuscule. There’s obviously a ton of medical images in general but very few, and even fewer along with a report are available on the internet publicly for download. There are vision language models coming out of research labs that are excellent in describing and localizing findings. Still at the level of a 1st or 2nd year radiology resident, but as we all say - this is the worst the models will ever be.
Absolutely. It's very unfortunate that this post used the worst example possible of using LLMs for medical purposes. General-purpose LLMs are _fantastic_ at medical diagnosis that do not involve imaging. I am completely convinced that given enough information and time, frontier models already outperform >90% of doctors on initial diagnosis of internal issues and suggesting medical tests to further reject or confirm the most likely theories. To the point where I'm eagerly waiting for the first hospital in the world that's willing to be open and honest about using them for that first step, and then proceeding from there. I'll be on a flight there as soon as one arrives. At the same time, they're worse than useless at anything involving medical imaging. Asking them to interpret them is worse than trying to interpret them yourself as a layman. And you surely wouldn't interpret them yourself.
> General-purpose LLMs are _fantastic_ at medical diagnosis that do not involve imaging. Can you share the reasons that you believe this? > At the same time, they're worse than useless at anything involving medical imaging. What is special about medical imaging that makes AI/LLMs specifically bad?
> Can you share the reasons that you believe this? Firstly, please keep in mind I'm talking about the entire doctor population of the world here. Not sure which particularly bubble of this earth you have experience with, but note how half the word's population lives in India/China/Indonesia/Pakistan/Nigeria/Brazil/Bangladesh/Russia. Now I do believe that it holds the same for e.g. Europe and non-China East-Asia, but still. How many patients has the world-wide average doctor seen? How long have they been a doctor? How many have they seen with the particular condition the patient has? How much time do they spend listening to and reasoning about a patient? The median in the world is likely under 3 minutes. How many real-world incentives do human doctors have to deal with? Given infinite time and resources, and zero external incentives, maybe the median human doctor would outperform the LLM at this task. But this is completely detached from the real world. > What is special about medical imaging that makes AI/LLMs specifically bad? LLMs: Besides lack of training data as mentioned elsewhere, they're simply not trained for high-fidelity image processing in general. It's not limited to medical imaging. It's a bit like the "How many Rs in strawberry" thing, but worse. As for "AI" in general, medical image analysis is a very active field. These tend to be purpose-built though, not general-purpose. It seems likely at some point they'll become mainstream, but there's still a way to go.
You can see it in just this PDF report. It's multiple things. It never shows the subscapularis in the way that people actually look the tendon. It hyper fixates on the axial when I find the sagittal much more useful for subscapularis. Figure 7. There's an arrow pointing "to the acromial undersurface". The arrow is not pointed to that location. Figure 5. "thin bursal fluid". This is within physiologic variation, but is calling bursitis. It keeps bringing up irrelevant normal things like the shape of the coracromipal arch, I assume because lots of websites have information about that as a patient focused possible cause for rotator cuff impingement. I am reminded of the recent Stanford MIRAGE study which found that LLMs will happily hallucinate answers about medical images if the medical images are omitted. https://arxiv.org/html/2603.21687v2
Yeah, medical computer vision is a (fascinating) field with a lot of ongoing research. SOTA models are highly specialized, and are only getting good enough to be used by actual doctors and patients. Using a general purpose LLM to do this is similar to giving a credit card to Openclaw and telling it to make you rich through the stock market & cryptos.
I don't have insider information, but: if one of the AI companies really wants their models to become really good at this and publicly available datasets are scarce, they can probably just buy anonymized X-ray/MRI scans paired with the human doctor's diagnosis, and train on them. I don't know what the legal story is around this, but AI companies have near infinite money, so I'm sure they can buy their way around regulations (eg. by buying them from a less regulated country).
Anecdotally, I've had Claude (Sonnet and Opus latest) consistently misread numbers from screenshots of my macro tracking app. Makes me skeptical of claims about its usefulness for anything requiring accurate image interpretation, let alone MRI analysis.
I can see how your thesis is valid. Like OP, I also had a shoulder MRI, and asked two AIs for opinion (awaiting a follow up appointment to discuss the results). They both insinuated much more serious problem than it was (as judged by an orthopaedic doctor).
No trolling here: Do you feel threatened by the advance of AI/LLMs with respect to your field? I would. I am a computer programmer, and it absolutely feels threatening.
As a programmer, I don’t feel threatened by the technology itself, but I do feel threatened by the second-degree effects such as what the technology does to our field, especially in the wrong hands.
[deleted]
It funny to see the community here expects the human body to be treated like a deterministic function: for input X expect output Y - and that transfers to diagnosis - people expect to receive the same diagnosis from different specialists for the same issue. Given human body complexity, the diagnosis is a compound output of the experience, knowledge gained throughout the career and diagnosis methods/equipment, the title (like Dr) is a certification imposed by the state so its "safe" to let people practice since they passed "the bar" - but that doesn't imply everyone will be treating the same. Some specialists update their knowledge monthly, some yearly and some don't do it at all, there are so many variables in play here (geo, politics, even weather haha). Having said that, choosing the specialist is really important, getting opinions about their practice and their speciality, you can only maximize your chance of getting the right diagnosis, but don't expect to get it right just because somebody is called a Dr.
> It funny to see the community here expects the human body to be treated like a deterministic function In a community largely made of people whose job it is to produce such functions, I'd say it's to be expected
It's funny (and a little depressing), because HN routinely assumes that their world view, and thus, their domain expertise, transfers. There's no shortage of tech people convinced they deeply understand law, medicine, philosophy, etc. despite never having read much on the topics.
Most of my "favorited" comments on here are by software people with confident yet incorrect statements (usually by way of vastly underestimating complexity) about one of my domains of expertise. I can't find it but one of the greatest show HN was a blog post about someone who was annoyed by his inconsistent shower temperature control. From memory, he spent a full weekend adjusting it, taking measurements, making graphs, and proposed "next steps" about prototyping better temperature control with microcontrollers and servo and pontificated about developing a product, of course controlled by software. He skipped the part where a bit of research leads you to the already common "thermostatic mixing valve".
The internet at large is full of armchair experts, it's not just a tech thing.
People in tech have particularly big armchairs though.
I'm not sure what your point is. Are you saying that medicine is inherently fallible and therefore AI is more likely to make a good diagnosis - particularly a cluster of specialist AIs?
Yeah I think the OP is muddling the point by conflating "physician's version of the diagnosis" with "The Diagnosis". There is absolutely one "The Diagnosis". Human body is a machine, albeit a very complex one, and all measurement sources have noise. But they are all measuring one reality, and if there is a problem, there should be one explanation that all measurements align with. They can be noisy but can never be conflicting (instrument error notwithstanding). Physicians' ability to arrive at "The Diagnosis" would vary, but it does not mean one does not exist. I am not sure if characterizing human body as derministic or not is relevant here.
I think „the diagnosis” is over simplification and lots of professionals would disagree that there’s always a single one. As a patient your goal is to eliminate the symptoms of whatever is going on in your system. Often times there could be many reasons for it and only curing one can help you already. The diagnosis is a help tool to choose the roght curation method. Thus, chasing the „right” diagnosis (whatever that is?) is pointless, as it only the outcome (reducing symptoms, stopping the damage) can tell you if the diagnosis was right, but not the only one right.
I'm a radiologist but can't really weigh in without seeing the full 3D MRI dataset. Regarding this point: > They performed shockwave therapy on my shoulder even though a recent clinical practice guideline says clinicians should not use or recommend shockwave therapy for rotator-cuff tendinopathy without calcification; I was told during ultrasound that there was no calcification. Ultrasound isn't a great way to assess for calcification. It'll find large calcification but easily miss small ones. Plain radiograph would be more helpful, but the MRI may have revealed it as well. Either way, shockwave therapy isn't harmful in the absence of calcification--it's just not helpful. Edit: when a radiology report says something isn't present, there's always an implicit caveat that the finding isn't present within the context of the modality and images obtained. So an ultrasound report can state there are no calcifications while a plain radiograph can report the presence of calcifications without being inconsistent. Obviously very confusing to patients and people unfamiliar with medical jargon, but clarifying this in reports would make them sound even more qualified, "hedgey", and annoying to read than they already are.
> So an ultrasound report can state there are no calcifications while a plain radiograph can report the presence of calcifications without being inconsistent. Obviously very confusing to patients and people unfamiliar with medical jargon This is being overly nice, I think. Anyone who doesn't understand this is an idiot imo. You would have to assume that every type of diagnosis instrument has infinite clarity and is always correct to be confused in this case. Reminds me of the Babbage quote where somebody asked him, if I put the wrong question into this computing device, will it still give me the right answer? His response, paraphrased "I can not fathom the logic of the minds which would come up with such a question".
> Anyone who doesn't understand this is an idiot imo I don’t think that’s true. Avoiding this mistake requires knowing that an ultrasound may not detect calcification. For a patient reading their own report, I don’t think that’s intuitive. I would expect most people to read “no calcifications” and assume that their joint has no calcifications.
Exactly. I was about to reply to the comment with “perfect example of not knowing what you don’t know” in terms of self-diagnosis. My internal model is/was “if the scan wasn’t set up / can’t detect the thing, why would the statement be present at all?”. That implicit assumption is really subtle.
Most people should have learned at a young age that absence of evidence is not evidence of absence. My 8 year old understands this. After all, you can rarely ever prove something does not exist, only that it is unlikely to exist. If a report states that X was not found, it does not mean X did not exist, it means it was not found. What may be lost on the layperson is the nuance and understanding of how thorough or not a particular scan is and how much weight to give the findings and thus the odds that the report is correct.
It’s 2026 and my computer will happily give me the right answer even when i make typos. I love it.
It's a fatal flaw to think counter-intuitive == wrong.
Not really, it just requires to assume an ultrasound has infinite, perfect resolution when you are faced with a different imaging tech which reports things that didn't appear in the first one. That's just stupid.
> You would have to assume that every type of diagnosis instrument has infinite clarity and is always correct to be confused in this case. There's a difference between 99.9% clarity and 50% clarity. Even if neither exactly equals 100%, it's understandable that a layperson would expect different language between them
This comment sounds like it's written by someone who doesn't interact with real people very often
I’ll bet they’ve got a debilitating case of engineer’s disease, too.
"On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
Off topic but I have always felt this seemed like his misunderstanding rather than theirs. It’s an odd question, but it’s a very sensible point to make if Babbage has just told you this will solve the problem of mistakes in calculations - humans being involved at the start means human error still plagues the output.
To quote the LLM-ism, they were making a sharp point. It doesn't matter how precise the calculations are if you're calculating the wrong thing. I suspect their sarcasm might have escaped Babbage who seems to have been on what we now call "the spectrum."
Actually, I would be really pleased if a member of Parliament asked that. That shows a level of deeper consideration. Isn’t there a saying about there being no stupid questions, only stupid answers or something?
[deleted]
> Anyone who doesn't understand this is an idiot imo I disagree. A priori it's not obvious to a layperson whether or not a statement that uses unconditional phrasing is intended to be authoritative or conditional on something unspecified, like the resolution of the measuring device. This goes for any sufficiently technical field. If you got the brakes checked on your car, and the mechanic did <something> and told you there are no issues with them, and you then took your car to a different mechanic who did <something else> and told you there is a problem, you would not be an idiot for thinking that these conclusions contradict one another.
It’s funny that the answer to this has increasingly become “yes” over the last few decades.
I don’t think people are idiots if they don’t understand how a normally intelligent person might not intuit that. I do think they have a seriously underdeveloped theory of mind.
> Anyone who doesn't understand this is an idiot imo Even if this is true, so what? Idiots get sick at least as often as others, and the medical system needs to work as well as it can for that population too.
[deleted]
As a rad tech, YOU TELL ‘EM DOC! I do like some uses of AI I’ve seen that help patients advocate for themselves or understand basic things like blood panel numbers, but it’s really bad at glazing people and leading them down medical rabbit holes kind of like the OP. You would think that the AI would point out that calcium is best demonstrated on Radiographs/CT imaging vs Ultrasound or something to that effect.
Semi-related: my father has complications from a motorcycle accident ~25y ago that crushed arteries in his leg coupled with diabetes (insulin / kept sugar at ~100 and his A1C was kept under 6.7 for ~15y). 6w ago had to have his toes removed due to dry gangrene; they eventually (2.5w ago) had to remove his leg below the knee because of the severe blood flow issues below the knee. Between the toes and the below the knee amputation, there were no less than 15 different doctors and PAs / related personnel who COULD NOT COME TO A CONSENSUS. They would just tell my mother and I (PoA) the details; they refused to come up with a singular plan of action moving forward, leaving it up to us to make 'an informed decision,' something that's IMPOSSIBLE when you have to take up to 15 different opinions into consideration. What exactly are we supposed to do as patients/family members when medical personnel cannot give reasonable paths forward and instead just throw a bunch of shit over the fence at you and tell you, "you decide what to do from here," regardless of how many VERY DIRECT conversations I had w/the 'care team' on doing better to provide a limited array of options and reasons/likelihood of 'positive outcomes'. I'm used to dealing with a wide variety of stakeholders/SMEs in decision-making; it's my job to apply my extensive industry experience to present our clients with their options, ranked and reasoned. Doctors, in my experience and most recently with my father, clearly do NOT do that (I assume due to liability; but, no real idea, honestly). So; when dealing with LIFE CHANGING circumstances, what are we supposed to do except rely on what might be able to offer more analysis and option narrowing w/AI? I certainly don't want to make the job of medical staff more difficult by putting out crazy theories I found on the interwebbernets through my own research, etc; but, when we're having to deal with uncertainty and insanity, what else can we do?
Your see this in coding agents too. The only times so far I’ve really seen Opus tie itself into a knot is where I’ve asked it to fix something that I thought was broken but actually wasn’t in the way I had described. It will bias towards your description (I’m guessing because that’s the most recent context it has?).
i'm sorry, but AIs only "know" about stuff that they have been trained on. If we would allow AIs to be trained on the petabytes of medical data hidden in hospital systems, they would most likely be much better at diagnosing illnesses and conditions than the average doctor. (Justifiable) Privacy around medical records so far prevents this. You think you're cheering for humans, but in fact you are gatekeeping healthcare.
I feel like the promise of these models is to help people make more informed decisions. Improving the knowledge economy and general understanding. The problem is these are just statistical models at the end of the day, so you need to know something to be able to identify the errors. You can’t let them really be autonomous and you also can’t really have people turn into glorified approvers. If the machine is correct 89% of the time, you cannot make people responsible for that 11%. It’ll just cause automation fatigue. tl;dr: the actual use cases of these LLM (or generative AI in general) is rather limited, so it is offensive how much hay has been given to them eating the entire capitalist system. They are not fit for purpose.
Agreed. Not a radiologist, but I do a fair bit of MRI research. Experts vs lay people probably have different success with getting the right diangosis out of a frontier model. Subtle changes in prompts can cause different diagnosis[1] [1] https://www.nature.com/articles/s41591-026-04501-8
Radiologist who does read shoulder MRI would like to add that over half the annotations are wrong, glaring mistakes in anatomy and cardinal direction which begs the question of how is it making these findings without knowing what it’s looking at (here’s a hint, it’s hallucinated based on reports it sees).
What is "it"? Claude Opus 4.x? ChatGPT-5.x? GLM? DeepSeek? RadFM? Med-PaLM?
Huh, I'm reading and looking up these words you guys are saying and it is starting to look exactly like the symptoms I have been having with my own right shoulder! I feel like a giant gaping rabbit hole just opened up next to my desk.
We're discussing calcific tendinitis (https://radiopaedia.org/articles/calcific-tendinitis?lang=us). If you think you have it, you can see a doctor and consider shoulder radiographs to start.
Can vouch for it. Ultrasound hasn't found calcification in my shoulder but MRI did. Exactly as you said, because it was very small.
Why isn’t diagnostic ultrasound used in orthopedics? They inspect fetus hearts and other organs everyday, why not shoulders? Seems much cheaper and faster.
They do. Ultrasound in orthopedics is a relatively newer field, and there aren't quite as many sonography techs and radiologists experienced in reading these studies, which is likely why you don't see it offered more widely. Edit: I should mention that ultrasound is basically unusable for evaluating bones. Sound waves can't penetrate bone, and so you end up just seeing a huge black void. That's a huge orthopedics use case that ultrasound just can't benefit. However, ultrasound is fantastic for evaluating muscles, ligaments, tendons, and other superficial soft tissues.
We order ultrasounds all the time for shoulders (for like soft tissue issues; for trauma, you'd start with an xray). For other joints, such as the knee, MRIs are a better choice (unless htere has been substantial trauma, in which case xray initially or further), though more expensive, unless you're excluding a Baker's cyst, in which case an ultrasound is fine. Since MRIs are more expensive, private doctor's might order them instead of an ultrasounds. (I'm a doctor)
Ultrasound was overlooked by US medicine as a first line imaging tool for a long time because it takes real skill and experience to do it right. But it's making a comeback. We've had Chinese, Indian, Australian, and American doctors visit us for one to two month stints to build up their skills. Given the skill involved, it's probably a liability concern they don't want the exposure over there.
They're used quite a bit for nerve entrapment—both in diagnosing and treating.
It's a manual, non-standardized process without a standardized output. Image quality depends both on user skills (how deeply they press the sensor on the skin) and the machine they have. Unlike CT/MRI the examination results cannot be easily shared and compared between patients for studies.
So Opus might be correct?
> I'm a radiologist Any comment that doesn't start with this or similar qulaification should be taken with a grain of salt (yes, including this one). Medical imaging is one of those things everyone thinks is simple because they don't know what they don't know. I'm a cardiac sonographer, and I have to assume radiologists hear at least as many eye-rolling takes on AI coming for their job as I do.
Ahh, AI is coming for your job. Full sarcasm, is there one that’s that’s more immune?
Does radiology really make +$700,000.00 a year ? Someone on reddit claiming to be a radiologist claimed that. I wonder where the savings will go when those jobs are gone.
> Does radiology really make +$700,000.00 a year ? The radiologist I know does not, but they are paid very well (and these numbers are always dumb when you're not sure if they're living in Manhattan vs literally anywhere in Kentucky) Like most medicine, a large % of the job could be done by any decently talented person willing to follow instructions and shadow for a few months. Like most medicine, the remaining % is what you're paying for, because it is literally life and death and you can't do things like "pull the logs" or "lets turn it off and take it apart" or "huh i need to put this down and come back later". Even in radiology, because "well lets just do it again to be sure" is often not a viable option. While there is a problem in how we have inflated the cost of education for medical fields, the insane health insurance issues (US obviously, but it does have some effect globally when the expert radiologist you hire from the US to help with research costs that much), and probably some better ways to approach splitting the work for the entire field, like most professions dealing in life or death, medicine likely will always be paid well.
Physicians salaries account for about 8% of healthcare costs in the US.
The savings go straight into patients' worse outcomes.
You know the radiologist you're responding to is a real person? Your last line seems needlessly callous.
To the consumer! Haha just kidding. We all know where they'll go.
I've seen a lot of friends and family members almost immediately get offered surgery for shoulder pain. It's just often the default for people that do surgeries for a living. I also had a pretty painful shoulder issue at one point, where the pain just wasn't subsiding for months. I tried massages and acupuncture as I didn't want to do surgery, but it wasn't helping at all. The thing that fixed it for me was just really focusing on doing pull-ups. I couldn't do them at all when I started, so I began with dead hangs and scapular pull-ups, eventually progressing to regular pull-ups, and then training with a "grease-the-groove" method once I could get a few per set. I stopped the training schedule once I was getting in around 17 pull-ups per set, and now just do 6 sets of about 7-8 pullups 3x per week spaced throughout the day. I'll also do some shoulder mobility drills [1]. Whenever I get lazy about keeping up with them inevitably discomfort will start arising again, but it goes away once I get back to strengthening. [1] https://www.youtube.com/watch?v=vP8YmmRMz6I
I have a story about this very issue. I have hip impingement, I played sports and got a labral tear in one of my hips. My hip would get sore and painful after a lot of activity. I've seen a top surgeon in the US. After we just met, he looked at the MRI(yes there was a small labral tear there) and said he can have me on an operating table in 2 weeks. I was shocked, because the recovery absolutely sucks. So, I got 2nd and 3rd opinion. 3rd opinion was a doc with 20+ years of experience. Asked me if I plan on going pro in any sports, I said no, he said the surgery is not worth it. I did some PT and barely have issues with that hip. Then, Obama admin created a website to see what gifts($) physicians accept. The 1st surgeon had accepted six figures+ from stryker. The older doc? 0. There is no money in PT for a surgeon. I would thread lightly with popular and young surgeons.
Personally I've always appreciated talking to nurses I know. The respectable ones know they aren't doctors, but they've seen a lot more recoveries and cases where minimal intervention was required. As some people have said some surgeons like to cut people up.
I had issues with my shoulder for years. Tried PT as well as pull/push-ups but doing that made the pain worse (if I wasn't doing any exercises involving the shoulder it was "fine")…
same here. I started doing yoga and rock climbing, and it stretched everything out, and strengthened all the muscles around it. I rarely have an issue now.
On the flip side, when I had rotator cuff issues, the surgeon recommended months of physiotherapy before resorting to the knife. And it worked. And by weight training regularly with a focus on correct shoulder movement, the pain stays away. It really seems like if you, as a patient, go looking for a quick fix, that’s what you’ll be offered. And if you educate yourself a bit and then go t for the best fix for you, you usually get they.
Physical therapy is very often under recommended in the US under the belief that insurance won’t cover it. They might. And, for anyone reading, you don’t even need a referral for the first 30 days in some states. Physical therapy is for more than just hip replacements and car accident trauma. Like regular therapy, a lot of “normal” people can benefit from it. It’s also not just stretching.
What did you have exactly? With calcifications, physio without the shockwave component definitely doesn't allow going back to the normal gym routine. It's just not enough.
> the surgeon recommended months of physiotherapy before resorting to the knife In my limited experience, "If all you have is a hammer, everything looks like a nail", rings particularly true with medical professionals.
~2 years ago I used ChatGPT "deep research" to investigate a chronic sinus infection I'd been fighting for ~3 years. After seeing 3 GPs and 3 visits with an ENT, I fed all the observations I had into the AI. In particular, I couldn't get the ENT to explain why he visually saw, via a scope, evidence of allergic reaction in my sinuses, but then later concluded, after an allergy test, that it couldn't be treated via allergy medication. I asked this question a few times and he just never answered. ChatGPT surfaced a NIH study that concluded that 20% of people have allergic reactions that are isolated to a body location, and that shoulder "skin prick" testing may not reveal. I asked him about that and he said "that's not how allergies work". Full stop. He was unwilling to even look at the study. He prescribed a CPAP and regular nebulizer treatments. Side story: the CPAP place sent me a SMS message that I couldn't recognize was not a phishing attempt, and when I reached out to inquire who they were they never replied. So I decided: Let me just try taking a second-gen allergy tablet every day and see what happens. My sinus infections have gone away. Previously I was getting a major sinus infection at least quarterly. Maybe he's right that allergies don't work that way, but allergy tablets have absolutely solved my problem. Which I'm thankful for because I tried a CPAP for a solid month a few years ago and I just could not get used to it, and was sleeping like crap.
Ok, there's a lot to unpack here and you really had the deck stacked against you. First, lets go from the top, once a test says X, disproving that X is really hard. And that's not unique to the medical profession, it's inherent to all humans and we suck at revisiting or revising our decisions, much less at looking at the possibility to even reverse it. Which moves us to the next two issues: liability and time. Any moment that you ask someone to revise a decision and specially with the stakes that the medical profession has that nobody has the time nor the inclination to open themselves for a mess. Now, if you really want to be successful, you have to, before they even have a case with you, and specially before the diagnostic loop closes, to suggest the tests that the study has, since that has the biggest chances of looking at the right thing to look. Just be straight that you walked in with a theory. Doctors notice when they're being steered way faster than they notice when you're actually right. That's how you work with the systems that have a overworked mass trying their best.
>before they even have a case with you My problem is that I needed information from 2 ENT visits to feed into ChatGPT to get that study. On the first visit he scoped my sinuses and immediately said "I can see evidence of allergic reaction, see those white bumps?". On the second visit I got an allergy stick test and it came out negative. Those helped lead to that NIH study. It would have been very hard to have walked in with that study in hand.
> Let me just try taking a second-gen allergy tablet every day and see what happens. Stupid question: Why did you wait three years before trying this tactic?
Not stupid. Because it wasn't on my radar that it was allergy related until the ENT mentioned allergies.
Daily allergy tablets are associated with huge increases in early onset Alzheimer’s. Glad you found something that works, but might be good to get some of the allergen injections :)
That seems to be only for first generation, drowsy-making, tablets. Second gen formulas don't cross over the blood/brain barrier. https://www.myalzteam.com/resources/zyrtec-and-alzheimers-me... There IS one year-old finding that suddenly stopping Zyrtec after daily 3-month use may lead to nasty itching, and if that happens you can re-start and then taper off. https://www.fda.gov/drugs/drug-safety-communications/fda-req...
Where are getting that from? All I can find is about 1st gen antihistamines (i.e. Benadryl, which I doubt many people take daily, because of the drowsiness). Even for those, evidence seems to be mixed at best. "Huge increases" seems like hyperbole.
Only first gen, 2nd gen does not have this issue anymore or it’s greatly reduced
Misinformation. Only first-generation antihistamines with anticholinergic effects are associated with cognitive decline in elderly patients.
I believe it depends on which ones, the older gen or certain classes of antihistamines
Wait, what?? Now I'm getting in panic mode because I do take regularly anti-hystaminic tablets/pills (the newer ones, based on ebastine because they don't make me feel sleepy)
As a radiologist I have found Claude and ChatGPT to be absolutely terrible at MRI and I would not trust it one bit. It has its merits if you need to research stuff that is more text based, but radiological images is just something that they cannot interpret good enough (yet)
AI makes up for its poor reporting by enhancing the images. Current Siemens MR software ‘Deep Resolve’ makes up the signal (adding about 50%), then makes up every second pixel, and then, for 3D sequences, makes up every second slice. It’s locking about 59% of the time off each sequences. And it’s really really good. I’m an MR tech.
but those are two different things. Of course something like Deep Resolve is great, as are modern model based reconstruction algorithms for CTs, but here we are talking about LLMs and their ability to interpret medical images, which has nothing to do with what you said.
Sorry? You use AI to hallucinate medical images and that's good?
Sure but claude and ChatGPT are not Siemens 'Deep resolve'.
It's like people who expect ChatGPT to be really good at chess because chess engines with super-human performance have been around for decades, so obviously the latest frontier LLM that took billions to train should find the task trivial. Actually, I'm curious what ChatGPT 5.5's ELO is- I wouldn't be too surprised if it's 2000+ just from its basic understanding of chess principles from all the content it has digested.
ChatGPT is completely unplayable at chess on its own. It's unable to keep track of the state of the chess position and therefore will make an illegal move within about 10-12 moves. I would put GPT-5.5's rating at 400, since it can't even make legal moves reliably. I've tried to pay chess with GPT-5.5, even played it again tonight, allowing it to use `python-chess` to keep track of the state of the position and to get a list of legal moves at each turn, so that it was fair. I also gave it blindfold odds, again to make it a fair fight, but it was not even close. GPT still isn't better than maybe 1000 Elo, maybe 1200 tops. Even with what amounts to being able to see the position and also being unable to make an illegal move, GPT-5.5 hangs material left and right, doesn't make a plan, and got smoked even when I gave it blindfold odds, to the point it's boring for me to play even under those conditions. I'm not sure it's better than whatever the GPT model was that was out about 8 months ago. I also thought it might be somewhat better than a beginner due to reading chess books, but no, it's complete garbage at playing chess, not even average-level skill.
Interestingly LLMs are extremely bad at chess position _images_. I have to imagine if you give it positions in text it'd be pretty great but when I was learning chess and pasting images of positions in for analysis I couldn't believe how wrong it was. I actually thought it was looking at the board in reverse but even when pointing out problems it seemed completely incapable of understanding what it was missing (of course... it doesn't really "understand" anything). LLMs truly are marvels with text but anything spatial seems to really mess it up, somehow.
"A 2026 Finnish study published in JAMA Internal Medicine that used magnetic resonance imaging (MRI) scans to look at patients’ shoulders found that 99% of Finnish adults over 40 have at least one rotator cuff abnormality." https://brainlenses.substack.com/p/abnormality Incidental Rotator Cuff Abnormalities on Magnetic Resonance Imaging https://jamanetwork.com/journals/jamainternalmedicine/fullar...
I thought this was an interesting experiment and I repeated it with my own DICOM. Results are terrible. Claude has complete opposite diagnosis on my ACL, mensci and cartilage. Claude: Primary finding: Complete ACL tear with the classic pivot-shift bone bruise signature (posterior lateral femoral condyle + anterior lateral tibial plateau edema) and large hemarthrosis. PCL, MCL, LCL, menisci, and cartilage all intact. Radiologist: English translation of findings & conclusion: Mild joint effusion. No Baker's cyst. Post-ACL reconstruction with minor cyst formation in both the femoral and tibial bone tunnels. The ACL graft shows heterogeneous signal but no complete or recurrent rupture. PCL and collateral ligaments intact. The lateral meniscus appears abnormal, likely from prior partial meniscectomy, with significant cartilage loss (partly Grade 4) at the posterior lateral compartment, osteophyte formation, and reactive bone marrow edema. The medial meniscus shows diffuse signal change from prior repair but no recurrent tear (specifically no recurrent bucket-handle tear). Mild chondropathy with focal cartilage loss on the lateral side of the medial femoral condyle. Cyclops lesion present. No definite loose bodies.
I don’t understand the negative reactions. Medical care as it exists requires the doctor and patient to have their brains switched on. I’ve almost never had a problem where a doctor provides me with a diagnosis and I go about my day. Most of the times that I have, I’ve been confident about the problem and known what I needed. The doctor was a barrier to accessing care. Dr. GPT is a good brainstorming tool. It helps synthesize information in a way that primary texts don’t. But it does force you to say “that doesn’t make sense”. I do think that people saying “doctors don’t know the state of the art” have a weaker case. If you think about it in terms of token density during pretraining and how post training datasets are constructed, I think it would take us a very long time to adapt to any fundamental shifts. If we have forgotten how to cure scurvy, how many journal articles would it take before we adapt to a discovery?
> I do think that people saying “doctors don’t know the state of the art” have a weaker case. This is kinda the case though. In Poland I met only one psychiatrist that knew about DSM-5. In this year. DSM-5 was a thing from 2013. Doctors are people just as us, not every single of them is good.
Oh I agree with you, it's just that I don't think LLMs are either. If you think of LLM knowledge, especially in scientific/engineering fields, as a lossy representation of the density of ideas, then you'd expect to see some weird behavior. I'm sure there is some sort of a temporal discounting and people thinking about this, but a naive NLL or Reverse-KL on medical literature would engrain some weird, wrong ideas.
Many DSM-5 diagnosis come into effect with the ICD-11, ICD-10 doesn't have a good deal of them, and that rollout is still fresh & ongoing. It is kinda spooky, though, to have freshly minted doctors from a few years back whose school-knowledge will forever be "outdated and archaic" based on standards published before they were in school. Some good advice I got: treat this as a generation shift, find younger and newer doctors who are familiar with the "modern" standards.
Why would you expect a Polish psychiatrist to understand the differences between different versions of a diagnostic manual used only in the US?
I would not trust AI on images. But I once had ChatGPT tell me that an MRI report was very likely to be incorrect based on the text, and offered a different diagnosis. Since it was semi insisting, I visited another doctor who made me do a retest. Long story short, ChatGPT was correct. Again, this is just one single person's experience. So not worth much.
I think that much of the visual gap is because what to attend to in images is less structured. Anecdotally small qwen finetunes (ie less than 10B) take task accuracy from sub 30% on FMs to 90%. We have sold some of these for outcome based back office tasks. I think we’ll see a lot of specialized VLMs that provide real value.
This sounds fascinating. Can you provide any detail regarding the nature of the diagnosis or problem it identified?
Anecdote but I gave Gemini Pro an image of an individual with Herpes Zoster which the doctor said was something else. Gemini gave the correct diagnosis which allowed for correct treatment and cure. I don't understand why doctors don't prompt LLMs before saying wrong things. Is it ego? I can understand for radiology because you need a specialized convolutional network, but for more knowledge based things...
“A man with a watch knows the time; a man with two watches is never sure.” I imagine reasons for what you’re asking might include: * Prompting an LLM is work, and they’re already overworked just doctoring—every conversation with a computer is a conversation you’re not having with a patient; * They’re probably right more often than they’re wrong; * “When you hear hooves, think horses, not zebras”: the 15th case today of strep throat is probably strep throat, regardless of today’s 15th falsely-confident LLM weighing-up; * They tend to have spent many many years honing a clinical intuition that makes an examination, to some degree, hard to articulate fully to the LLM; * Liability/overdiagnosis: All this stuff is probabilistic. Inevitably, there’s going to be a time when the LLM throws out something I thought unlikely that turns out to be right, and there will be other times when it’s wrong but now I have to document why. How many false leads do I need to chase per one true differential? Does this really compare favorably to seeking a second opinion from another human doctor? * Not everything needs to make it into the record. Once it’s in the LLM, it’s discoverable and litigable and hackable and permanent; * Medicine is practiced in very different ways in different contexts—even in this thread, one radiologist routinely orders ultrasounds for soft tissue shoulder problems, and the other medical-world person replying has never heard of such a thing—presumably both within US health care contexts. Some doctors hand out antibiotics like candy, others are more cautious with respect to resistance. What’s right can depend on the time, the place, the clinical setting—more than just the immediate patient-level facts at hand, in ways that become awkward or unwise to express explicitly. And of course… who’s to say they don’t do LLM-assisted research, in cases where they think it might be helpful?
> I don't understand why doctors don't prompt LLMs before saying wrong things. Is it ego? Either that or laziness I'd imagine. This isn't limited to LLMs. Expert digital assistant systems that you query have existed for a long time. A good physician will double check anything even slightly unexpected against one.
mate the other day chatGPT (enterprise) told me that the kernel 7.0.2 was older than 6.69 you cant trust these toys at all. that doesn't make the useless, just untrustworthy.
6.69 hasn't been released yet, to be fair.
That might be doctors new nightmare: people who second guess everything with AI. Previously it was "google your symptoms".
Well I live in the nightmare that is the Dutch healthcare system [1]. There are many things that they will fix but they didn’t fix my sleep. A friend fixed my sleep. He is a doctor and prescribed me the right thing. The thing is, he shouldn’t have had to intervene. Without him I could have ended up poor and destitute as my sleep was wrecking me. And yea, I already did all the standard things. CBT for insomnia helped somewhat. My insurance didn’t fully cover it either, unless I was willing to wait for 8 to 12 months. And I recently met someone with slow moving metastatic cancer. Thanks to LLMs they will most likely live another 3 to 5 years extra since the Dutch conventional mainline treatment hasn’t been taken yet. But it is German doctors that helped them and Belgian doctors that pointed out in a second opinion that a lot more can be done. LLMs have a part to play. The false positives are awful, but I have seen an average of 5 out of 10 care when things become too complicated. Except for trauma treatment. The Dutch healthcare system is amazing once they diagnose classic PTSD. So it’s definitely not all bad but the trust I had when I was younger has been eroded quite a bit and LLMs can meaningfully step in, in my case at least. [1] I know there are worse systems. But from what I have heard there are clearly better systems nowadays. It has slipped a lot
Hey what did you do to fix your sleep? Help us all and maybe an llm will index your diagnosis (hi ChatGPT)
The NYT did this profile a while back: "Ben Riley was already writing about the risks of chatbots when his dad started trusting A.I. over his doctor." The dad was a retired neuroscientist who delayed cancer treatment against medical advice because he was certain he had been misdiagnosed based on his own research that he did with the help of A.I. https://www.nytimes.com/2026/04/13/well/ai-chatbots-cancer.h... There's a comment on the article from Ben Riley: > I am very grateful to Teddy Rosenbluth for sharing my father's story with the world, her kindness and curiousity proved to be restorative in ways I didn't anticipate. > The two words that everyone used to describe my dad: "intelligent" and "kind," and he was indeed both of those things. The sad irony here is that it was his human intelligence, combined with these strange new tools that purport to be a form of 'artificial' intelligence, that led to his ill-advised decision to forego the treatment he needed for his CLL. A doctor has already commented on this story with the observation that AI "confidently asserts erroneous conclusions," and we simply have no idea how often this is happening or the magnitude of the harm that results. > Not a day goes by that I don't feel the pang of my father's absence. He might still be here if not for AI. I try not to think about that, but sometimes I can't help myself.
The context is very important: decades of a poorly-diagnosed chronic illness had left him deeply distrustful of the medical system. This is the real root issue. At 75 years old, he was stubborn. Is that reasonable ? Yes, perfectly. Could he have been right since the beginning ? Certainly. Did he deny evidence ? Yes. Zero doubt that he was intelligent, everything points toward that direction, but that doesn't make a person less stubborn, because accepting the evidence, is also accepting that you were wrong if you initially postured yourself as adversarial instead of cooperative. He would have read Wikipedia, scientific papers, etc, even without AI. He did not want to be convinced. It works both ways: https://www.foxnews.com/health/woman-says-chatgpt-saved-her-... or https://www.today.com/health/mom-chatgpt-diagnosis-pain-rcna... Nonetheless, someone very smart, just didn't want to move from his position.
i mean, other smart people have famously delayed cancer treatment without needing poor guidance from LLMs! that's not at all new or unique to LLM chatbots
GPT-4o, which is what that article is most likely about, was an older low param count slop model which was known for abusing emojis and sycophancy. It does not really have any relevance to latest claude frontier models. Your comment is akin to saying "Karen from facebook who is a human pushed essential oils and ivermectin as a cure to cancer. Now doctor Y is suggesting chemo. Both are humans, humans cannot be trusted!"
I asked a clanker about symptoms I was having. (I'm not an idiot, I was already on my way to hospital, clanker was just to take my mind off symptoms during the drive.) The clanker said I'd be fine, I just needed some rest and OTC meds. The medical staff immediately turfed me to surgery because the same set of symptoms I told the clanker were enough to concern them that I needed emergency surgery. Had I have listened to the clanker, I'd be dead because I did need emergency surgery. (Hell, I almost kicked the bucket because I waited for someone to wake up to give me a lift because.my insurance probably doesnt cover an ambulance ride.)
Very curious what made you run to the emergency first thing in the morning that an LLM understood as "just normal, take some OTC meds and wait".
It's not just the second-guessing. It's the getting in the ballpark but striking out: explaining in detail why they are not correct. A little bit of patient knowledge requires a tremendous amount of doctor time to explain away the ignorance. It's a 180 for me: While I believe doctors should explain diagnosis or treatment decisions when asked, I don't believe they should be taxed with explaining away alternatives. In my anecdotal 2nd- and 3rd-hand experience, doing that is taking at least a third of their time (on roughly 5% of the patients who think demanding answers will make things better) -- with zero improvement to diagnostic accuracy or treatment effectiveness. Doctors already consult with other doctors, and it makes no sense for them to have to consult with ignorant patients or treat their AI psychosis on top of their disease. It doesn't increase patient autonomy any more than adding a steering wheel for child car seats would help toddlers learn to drive.
Explaining diagnosis and treatment recommendations decisions inherently involves explaining away the alternatives. In this world where patients are ultimately responsible for our own care, explaining your rationale is a straightforward part of the job - otherwise there is nothing for patients to base their decisions on apart from how the options make them feel. If visits haven't been allotted enough time to get the job done, then that is something you need to take up with health plan bureaucrats rather than taking it out on patients.
It’s funny every profession deals with customers making their own guesses at diagnosis. I told my mechanic the film flam is broken but he said it was the rim ram. He fixed it and we all went in with our lives. But doctors insist on this God like status so it’s a “nightmare” when patients try to help themselves.
I dunno man, it's one thing to have your car still be broken because you were wrong, it's a different thing poison yourself on the basis of having done your own research. The mechanic can laugh at you, it hits a doctor differently.
[deleted]
you are literally taking sleeping pills ..
Nightmare because they're always right and the A.I second guessing is always wrong, or because they just don't like to be second guessed?
Well it was a nightmare for my mother's do-nothing GP surgery in the UK. She had several conditions which were being handled completely separately without central coordination, and her health was in serious decline. We went in with a list of 20 AI-generated questions based on her conditions and treatment (which I was able to screen as I have a bio postgrad, but not medical training), including those related to NICE guidelines and procedure, and, frankly the GP bricked it and ordered a load of new interventions. My mother started to get proper treatment. I wouldn't trust AI to make a diagnosis, but I would absolutely trust it to notice where procedure hasn't been correctly followed, where a treatment is counter-indicated because someone has missed a line on a health record, or where there's a clear potential alternate diagnosis which has been missed for spurious reasons. Also, unfortunately, where doctors aren't doing a decent job - often because they're overworked or underfunded.
There’s more than two options here. It was already difficult to deal with self diagnosis for doctors, now we have a machine that outputs recommendations, and does it with confidence whether it’s correct or not. The same issues that were present with search-engine self diagnosis are still present with LLMs. If you provide Google with an incomplete list of symptoms and can’t interpret the information you find correctly, you will likely get an incorrect diagnosis. The same is true for LLM output.
Nightmare because users approach LLMs with the false confidence that they're always right, and present LLM outputs as fact to Doctors who have to waste time explaining that it's wrong most of the time. It hurts more than it helps.
Its a nightmare because it erodes trust. Doctors are not "always right" which is why "always get a second opinion" is codified in culture. But AI's problem is that its completely full of shit, sometimes, and the people most qualified to evaluate whether its full of shit are the doctors, not the patients, but just like OP's original article, patients are left feeling like their second opinion from AI might be more trustworthy than their doctors opinion.
Nightmare because the AI is just generating a random text that fits the question.
This is obviously going to happen. But sub-par and sloppy doctors are a thing too. Medicine has been using semi-intelligent systems for years that were nevertheless found to improve outcomes. We need studies that quantify error rates from each source type, then we need to account for the fact that the artificial type will keep improving.
Indeed. I don’t even get what OP thinks they are getting out of this other than doubt.
People should've googled their symptoms and especially the prescriptions they got. It has always been a good practice. If[0] AI proves to be the new google then people should ask AI too. [0]: IF.
Do you know how many life threatening illnesses I’ve diagnosed myself with by googling symptoms?
It can be helpful in your understanding the choices made by asking questions and thus in reassurance, but it requires something most people lack: understanding you are likely wrong since you are just collecting information without understanding it. Pretty much the like most manager these days, so I understand the frustration of the GPs.
And say it's true because the AI said so.
It's so much worse than some Google results: people see LLMs as a trusted friend who never talks back and never questions you, who is excellent at convincingly communicating their bs, reeling you in with "tell me more so I can really lock this down", continuing to fool you A con artist, a fraud
No, this flow is actually very good. Like any domain, when you have questions or need a solution, you make research first, then you ask a specialist. If you explain well the symptoms and context you can have proper advices and then decide on the path next: Case A) It looks benign and advices / information that you collected seem reasonable, then you go your way. Case B) You need second opinion of a specialist because the subject is too complex, or there are medications that you need approval. Once you have challenged LLMs, and read about the topics over and over then you genuinely become really good at understanding it (especially if you triangulate over LLMs and ask them to challenge, you start to have genuine questions). No matter if the answer is right or wrong, you have elements. Maybe you missed the point, but you come prepared. At home you have the time to assess the options, pros and cons of each approaches, the possible questions to ask and then challenge the doctor. Shared decision-making is an actual evidence-based model of care, and patients who arrive understanding their condition and carrying specific questions tend to get better attention and better outcomes. Some doctors get annoyed, because they have big ego and choose to be patronizing, but it is exactly their job to answer such questions. With LLMs, it's quite good, you get nuanced and rather useful answers. Before LLMs, no matter the topic you searched for, the answer was the same: "you have cancer / an [obviously deadly] rare disease" The other problem, in many places: • The doctors are not affordable • They are too busy for you (< 15 minutes) • You may need to wait months to get an appointment • They are not good (country-side is an example, and sometimes even country-level) + you can have all of these factors together. So, you have something deeply bothering you, your only appointment is in 4 months. It would be insane not to take the time to explore different solutions and not to come informed about the topic. If you express your prompt properly and do not rely on imagery, you can absolutely have top-tier advices.
Agreed. This gets worse in cultures in which Doctors have no habit or haven't been trained that educating the patient is part of the job. Whenever I am back to my birth country, I specifically avoid doctors that are older than mid 30s, because they all have the same, terrible bed manner. They might be good at diagnosing and treating, but they never, ever explain anything, even when asked. Some even have "helpful pamphlets" to hand to the patient - anything to avoid explaining. It seems that in their view their job is not helping the patient, but completing a task - running a scan, performing a procedure, administering medicine etc. The human, that is subject of the task, is invisible.
Frustrating post. This gives rightful ammunition to the calls of "LLMs need to be avoided for anything medical". Even though the issue is that they're asking it to interpret images. They need to be avoided for that, but that doesn't say much about their medical accuracy outside of image interpretation. It would already be a huge benefit to 90% of people worldwide if the very first part of most hospital visits would be outsourced to frontier-level LLMs. Yet this kind of misuse just gives the medical industry a stick to beat that idea into the ground. Oh well, I'm sure there will be at least a few countries that will indeed embrace frontier models for initial diagnostic medical purposes. Maybe medical tourism destinations. But it's unfortunate for those who can't afford the trip.
I feel like I'm going nuts. There are other commenters saying this is a good practice they've also done for other injuries. You are saying you are an actual radiologist and immediately clock the problems with its advice. I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful. This is itself alarming to me, but no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information.
(We detached this subthread from https://news.ycombinator.com/item?id=48709121.)
This is the root of AI psychosis. There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks because their fundamental basis is not evidence, it’s belief. It is weirdly religious in a way, because if you were to present contrary evidence (e.g. experts in a field weighing in about how plausible sounding responses are bunk), you would only be told you don’t believe enough in the long term potential and capabilities. Don’t get me wrong, I think we all agree capabilities will eventually improve (and farther-future capabilities could reasonably surpass experts), but really is unclear if the current transformer architectures with their probabilistic/hallucinatory outputs will plateau before they surpass current experts abilities in all promised fields.
I was a very early adopter in my circles with AI and I shared it with many people. Strangely, I seem to be the most skeptical about AI in my circles as well, but because I was the gateway for a many folks, they want to come back and share their experiences with me. And it's so much like listening to someone in a church congregation sharing their experiences with god. Clear and obvious gaps are hand-waved away exactly how you're describing.
>This is the root of AI psychosis. There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks because their fundamental basis is not evidence, it’s belief. Treating it as if it is an intelligence is the problem. The problem is that AI psychosis is fundamentally the belief that an LLM is "thinking" at all. Outputs are just believable word vomit which resembles factual information.
I _believe_ the term "AI Psychosis" is a "thought-terminating cliché" that readily puts you in a position to disarm any criticism to your point of view, which, if you're aware of it or not, it's a belief in and of itself. I'm more willing to bet your can't have discussions because you're trying to have debates. But on your actual point, I don't think AI needs to "surpass current experts abilities in all promised fields" as a marker of its ability. The immediate gains has already shown some remarkable promise and more LLMs should have had safeguards around mental health up front. If I were to put it on a scale, I would say it is net positive long term with a strong negative spike up front which was somewhat preventable. But who knows, maybe is just have "AI Psychosis" and you can easily dismiss me.
I don’t think they will improve, there is too much incentive to poison the datasets going forward. A lot of the models up to this point have been benefitted - like Google did - from essentially ‘pre SEO’ internet. Now the same tools are being used to generate nigh infinite good sounding bullshit, which poisons the dataset in all sorts of hard to detect ways. To add insult to injury, the human experts are also not as. Naive, and have many incentives to poison their own input in subtle ways too.
Human expertise is also improving all the time and not limited to just connecting dots. When AI seems to surpass a particular human, it's just because the human lacks broader knowledge and fails to investigate further. An expert already knows they don't know everything. That was never the point. Critical thinking cannot be delegated to AI any more than it can be delegated to a book. There is nothing new going on here.
[deleted]
> There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks Do you think it is any more possible to have a proper discussion with someone who preemptively paints the other person as mentally ill? Or someone who preemptively victimizes themselves? Cause I don't think these are the hallmarks of an honest discussion. See also the entire past decade of political discourse. Like, consider this: > It is weirdly religious in a way, because if you were to present contrary evidence (e.g. experts in a field weighing in about how plausible sounding responses are bunk), you would only be told you don’t believe enough in the long term potential and capabilities. A trivial counter to this is that you can just be an expert at something (e.g. your own work), use the damn thing yourself (professionally), and evaluate the outcomes for yourself. Then maybe remark "LLM good". Now you come and remark "LLM bad", and point at random "evidence", either of outright other workloads, or even the one at hand: you're asking someone to reject the reality they've already experienced, entirely based on the assumption that they're "merely religious" or "in psychosis". You tell me if that's any more epistemically rigorous and sensible than their story.
Why is it psychosis and not lower standards? While I can understand being skeptical of non-experts' claims that such answers are enough, I don't understand why you call it "psychosis" and not simply naivety or lack of expertise. At the same time, the new so-called "models" haven't been pure transformer-based LLMs, but entire systems with tools (with access to the Internet), data storage, and the options to trigger additional instances for different tasks.
Totally agree. I'm a scientist, and like most scientists I have some specialized skills that most of my colleages don't. AI has empowered them to learn and build things that they might have otherwise needed me for. But there have been quite a few cases where it led them very far down a wrong path. This has started happening way more often in the last few months.* We've known since the beginning that AIs confidently say incorrect things. But now that they can speak confidently about very complex topics, and mostly say correct things, we are letting our guard down and lots of subtle falsehoods are slipping through. *In one case, I was able to put things back on track because the AI suggested my colleague talk to me; somehow it figured out we were co-workers.
Right but hallucination rates have been consistently decreasing every model iteration. It's about error rates. As also a fellow scientist, I also will mess something up. Humans have an error rate. Once that error rate is low enough, it doesn't matter that it's > 0, it matters that it's low enough to be trustworthy and useful. Coding agents of 2024-25 had error rates too large; you couldn't meaningfully vibe code anything and needed a ton of oversight. It's still true but FAR less so, and this is after like a year of iteration.
>very far down the wrong path. Absolutely agree. Have seen this first hand
I see your argument, but it's not exactly news that an expert found a flaw in a popular tool. You could say the same about Wikipedia--experts have tons of issues with it, but Wikipedia still provides value to non-experts. The most likely alternative to Wikipedia for non-experts is simply not trying to learn anything new. Similarly with LLMs, you can't just write them off entirely because they sometimes provide misleading or incorrect advice. The positive utility maximizing view is to learn when you need to call in an expert. I recently moved in to a new house and have used Claude extensively to figure out basic things (e.g., adjusting the garage door height, how to mount a TV). However, when the HVAC suddenly stopped working, I gave Claude a shot for an hour and tried some non-destructive fixes, but then realized I had to call in an HVAC expert.
The free alternative to Wikipedia is the library, not “don’t learn anything new ever”. I find Claude is surprisingly similar to a confident but incorrect coworker, with the benefit that Claude will reevaluate when I correct it.
Slightly OT Nitpick: in regard to experts and Wikipedia, when doing a neuroscience-adjacent MSc, experts in the field actually directed me to Wikipedia as an excellent source for high-level neuroanatomy, including recent research, so I'm not sure your blanket description about experts and Wikipedia is correct.
You 100% can write them off entirely and go about your business as you previously had done. Ignoring the errors, it is very debatable whether there are even productivity gains beyond: human programmer or whatever is excited and cranked up to unsustainable degrees of activity and thinking to 'keep up' with what he thinks is an AI doing the work. I'm seeing this fairly often and when it isn't garbage it's a capable person who has gotten inspired by their 'collaboration' in which the busywork is being done by a machine, but they're doing so much directing and correcting that it's not unlike what would happen if they got heavy into meth and went on a tear. You absolutely can write them off entirely and decide for yourself what your comfort level of human-killing speed-freakism you want to pursue in your productivity. There's a long history of humans managing astonishing levels of productivity through self-destructive means. This is not even cheaper, once the 'first one's free' wears off: it's just a novel method of getting humans to burn themselves harder in the belief that they have a magic feather. The ones who're really throwing themselves into the situation are the ones who'll burn out, but who aren't setting themselves up for atrophy and learned helplessness. Anyone who believes the technology lets them be a lazy manager just getting paid, is in for an unpleasant discovery.
> Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading Yes, this is exactly so. AI is able to confidently sound plausible enough to convince laypersons or anyone who isn't very familiar with the subject matter, which is a big part of the mass-appeal "magic" of ChatGPT and other similar tools. It's like having a know-it-all friend (who also makes shit up to bridge their own knowledge gaps). In many non-advanced non-specialized situations, AI is right enough to be at best useful or at worst not harmful (usually landing in the middle somewhere). But speaking for myself, in areas where I consider myself quite proficient, I can very easily spot the subtle inconsistencies and naive conclusions that AI responses provide, and I have to guide/steer/correct it a lot to get good results when the subject matter is complex enough.
Last week I went to a highly-specialized tertiary clinic about further treatment for a rare medical condition that I was diagnosed and treated for as a child. The two very specialized doctors I met there confirmed a diagnostic mistake that a specialist had made ten years ago. The only reason I pursued a second opinion, ten years later, was because Google Gemini had explained to me that the specialist ten years ago had performed the wrong type of test for my condition. Do these LLMs make mistakes? They sure do, I see it all the time. But they can also help people make breakthroughs. And this isn't the only time that Gemini has helped me diagnose long-term health issues, either. I am not advocating to trust anything they say blindly, but they can be a great place to form new hypotheses and learn the right terms to look for when you are unfamiliar with a subject.
Can you elaborate on how you use Gemini to diagnose long term health issues? Considering doing the same for myself, but I have no idea what is too much vs too little information, and generally the type of prompt engineering to do.
I may be missing something, but I think it's unclear that the parent poster here is necessarily actually contradicting anything the AI said. It may depend on the exact information the OP wrote to Claude and GPT. The full transcripts would be needed. (Though there is definitely a separate point that a doctor would generally better know all the right questions to ask, while current LLMs may be making certain assumptions.) The LLM may have, from its "perspective", implicitly thought the OP was telling it that he had strong reason to believe there was no calcification and was not considering the bigger picture of possibly receiving an incomplete/poor assessment from the medical staff. In fact, the issue here may be the LLM overly trusting doctors vs. trusting its own expertise.
> no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information "Be wowed by the convenience and speed", or merely "take advantage of the mere availability"? What most people find to be damning about expert advice is that they simply can't get it anywhere, at any cost that they can afford.
So if you want to do a surgery but you don’t see any surgeons around you ask a grocery butcher to have his way?
Seems natural enough. There will always be complexity and nuance that is missed by an AI model or person - the world is just super detailed. The more expertise you have the more you will be aware of that nuance. That doesn't mean the model or person is not useful as a starting point.
I dunno. I know a lot of software engineering experts. AI isn't always right, but neither are the people, and it's getting better and better. Software is one domain where it excels because of structured training data and simulation environments, so I'm well aware it's better here than other areas. Still there's somewhere balanced between saying every time it's "insufficient or incomplete or outright misleading" and "just trust AI". AI's a useful source of information/reasoning/research, but know you need to validate it's answers for important decisions.
> I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful. I always recommend people try asking LLMs a lot of questions on something they know first. Programmers should start by asking LLMs to work on a codebase they’re familiar with first. You’re overstating the problem, though. Even for an expert the LLM will get a lot of things right and can be helpful under a watchful eye. The real problem is knowing how to identify when it’s on the right track and when you need to correct it, because both cases are presented with the same tone and confidence. An expert can better identify when the LLM output doesn’t sound plausible. Someone unfamiliar with the topic will think everything it says looks correct.
You're not. This site was also bullish on using LLMs as therapists, which defeats the very point of them, and reflects a lack of knowledge on what exactly therapists do for people. More on topic: if the article's author arrived at a definitively negative result would this have shown up on HN?
On the flip side of this problem, novel best practices lag the medical standard of care, other human failures like corruption and competing priorities notwithstanding. For example, we had to advocate for certain practices during the birth of our first child that became routine during our second several years later. So, neither side is guaranteed correct, doctor or citizen researcher (which did not include LLMs in my case, for the record). The truest answer is also the most useless one, applicable to all fields: it depends. The real question is: if you embrace being a layman, whom do you trust more: LLMs/the internet or experts, like doctors? I think the answer is pretty clearly experts.
You shouldn’t expect frontier models to work on medical imaging. There is much more that goes into building a medical imaging product. First and foremost is data. Medical imaging datasets are not prevalent one the public internet at the scale necessary to have good performance on medical imaging tasks especially MRI. Also the labels are super noisy. This is completely different than asking for general medical reasoning which is more derived from papers, public standards and textbooks. Text exists at the right scale but images don’t.
No, not anytime someone is an actual expert at anything, AI output appears insufficient. That is why experts in various fields use AI. Then to say "Aha, but all of that is AI psychosis" makes obviously no sense: Why would we trust experts when they offer critique but not when they say "this is helpful"? Overall: People are not insane. AI makes mistakes and, often, fails completely. AI also helps them do things better, quicker, increasingly so. The jaggedness of AI is confusing and real.
How many times have you seen an expert go "yeah these results are good consistently enough for a non expert to trust them without expert assistance"? There is a huge difference between having a chance of a good result, which can be useful for experts able to filter out the bullshit, and consistent success. I would generate code as a helper, I would never allow a guy from marketing to merge unreviewed AI code.
I’ve never seen an expert use AI in their field beyond the initial ‘oh interesting’ stage.
This is a serious issue for young people I think. I have seen outputs that look good but the actual content is bad. If you’re inexperienced in a field you can’t see it because AI makes anything look right. I have gotten very good results with AI but you can’t take the first answer at face value. You need to be suspicious and challenging until you tweak out the right answer over time.
The question is how far is AI off compared to the professional that we have access to. World best experts are not accessible to most of us. :(
[deleted]
Well that's part of the problem. AI is not accountable - if you take its advice and hurt yourself, who is responsible? A real doctor is accountable. They might both "know" a lot of things but implicitly the party who is accountable is going to be more trustworthy. And I don't see that going away until AI companies must be licensed for application x and can lose their license / be sued if engaging in malpractice.
>I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading media is awash at the moment with experts chiming in to support AI, saying their fields are being revolutionized, etc. it seems unsurprising to me that the laymen opinion would follow the loudest media trumpets.
How do LLMs get information from images? Do they have to run essentially the opposite of an image generation model, taking an image and converting it into a description? I'm just concerned that the description wouldn't be able to encapsulate the information needed to differentiate exactly what is wrong with a shoulder. The image -> text model would need to know what it should actually report back to the LLM about the image, so that it doesn't just say "this is an MRI of a shoulder" or similar. It would be like a layperson describing a bridge, and asking an engineer if the bridge is safe based on that description
older vision LLMs chopped up images into patches which were projected into the same embedding token space as words. Newer ones use an encoder to more efficiently project an image into token space. Then it runs through the same attention layers as the text component.
No, it does not work like that, it actually can process the image itself there is not an intermediate image to text step
How does a Large Language Model process images then?
As someone who has had shoulder issues for the last 25 years or so, including partial tendon tears, I can tell you that even if your tendon would have been damaged, the treatment would have been strange. With moderately damaged tendons, you want: 1. stop any inflammation, by taking NSAIDs for a few days 2. detect and correct any behavioral patterns that could have caused the presumed overwear of the tendon 2. start physiotherapy to strengthen those muscles that can take over the load from the damaged tendon These are not quick fixes, because quick fixes don't exist here. Stuff like shockwave treatment, massages etc will only lessen the problems for a few hours at most, after which they will come back.
> My hope is that in a couple of model generations, we'll trust AI to review MRIs the way we trust it to proofread our emails. https://www.nature.com/articles/d41586-026-01947-1 I've started asking my doctors whether they use AI, and if they say yes look for another one.
That study seems to be confounding factors and rushing to a questionable conclusion. A very plausible explanation for the adenoma detection rate to have gone down is simply that its prevalence went down among the population in the second three-month period. This was not a randomized trial. Concluding that "AI usage degrades physicians' skills" is questionable at the very least.
There's a whole bunch of other studies on this topic, as well as metastudies, and from what I can tell the problem is real. https://www.sciencedirect.com/science/article/pii/S245195882... (+ cf. its references)
I don’t even trust AI to proofread my emails.
You should always be getting a second or third opinion from real doctors for matters like surgeries, radiology, etc. One doctor diagnosis + LLM is gonna throw you off. You need more datapoints.
In the US, this is standard advice. I note that the OP is in Germany. Maybe they do things differently, there.
The OP describes getting injected with a homeopathic botanical formulation and receiving another type of therapy that wasn’t indicated for his condition. I wonder if this person was going to a traditional doctor or if they were visiting some type of specialty clinic as a second opinion. For most conditions you can find specialty clinics that will prescribe and administer (and bill for) a lot of non-indicated treatments, but some patients like being in the care of doctors who take action and do things after being recommended more conservative treatments by primary doctors.
In Germany we get zero-th opinion because you can't even get an appointment within the next 8 months.
[deleted]