A centuries-old therapy meets a brand-new problem
A young doctor in a rural Chinese clinic wants to try Jin San Zhen for a patient's stroke recovery.
Jin San Zhen is a classical Chinese acupuncture system (its name means "Jin's Three Needles") developed in the Lingnan region. It uses small sets of three points for specific conditions.
She opens a chatbot. It gives her a confident answer. But she cannot tell if the answer came from real research, a textbook, or thin air.
Traditional Chinese Medicine is used by hundreds of millions of people. Jin San Zhen in particular is popular for nerve problems, chronic pain, and some internal conditions.
The catch is that its rules are scattered across journal articles, old textbooks, and teacher-to-student notes. No one has stitched them into one organized reference.
Meanwhile, general AI chatbots "hallucinate," meaning they confidently generate wrong information when they do not know something. In medicine, that is dangerous.
The old way versus the new approach
The old way was simple and slow. Clinicians flipped through books, searched Chinese medical databases, or asked senior teachers. That took hours.
Letting a generic chatbot answer was fast but risky. It might invent a point combination or misname a condition.
Here is the twist. The research team built a bridge between the two.
Think of it like a librarian with a map
Picture a huge library with no card catalog. You ask the librarian a question.
A plain chatbot is a librarian who makes up plausible answers when she cannot find the book.
A knowledge graph is a detailed map of every shelf, every chapter, and how every idea connects. When the librarian uses that map first, then explains in plain words, she stops guessing.
That is the setup here. A knowledge graph (a structured network of facts, "disease X is treated with point combo Y") feeds the AI before it writes its reply.
How they built it
The team pulled 191 high-quality clinical studies from the last decade plus four authoritative Jin San Zhen textbooks.
They defined categories like diseases, acupoints, point combinations, and treatment plans. Then a language model called Qwen3-MAX read the papers and pulled out the facts. Humans double-checked the work.
The result was stored in Neo4j, a database designed for maps of relationships. Final tally: 921 nodes (facts) and 3,745 connections between them.
What they tested
The researchers built 60 evaluation questions covering common clinical scenarios and major disease categories.
Three systems tried to answer: a plain chatbot with no graph, a graph-plus-AI system using strict templates, and a smarter "hybrid" version that could flex when needed.
Two expert acupuncturists graded each answer blindly on correctness, professionalism, and completeness. Scale of 1 to 5.
The results were clear
The hybrid graph-plus-AI system scored near-perfect. Correctness averaged 5.00 out of 5. Professionalism 5.00. Completeness 4.40.
The plain chatbot, by comparison, averaged 4.05 for correctness and 3.65 for professionalism.
The biggest win may not be the scores. Across all 60 questions from both graph-powered systems, experts found zero fabricated claims. Every answer could be traced back to the original paper or textbook it came from.
A chatbot that cites its sources is a very different tool from one that does not.
Where this fits in the bigger picture
AI in medicine is moving fast, but trust is the bottleneck. Doctors will not use a tool that might quietly invent a drug dose or a nerve location.
This study joins a broader wave of work called "retrieval-augmented generation," or RAG. The idea is simple. Do not let the AI answer from memory. Make it look up the answer first, then explain.
For traditional medicine specifically, that matters even more. TCM knowledge often lives in sources that Western-trained chatbots barely see. A curated graph puts that knowledge back on equal footing.
If you use Jin San Zhen acupuncture or are curious about it, this system is not yet a consumer app. It was built as a reference for clinicians and students in China.
But the pattern is spreading. Expect "grounded" AI assistants, ones that cite sources and show their reasoning, to show up in patient-facing tools over the next few years.
For now, if an AI gives you a medical answer with no source, treat it like a rumor. Ask where the information came from.
Honest limitations
Sixty questions graded by two experts is a solid start but a small test. Real clinics see thousands of variations.
The graph only covers Jin San Zhen, not all TCM. And the quality of the answers depends entirely on the quality of the underlying papers. Weak studies in equals weak answers out.
The study also did not measure whether better answers lead to better patient outcomes. That is the question that really counts.
The team plans prospective studies in real clinics. The goal is to measure whether doctors using the tool make better decisions and whether patients do better.
If those results hold up, the same recipe could be used for other traditional medicine systems, from Ayurveda to Kampo. A careful map plus a careful AI could finally make old knowledge searchable without distorting it.