Knowledge graph system for Jin San Zhen acupuncture therapy evaluationCan AI Safely Guide a Centuries-Old Acupuncture Method?

Frontiers in Medicine Published April 13, 2026 Study authors: Junjie Chen, Minting Luo, Jianchao Chen, Guangming Luo, Genxin Li, Chubin Lei, Dongjing Chen, Jin Yu… DOI ↗ Editorial oversight: Dr. Amelia Tan, PhD · Internal Medicine & Chronic Disease

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway

Note that system performance metrics do not equate to clinical efficacy of Jin San Zhen acupuncture therapy.

This study involved the construction and evaluation of a knowledge graph and large model question-answering system specifically for Jin San Zhen acupuncture therapy. The system was developed using data derived from 191 clinical studies and 4 authoritative monographs. Evaluation included intrinsic assessment of acupoints and testing of a question-answering system using a hybrid model.

The intrinsic evaluation of the knowledge graph yielded an F1 score of 0.952 for main acupoints and 0.859 for auxiliary acupoints. For the question-answering system, the KG+LLM hybrid model achieved a mean correctness score of 5.00, a professionalism score of 5.00, and a completeness score of 4.40. These scores were significantly superior to other models, with p < 0.01 for correctness and professionalism.

Safety and tolerability data were not reported. The study design was a tool development and evaluation study, not a clinical trial. Consequently, the results demonstrate the technical performance of the system rather than the clinical efficacy of Jin San Zhen acupuncture therapy for any specific condition.

Key limitations include the lack of prospective studies in real-world clinical settings to evaluate the actual impact on decision quality and patient outcomes. The system has potential utility as an auxiliary tool for primary care and general practitioners to rapidly access information on Jin San Zhen, perform evidence integration, and support teaching.

A centuries-old therapy meets a brand-new problem

A young doctor in a rural Chinese clinic wants to try Jin San Zhen for a patient's stroke recovery.

Jin San Zhen is a classical Chinese acupuncture system (its name means "Jin's Three Needles") developed in the Lingnan region. It uses small sets of three points for specific conditions.

She opens a chatbot. It gives her a confident answer. But she cannot tell if the answer came from real research, a textbook, or thin air.

Traditional Chinese Medicine is used by hundreds of millions of people. Jin San Zhen in particular is popular for nerve problems, chronic pain, and some internal conditions.

The catch is that its rules are scattered across journal articles, old textbooks, and teacher-to-student notes. No one has stitched them into one organized reference.

Meanwhile, general AI chatbots "hallucinate," meaning they confidently generate wrong information when they do not know something. In medicine, that is dangerous.

The old way versus the new approach

The old way was simple and slow. Clinicians flipped through books, searched Chinese medical databases, or asked senior teachers. That took hours.

Letting a generic chatbot answer was fast but risky. It might invent a point combination or misname a condition.

Here is the twist. The research team built a bridge between the two.

Think of it like a librarian with a map

Picture a huge library with no card catalog. You ask the librarian a question.

A plain chatbot is a librarian who makes up plausible answers when she cannot find the book.

A knowledge graph is a detailed map of every shelf, every chapter, and how every idea connects. When the librarian uses that map first, then explains in plain words, she stops guessing.

That is the setup here. A knowledge graph (a structured network of facts, "disease X is treated with point combo Y") feeds the AI before it writes its reply.

How they built it

The team pulled 191 high-quality clinical studies from the last decade plus four authoritative Jin San Zhen textbooks.

They defined categories like diseases, acupoints, point combinations, and treatment plans. Then a language model called Qwen3-MAX read the papers and pulled out the facts. Humans double-checked the work.

The result was stored in Neo4j, a database designed for maps of relationships. Final tally: 921 nodes (facts) and 3,745 connections between them.

What they tested

The researchers built 60 evaluation questions covering common clinical scenarios and major disease categories.

Three systems tried to answer: a plain chatbot with no graph, a graph-plus-AI system using strict templates, and a smarter "hybrid" version that could flex when needed.

Two expert acupuncturists graded each answer blindly on correctness, professionalism, and completeness. Scale of 1 to 5.

The results were clear

The hybrid graph-plus-AI system scored near-perfect. Correctness averaged 5.00 out of 5. Professionalism 5.00. Completeness 4.40.

The plain chatbot, by comparison, averaged 4.05 for correctness and 3.65 for professionalism.

The biggest win may not be the scores. Across all 60 questions from both graph-powered systems, experts found zero fabricated claims. Every answer could be traced back to the original paper or textbook it came from.

A chatbot that cites its sources is a very different tool from one that does not.

Where this fits in the bigger picture

AI in medicine is moving fast, but trust is the bottleneck. Doctors will not use a tool that might quietly invent a drug dose or a nerve location.

This study joins a broader wave of work called "retrieval-augmented generation," or RAG. The idea is simple. Do not let the AI answer from memory. Make it look up the answer first, then explain.

For traditional medicine specifically, that matters even more. TCM knowledge often lives in sources that Western-trained chatbots barely see. A curated graph puts that knowledge back on equal footing.

If you use Jin San Zhen acupuncture or are curious about it, this system is not yet a consumer app. It was built as a reference for clinicians and students in China.

But the pattern is spreading. Expect "grounded" AI assistants, ones that cite sources and show their reasoning, to show up in patient-facing tools over the next few years.

For now, if an AI gives you a medical answer with no source, treat it like a rumor. Ask where the information came from.

Honest limitations

Sixty questions graded by two experts is a solid start but a small test. Real clinics see thousands of variations.

The graph only covers Jin San Zhen, not all TCM. And the quality of the answers depends entirely on the quality of the underlying papers. Weak studies in equals weak answers out.

The study also did not measure whether better answers lead to better patient outcomes. That is the question that really counts.

The team plans prospective studies in real clinics. The goal is to measure whether doctors using the tool make better decisions and whether patients do better.

If those results hold up, the same recipe could be used for other traditional medicine systems, from Ayurveda to Kampo. A careful map plus a careful AI could finally make old knowledge searchable without distorting it.

Study Details

Study typeCohort

EvidenceLevel 3

PublishedApr 2026

View Original Abstract ↓

Jin San Zhen acupuncture therapy is a classical Traditional Chinese Medicine (TCM) school originating from the Lingnan region of China. It is widely used in China for central nervous system diseases, internal medical conditions, and various pain disorders, benefiting a large number of patients. However, the related clinical evidence and expert experience are scattered across journal articles and monographs, without systematic curation or structured presentation, making it difficult for frontline clinicians and trainees to access in a timely manner. Although general-purpose large language models (LLMs) can generate answers, they are prone to “hallucinations” and lack traceable evidence-based support. Based on Chinese clinical research literature and authoritative monographs from the past decade, this study aimed to construct a Knowledge Graph (KG) for Jin San Zhen and to develop an intelligent question–answering (QA) system that combines the KG with LLMs to answer clinical and educational questions related to Jin San Zhen. We searched Chinese databases such as China National Knowledge Infrastructure (CNKI), Wanfang, and CQVIP for clinical studies published between 2016 and 2025 in which Jin San Zhen was the main intervention, and incorporated information on point combinations and clinical practice from four authoritative monographs. Following a PRISMA-style selection process (905 initial records → 416 after deduplication → 191 included studies) we designed an ontology comprising seven entity types (diseases, acupoints, acupoint combinations, treatment plans, etc.) and nine relation types. We used the Qwen3-MAX LLM for information extraction, supplemented by manual verification, and ultimately constructed the KG in Neo4j. We evaluated the KG intrinsically using Precision, Recall, and F1 metrics against a human-annotated gold standard derived from stratified sampling (n = 149 treatment plans from N = 298, 95% confidence level, 5% margin of error), with inter-annotator agreement assessed on 81 overlapping annotations. We then designed a retrieval-augmented generation (RAG) workflow, in which user queries are parsed by an LLM into a limited set of query types, Cipher templates are used to retrieve the graph, structured records are returned, and the LLM generates natural language answers that can be traced back to the original literature. We described the scale and characteristics of the KG using node and relation statistics, and developed 60 evaluation questions covering common query types and major disease categories. Two TCM acupuncture experts were invited to rate, under a double-blind design, the answers produced by three systems—a “KG+LLM template model,” a “KG+LLM hybrid model” incorporating fuzzy entity matching and enhanced retrieval strategies, and an “LLM-only model”—on three dimensions (correctness, professionalism, and completeness) using a 1–5 scale. Paired t-tests were used to compare differences across all pairwise model combinations. The final KG contained 921 nodes and 3,745 relations, including more than 80 diseases, over 360 standardized acupoints, 55 core acupoint combinations, and 298 treatment plans, systematically representing the “disease–plan–acupoint” relationships and efficacy characteristics of Jin San Zhen. Intrinsic evaluation showed that the KG achieved post-refinement F1 scores of 0.952 for main acupoints (P = 0.959, R = 0.949) and 0.859 for auxiliary acupoints (P = 0.984, R = 0.858), with inter-annotator F1 of 0.991 and 0.999, respectively. Across 60 evaluation questions, the KG ± LLM hybrid model achieved the highest mean scores on all three dimensions (correctness: 5.00; professionalism: 5.00; completeness: 4.40), significantly outperforming both the KG ± LLM template model (4.75, 4.77, 4.03) and the LLM-only model (4.05, 3.65, 4.12; all pairwise comparisons p < 0.01). Notably, the hybrid model resolved the completeness limitation observed in the template-based approach, while both KG-enhanced systems produced answers fully traceable to source literature across all 60 questions, with no fabricated claims detected by expert reviewers. Expert feedback indicated that the hybrid model's layered presentation—distinguishing high-confidence graph-derived content from supplementary general knowledge—provides particularly strong clinical reference value. Compared with a general-purpose LLM, the Jin San Zhen knowledge-graph–based QA system—particularly with the tiered confidence generation strategy—markedly improves the accuracy, professionalism, and completeness of answers while providing traceable evidence with explicit confidence labeling. The system thus has the potential to serve as an auxiliary tool for primary care and general practitioners to rapidly access information on Jin San Zhen, perform evidence integration, and support teaching. Future prospective studies in real-world clinical settings are needed to evaluate its actual impact on decision quality and patient outcomes.

Knowledge graph system for Jin San Zhen acupuncture therapy evaluationCan AI Safely Guide a Centuries-Old Acupuncture Method?

A centuries-old therapy meets a brand-new problem

The old way versus the new approach

Think of it like a librarian with a map

How they built it

What they tested

The results were clear

Where this fits in the bigger picture

Honest limitations

Study Details

Biomarkers and clinical factors show statistical associations with sepsis-induced coagulopathy in 37,459 patients

Your Blood May Signal a Deadly Clotting Crisis Before It Strikes

Clinical research that matters. Delivered to your inbox.

Knowledge graph system for Jin San Zhen acupuncture therapy evaluationCan AI Safely Guide a Centuries-Old Acupuncture Method?

A centuries-old therapy meets a brand-new problem

The old way versus the new approach

Think of it like a librarian with a map

How they built it

What they tested

The results were clear

Where this fits in the bigger picture

Honest limitations

Study Details

Biomarkers and clinical factors show statistical associations with sepsis-induced coagulopathy in 37,459 patients

Your Blood May Signal a Deadly Clotting Crisis Before It Strikes

Clinical research that matters. Delivered to your inbox.

Related in Hematology

From Other Specialties