Mode
Text Size
Log in / Sign up

Knowledge graph system for Jin San Zhen acupuncture therapy evaluation

Knowledge graph system for Jin San Zhen acupuncture therapy evaluation
Photo by Navy Medicine / Unsplash
Key Takeaway
Note that system performance metrics do not equate to clinical efficacy of Jin San Zhen acupuncture therapy.

This study involved the construction and evaluation of a knowledge graph and large model question-answering system specifically for Jin San Zhen acupuncture therapy. The system was developed using data derived from 191 clinical studies and 4 authoritative monographs. Evaluation included intrinsic assessment of acupoints and testing of a question-answering system using a hybrid model.

The intrinsic evaluation of the knowledge graph yielded an F1 score of 0.952 for main acupoints and 0.859 for auxiliary acupoints. For the question-answering system, the KG+LLM hybrid model achieved a mean correctness score of 5.00, a professionalism score of 5.00, and a completeness score of 4.40. These scores were significantly superior to other models, with p < 0.01 for correctness and professionalism.

Safety and tolerability data were not reported. The study design was a tool development and evaluation study, not a clinical trial. Consequently, the results demonstrate the technical performance of the system rather than the clinical efficacy of Jin San Zhen acupuncture therapy for any specific condition.

Key limitations include the lack of prospective studies in real-world clinical settings to evaluate the actual impact on decision quality and patient outcomes. The system has potential utility as an auxiliary tool for primary care and general practitioners to rapidly access information on Jin San Zhen, perform evidence integration, and support teaching.

Study Details

Study typeCohort
EvidenceLevel 3
PublishedApr 2026
View Original Abstract ↓
Jin San Zhen acupuncture therapy is a classical Traditional Chinese Medicine (TCM) school originating from the Lingnan region of China. It is widely used in China for central nervous system diseases, internal medical conditions, and various pain disorders, benefiting a large number of patients. However, the related clinical evidence and expert experience are scattered across journal articles and monographs, without systematic curation or structured presentation, making it difficult for frontline clinicians and trainees to access in a timely manner. Although general-purpose large language models (LLMs) can generate answers, they are prone to “hallucinations” and lack traceable evidence-based support. Based on Chinese clinical research literature and authoritative monographs from the past decade, this study aimed to construct a Knowledge Graph (KG) for Jin San Zhen and to develop an intelligent question–answering (QA) system that combines the KG with LLMs to answer clinical and educational questions related to Jin San Zhen. We searched Chinese databases such as China National Knowledge Infrastructure (CNKI), Wanfang, and CQVIP for clinical studies published between 2016 and 2025 in which Jin San Zhen was the main intervention, and incorporated information on point combinations and clinical practice from four authoritative monographs. Following a PRISMA-style selection process (905 initial records → 416 after deduplication → 191 included studies) we designed an ontology comprising seven entity types (diseases, acupoints, acupoint combinations, treatment plans, etc.) and nine relation types. We used the Qwen3-MAX LLM for information extraction, supplemented by manual verification, and ultimately constructed the KG in Neo4j. We evaluated the KG intrinsically using Precision, Recall, and F1 metrics against a human-annotated gold standard derived from stratified sampling (n = 149 treatment plans from N = 298, 95% confidence level, 5% margin of error), with inter-annotator agreement assessed on 81 overlapping annotations. We then designed a retrieval-augmented generation (RAG) workflow, in which user queries are parsed by an LLM into a limited set of query types, Cipher templates are used to retrieve the graph, structured records are returned, and the LLM generates natural language answers that can be traced back to the original literature. We described the scale and characteristics of the KG using node and relation statistics, and developed 60 evaluation questions covering common query types and major disease categories. Two TCM acupuncture experts were invited to rate, under a double-blind design, the answers produced by three systems—a “KG+LLM template model,” a “KG+LLM hybrid model” incorporating fuzzy entity matching and enhanced retrieval strategies, and an “LLM-only model”—on three dimensions (correctness, professionalism, and completeness) using a 1–5 scale. Paired t-tests were used to compare differences across all pairwise model combinations. The final KG contained 921 nodes and 3,745 relations, including more than 80 diseases, over 360 standardized acupoints, 55 core acupoint combinations, and 298 treatment plans, systematically representing the “disease–plan–acupoint” relationships and efficacy characteristics of Jin San Zhen. Intrinsic evaluation showed that the KG achieved post-refinement F1 scores of 0.952 for main acupoints (P = 0.959, R = 0.949) and 0.859 for auxiliary acupoints (P = 0.984, R = 0.858), with inter-annotator F1 of 0.991 and 0.999, respectively. Across 60 evaluation questions, the KG ± LLM hybrid model achieved the highest mean scores on all three dimensions (correctness: 5.00; professionalism: 5.00; completeness: 4.40), significantly outperforming both the KG ± LLM template model (4.75, 4.77, 4.03) and the LLM-only model (4.05, 3.65, 4.12; all pairwise comparisons p < 0.01). Notably, the hybrid model resolved the completeness limitation observed in the template-based approach, while both KG-enhanced systems produced answers fully traceable to source literature across all 60 questions, with no fabricated claims detected by expert reviewers. Expert feedback indicated that the hybrid model's layered presentation—distinguishing high-confidence graph-derived content from supplementary general knowledge—provides particularly strong clinical reference value. Compared with a general-purpose LLM, the Jin San Zhen knowledge-graph–based QA system—particularly with the tiered confidence generation strategy—markedly improves the accuracy, professionalism, and completeness of answers while providing traceable evidence with explicit confidence labeling. The system thus has the potential to serve as an auxiliary tool for primary care and general practitioners to rapidly access information on Jin San Zhen, perform evidence integration, and support teaching. Future prospective studies in real-world clinical settings are needed to evaluate its actual impact on decision quality and patient outcomes.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.