Mode
Text Size
Log in / Sign up

Loon Lens outperforms Catchii for title and abstract screening in Parkinson Disease researchAgentic AI Tools Show Promise for Screening Parkinson Disease Research

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway
Note that Loon Lens outperformed Catchii for title and abstract screening compared to human reference standards.

This systematic review evaluates the performance of agentic AI tools, including Loon Lens and Catchii, for title and abstract screening in Parkinson Disease research. The study specifically compares these AI systems against human screening as a reference standard to determine accuracy and reliability during the initial stages of literature review.

Key findings indicate that Loon Lens outperformed Catchii when humans served as the reference standard for title and abstract screening. Analysis of full-text screening showed that most disagreements occurred on articles included by Loon Lens but excluded by human screeners. Furthermore, a correlation was found between confidence scores and agreement; higher confidence scores were associated with lower odds of disagreement between Loon Lens and human screeners.

The authors note limitations including the rapidly evolving nature of AI technology and varying performance across different tools. They caution that metrics such as concordance and NPV can be inflated by imbalances in screening results. Clinical relevance suggests researchers should pilot test chosen tools at the start of each review, utilizing sensitivity, kappa, and F1 scores as primary performance indicators.

Researchers evaluated how well different artificial intelligence (AI) tools could screen titles and abstracts to find relevant information about Parkinson disease. They compared two specific tools, Loon Lens and Catchii, against human screening as the gold standard. The study aimed to see if these technologies could help scientists manage large amounts of medical data more efficiently.

The results showed that Loon Lens performed better than Catchii when compared to human reviewers. When looking at full-text articles, most disagreements between the AI and humans happened on papers the AI included but humans left out. The study also found that when the AI gave a high confidence score, it was much more likely to agree with the human experts.

Because AI technology is changing very quickly, these results are not yet a standard for every situation. Different tools perform differently depending on the specific task. Researchers should still test any AI tool thoroughly before using it in practice. These findings show that while AI can be a helpful assistant, it currently works best as a support tool rather than a replacement for human oversight.

What this means for you:
Loon Lens showed better performance than Catchii in screening Parkinson disease research during this study.

Common questions

How do these AI tools compare to each other?

The study found that Loon Lens outperformed Catchii when human screening was used as the reference standard. This means Loon Lens was more accurate at identifying relevant titles and abstracts for Parkinson disease research than the Catchii tool during the evaluation.

Can AI tools completely replace human researchers?

No, these tools are not intended to replace humans entirely. Because AI technology is evolving rapidly and different tools perform differently, researchers are advised to pilot test any chosen tool before using it in a professional setting.

How reliable are the confidence scores given by the AI?

The study found that higher confidence scores from the AI were associated with lower odds of disagreement between the AI and human screeners. This suggests that when the tool is confident, it is more likely to align with human experts.

Study Details

Study typeSystematic review
EvidenceLevel 1
PublishedJul 2026
View Original Abstract ↓
Introduction. Systematic reviews are essential for informing health policy and practice. Artificial intelligence (AI) automates the article screening process and produces time savings, although the performance of AI screening compared to traditional human screening remains uncertain. We undertook this study to compare the performance of two agentic AI tools, namely Loon LensTM and Catchii, to one another and to humans at the title and abstract screening level. We also compared Loon Lens to humans at the full-text screening level. Methods. We developed a de novo research question on the association between any of three ambient air pollutants (carbon monoxide, ozone, nitrogen dioxide) and the onset or worsening of Parkinson disease. A health sciences librarian developed the literature search strategy and we proceded with human screening guided by PRISMA. We uploaded the retrieved citations and the eligibility criteria to both AI tools and compared screening results using sensitivity, specificity, positive predictive value, negative predictive value, concordance, kappa, and F1 score. We compared the calculated performance statistics to those obtained by naive guessing and regressed concordance (agree or disagree with the human reference standard) onto confidence scores provided by Loon Lens, which assigned a confidence level (Very High, High, Medium, or Low) to each of its screening decisions. Human screening was the reference standard against both AI tools and Catchii was the reference standard against Loon Lens. Results. At title and abstract screening, Loon Lens outperformed Catchii when humans were the reference standard. At full-text screening, most disagreements centered around articles Loon Lens included and humans excluded. At both screening levels, higher confidence scores were associated with lower odds of disagreement between Loon Lens and human screeners. Discussion. Given the panoply of available AI screening tools and their differential performance, plus the rapidly evolving nature of AI technology, researchers should pilot test their chosen tool at the start of each review. Sensitivity, kappa, and F1 are the optimal performance statistics to employ, especially at title and abstract screening, where the imbalance between proportions of included and excluded citations can inflate concordance and negative predictive value.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.