Use of Artificial Intelligence in Systematic Literature Review

“AI is the new electricity. It has the potential to transform every industry and to create huge economic value. Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don’t think AI will transform in the next several years” – Andrew Ng

The development of artificial intelligence (AI) and machine learning (ML) in recent years has brought about a revolution in the scientific community with its ability to mimic the human intellect and behavior. While AI is a computer algorithm that can think and act like humans based on what data is being fed into it, ML enables machines to learn from past data or experiences without being explicitly programmed. We foresee a tremendous potential of AI for pharmaceutical industry, particularly in health economics and outcomes research (HEOR) to enable faster decision-making.

A systematic literature review (SLR) follows a transparent and reproducible process to identify and summarize the evidence. An SLR is performed to demonstrate the current state of research on a particular topic, while identifying the gaps and areas for further research. It comprises several time-consuming steps, such as searching and screening for the relevant literature, data extraction, analysis, and proper dissemination of the findings. A reviewer may have to filter the required papers from hundreds or even thousands of papers, and extracting the key information from them is an arduous and error-prone process. Being repetitive in nature, AI could help in simplifying this process.

The attempts to use AI/ML for SLR dates to almost two decades back, with idea of using ML algorithms for SLR as classification techniques to automatically identify the appropriate papers from MEDLINE.[7] Since then, a significant amount of research has been done on inventing and implementing new techniques.

Presently, several semi-automatic AI tools are available for SLR that have been developed to assist the particular step of the SLR process. DistillerSR, Rayyan, PICO Portal, Nested Knowledge and ASReview are few of them designed to help in the screening of the abstracts. They either arrange the articles in the order of most relevant to least relevant based on prediction probabilities or facilitate faster decision-making by highlighting the key words within the abstract. These tools have varying levels of accuracy, recall and workload reduction. Per studies, the use of these tools resulted in a reduction in screening burden of up to 41% and approximately 77% of the screening decisions were accurate.[1-3, 5]

Despite the benefits being reported with the use of these tools, there is still a lack of trust among researchers in SLR automation technologies. Researchers are still in the dilemma of trade-off between benefits and risks of using these tools. On the technical side, some tools require installation and use of Python packages or some computer programming skills that researchers are not familiar with.

Dilemma of trade-off between benefits (i.e., workload and time saving) and risks (i.e., potential to miss relevant records)

Data extraction is the backbone of SLR; it is essential to generate qualitative findings and quantitative estimation. Notably, this is the most time-consuming step and would benefit greatly from automation. However, the main challenge with current AI-based SLR tools is poor performance in data extraction tasks. Even in the AI tools that claim to have fully automated data extraction, data needs to be entered manually into platform by the reviewer. Extracting clinical data implies extracting very specific pieces of information from enormous amounts of text. One of the glitches with automation of data extraction is that different authors may use different terminology and formatting and have different ways of representing the same data, which makes it harder for AI tools trained on a specific type of data to extract data from those papers.[9]

In recent years, the hype is all about large language models (LLMs), including but not limited to ChatGPT. These models are trained on a large dataset of books, documents, web pages etc. and allow them to predict what to say next, as well as summarize the text and generate the text corresponding to a prompt by the user. Certain limitations identified with the use of ChatGPT include using background information for summary, incorrect citations, or not linked to MeSH terms. However, these can be refined and improved in future with the growing power of coding and computing. A recent study showed that ChatGPT performed very well in screening tasks of an SLR as compared to general physicians (GPs). While ChatGPT completed the entire screening process within an hour, GPs took 7-10 days on average. ChatGPT also achieved 95% sensitivity and a negative predictive value of 99%, while also exhibiting workload savings of 40% to 83%.[10]

Retrieval Augmented Generation (RAG) is a technique that allows LLMs to retrieve contextual information from a data source and pass it to the LLM along with the user’s prompt. Providing LLMs with information similar or relevant to the one which the user has requested would help it generate a much better response and help it to improve in tasks like summarizing documents, along with proper prompt engineering.[11] Another feature of LLMs is that they can be adapted to different domains, so an SLR focused LLM would be better at SLR tasks as compared to a model trained on general data. Notably, it is a very fast-growing field, and many more advancements may occur by the time the reader reads this article. Overall, there is a positive outlook on the use of ChatGPT for SLR.

Currently, there is no clear mention of the utilization or endorsement of AI/ML in any available documentation concerning the execution of SLRs for Health Technology Assessment (HTA). While both National Institute for Health and Care Excellence (NICE) and National Center for Pharmacoeconomics (NCPE) anticipate the involvement of two reviewers in the SLR process, they do not explicitly specify whether AI/ML can be considered suitable for one of these roles. Scottish Medicines Consortium (SMC), on the other hand, directs readers to refer to NICE methodologies.

Interestingly, Cochrane, often cited and relied upon by HTA bodies for guidance on best practices in SLRs, is actively engaged in initiatives aimed at comprehending how AI/ML can be effectively harnessed within SLRs to enhance efficiency and the quality of outcomes.[6]

To sum up, the use of AI in SLR has the potential to reduce workload on human researchers and improve efficiency, leading to fast and consistent results. Nonetheless, there is ample work that needs to be done before it can replace human researchers entirely, particularly in the data extraction step of the SLR. Given the increasing popularity of AI tools in SLR, we anticipate their application will increase tremendously by the HEOR community if HTA bodies define the clear guidelines on their use in HTA process.

Authors – Shubhodeep Mitra, Raju Gautam

References:

Hamel, C., Kelly, S.E., Thavorn, K. et al. An evaluation of DistillerSR’s machine learning-based prioritization tool for title/abstract screening – impact on reviewer-relevant outcomes. BMC Med Res Methodol 20, 256 (2020). https://doi.org/10.1186/s12874-020-01129-1
Cichewicz A, Burnett H, Huelin R, Kadambi A: SA3 Utility of artificial intelligence in systematic literature reviews for health technology assessment submissions. Value in Health, Volume 25, Issue 7, Supplement, S604, July 2022
https://doi.org/10.1016/j.jval.2022.04.1669
M. J. Oude Wolcherink, X. G. L. V. Pouwels, S. H. B. van Dijk, C. J. M. Doggen & H. Koffijberg (2023) Can artificial intelligence separate the wheat from the chaff in systematic reviews of health economic articles?, Expert Review of Pharmacoeconomics & Outcomes Research, DOI: 10.1080/14737167.2023.2234639
de la Torre-López, J., Ramírez, A. & Romero, J.R. Artificial intelligence to automate the systematic review of scientific literature. Computing 105, 2171–2194 (2023). https://doi.org/10.1007/s00607-023-01181-x
van de Schoot, R., de Bruin, J., Schram, R. et al. An open-source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell 3, 125–133 (2021). https://doi.org/10.1038/s42256-020-00287-7
Ferizovic N, Rtveladze K. Recommendations on the Use of Artificial Intelligence and Machine Learning in Systematic Literature Reviews Submitted as Part of the Evidence Package in Health Technology Assessment. Value in Health, Volume 25, Issue 12S (December 2022)
Yindalon Aphinyanaphongs, Ioannis Tsamardinos, Alexander Statnikov, Douglas Hardin, Constantin F. Aliferis, Text Categorization Models for High-Quality Article Retrieval in Internal Medicine, Journal of the American Medical Informatics Association, Volume 12, Issue 2, March 2005, Pages 207–216, https://doi.org/10.1197/jamia.M1641
O’Mara-Eves, A., Thomas, J., McNaught, J. et al. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4, 5 (2015). https://doi.org/10.1186/2046-4053-4-5
Rito Bergemann, Addressing the Challenges of Artificial Intelligence used for Data Extraction in Systematic Literature Reviews, Parexel 2023
Issaiy, M., Ghanaati, H., Kolahi, S. et al. Methodological insights into ChatGPT’s screening performance in systematic reviews. BMC Med Res Methodol 24, 78 (2024). https://doi.org/10.1186/s12874-024-02203-8
https://help.openai.com/en/articles/8868588-retrieval-augmented-generation-rag-and-semantic-search-for-gpts

Use of Artificial Intelligence in Systematic Literature Review

Judit Banhazi

Judit Banhazi

Nigel Seear

Nigel Seear

Aris Skandemis

Aris Skandemis

Adam Ball

Adam Ball

Eleni Tente

Eleni Tente

Syed Salleh

Syed Salleh

Thai-Son Tong

Thai-Son Tong

Shilpi Swami

Shilpi Swami

Hugo Pedder

Hugo Pedder

Kunal Hriday

Kunal Hriday

Raju Gautam

Raju Gautam

Radha Sharma

Radha Sharma

Kate Ren

Kate Ren

Tushar Srivastava

Tushar Srivastava