false
Catalog
The Application of Artificial Intelligence to Arti ...
The Application of Artificial Intelligence to Arti ...
The Application of Artificial Intelligence to Article Screening in Systematic Reviews
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
I'll probably wait about five minutes, and then we're down to 12.05. All right, I guess I can get started. Don't have to wait until 12 o'clock since we have recording anyway. Okay, so hi everyone. Welcome to this webinar. My name is James and I am the Community Director of the Public Health and IP Interest Group and also a PhD student in Nutrition and Health Sciences. So today I'm going to talk about implications of artificial intelligence to article screening in systematic review. So before that, let me make it smaller. So as a background in diabetes research, we have an increasing number of articles for large acquisitions. By just looking at the PubMed, we can see we have about 60,000 articles in 2003 and that is about 10,000 in 1999. So we have a huge increase since then. And moreover, the rapid application of ChessGDP has enabled efficient scientific writing and programming, which means eventually we may see a greater number of articles for publications. That means knowledge retrieval and summarizations are more critical in keeping updated about the most recent research. And compared to narrative review, which basically we summarize the research based on personal understandings, systematic review and meta-analysis is a novel approach has kind of emerged as a critical method for retrieving and summarizing knowledge with an unbiased and comprehensive and rigorous approach. As we can see on the top, in systematic review, we typically will do some forest plots where we summarize all the evidence and looking at their individual effects and then looking at their summarized effects. And this forest plot is and or the global events, because some of these individual studies may not have significant findings in some of the outcomes. But if we look in the overall, they have significant findings and have a shorter standard deviations. So that's the power of systematic review. And despite the importance of SRMA, the construction of systematic review is labor intensive, which from hundreds or to thousands of hours and the error problem due to the difficulty of balancing the coverage and precision of database searching using complex search term combinations and merging and deduplicating all of the records, screening through thousands of titles and abstracts, reviewing the full text of the article, which means you need to read from abstract all the way to the conclusions to avoid missing details. And also the last part is extract the data manually, which means you need to go over the full text reviewing process. So overall, all of these steps are time consuming. And with the rapid growth of literature, this challenge is compound. That's completing a systematic review typically takes about half a year, counting the time from drafting the proposal to actually finishing the writing, the screening. And also when you need to count the time for the writing, because then we need to consider the data extraction, all of that. And so it is really hard to review a lot of the articles, it is really hard to review a lot of the articles, especially for big topic. That means a lot of the reviews themselves cannot be updated very frequently. That's, you may be just updated every five years. So, and second is, even if we are interested in some review to replicate them, we could not replicate it because we don't have time. And also replicating a systematic review have a limited kind of meaningfulness if we could not do that quickly. Now, so, we explored various approach to address the challenges by using traditional machine learning. The most popular machine learning technique is active learning, which is a subset of machine learning, that it has a semi-automated approach where the algorithm actively and continuously asked the researchers to label the data for inclusion or exclusion, and then shuffle the order of all the records, so that the algorithm can bring the most important articles to the users for screening. So that's basically the mechanism of this type of algorithm. And there are many popular active learning tools like abstractR, ASReview, Calendar, FastRead, Viant, and Robot Analyst. And I think the ASReview is the most popular one now. And abstractR is not available anymore. So, the website is not functioning now. And all of them have strengths as it may save time by doing shuffling, right? Because you can consider if you can keep finding the most relevant articles, the rest of them will become less and less important. And in the end, once you finish a lot of article screenings, you may consider that the rest of them are not important. That's the assumption of all of these algorithms. So, if you, for example, if you, there are some arbitrary cutoff, where if you found a hundred articles that is continuously to be irrelevant, you may consider that the rest of the articles are not relevant. So, that's actually the limitation of this type of active learning tools, that they they lack accurate and valid stopping to determine if there's any relevant article left. So, in the end, you have to set an arbitrary cutoff and that makes people uncomfortable. And it's, but if you don't set a cutoff, you basically need to review all of the articles, which means there's no point of using the active learning tool. So, right now, I don't see a clear answer and all of the cutoff is arbitrary. And the second limitation is that, let me make a pencil. Second limitation is lack of versatility, because in the training data, these type of tools train the models for using a specific research field, training materials. So, if the tool is trained on one subject, it may not work in another subject. For example, if that is trained on like cancer, it may not work for screening type 2 diabetes articles. And also, the benchmark testing in this tool often lack in details, that it's hard to replicate their works and truly understand what they mean, and also to assess their real-world performance. And third is, their real performance is often suboptimal than their benchmark demonstrations, because I use the AIS review by myself multiple times. And the simulation is great. It's showing almost wonderful results. But if you use the same datasets for demonstrations and do it by myself, it doesn't look that good. So, there's a risk for overfittings in these active learning tools. So, I would like to talk more about another path. I think that's worth investigation and explorations, which is the large language models. Large language model is notable for its ability to achieve general purpose language generation and understanding. So, all of the audience probably know about chatGDP, right? So, that is first of the artificial intelligence that has enhanced versatility, which means it not only understands one subject, but also understand the others. So, it somehow adjusts the previous limitation that one research field training models may not work for the others. And also, it can provide rationale for visioning if you ask for it. And the real performance can be assessed easily, because you will get the output directly, and you can go through the individual details one by one. And so, because the interface is very easy, you can assess the real performance. And also, I think the most important advantage is that you can ask it to read all the titles and abstract until no article is left. So, it's addressing the biggest limitation for the active learning tool, which is setting an arbitrary stopping. In here, you don't have to do that, because the language model can read everything. That helps you to make informed decisions by making sure nothing is left. So, before that, I would like to share some useful tool I personally think is helpful for my systematic review, which is in the publication process. So, the first one is ASR Accelerator Duplicators, which is a free and a powerful article duplication tool. So, I test it using Covidence. I'm using this tool, and this tool is more sensitive and precise in capturing those duplicates. And third, second is Ryan. So, Ryan is also a free and a nice article organizers where you upload the articles from Parmet and from InBase and also for different databases, and you can upload it there. It accepts diverse file type and help you to organize them and download in CSV, which is very nice. And they supplement the following tool I'm going to talk about. So, for curiosity and also for satisfying my hope to building the tool to address the previous limitations, I developed the Vehicle Pallets and on ChartGT, which is powered by GTP4 and have several main functions. The most important function is title and abstract screening, which means you can copy and paste the title and abstract from text file or from CSV and copy it there, and it can help you to screen articles automatically and provide reasons and decisions. And second, it can accept full task upload and screen via PDF. So, if you want to have additional check or want to have AI opinion first, before even reading about the full task, you can download a PDF for the articles and upload it there, and it can help you to make initial decisions for that. And it can also perform some data extractions after the full task screening stage. And also, there are two additional functions, which I think also be helpful is search term generations. So, if you want to search on PubMed or InBase, you typically need to consider many search terms combinations, which is hard. And so, the search term generations function from there can help you to get some initial feedback and also additional suggestions if you have some terms already, but you want to increase the coverage or making them more concise. And it can also help you to make post-parallel protocol preparations, because this process is not very straightforward. And so far, it is pretty streamlined, and using this tool can help you to finish this process quickly. So, overall, this tool has a diverse function, can help greatly accelerate the systematic review process. And before that, I would like to share some tips. So, we developed some beta function on HTTP, shown here, and I will give a live demo later. But before that, in these beta versions, we can maximize the title and abstract upload five, because if we upload more, you can imagine that the system gets a little bit overwhelmed, and the performance tends to decrease. So, five uploads each time. And the full text is one max PDF, and not supplementary material, because supplement is too long. And if you upload multiple PDF, it would become chaotic and have decreased performance. And the third is, we need to copy and paste our PICOs, which is target populations, intervention, comparison, outcome, and study time, each time, at the beginning of the chat. So, you can keep reminding AI what is your target PICOs. If you don't do that, sometimes, as the conversation goes, the AI may forget what is the target PICOs. So, you need to keep reminding. And also, you need to make your target PICOs concise and easy to understand. That could be from the Prospero protocols, but from the protocols meant to be comprehensive. But during the title and abstract screening, it doesn't have to be comprehensive, right? Because we typically look for very specific PICOs during the screening stage. We don't look at all the details. And that helps to make the AI better as well. So, you can consider that as an undergrad with an assistant, where it can help you to do a lot of stuff, but you need to make it straightforward and easy to understand. So, I will share two live demos that's how I can use the review co-pilots to replicate, to publish systematic review and meta-analysis. So, the first one is GLP-1 SRMA, which was published on Lansing Diabetes Endocrinologist on 2001. And second one is prediabetes and published on the American Journal of Preventive Medicine on 2002. So, I will start with the live demo. And let's see. So, here's the web page for the beta version tool. And it's, if you have the tragedy plus account, it is free currently. And if, so for example, I have templates where this is the original systematic review and have a great research team and publish in a great journal. And they screen a lot of particles to do that. So, it is a really great work, but very complicated and time consuming. And among this, I extract five included articles in their review. So, we can basically see if the AI can replicate that. And here's the script I wrote. And I can make it available after the presentations. But basically, we just like interact with the normal people asking, can you read the following titles in abstract to see it should be included or excluded based on the target peoples. And below is the target peoples. And I basically get all of this information from their systematic review for several protocols. So, you can also do that if you want to use the review for pilots to do your own systematic review or replicate other works. And so far, it's straightforward and simple. So, just enter the key information there. And then we just copy and paste the title and abstract. We could do that copy and paste from the TST file or copy and paste from CSV, doesn't matter. It will capture them anyway. So, this is the title and this is the abstract. And this is the second articles. And third articles. And fourth articles. And the fifth articles. So, we have four title and abstracts. So, all I did is I copy and paste them into here. And we want to cancel this out because this is useless. This is an image. So, I load all of these target peoples and those targets titles and abstracts into the chat and enter there. We can see the results coming out with information in there. They are from the articles abstract. And we can check in with in there, like they're all valid data extractions. It's a little bit slow during the day and during the night, it's typically better. But as you can see, we ask AI to screen five articles for us and it's done. I know the answers before, they are all include, so it is pretty great. We have these articles correctly identified as include during the screening process. We can also ask AI to check some of the exclude articles. I also have some exclude articles where I download information from our maps and found some articles. This is another systematic review, which typically should be excluded. I can upload it there. I don't need to do that, doesn't matter. Here's another articles. Here's another articles. It is likely a review article as well. Copy and paste here. We got two and we can keep going to get three. This is talking about mechanism. Happened a lot when you do systematic review where you get a lot of not relevant articles per your research questions, but it's just the nature of using comprehensive search terms. You will get all of this during the screening process. Okay, since I need to exclude this information, but I think a nice way to use is to upload a file to Ryan and organize the files into the CSV. So in a CSV, typically better, because once we exclude all of this and making it more organized like the previous one, it's have a better performance. So we need to do that before. And we can start another chat to basically check in to see if they can identify with the full text. So the full text is like this, maybe similar, we do read the following article. To see if it should be included or excluded. And we basically just drag the PDF there. And it's taking quite a while. And I did not provide its titles, and you can read the information was in there. And it knows that the title of this PDF is this. Leragruta and cardiovascular outcome in type 2 diabetes. So get it all right, and get all of this information right. And math inclusion criterias. So it is very nice for full test screening because it can help you to grasp the kind of the key messages from these articles. Like it is cardiovascular deaths and the nonfatal mitochondrial functions and the stroke and of course, mortality and hospital admission. So you can have this information. And that basically the rationale of how vehicle pilots make these decisions. And once you have that, that is a way helpful first impressions. Then you can just check in the full test article to see if it is the case. Or if you disagree with some of the decisions and you want to check by yourself, you can also do that. But anyway, it is a powerful tool to help you to read a full text because the full text is pretty long actually. So if you scroll around, you probably need to read about 10 pages of PDF and you need to repeat this process for a dozen of articles. But if you somehow use the tool, it can make the process much easier. So we start another chat and we can assess the next articles, which is this intervention for reversing pre-diabetes, systematic review and meta-analysis for micro internal preventive medicine. And so I have the talk opened and it's still similar formats. And I have the PICOs ready and that is from their post-parent. And the population is anyone who is pre-diabetes and the intervention is any non-surgical interventions, comparison is another intervention and outcomes, any of the other outcomes. And the study type is randomized control trials and the copy that today. So here's the file I obtained from the collaborators that have the original decisions. So on the left, I can make it bigger. On the left is the human decisions. I know the true answer by checking with the collaborators. And so if I hide all of this information, all of this information is AI extracted information. So it is the population extracted by AI, intervention extracted by AI, comparison, outcome, and the study type, and AI decisions, and AI rationale. And you can provide some rationale so that, for example, the study population include pre-diabetes, intervention include the non-surgical methods, and there's a comparison group, and outcome measure includes hemoglobin A1c and fasting plasma glucose, which is relevant to the target PCOS. And the study is RCT, align with the study type. So you have the rationale, and that's how it's make a decision and to make you convinced that it's correct information. So if we only provide the first five to the AI and the C, okay, so you don't know the decisions, and I want you to re-identify those included articles. So in the end, in the future, it's just my imagination that we may just need to clean the files to the CSV forms and copy and paste into AI to have it make initial first decisions. And then having researchers to check the work and to screen through the rationale to make sure that that makes sense. And then that helped to increase the efficiency while ensuring the validity. And we can see here, so it making a lot of conclusions. Okay, we said include, and having the randomized control trial out of this inversion, they say exclude. Although it mentioned something, although it is involved, the target populations obese without definitions. And you need to check this. Like sometimes it's make not perfect decisions. So for example, exclude, and the rationale is this. It said obesity without definite diagnosis of prediabetes. So it just, to think about this during the screening, if you want to include obese individuals without diabetes or just prediabetes with clear definitions, you need to make it clear. And so it also provides rationale. If you disagree, you can override it. Okay, I would say that it is include because I think obesity, although without diagnosis of diabetes, they're very likely to be prediabetes. And the third, basically checking this rationale again. Okay, the target population, the evacuative type, placebo, diagnosis of diabetes, and the population is prediabetes. Okay, it is okay. So move forward with others. And the first one was incurred glucose and metformin, all of this includes, and the fifth, prediabetes, PIO, placebo incidence, follow-up, exclude. Okay, so here we want to see if we think follow-up is okay. If so, we need to include that into the target pickups. So basically just say that randomized control trials, including follow-up. So we need to do that if we want to get a perfect answer. So it is a little bit process that how to make a complete PICOs and do some tests. And I can imagine the future that if we want to use this tool to get some increased efficiency for article screening, we need to randomly pick about like 20 or 30 articles and then write the PICOs and the test if the AI working as expected. So that, and I also developed some automatic approach where we basically repeat this process on the backend using codes, unlike on the online platforms. And you probably run about 20 hours continuously, but in the end, you will get all of these clean informations and with decisions. So, but this process is important because you need to test, okay, if I miss any details, if I miss any assumptions, for example, is the RCT okay, or RCT including the follow-up, if the RCT is okay. So getting that tested, and then to deploy it, I'll have the article screening. And for example, here, I screen about 3,000. And the 3,000 article screening only takes about probably 20 hours for the AI to do. And so it's pretty quick, but we need to make this test before deploying. Basically kind of like calibrate your machine before the form leaves. So that's it for this demo. And also this product is still under development and I welcome any feedback. And we also have a formal version for this tool for free, and we welcome researcher to use and to test its performance and to see if that is available or possible to get AI assist systematic review completed. So here's my contact informations and I welcome any comments and questions after. Thank you. So let me check if any questions. Okay. So can attendees ask questions? Like I'm not sure if they do. So do I have to allow them to ask question? Okay, great job. Thank you. Okay, thanks. Clearly, which version of the chat GDP I use, I use GDP for, and this software was developed for using GDP for, and it is a basic chat GDP app that's only in support of GDP for it to be used. So it is not free, and you need to pay about $20 per month. And there is a cap for use, which I think is 40 messages per four hours, something like that. So, and also I did not charge for this is open AI charge for this, but this tool is pretty helpful. And, and since the message limitations is not likely to help you to complete a full review pretty quick. So if you want to really deploy the formal tool for the review, that will be great, because this is a beta version can allow you to test around. But in the end, this process is very time consuming to kind of copy and paste every time and these are automated versions that I have available, but it's just not available yet to be a website. Yes. Okay. Can't wait to go and play. Okay. Yes. Yeah. Welcome. And I will just copy and paste the, the kind of the link into the chat. So, if we were use this part. So, I, I am in the process of publishing method articles about this. So in the future. You can cite it. And we basically take a more formal approach to try to replicate for public systematic review and have all that detail documented. So, in the future you can cite it in using that article, but that article is in the process. Okay. So, let's see. Okay. Any other questions. Okay, if not, then I will just end this webinar. But I think it's a really nice tool and also really fun to just see how the AI can replicate human works. Although there are some things we need to calibrate its performance. But in the end, if we have this tool that can work 24-7 for us, and we just need to calibrate it at the beginning, that would be really nice. Because that way we can save probably hundreds of hours by having the tool do that for us. So we focus more on the data interpretations and also for the manuscript writing than screening the articles. I mean, screening articles is boring, so I hope this tool is fun and this webinar is helpful to the one who watch it. Okay, very nice. So that's it, and thank you for attending this webinar, and have a good day. I'll stop the screen share, and I will make the slides to everyone after this webinar.
Video Summary
In this webinar, James discusses the implications of artificial intelligence for article screening in systematic reviews. He highlights the challenges of the labor-intensive process and the time-consuming nature of systematic review construction. James explores the use of traditional machine learning techniques like active learning and popular tools such as ASReview. He also introduces the concept of large language models like ChatGPT, which offer more versatility and efficiency in the systematic review process. James demonstrates a tool he developed called Review Copilot, powered by GPT-4, which automates tasks like screening articles based on predefined criteria. He emphasizes the importance of calibrating the tool and testing its performance before full deployment. Attendees are intrigued by the potential time-saving benefits of using AI in systematic reviews and express interest in exploring and using the tool. James concludes the webinar, highlighting the benefits of AI in streamlining the systematic review process and enhancing research efficiency.
Keywords
artificial intelligence
article screening
systematic reviews
machine learning techniques
ASReview
large language models
Review Copilot
time-saving benefits
American Diabetes Association 2451 Crystal Drive, Suite 900, Arlington, VA 22202
1-800-DIABETES
Follow us on
Copyright All rights reserved.
×
Please select your language
1
English