Test project - Explore AI models in data extraction

TABLE OF CONTENTS

Introduction
How to start
- Invite other reviewers (optional)
- Distribute tasks for first extraction (mandatory task)
Extract data (recommended)
Import other PDFs (optional)
Validate extracted data using Quality assurance stage (optional)

Introduction

This training project is not topic-specific, i.e. it does not address a specific research question. Instead, studies have been selected to be representative of human research in terms of interventions and study design.

The project aims to provide a broad overview of the usability of the AI models implemented in LaserAI.

While both the research questions and the purpose for which specific information is collected can vary widely from one project to another, there is still a great deal of information that AI models can significantly help to extract.

How to start

Invite other reviewers (optional)

Invite team members especially if you want to test QA stage (validation of data extracted by first reviewer).

Learn more about team management.

Distribute tasks for first extraction (mandatory task)

Distribute studies for data extraction. On the Data extraction stage dashboard you will find studies waiting to be distributed. You can choose between three distribution methods. Please note that the open pool method is the most flexible, as it allows, for example, studies to be returned to the pool—enabling other reviewers to extract data from the same study (please note that only last extraction will be saved in this case).

Learn more about distribution

Please note that the models need some time to extract data. You will see a notification indicating that the model is processing. Once at least one PDF has been processed, click the ‘eye’ icon in the left-hand navigation bar. You should also receive an email notifying you that tasks have been assigned to you.

Click here to learn more about my Task board and data extraction tasks.

Extract data (recommended)

We encourage you to test the following features to better understand how the model supports data extraction and to familiarize yourself with the extraction process in Laser AI. Please note that the database format slightly differs from what you may be used to in Excel.

Extraction of data into three subfields (main extracted value, author-reported value, and comment)
Extraction of data using the highlighting mechanism
Extraction in vocabulary fields
AI-supported extraction of fields from text
Extraction within subsections
AI-supported extraction from tables

Task 1 Extraction of data into three subfields and Extraction of data using the highlighting mechanism

Each data extraction field includes three subfields: main extracted value, author-reported value, and comment. Learn more about: Components of a Single Data Extraction Field

Try to extract all three subfields for at least one data extraction field. For now, select a field that is not AI-supported—there will be no yellow suggestions. Use the highlighting mechanism to extract the data: select the relevant part of the text from the PDF and click the + button in the data extraction form.

To learn more, read about how to extract data using the highlighting mechanism.

Task 2 Extraction in vocabulary fields

In addition to simple text fields, you will find vocabulary fields, where you can select the appropriate code from a provided list that best represents the extracted data (in this data extraction form for example Study type is such field). Please extract at least one vocabulary field and try using the "create new term" option—if the relevant term is not available in the list, you have the option to add a new one.

If you need help please read this article

Task 3 AI-supported extraction of fields from text

When model suggestions are enabled for a field, the extracted value will be highlighted in yellow. Your task is to accept or reject these suggestions.

Select at least one tab to test how this process works—for example, open the Population tab and review the model’s suggestions in the Inclusion/Exclusion Criteria section.

Review the suggestions one by one for the Inclusion Criteria. You can also learn how to use the Batch Accept button (dots button) to speed up the process.

Learn more about the process

Task 4 Extraction within subsections

Subsections are groups of fields that can be multiplied within the Section. They are used in three cases:

Basic subsection - to extract dependent values and multiply fields, e.g. in the case of multiple countries, In/Out criteria or combinations treatment in one study arm.
For group level data - to extract values per single group/study arm (outcomes, baseline characteristics)
For comparison - to extract comparison values between study arms (outcomes, baseline characteristics)

More detailed information about subsection

Try to extract data within a basic subsection. For example, you will find the subsection Country under the Study Details section in the Study Characteristics tab. The 'Country' field is a perfect example of a subsection field, as a study might be conducted in various locations.

To extract group level data, must first extract all study groups/arms. For example, if you want to extract data for each single study arm separately , extract all arms in the ‘Intervention’ tab.

Please extract the baseline age for each study group and p value (between study groups). In the Population tab, you will find the Baseline Characteristics section, which includes a subsection dedicated to extracting values for each study arm.

For this task, please extract the data manually. In the next task, you will test how to automate this process.

If age is not reported in the publication, you may extract other baseline variables or outcome data instead, as the subsections are structured in a similar way.

Task 5 AI-supported extraction from tables

Repeat Task 4 (extracting age data), but this time use the semi-automated option for tables.

Click on the table, then click the blue Extract button to enter the automatic table extraction module. Next, match the content of the relevant columns to the corresponding fields in your data extraction form.

You can extract all baseline variables from the table at once by selecting all rows.

More about this process

Please note that extracted values will be highlighted in yellow, indicating that they require your review. You should accept them either one by one or by using the Batch Accept button.

Import other PDFs (optional)

In this project you will find 11 studies that are waiting to be distributed. Please feel free to add more.

Learn more how to upload more PDF.

Validate extracted data using Quality assurance stage (optional)

In Laser AI, you can validate your extracted data through an optional 4-step process.

At the researcher level, once you finish your data extraction and click the 'Finalize' button, the tool will verify the extracted data for any missing elements, such as empty fields, missing connections, or uncoded values.
At the researcher level, you can set up the quality assurance stage, allowing a second researcher to validate the extracted data. This process is similar to the initial data extraction.
The data cleaning module is dedicated to validating vocabulary fields (manager level)
At the manager level, you can review and modify extracted data in the reference list

To test the Quality Assurance (QA) stage, you must first extract data from at least one study. Once that is done, please distribute studies for QA. This process is similar to the distribution used for the first extraction.

How to distrubute studies for QA

The Quality Assurance stage is very similar to the data extraction process. All data extracted by the first reviewer will be highlighted in yellow for your review.

Learn more about how to verify data during the QA stage.