How to create data extraction form

TABLE OF CONTENTS

Introduction
General overview of the list of tabs, sections and fields
Sections management - general
Sections management - creating empty section
Section management - importing from templates
Fields management - General information
Fields management - Adding new text field
Fields management - Enable model suggestions in the text field
Fields management - Adding vocabulary field
Fields management - Enable model suggestions in the vocabulary field
Relations between models
Subsection management

Introduction

Standard Extraction form overview consists of three general elements.

Hierarchical list of tabs, sections and fields - list of concepts to be extracted. Detailed explanation about the hierarchical structure of the form fields is available below.
Sections and fields settings - the central part of the data extraction form. Here you manage the content of each section or field, add descriptions, models and connections. A detailed description of the section and field settings is available below.
Preview - presents the final look of the designed domain. All yellow fields indicate that the field is supported by an AI model (the name of the model is placed in a specific field)

Data extraction form overview with overview of the form elements

General overview of the list of tabs, sections and fields

The extraction form consists of three main elements:

Tabs
Sections
Fields

This hierarchical structure is similar to that used in an Excel spreadsheet (see graphic). Tabs in Laser AI mirror the tabs in Excel, sections are groups of columns describing a particular topic, while fields mirror a single column.

Comparison between Excel spreadsheet and Laser AI Data Extraction Form

Tabs

First level of hierarchy in the Extraction form. It’s dedicated to the main concepts of the Data Extraction (often compatible with PICO elements). Dividing the form into tabs ensures comfortable extraction of individual topics (Click here to check how tabs looks in the Data Extraction form).

Examples of tabs:

Baseline characteristics,
Population,
Outcomes
Risk of Bias

To add a new tab, click the ’Add tab’ button at the top of the section, then enter the tab name. After selecting the dots icon next to the tab name, you can change the order of the tabs, rename or delete them.

Data extraction form with highlighted tabs and tabs management

Sections

Groups of specific fields that are similar in terms of subject and belong to the same general domain (tabs in Laser AI).

Examples of sections:

Study details (belongs to the General characteristic tab)
Baseline variable (belongs to the Population tab)
Adverse events (belongs to the Outcomes tab)
Risk of bias arising from the randomization (Belongs to the Risk of bias tab)

Sections should also be used when one concept has a lot of different variables to be extracted. Placing this type of concept in a separate section allows multiple values to be extracted by the model and to be generated correctly in the Extraction Summary.

Examples:

Inclusion / exclusion criteria (because there is more than one criterion per study)
Countries (studies are often conducted in more than one country)
Type of analysis (single study can contain different analysis types, i.e. Intention-to-treat, per protocol etc.)

Detailed description of the Section management is presented in the next paragraphs.

Data extraction form with highlighted sections

Subsections

Group of fields that can be multiplied within the object.

Subsections in Laser AI can be divided into 3 main types:

Basic subsection - used when one concept has a lot of different variables to be extracted. Placing this type of concept in a subsection allows multiple values to be extracted by the model and to be generated correctly in the Extraction Summary.

Examples:

Countries (studies are often conducted in more than one country)
Type of analysis (single study can contain different analysis types, i.e. Intention-to-treat, per protocol etc. )
Fields dedicated to extracting interventions when patients in a single study arm received more than one intervention.

For group level data - used to extract all data that are reported per single group, e.g. study arm. Laser AI contain set of example fields for this type of subsection that can be modified, e.g. % and number of patients with event (per arm) or central tendency value (per arm)
For comparison - used to extract comparison values between groups, eg p-value, OR, RR. Laser AI contain set of example fields for this type of subsection that can be modified, e.g. Effect estimate value, variability value or p-value

Data extraction form with highlighted subsections

Fields

Individual concepts belonging to the section, the most granular element of the data extraction form.

Examples of fields:

Study sponsor (field belonging to the Study details section and Study characteristic tab)
Duration of treatment (field belonging to the Study arms section and Intervention tab)
Baseline variable name (field belonging to the Baseline variable section and Population tab)

Data extraction form with highlighted fields

Sections management - general

After clicking on the single section in the left panel, in the central part you will see a section settings panel consisting of:

Name - a box for entering or editing the name of single section
List of tabs to which a specific section can be assigned to
Panel for changing the order of sections across single tab
Changing background color
Section key field - an information about which field from the particular category is set up as a key field (more information about the key fields and their setting is available in the section dedicated to the fields management)
Advanced settings - additional panel allowing to add connections

To add a new section, please click on the ‘Add section’ button in the upper part of the panel. You have two options:

Import from templates
Create empty section

To delete the section, select the ‘Remove’ button in the upper part of the panel.

Data Extraction Form Detailed Description of a Section Management Panel

Sections management - creating empty section

After you choose the ‘Create empty section’ option, you will see a new section added to the form, unassigned to any of the existing tabs.

First, you have to change the name of the section
Assign the section to the relevant tab and change background color if necessary
Don’t forget to create connections with the other relevant elements of the form

Steps to add a new section to the data extraction form

Section management - importing from templates

You can also import whole sections from the existing Data extraction forms.

After clicking ‘Add section’, choose ‘Import from templates’ option
Select the template that you want to use to import sections
Select some or all sections that should be imported
Define a tab where the section should be assigned to. If the tab doesn’t exist, choose ‘Extraction (default)’ - you can create the tab later and then assign sections to the tab.

Steps to add a new section to the data extraction form by importing existing sections from other templates.

Section management - Connections

General information about connections

Once you have specified data extraction fields, you should establish connections between each field, similar to how it is done when conducting data extraction in Excel.

When extracting data in Excel, you connect your data by selecting the appropriate row and column. In some cases, you may have to copy already extracted data into another row.

In Laser AI, you decide which field should be connected with others. Here, you can extract particular data only once and save time. Let's look at some examples:

The researcher's task is to extract data for a selected outcome across different subgroups, types of analysis, and cohort arms. If the extraction is done in an Excel file, the researcher needs to add additional rows to extract data for the same outcome but for different cohorts and different subgroups or types of analysis.

Excel file with the example of Data extraction form

In Laser AI, your researcher extracts cohort names and subgroups or types only once. After finishing the extraction of values for this outcome, the researcher has to make connections with previously extracted data: subgroups name, types of analysis.

You will find three type of connections in Laser AI:

1. Auto-connect

How it works: All selected sections will be automatically connected with each other by Laser AI .

Extractor perspective: Since sections are automatically connected, the extractor does not have any tasks here, but they have the opportunity to remove connections between extracted values. See an example on the screen.

When to use: Recommended for fields related to study details, to connect with selected sections

Output: If you select this type of connection, all extracted values in those fields will be presented in one row in the final data extraction file (Excel/CSV file).

Example of autoconnections panel in the Data extraction

How to create auto-connection in data extraction form:

To create a connection between all the sections describing the study

Go to the appropriate section (study details) in which you want to create an auto-connection.
Go to the 'Advanced settings' tab and select section type-other
Choose the sections that need to be connected from the list, select 'Many (auto-connect)

You do not need to conduct this action manually. If you select the “Study Characteristics” section, the tool will automatically link all sections.

2. Connections - Only one

How it works: All selected fields can be manually connected during extraction with each other.

Extractor perspective: The extractor’s task will be to connect extracted values with values from other sections specified in the data extraction form using the connection panel below the sections. The extractor can select only one extracted value for one type of field. For example, in case of outcome values data could be presented for several follow ups, extractor can connect extracted values only for one follow up for each (see an example on the screen)

When to use: Recommended for outcome data with cohort arms

Output: If you select this type of connection, all extracted values in those fields will be presented in one row in the final data extraction file

Example of 'only one' connection panel in the Data extraction

How to create connection-only one in data extraction form:

To establish an ‘Only one’ connection between all fields:

Select the appropriate section (i.e. Outcome name) that you want to connect
Go to the 'Advanced settings' tab, choose the sections that need to be connected and from the list select 'Only one’

Data extraction form overview with highlighted steps to create 'Only one' connection type between sections

3. Connections - Many

How it works: All selected sections can be manually connected during extraction with each other.

Extractor perspective: The extractor's task will be to connect extracted values with values from other sections specified in the data extraction form using the connection panel below the extracted section. The extractor could select one or more (if extracted) values. For example, interventions in the study could be administered in different settings: hospital, ambulatory, home. The first intervention could be administered only in one setting, and the second intervention could be administered in two settings. The extractor can connect interventions with one or more settings. See an example on the screen

When to use: Auto-connections are recommended options. The 'many' connections should be used only in cases when you feel that manual connection is more appropriate

Output: If you select this type of connection, all extracted values in those fields will be presented in one row in the final data extraction file

Example of 'Many' connection panel in the Data extraction

How to create connection-many in data extraction form:

To establish ‘Many’ connection between fields:

Select the appropriate section (i.e. Outcome name) that you want to connect
Go to the 'Advanced settings' tab
Choose the sections that need to be connected and from the list select 'Many’

Data extraction form overview with highlighted steps to create 'Many' connection type between sections

Fields management - General information

Field is the most detailed and granular element of the Laser AI Data extraction form. In the field you can extract a single concept, such as dosage or sample size. There are two types of fields

Text field - a field where Researchers have to extract information straight from the text.
Vocabulary field - a field where data is extracted from the structured list (vocabulary). Vocabularies are created in a separate tab at the organizational level, and can then be added to the extraction form. However, if there is a new concept that was not included in the initial vocabulary, researchers can add additional terms during the extraction process.

Data extraction form with detailed description of fields

After clicking on the field in the left panel, in the central part you will see a settings panel consisting of:

Field name
Panel for changing the order of fields across single section
Tick-boxes to indicate whether field should be required field and/or key field:

Required fields are those that have to be extracted during the extraction process. If you don't extract anything in this field, you won't be able to submit your extraction.
Key fields are used to distinguish between multiple objects in the section. Key fields will be visible as a subsection heading e.g. while creating study arms

Model - summary what kind of model is used in this field, eg. LLM, test model etc. (a panel to enable model suggestions is available in the Advanced Settings)
Field description - a box to leave instructions for the researchers that may be useful when extracting data from a particular field.
Vocabulary - section visible only for vocabulary fields. It presents what vocabulary has been used, it also allows you to preview uploaded vocabulary and to add additional terms

Advanced settings - an additional section that contains a panel for:

Input validation (only in text fields) - allows you to check if the extracted value is consistent with your expectations.
Managing models (both for vocabulary and text fields) - here, you can choose which model should be used to support the field.

To delete the section, select the ‘Remove’ button in the upper part of the panel.

Fields management - Adding new text field

To add a new Text field, in the selected section, click 'Add field' and choose 'Text field'. You can create as many text fields as you prefer in a single section. Next, in the Settings panel, add a field name, set its position among the fields and define whether it's a required field and/or a key field. Optionally, add a field description to ensure better understanding by reviewers and high quality extraction.

Data extraction form with highlighted steps to add new text field to the form

An optional step while adding text field to the Data Extraction form is to define Input Validation which allows you to check that the extracted value matches your expectations.

To enable the validation of the extracted data, expand the ‘Advanced setting’ tab, click ‘Input validation' button and choose one of three options:

Only numbers - the tool will verify if the extracted data is a number - if not, the tool will inform researcher that the extracted value should be numeric
CAS number - A CAS number is a numerical designation assigned to chemical substances by the U.S. Chemical Abstracts Service (CAS) - You can ask tool to verify if the extracted data is a CAS number or other value.
Custom rule (RegExp) - You can set your own validation rule using regular expressions.

Data extraction form with highlighted Advanced settings and input validation options

Fields management - Enable model suggestions in the text field

AI models
The final step of adding a new field, which is also optional, is to enable model suggestions. Not all fields are covered by models yet, but many of the fields are supported by AI.

Enabling AI model suggestions will speed up and improve the quality of the data extraction process. Model suggestions can be used to the various concepts, e.g. study design, population, dosage or inclusion and exclusion criteria.

To implement an AI model into the Data Extraction form, expand the ‘Advanced setting’ tab and go to the ‘Model suggestion' section with the list of available models. Choose the model that matches the best to the concept that you want to extract. Note that models have different categories - LaserLLM, ToxLLM, Test model etc. These categories are dedicated to different types of analyses. For example, if you are performing a toxicology study, select ToxLLM, whereas LaserLLM is dedicated to human research. List of currently available models with the description is available here.

Data extraction form with adding model suggestions to the text field

Once you have selected a model, its category will appear above the description field. You will also see that in the preview the colour of the field will turn yellow, which means that the model is activated, and the name of the model will also appear.

To see how model suggestions look in the Researcher's Focus Mode, click here.

Regular Expressions
In addition to AI models, suggestions can also be generated by defining regular expression (RegExp) rules. For example, to extract a PROSPERO protocol ID, you can specify a rule that searches for strings beginning with “CRD” followed by exactly eleven digits.

Fields management - Adding vocabulary field

To add new Vocabulary field, in the selected section:

Click 'Add field' and choose 'Vocabulary field’ option.
Select relevant vocabulary from the list. You have several options:

a. Vocabulary available in your organization’s- Controlled vocabularies library,

b. Mesh terms,

c. Temporary set - a vocabulary you create ad hoc as you build the data extraction form. It's especially useful if you want to add smaller vocabularies that are specific to a particular topic and don't want to share or save them. To create a temporary set, you basically type in a term and click Enter to save it. To create a temporary set, basically enter a term and click Enter to save it.

Panel for adding vocabulary fields to the Data extraction module

If you select option 'a' or 'b', follow the next steps.

Vocabularies have a hierarchical structure - each concept may have many levels, e.g. the intervention vocabulary may consist of the different drug names, and for each drug we may build a separate vocabulary level for dosage. Another example may be baseline variables (1st level), some of which may be dichotomous or continuous (2nd level). Continuous baseline variables can be, for example, age, BMI or mean blood pressure (3rd level).

You don't have to select all levels or elements of the created vocabulary - you can customise the vocabulary according to your needs and project.

3. To create a field with a list of terms on a chosen level, click the ‘Create Field’ button. You can create fields from all the levels of the vocabulary or just some of them.

To create two or more levels of vocabulary fields (multi-level vocabulary), add field names for both the 'parents' and ‘children’. The tool will create an appropriate number of data extraction field levels that are related to each other.

Panel for creating multi-level vocabulary fields in the Data extraction form

You can see the ready vocabulary in the 'Preview' panel. If you want to make minor changes and add additional terms from the original vocabulary, you can select 'Show terms'. If you want to add additional terms outside the vocabulary, enter them in the ‘Additional terms’ section, but remember that it won’t save them in the Organization settings.

From the extractor’s point of view: Once you have extracted the lowest level, e.g. Baseline variable name, the tool will automatically extract higher levels (e.g. Baseline variable category). You can also start coding from the higher levels and the tool will automatically limit the number of codes available on the list in the lower levels.

Multi-level vocavularies from the extractor perspective during the Data extraction

Single and multiple selection

A vocabulary field can be configured to allow either a single selection (the extractor can select one code from the list) or multiple selections (the extractor can select more than one code from the list).

By default, if an AI model is enabled for the vocabulary field, only single selection is supported.

Fields management - Enable model suggestions in the vocabulary field

Model suggestions can also be enabled for the vocabulary fields. The model will generate a list of possible suggestions and these suggestions can be checked against the vocabulary terms automatically or manually by the reviewer if the suggested value differs from the vocabulary. Model suggestions can also be added as new vocabulary terms - after checking by the reviewer, any accepted suggestions can be added to the existing vocabulary.

Enable model suggestions for vocabulary fields in the data extraction form.

Remember that any new term added to the vocabulary must be verified in the Data Cleaning module.

To see how model suggestions look in the Researcher's Focus Mode, click here.

Relations between models

Several models cooperate with each other during the data extraction process. Within Laser, you will find two types of combined models.

Models supporting outcome data extraction
and models supporting intervention extraction

In the first example, you can see how models extracted all data for a specific outcome for two study arms and the effect measure (comparison for those two study arms). In the second example, models extracted all data that defined one study arm, including substance name, dose, route of administration, etc. Researchers task is to verify these suggestions.

Overview of Data extraction with cooperating models in the Intervention section

Overview of Data extraction with cooperating models in the Outcomes section

To enable model suggestions, select from the list a model dedicated to specific data extraction fields. For instance, to automatically extract the outcome name, select the model: 'Outcome: Name of the outcome'.

Enable combined model suggestion for the Outcomes section in the Data extraction form

All models that support outcomes data extraction are grouped in the category 'Outcome'. All models that support interventions extraction are grouped in the category 'Study arm'.

Enable combined model suggestion for the Intervention section in the Data extraction form

Full list of cooperating models is available here.

Subsection management

General information

Subsections are groups of fields that can be multiplied within the object. They are used in three cases:

Basic subsection - to extract dependent values and multiply fields, e.g. in the case of multiple countries, In/Out criteria or combinations treatment in one study arm.
For group level data - to extract values per single group/study arm (outcomes, baseline characteristics)
For comparison - to extract comparison values between study arms (outcomes, baseline characteristics)

To add new subsection, click on the ‘Add subsection or field’ button at the bottom of the relevant section, and select ‘Subsection’.

Data extraction form with highlighted option to add new subsection to the form

Next step is to choose approporiate subsection - Basic subsection, For group level data or For comparison

Panel for adding subsections in the Data Extraction Form creator

Basic subsection

After choosing ‘Basic subsection’ from the list of available subsection types, you have to propose a subsection name and click ‘Add subsection’

Panel for adding subsections in the Data Extraction Form creator

Once you have created a basic subsection, you need to add a field to it. As with the standard field addition, there are two types to choose from: Text Field or Vocabulary Field. You can find more information about managing fields in the two sections above - 'Managing Fields - Adding a New Text Field' and 'Managing Fields - Adding a Vocabulary Field'.

Data extraction form creator with highlighted panel to add new field to the subsection.

Subsection ‘For group level’

To add a new subsection for group level data, after selecting the 'For group level' option from the list of available subsection types, you need to select the section in which the different groups are generated - for example, if you have two treatment arms and they are extracted in the 'Treatment' section, select this section among others.

Adding new basic subsection

After selecting the appropriate section, you can add sample fields that are most commonly used for these types of fields (see screenshot). Otherwise you will need to add fields manually. The default fields shown can be modified when added to the form.

After selecting the section, click on 'Add Subsection'.

Adding new basic subsection

In the dashboard for building Data extraction form you will see an added subsection below the main section. ‘Per Treatments’ mean that this subsection with the list of fields can be multiplied and extracted separately for all of the treatment arms.

If you have more than two study arms, you can extract a few comparisons with different treatments in the Data extraction form. The tool will generate combinations of comparisons based on the extracted study arms.

Remember that you can add more text or vocabulary fields to this subsection. As with the standard field addition, there are two types to choose from: Text Field or Vocabulary Field. To do it, click ‘Add field’ below the appropriate subsection in the left panel.

Data extraction form creator with subsection added to the Outcomes section

You can find more information about managing fields in the two sections above - 'Managing Fields - Adding a New Text Field' and 'Managing Fields - Adding a Vocabulary Field'.

Subsection ‘For comparisons’

To add a new subsection for comparing values between groups or arms, after selecting the For comparisons' option from the list of available subsection types, you need to select the section with arms - for example, if treatment arms are extracted in the 'Treatment' section, and you need a field to compare these arms, select ‘Treatment’ section among others.

After selecting the section, click on 'Add Subsection'.

Adding new 'For comparison' subgroup

In the dashboard for building the Data extraction form you will see an added subsection below the main section. ‘Comparisons’ mean that this subsection contains fields that enable comparison between study arms.

If you have more than two study arms, you can create a few comparisons with different treatments in the Data extraction form.

You can find more information about managing fields in the two sections above - 'Managing Fields - Adding a New Text Field' and 'Managing Fields - Adding a Vocabulary Field'.

RELATED ARTICLES

How to create data extraction form

Introduction

General overview of the list of tabs, sections and fields

Tabs

Sections

Fields

Sections management - general

Sections management - creating empty section

Section management - importing from templates

General information about connections

1. Auto-connect

You do not need to conduct this action manually. If you select the “Study Characteristics” section, the tool will automatically link all sections.

2. Connections - Only one

3. Connections - Many

Fields management - General information

Fields management - Adding new text field

Fields management - Enable model suggestions in the text field

Regular ExpressionsIn addition to AI models, suggestions can also be generated by defining regular expression (RegExp) rules. For example, to extract a PROSPERO protocol ID, you can specify a rule that searches for strings beginning with “CRD” followed by exactly eleven digits.

Fields management - Adding vocabulary field

Fields management - Enable model suggestions in the vocabulary field

Relations between models

Subsection management

General information

Basic subsection

Subsection ‘For group level’

Subsection ‘For comparisons’

Regular Expressions
In addition to AI models, suggestions can also be generated by defining regular expression (RegExp) rules. For example, to extract a PROSPERO protocol ID, you can specify a rule that searches for strings beginning with “CRD” followed by exactly eleven digits.