From Data Chaos to Clarity - Smarter Extraction Starts Here in Data extraction form creator

TABLE OF CONTENTS

Introduction
Step 1. Determine Data Items - Define What to Extract
Step 2. Group and Organize Your Data extraction concepts 
Step 3. Specify Relationships Among data extraction concepts
Step 4. Develop a Data Dictionary - Vocabulary
Step 5. Design the Extraction Form 
Step 6. Test and Calibrate the Extraction Form with your team
Step 7. Prepare Document with detailed instructions for team if necessary  
Step 8. Review, Export and Organize Extracted Data

Introduction

Building a Data Extraction Form (DEF) is more than just listing variables—it's the backbone of any systematic review, especially when you're navigating complex studies, multiple interventions, or layered outcomes.

The 10-step framework by Afifi et al. (2023) breaks this process down into three phases: Database Planning, Database Building, and Data Manipulation.

But even the best framework needs the right tools. That’s where Laser AI comes in.

Laser AI doesn’t just support data extraction - it transforms it. From defining what matters, to connecting extracted values and ensuring clean, structured outputs, we make every step smarter, faster, and error-resistant.

Importantly, Laser AI supports this process fully - not by treating the data extraction form as a flat checklist, but as a structured, relational framework. Each form is built using a database format, where data is organized into Tabs, Sections, Subsections, and Fields, mirroring a relational database structure. This allows for both single-entry fields (e.g., study-level data) and repeating entities (e.g., multiple arms, outcomes, or time points), enabling scalable and analysis-ready data capture.

To learn more about this database approach, read this article: LaserAI Data Extraction: Moving Beyond Spreadsheets with a Database Approach

In this guide, we’ll walk you through each stage of DEF development, and show you how Laser AI empowers you to do it better.

Step 1. Determine Data Items - Define What to Extract

The first step in developing a Data Extraction Form (DEF) is to identify the key data items needed to answer your systematic review’s research question(s). These typically include:

Study-level information (e.g., design, setting, funding)
Population and intervention details
Outcomes of interest
Risk of bias assessment

To define these elements, your team should rely on:

Domain knowledge
Key example studies
Previous relevant systematic reviews

It's also important to decide how you will capture bias assessment data—either directly within the DEF or using external tools such as ROB2 (for RCTs)

Support from Laser AI

You can explore our library of Ready-to-Use Data Extraction Forms, specifically designed to support:

Effectiveness reviews (with templates tailored for different intervention types)
HEOR and economic evaluations
Risk of bias assessments, using commonly accepted tools

All forms can be easily adapted—you can add, remove, or rename fields to match your review’s scope.

Visit our Knowledge Base to view the full list of available templates with descriptions.
Use them as inspiration or directly integrate them into your project.

Step 2. Group and Organize Your Data extraction concepts 

Once you’ve identified all the data items you plan to extract, the next step is to group them according to their role and level in the study hierarchy.

Why Grouping Matters

In complex systematic reviews, some data points occur once per study (e.g. study design or funding source), while others may appear multiple times—such as:

Several intervention arms
Multiple outcomes
Studies conducted across different countries

Recognizing these patterns is essential for building a well-structured data extraction form that reflects the actual structure of the evidence.

Think Excel, but smarter

If you’re familiar with structuring data in Excel, this concept will feel intuitive:
In Excel, you might use separate sheets for different entities and color-code related fields.

In Laser AI, this hierarchy is built directly into the form using a clear and consistent layout:

Tabs → Sections → Subsections → Fields

Tabs = Separate thematic areas or entities (e.g., Study characteristic, Outcomes, Risk of Bias)
Sections = Groups of related fields within each tab (e.g., Population characteristics)
Subsections = Used for repeating data (e.g., country) or outcome data (both study arsm or comparisons)
Fields = Individual data entry points (e.g., Study type, sponsor etc)

This structure lets you separate single-entry data from multi-entry data, ensuring clarity for both extractors and statistical analysts.

How Laser AI Helps
Laser AI fully supports this hierarchical structure:

Easily define which fields should be repeated per group (e.g., per arm, per outcome).
Avoid data duplication by grouping once-entered variables at the top level.
Use subsections for nested data without creating extra forms or files.

Summary
Grouping your data into logical, hierarchical entities:

Makes your extraction form easier to navigate
Reduces errors during data entry
Supports complex study designs with multiple layers of data


Use the Tab → Section → Subsection → Field layout in Laser AI to reflect your review structure clearly and efficiently.

Click here to learn more about these Data extraction forms element

Step 3. Specify Relationships Among data extraction concepts

Once you’ve listed and grouped your data extraction fields, the next critical step is to define the relationships between them - just as you would when organizing data in a structured Excel workbook.

The Idea Behind Relationships
In many cases, a single data item at a higher level (e.g., a study) corresponds to multiple values at a lower level (e.g., arms, outcomes, countries). This is a classic one-to-many (1:M) relationship, and it needs to be made explicit in your data extraction structure.

How Laser AI Supports This
In Laser AI, relationships between data fields are built directly into the Data Extraction Form (DEF):

Within a section (group of fields), connections between the fields are automatically handled by the tool.
You can specify connections between sections during form creation.
Instead of duplicating data across columns (as you might in Excel), you define logical links—saving time and reducing errors.
This setup ensures that the same data only needs to be extracted once, even when it's referenced across multiple parts of a study.

In Laser AI, there are three types of specifying relationships between concepts:

Section- automatically connect fields in this section with each other
Subsections- automatically connect section with list of data extraction fields in subsection
Connections - connect sections with each other

Examples of Relationships in Practice using subsections
Baseline Characteristics
When extracting baseline data (e.g., age, sex, comorbidities), you need to connect these fields to specific study arms. In Laser AI, this is done using the "Per Arm" subsection, which lets you extract data for each group within the same structured form (see example below)
Outcome Comparisons
For comparative outcome data (e.g., treatment A vs. B), you can use the "Comparison" subsection, which connects values across groups for direct analysis (see example below)

Other Connection Types
Laser AI supports different connection logic:

Autoconnect – Laser AI links relevant fields automatically (e.g. all sections from other tabs with first section in first tab).
Connection: Many – Link a field to multiple other fields (e.g., one ROB domain tied to several outcomes).
Connect: Only one – Link a field to a single relevant value (e.g., analysis type per outcome).

These options allow you to model complex relationships between data elements accurately and efficiently.

Summary
By specifying relationships between fields in your DEF (regardless of whether it's subsection or traditional connection):

You eliminate unnecessary duplication.
You create a structured, relational model that mirrors how real-world study data are reported.
You enable cleaner, more reliable datasets—essential for high-quality analysis.

Want to go deeper? Learn more in our guide on Connecting Fields and Using Subsections in Laser AI

Step 4. Develop a Data Dictionary - Vocabulary

A well-structured data dictionary is a key part of a high-quality data extraction form. In addition to defining variable names, labels, types, formats, and any special requirements, Laser AI emphasizes the use of controlled vocabularies to ensure consistency and accuracy during data extraction and analysis.

Why Building Vocabularies Matters
Using standardized vocabularies improves both the quality and usability of extracted data:

When extractors use consistent codes for the same concepts, even if the wording differs slightly, data analysis becomes significantly easier.
Controlled vocabularies help minimize discrepancies between reviewers and reduce interpretation errors.
They facilitate accurate comparison and aggregation of data across studies.
By creating structured, coded datasets, vocabularies streamline the statistical analysis phase.

Conclusion: Investing time in building vocabularies early pays off during analysis. Do it once—save it, and reuse it.

From the Extractor’s Perspective

When a field requires coding, use vocabulary fields instead of free-text:

Select the most appropriate term from a predefined list.
Vocabularies are prepared and uploaded by a Data Manager or Project Manager.
As a researcher, you can suggest new vocabulary terms if you encounter concepts not yet included.

Go deeper and check how to extract data in a vocabulary field

From the Form Builder’s Perspective

Laser AI provides full support for controlled vocabularies:

All vocabularies can be saved as organizational assets.
They can be modified and reused across multiple projects, saving time and ensuring methodological consistency.

Options:

Already have your own vocabulary? Great—just import it directly into Laser AI.
No vocabulary yet? No problem—you can use one of our ready-to-use vocabularies tailored for common concepts in clinical, HEOR reviews.
Need a simple solution? Use a temporary vocabulary for basic response fields (e.g., Yes/No). These are ideal for one-off use and don’t need to be saved in the organizational library.

Cleaning Extracted Data

Even with vocabularies in place, mistakes can happen:

Reviewers may apply incorrect codes.
New terms may arise that weren’t in the original vocabulary.

Laser AI provides a dedicated Data Cleaning module for this purpose:

Accessible from the Data Extraction Stage dashboard.
Allows managers to review, edit, and standardize coded values.
Ensures a clean, consistent dataset ready for statistical analysis.

It May Look Like a Lot of Work...

But it’s not—do it once, save it, and reuse it.
Use our ready-made, controlled vocabularies curated by the Laser AI team. 
Save your own vocabularies to the organization library for future projects.

Step 5. Design the Extraction Form 

Once your data items are defined and grouped, the next step is to create the data entry forms that your reviewers will use during extraction. In Laser AI, this step is seamlessly integrated with the previous stages—designing the form structure, defining relationships, and building vocabularies.

Features That Make Extraction Easier

Laser AI offers a range of tools that improve the reviewer experience and reduce manual effort:

AI model suggestions: Automatically pre-populate fields with suggested values extracted from the article. Reviewers simply accept, reject, or edit.
Field-specific instructions: Add detailed guidance for each field especially useful for complex data points—so extractors know exactly what to do.

️Built-in Quality Control Features
Laser AI helps prevent extraction errors through robust validation options:

Input validation - define acceptable formats, e.g., restrict a field to numeric values only or enforce date formats using controlled vocabularies
Required fields and key fields - ensure that required information will be extracted
Preview Mode - before assigning tasks to reviewers, you can use Preview to test the form with real PDFs. This helps ensure the form setup is complete and logical, and nothing has been missed.

Best Practice
Your form should follow the flow of information as it appears in source articles, making it intuitive for extractors. Laser AI’s structured layout (Tabs → Sections → Fields) makes it easy to align the form with real-world reporting formats.

Summary

AI-powered suggestions → to reduce workload 
Clear field-level guidance → to reduce confusion 
Preview mode to test forms → before use in the main extraction
Input rules and validation → to reduce errors

Step 6. Test and Calibrate the Extraction Form with your team

Before launching full-scale data extraction, it’s essential to pilot your DEF using a small, purposive sample of studies, ideally ones with varied study designs, outcomes, and reporting styles. This step ensures your form works as intended and that reviewers are aligned in how they extract and interpret data.

How Laser AI Supports Piloting

In Laser AI, you can create a separate project just for piloting or calibration exercises. This allows you to test your form using selected articles and refine it before distributing full extraction tasks.

The pilot helps you:

Confirm that reviewers understand the extraction process
Identify missing fields, confusing instructions, or mismatched logic
Verify the form structure reflects how data is actually reported in studies
Detect technical issues like incorrect validation rules, missing codes.

Key Considerations During Piloting

a) Validate AI Model Suggestions

If your form uses AI model suggestions (from text), verify that:

The model is activated for relevant fields
The suggestions are accurate and meaningful for your sample studies
Model coverage is appropriate - some models may not perform well on certain studies

If the model underperforms or causes confusion, it may be better to switch the field to manual extraction for clarity and accuracy.

b) Test Vocabulary Fields

Ensure vocabulary terms are visible, complete, and easy to apply
Add or adjust values if needed, especially for complex or domain-specific vocabularies, for example it might be worth to consider to add multilevel vocabulary fields in case of large vocabulary to improve extractor usability.

c) Check Semi-Automated Table Extraction

Make sure the form structure supports AI-assisted table reading (especially for outcome data, comparisons)
Validate that extracted values are linked correctly to the appropriate arms or comparisons (subsections)
If layout or data variation across studies is high (many subgrups, timepoints in one table, consider using manual extraction instead or extract data from tabel in steps). Discuss with team any diffuculties and describe what approach will be the most suitable in your project.

Use 2–5 studies with different structures or reporting styles. These will help surface hidden issues and ensure your form performs well across a variety of study types.

Step 7. Prepare Document with detailed instructions for team if necessary  

Effective data extraction relies not only on a well-designed form, but also on clear guidance and proper team alignment.

If your team is:

new to Laser AI,
handling a complex review, or
if you identified confusion during the calibration phase,

It's highly recommended to prepare a detailed reviewer manual.

Why Documentation Matters
A good manual ensures:

Consistency across reviewers
Clear understanding of each field and its purpose
Fewer errors and less need for rework

How Laser AI helps
If you're using a ready-to-use data extraction template from the Laser AI Knowledge Base, you’ll find practical examples with such instructions. These can be:

Used as-is
Modified and expanded to reflect the specific needs of your review

This allows you to avoid starting from scratch and quickly tailor training materials to your project.

Summary

Provide detailed documentation with examples 
Use templates to save time and ensure best practices 
Train the full team using diverse sample studies

A well-trained team + a clear manual = reliable, high-quality data extraction

Step 8. Review, Export and Organize Extracted Data 

Once data extraction is complete, Laser AI provides flexible and structured tools to export, review, and finalize your dataset.

As reviewers complete extraction study by study, all data is automatically compiled and available in the Extraction Summary module - your centralized view of all collected data.

After piloting, inspect the Extraction Summary:

Is the data well-organized and consistently labeled?
Are vocabulary terms and field types used properly?
Does the output appear ready for downstream analysis?

Who Should Be Involved?

Statisticians, to confirm output is suitable for analysis

Export Options:
Choose from multiple formats:

Statistical export – structured and analysis-ready
Simplified export – cleaner for human reading or stakeholder review

Export specific parts of the form, such as:

Export one tab
Selected sections from one or multiple tabs (customize)
Export all

Subfield Control:
You can also decide which subfields to export for each data extraction field:

Extracted value (what the reviewer exracted)
Author-reported value (verbatim from the article)
Comments (supporting notes or clarification)

You can export all three or limit the output to only the fields you need—tailoring the dataset for different audiences (e.g., statisticians, clinical reviewers, or publication teams).

If you're using the recommended Tab → Section → Subsection → Field structure, your data will be cleanly organized for both targeted export and easy merging.