Laser AI Data Extraction: Moving Beyond Spreadsheets with a Database Approach

Traditionally, systematic review data extraction relies on spreadsheets, a familiar format that nonetheless struggles with the complex research data. Laser AI addresses this issue by using a database-based organization of data extraction forms that enables advanced AI features and streamlines workflows. This article aims to explain the concept and structure of Laser AI's database-based organization of data extraction fields and highlight benefits of such format in comparison to traditional spreadsheet methods.

1. Enabling Powerful AI Assistance

The database structure is fundamental to enabling Laser AI's AI assistance. Namely, Laser AI utilizes trained, specialised AI models dedicated for specific extraction tasks. To operate effectively, these models require the data to be properly structured.

What can be extracted using Laser AI models?
There are countless concepts that one might wish to extract from a study. For an RCT study, for example, these concepts may range from the selection criteria for eligible patients and their detailed baseline characteristics to the methodological details of the study design and execution, details on intervention and study arms, through to the data on the outcomes and endpoints measured, up to the final study conclusions.

Even within a single therapeutic area, there are plenty of possible types of outcomes that can be measured. Therefore, to simultaneously cover the broadness of possibilities and provide dedicated, pre-trained models, the structure of the data extraction form goes as follows.

Example: In traditional spreadsheet extraction form, if one would be interested in assessing the impact of a drug on blood pressure, the outcome of interest would be called “Blood Pressure”. Since there are multiple possibilities on how such outcome may be reported, e.g. absolute value of the blood pressure, mean blood pressure, blood pressure change, range of blood pressures measured within study arms, etc., each of these options would probably be covered by a separate column. In Laser AI we do not have a model for blood pressure, instead be provide few collaborating models which recognise:

Outcome name (text field, here: Systolic Blood Pressure)
Outcome measure (field with pre-defined values, such as mean, median, n, %)
Outcome variance type (field with pre-defined value, such as 95% CI, range, SD, SE)
Outcome value (text field)
Outcome variance (text field)

We are continuously working on developing new and improving existing models. The complete list of currently available models can be found here.

2. AI-supported extraction for tables

Laser AI recognizes tables with data (e.g. outcomes or baseline characteristics) and can extract all its components - rows, columns and values. However, this is only possible when the data extraction form has the correctly structured corresponding fields. Matching the columns and rows from the table to the data extraction form enables the simultaneous extraction of information from the entire table. As in the example described above, the model can extract the following:

Baseline characteristics name (e.g. age)
Baseline characteristics value type (e.g. mean, median, n, %)
Baseline characteristics variance type (e.g. 95% CI, range, SD, SE)
Baseline characteristics value (e.g. 59 years)
Baseline characteristics variance (e.g. ± 4 years)

Instruction on how to perform an AI-supported extraction form table is given here.

3. Optimise data extraction forms.

Laser AI optimises data extraction forms by utilizing repeating groups of core fields (sections and subsections), which users duplicate dynamically for each specific characteristic encountered. Thanks to this structure, there is no need to create long forms containing empty fields due to a lack of data. Instead, the form will only contain data relevant to given study, making it easier to get an overview of the entire data extraction form and facilitating further statistical analysis. For example, if a project aims to extract information on four baseline characteristics—age, sex, height, and weight—the group of fields related to baseline characteristics would be duplicated four times if all are reported in a study. However, if in a given study the information on height is missing, the user would only duplicate the group three times, directly reflecting the available data.

This method allows capturing diverse data points within a compact, vertical structure, resulting in significantly shorter and more intuitive forms compared to a long horizontal list of columns in a spreadsheet.

4. Seamless Data Transfer for Meta-analysis

Another advantage of Laser AI's database structure is that it organizes extracted data in a format inherently ready for analysis, allowing for direct querying and export. This enables seamless data transfer into statistical software packages commonly used for meta-analysis, such as R or similar tools. Conversely, data collected in traditional spreadsheets typically requires significant manual cleaning and reformatting before it can be utilized effectively for statistical analysis.

5. Coherence and Reusability

Laser AI's database structure ensures a consistent schema across all projects, which greatly facilitates the creation of new data extraction forms and enables easy reuse of existing ones. This consistency also makes it significantly easier to compare data systematically between different projects or aggregate data across the entire organization. The result is more standardized, coherent, and ultimately more manageable data company-wide.

Laser AI's database approach to data extraction is a fundamental design choice that moves beyond spreadsheet limitations to enable powerful AI features, streamline data entry, and facilitate robust data analysis and organization-wide coherence. While the underlying database structure represents a shift from traditional methods, its design is fundamentally aimed at making the data extraction process more efficient and ensuring the resulting data is more valuable for analysis and aggregation.

RELATED ARTICLES

From Data Chaos to Clarity—Smarter Extraction Starts Here in Data extraction form creator