Cleaning extracted data

Once your team has completed the data extraction tasks, you can initiate cleaning the extracted data.

To access the Data cleaning module, navigate to the Data Extraction Stage dashboard, where you will find a dedicated section for data cleaning tasks.

Note: you can start cleaning when at least one study was sent by extractors.

Dashboard of the Data Extraction stage with the "Data Cleaning" section highlighted

In the left panel of data cleaning module, you will find all the data extraction fields that were predefined as vocabulary (coded) fields. Note that data cleaning applies only to vocabulary fields.

Once you select a field in the central part of the page you will see all values extracted using this vocabulary field across all studies.

Next to some values you will see a yellow pen icon. These values are new suggestions to the existing vocabulary. The rest of the terms that don't have the icon were selected from the vocabulary used in the given field.In the first column- Extracted values presents Extractor’s suggestions It displays all the extracted values provided by the extractors. The second column - Reported values presents document values: Here, you can view the values stated in the study for the selected field.

Within this panel, you have access to action buttons that allow you to perform the following actions:

Merge: You can merge similar extracted values into a single code.
Split: If previously merged values need to be separated, you can split them.
Exclude: If certain extracted values are deemed irrelevant, you can exclude them.

Once you select a value on the right panel, you will have two options:

Assign Extracted Values: You can associate extracted values with the appropriate code from the controlled vocabulary. If an extractor made an incorrect decision while selecting a code from the list, you can select the relevant code that already exists in the vocabulary.
Create New Code: If a code does not exist in the controlled vocabulary, you have the option to create a new code (you must select one of the extractor suggestions). Click the option +Add to additional terms and the suggested term will be added to the current project and will be visible right away to other Researchers during extraction. This new code will be used in the data export for this project and may also be considered as a suggestion for the User responsible for managing organisational vocabularies. To use the code in future projects navigate the controlled vocabulary section. The term will be visible in the ‘Suggestions’ tab.

Data Cleaning module with three panels: a list of vocabulary fields on the left, the selected type of vocabulary field in the middle, and a coding panel on the right.

Merging values

To merge values, follow these steps:

Identify the values on the list that represent the same concepts and tick the checkboxes next to them.
Click the 'Merge' button.

Tip: The value that represents the best name should be highlighted (we will use it to create a new code name). You may highlight the value that is most commonly reported.

Data Cleaning module displaying the "Merge" action for selected terms in the central part of the module.

Data Cleaning module displaying the merged codes in the central section

Coding

Once you merge similar values into one code, you need to decide which code best represents the extracted data.

If an extractor made an incorrect decision while selecting a code from the list, you can choose the relevant code that already exists in the vocabulary. To code values, simply select the appropriate code from the vocabulary list
If a code does not exist in the controlled vocabulary, you can create a new code (you must select one of the extractor suggestions-central panel or create a new term in the right panel ). This new code will be used in the export data for this project.

Additionally, it could serve as a suggestion for User, who is responsible for organizational vocabularies (Click ‘add to additional terms’. If accepted, this new code can be used in future projects.

Data Cleaning module showing how merged codes are added to vocabulary terms in the right panel.

Once you have added an additional vocabulary term during data cleaning, you may see this term added to the vocabulary in the data extraction form created for this project. It will be visible when you expand the list of vocabulary terms and also in the ‘Additional Terms’ tab. Extractor's will have access to this new term in their focus mode.

Data Extraction facilitator showing vocabulary fields with a new code added during the data cleaning process.

RELATED ARTICLES

Adding suggested terms to vocabulary