NLP Labeling

Bulk Label with the Search Extension

In this feature highlight, we discuss an example use-case for the Search Extension: a labeling tool that enables you to bulk label.

Jonathan Bruce

November 9, 2023

Datasaur ensures a smooth and efficient labeling process. Our built-in hotkeys and keyboard shortcuts enable users to label data efficiently. We aim to make the manual process of labeling fast, smooth, and accessible to non-technical users. Beyond keyboard shortcuts, our labeling tools help you tackle repetitive labeling functions with ease.

When specific words or tokens repeatedly require the same label throughout a dataset, this can become a tedious task for the labeler. For instance, labeling the word "CNN" as “Org” is straightforward. However, if "CNN" appears 250 times in a dataset, this simple task can become laborious, requiring the labeler to execute the same action 250 times.

Our Search Extension feature resolves this issue by introducing bulk labeling to the manual labeling process. Utilizing the Search Extension, we can command Datasaur to label a specific set of tokens or words in every instance they appear. Using the earlier example, we would be able to label "CNN" all 250 times immediately and simultaneously.

Let's explore this example together:

By clicking on “Search all files,” we enable a search for each instance across every file in the project, not just the currently viewed file. This allows us to label each instance across all files as well.

In the first dropdown box, we identify whether we are looking for instances of text in the dataset or labels that have already been applied. Then, we can filter the type of search inquiry we want: contains any word, exact word, or regex.

Note: You can use regex to search for multiple different words at one time. For example, select "Regex," and then place your keywords/phrases in the following regex formula: CNN|fox|dog|tree.

Continuing with our example, I searched for “CNN.” From the results, we see a total of 16 instances of "CNN" across the project. We can use this list to auto-scroll to each instance. The list is divided by which file contains the instance.

We may label every instance in this list by selecting the desired label from the “Label results as” dropdown menu and then clicking “Label all.” At this point, the selected label will be applied to every instance across the project, saving the labeler a significant amount of time.

Imagine you have 400 files in a project, each with 3 to 10 instances of your search inquiry; you will be able to simultaneously label hundreds or even thousands of tokens at once. Herein lies the power of the Search Extension.

In conclusion, Datasaur's innovative Search Extension feature significantly enhances the efficiency and ease of the manual labeling process. By introducing bulk labeling capabilities, it empowers users to apply labels across entire datasets with just a few clicks, thus eliminating the need for tedious, repetitive tasks. Whether dealing with instances of a specific word or a variety of terms, the feature's flexibility allows for precise searches and labeling through various search parameters, including regex. This accelerates the workflow and ensures consistency and accuracy throughout the labeling process. By streamlining these functions, Datasaur demonstrates its commitment to providing a user-friendly, efficient, and reliable data annotation tool, making it an indispensable asset for individuals and teams aiming to optimize their data labeling projects.

No items found.