Tutorial

Actions: Automating Projection Creation

We have released a feature that enables you automatically create projects from your preferred workflow settings.
Post Header Image
Jonathan Cesario
March 28, 2023
March 27, 2023
Post Detail Image

We have released a new feature called "Actions.” This feature automates project creation to make the end-to-end data labeling process even more efficient. In this blog post, we will explore this new feature and provide an example of how it can be utilized in your labeling creation process.

What is Actions?

For our initial release of Actions, the automated task is creating projects. Creating projects can be a repetitive task, especially if you’re using Datasaur for only one or two workflows.

With Actions, you can tell Datasaur the project settings you would like to automate, including:

  1. Project configuration through a Project Template.
  2. Data preparation from your own bucket with External Object Storage integration.
  3. Additional tags to be added to the project if needed.
  4. Assignment distribution from a pool of labelers and reviewers for each project created.
  5. Conflict resolution settings.

This feature is designed to help users save time with an easy-to-use experience directly on the app, without the need for scripting.

How can I automate projects using Actions?

1. Creating an Action

Creating an Action is a straightforward process. To get started, navigate to the "Automations" tab on the sidebar. From there, click the "New Action" button on the right. For a detailed explanation of the form, click here. By creating a new Action, you specify the project creation process you want to automate.

2. Run and Monitor the Action

Once you have successfully created an Action, a new card representing it will be generated. To run the Action, simply click the "Run" button on the card and wait.


To view more detailed information about the Action Run that you have just triggered, click on "View Run". This will redirect you to an Action Run page where each row on the table represents each time that you have run the Action.


If you wish to see the details of all projects being created in an Action Run, click on "View Details". This will redirect you to an Action Run Detail page where each row on the table represents a project being created through the Action.

Automation Requirements

We will use a fairly advanced example here to showcase the full capabilities of Action:

1. We want to automate the creation of Token labeling projects with the following settings:

  • Line separator: new line
  • Tokenizer: white space
  • Use Default NER as the label set
  • Mask PII: on
  • Display labeler names in Review mode: on
  • Add the "Phase 1" tag to each project created

2. We want to store files used to create projects on AWS S3 with a bucket named datasaur. Let’s say we want to label various snippet stories from popular novels, which can be accessed here.

  • The input folder should be used as the prefix for each uploaded file.
  • The resulting files from the newly created projects should be stored in the output folder.

3. We have 5 reviewers and 12 labelers. Every time a project is created, we want to distribute the work, assigning one reviewer and three labelers to each project. However, since there will be multiple documents in each project, we want to assign only two labelers to each document.

4. For consensus, we want the system to automatically accept a label if both labelers agree.

Actions Tutorial

  1. Set up an AWS S3 bucket named datasaur.
  2. Prepare the data and upload it to your bucket. Set up two folders: name one input, the other output. Here is how to upload the files via Actions:
  • Each folder inside the input folder represents a project. The Datasaur project’s name will be based on the folder's name. If there are two folders inside input, Action will create two projects.
  • Each file inside a project's folder is treated as a document belonging to that particular project.
  • For example, we can create three projects by uploading the sample files separated into separate folders according to their release dates:
  • ~19th Century Novels will have Pride and Prejudice, Moby Dick, and Sherlock Holmes:
  • ~~input/19th Century Novels/Pride and Prejudice.txt;
  • ~~input/19th Century Novels/Moby Dick.txt;
  • ~~input/19th Century Novels/Sherlock Holmes.txt
  • ~20th Century Novels will have Harry Potter, The Great Gatsby, and To Kill a Mockingbird:
  • ~~input/20th Century Novels/Harry Potter.txt;
  • ~~input/20th Century Novels/The Great Gatsby.txt;
  • ~~input/20th Century Novels/To Kill a Mockingbird.txt;
  • ~21st Century Novels will have The Hunger Games:
  • ~~input/21st Century Novels/The Hunger Games.txt

3. In Datasaur, create a Token labeling project using the configuration provided above.

Here is the breakdown of each step.

  • Step 1
  • ~Upload any files from the above sample.
  • Step 2
  • ~Make sure the line separator is new line.
  • ~Make sure the tokenizer is white space.
  • Step 3
  • ~Use existing label set and choose Default NER.
  • Step 4
  • ~Select anyone on the assignment as this step won’t affect the Action later since there will be a configuration to handle the assignment.
  • Step 5
  • ~Enable the Mask PII.
  • ~Enable show labelers name in Review Mode.

4. Once the project is created, click the triple dot menu next to your newly created project.
You will see an option to "Save as Template."


5. Go to the Action page and start creating the Action by filling out the wizard form:

  • Step 1
  • ~Fill out the name.
  • ~Use the Project Template from step 3.
  • ~Use the External Object Storage from step 1.
  • ~Fill in the input folder path value with input.
  • ~Fill in the result folder path value with output.
  • ~Fill in the tags attribute with Phase 1. Don't forget to enter it since we can handle multiple tags.
  • Step 2
  • ~Search for and select all 5 reviewers.
  • ~Search for and select all 12 labelers.
  • ~Set the reviewer distribution for each project to 1.
  • ~Set the labeler distribution for each project to 3.
  • Step 3
  • ~Set the labeler distribution for each document to 2.
  • ~Check the peer review consensus and change the threshold to 2.
  • Step 4
  • ~Check whether the configuration is correct, along with the assignment illustration. Then, create the Action.

6. You will be redirected to the Action page once again, and there will be a new Action card.

7. Click the Run button and wait. Your projects will be created. To see detailed information about the automation process, click on View Run.

Conclusion

The Actions feature in Datasaur is a powerful tool that can streamline your data labeling processes. By automating repetitive tasks, you can save time and focus on the labeling process itself. This is especially helpful for larger datasets that require a significant amount of labeling and need to be supported through a batch process, which can be time-consuming and tedious.

At this time, we are launching Actions for project creation. As a next step, we are working on supporting project export. This means that you will soon be able to export multiple projects in a variety of formats with the click of a button, making it easier to remove the manual work needed to export the results of your labeling. If this sounds helpful to you, please send us an email at support@datasaur.ai!

No items found.