Build a Model using Amazon Comprehend

Build a model with the new integration between Comprehend and Datasaur.
Post Header Image
April 28, 2023
Published on
April 28, 2023
April 26, 2023
Post Detail Image

How does Datasaur help?

At Datasaur, we are committed to providing you with the most efficient and effective labeling process possible. We understand that in order to achieve success in the field of machine learning, it is essential to have access to a variety of platforms, including Amazon Comprehend, Amazon’s NLP offering.

In today’s blog we are going to provide a step-by-step guide on how to take your labeled data in Datasaur to create a model using Comprehend. Let us begin by labeling our dataset!

Labeling with Datasaur

  1. Prepare your preferred dataset. We will use the following dataset for this tutorial: Dataset link . Shortcut: You can use this labeled data to jump into the last step: Step 11.
  2. Log in to your Datasaur account. We recommend you use the team workspace.
  3. Go to your desired team workspace. Once you have stored your data from the previous step, you can create a project using the Datasaur project template with the DOC (document) type. This will provide you with automated settings for a classification project.

  1. To get started with the data labeling process, the first step is to upload your data. Use this Dataset link that we provided earlier.

  1. After uploading your data, you can preview your dataset. 

  1. To set your labeler's task, you need to add a question set. The question set defines the goals for your labeling project. Name the question set “Category.” From the dropdown “Question Type” select “Dropdown.” Enter the following labels: Business, Technology, Politics, Sports, and Entertainment. 

  1. When working on a data labeling project with a team, it's important to assign responsibilities clearly to ensure efficient collaboration. In Datasaur, you can assign team members as either a labeler or reviewer based on their roles and responsibilities for the project.

  1. Finally, you have reached the last step. The project is now ready to be launched.

  1. You are now ready to annotate the data
  1. After you have finished labeling your data in Datasaur, you can easily export it to Amazon's native format. To export your labeled data to Amazon, you need to first access the Export submenu by clicking on the “File” menu. Once there, you will see a dropdown menu with several options, one of which is “Amazon Comprehend”. Selecting this option will initiate the export process, after which you will have a file in Amazon's format that can be used for training and deploying machine learning models.
  1. Your dataset is now fully prepared and ready to be utilized for your machine-learning project.

Prepare the model endpoint using Amazon Comprehend

In this section, we will show you how to create a ready-to-use machine-learning model through Amazon Comprehend.

Create Training Process

  1. Go to Amazon Comprehend > Custom classification > Create new model
  1. You must set a model name, model version, and dataset language. The model version is optional. In this case, we use “v0” as the version name and “English” as the language.

  1. Choose the dataset format (single-label or multi-label). We recommend using the single-label mode, based on our dataset above.
  2. Import the data in CSV format from the S3 bucket. You can browse your bucket and select our previously labeled dataset.
  3. Select the source for the test dataset (auto-split from the training dataset or upload a separate test dataset). We will use “Autosplit” for the test dataset.

  1. Optionally, select the folder path to dump the trained model.
  2. Set the IAM role and other optional settings. We can use the existing IAM role.
  3. Select Create. Your model is now ready to be trained.

Training Status Interface

To see your training status, go to Amazon Comprehend > Custom classification. On this page, you will see all your training history.

Metric Information

To obtain metrics information from your trained model, navigate to Amazon Comprehend > Custom classification > {MODEL_NAME}. Here, you can review the version and performance metrics of the model in detail. Our model training has achieved an F1 score of 91%.

Comprehend / Datasaur 

Congratulations! You have successfully trained your model using Amazon Comprehend within minutes. As you continue to use Amazon Comprehend, you will discover its power as a tool that can help you derive valuable insights from your data. We hope that you found this experience informative and enjoyable. We look forward to assisting you with any future projects you may have. Thank you for choosing Amazon Comprehend and Datasaur! If you have any questions please contact us at – we would love to hear from you! 

No items found.