The most advanced NLP data labeling tool now partnering with AWS

The most advanced NLP data labeling tool has partnered with AWS, combining cutting-edge technology with the power and scalability of the AWS cloud to revolutionize NLP projects and accelerate machine learning model training.
AWS x Datasaur
The most robust NLP labeling & LLM platform choice
for cutting-edge organizations around the world.
Datasaur is designed to integrate seamlessly with your existing AWS services, providing a flexible and feature-rich solution for all your data labeling needs. You can also host your own instance of Datasaur on AWS and customize the server specs to meet the specific needs of your organization. This includes the ability to scale up or down as your data labeling needs change, as well as take advantage of the security and reliability features that come with AWS.

Featured Integrations

  1. 1.
    Integrate your own S3 buckets to seamlessly transfer data to Datasaur projects. This will allow you to directly fetch data from and into your bucket when using the app.
  2. 2.
    You can directly integrate results from Amazon Textract to be used as OCR annotations when creating projects at Datasaur.
  3. 3.
    Focus on labeling on the Datasaur platform, and use our SageMaker integration to automatically train an NLP model in minutes.
  4. 4.
    Alternatively, you can also export the annotation results in a format that is compatible with Amazon Comprehend.

Self-Hosted with AWS

Datasaur can also be deployed as a self-hosted solution on AWS. This allows you to have complete control over your data labeling environment, and it also allows you to comply with any specific security or compliance requirements. Datasaur is fully integrated with these services below as the building block of the application.

Amazon Elastic Kubernetes Service (EKS)

The foundation of the solution. It is so much easier to use a managed service to configure a Kubernetes cluster.

Amazon Relational Database Service (RDS)

Database service of choice without ever having to worry about resilience, quality, and backup.

Amazon MQ

RabbitMQ service to handle long running jobs with complex processes.

Amazon ElastiCache

Redis service to handle sessions.

Amazon S3

Object storage service that will be used to store labeling data.

Amazon SageMaker

Machine learning platform that supports Datasaur Assist and ML-Assisted Labeling features.

Amazon Simple Email Service (SES)

Email platform used to send notifications and messages for Datasaur users.

Amazon Textract

Automatically extract printed text, handwriting, and data from any document

Amazon Bedrock

AWS service for building and scaling generative AI applications
As seen on

We [Consensus] had a very complex and specific set of annotation needs. Datasaur was able to address those needs efficiently and effectively.

Eric Olson, Co-founder and CEO, Consensus

Information labeling tasks has been reduced by 80% which has allowed us to optimize our workflow much more, allowing us to focus on other areas that are also priorities for us.

Product Manager, LegalTech

"We looked at Prodigy, LightTag, LabelBox, Scale and more. You really can't beat Datasaur for their suite of features and price point."

Director of Data Science, Financial Institution