Datasaur supports many types of span labeling.
We've set up one of our most popular project types - NER labeling.

To start labeling, click or highlight span of tokens (words) to label them. Feel free to play around with the ML-assisted extension. We also make labeling more efficient by providing keyboard shortcuts

LABELING
Expand/Shrink one step to the left
Expand/Shrink one step to the right
Delete label
EDIT
Undo
Redo
Copy

Explore more about our NLP features

Discover the capabilities of Datasaur for NLP labeling. An advanced NLP data labeling tool crafted to meet the unique requirements of your team. With quality and speed at the core, ready to be customized for your team’s needs.

Datasaur offers robust LLM tools to enhance your models.
Here is a preview of how you can see the completions from your models

You can run the provided prompt and obtain generated completions. Please note that this is only a slight sample of what we can do. For the complete experience of our LLM solutions, please drop us a message to request free LLM access.

Prompt template
Add prompt template
Add your first prompt template here
Add prompt template
Prompt template 1
gpt-4
Your chosen model. Currently you can choose between gpt-4, gpt-3-turbo, and gpt-3-turbo-16k.
A fullscreen view containing prompt templates and advanced settings.
Advanced settings where you can configure the LLM provider and model, randomness (temperature and Top P), and maximum token length.
More options: clone, delete template.
Average processing time: -
System instruction
You are an assistant chatbot that knows anything how-to regarding Datasaur
You can write the description of your model here, for example, "You are an expert AI assistant in international law."
User instruction
Answer firmly based on the knowledge you have. If you don't have the information, tell them honestly and say I will research more.
Here is where you can specify the behavior of your model. For example, "Answer confidently, avoid using the word "think" and similar terms, and if you do not have the information, simply state so."
Prompt
You can create prompts by upload it in CSV format.
Add prompt
Need to add prompt template & prompt to run the prompt
Prompt  8
Delete prompt
So, your LLM Lab is free?
Completion Result
Yes, Datasaur's LLM (Large Language Models) suite is currently free for all users. It's part of our effort to provide an effective tool for fine-tuning the intelligence of large language models. However, if you need access to the LLM Evaluation Project or LLM Ranking Project features and they are not yet available on your account, you can send a request to support@datasaur.ai.
Result from Prompt template 1
|
-
Prompt  7
Delete prompt
What cloud solutions do you have?
Completion Result
Datasaur supports integration with multiple cloud storage providers such as Amazon S3, Google Cloud Storage, and Azure Blob Storage. Here are the detailed guide links for the setup:

1. Amazon S3: https://datasaurai.gitbook.io/datasaur/integrations/external-object-storage/aws-s3

2. Google Cloud Storage: https://datasaurai.gitbook.io/datasaur/integrations/external-object-storage/google-cloud-storage

3. Azure Blob Storage: https://datasaurai.gitbook.io/datasaur/integrations/external-object-storage/azure-blob-storage

By integrating these services, you can create projects using your preferred cloud storage provider. This also allows you to maintain control over access from Datasaur to your object storage.
Result from Prompt template 1
|
-
Prompt  6
Delete prompt
What rules or regulations are you compliant with? Could you explain?
Completion Result
Datasaur complies with stringent security and privacy standards, including SOC 2 Type II and HIPAA.

SOC 2 Type II is a component of the American Institute of CPAs (AICPA)'s Service Organization Control reporting platform. Its goal is to assure the security, availability, processing integrity, confidentiality, and privacy of customer data. The certification demonstrates that an organization has strict internal controls regarding the handling and protection of their clients' data.

HIPAA, or the Health Insurance Portability and Accountability Act, is a US legislation that provides data privacy and security provisions for safeguarding medical information. Being HIPAA compliant means that any personally identifiable information in medical records, conversations between health care professionals, billing information, or any health care paperwork are suitably protected.
Result from Prompt template 1
|
-
Prompt  5
Delete prompt
My data contains a bit of information I do not want to show. How do you handle this?
Completion Result
In Datasaur, you can use the Mask PII (Personally Identifiable Information) setting to anonymize selected information attributes in your project. Here are the steps:
1. Turn on the Mask PII setting, this can be found under project settings.
2. After enabling the Mask PII setting, click on Data Masking Settings to define the masking method and select the specific attributes to mask.
3. In the PII fields table, you can check the boxes next to the information attributes you want to mask.
4. You can also set a regular expression rule to define what entities to mask.
5. There are two ways to mask the information attributes:

Random Character: This replaces the personal information with random characters.

Asterisk: This replaces the personal information with asterisks.

Remember, this masked data will be visible to all team members in the project.

The documentation is available at:https://datasaurai.gitbook.io/datasaur/workspace-management/workspace#mask-personally-identifiable-information-pii.
Result from Prompt template 1
|
-
Prompt  4
Delete prompt
I want to create a model based on what I have annotated. Can you do that?
Completion Result
Yes, Datasaur has a feature called Datasaur Dinamic which allows data labeling to train a model natively. It creates a ready-to-use model directly from Datasaur. You can import your labeled or unlabeled dataset into Datasaur, review it, and then use one of the supported providers for training. Currently, Datasaur supports integration with Hugging Face and Amazon Sagemaker.

Here are the steps to create a model:

1. Import your dataset into a Datasaur project and start labeling, or upload a pre-labeled dataset.

2. Review and correct any possible mistakes in the labeled data.

3. Go to the 'Assisted Labeling' dropdown menu and select 'Predictive Labeling' then 'Train'.

4. Choose the supported service provider that you prefer (Hugging Face or Amazon Sagemaker).

5. Follow the on-screen instructions related to the chosen provider.

Once the model training is completed, predictions will be displayed and you can accept or reject the labels based on your needs. With Datasaur Dinamic, you can optimize prediction performance and reduce the consumption of annotation time.

For detailed instructions, please refer to the documentation:
https://datasaurai.gitbook.io/datasaur/advanced/extensions/datasaur-dinamic.
Result from Prompt template 1
|
-
Prompt  3
Delete prompt
So, in ML assisted extension, I can use my own model via custom API?
Completion Result
Yes, that's correct. Datasaur supports using your own model via a custom API for machine learning assisted labeling. You will need to provide your API name, API URL, and any necessary credentials. Once it's set up, this model can be used to automatically generate labels for your datasets.
Result from Prompt template 1
|
-
Prompt  2
Delete prompt
I would like to try your new LLM Lab, how to gain access? I can not see it on free trial.
Completion Result
To access the LLM Lab feature, you need to reach out to Datasaur's support team via support@datasaur.ai. This feature is not available to all users by default, and needs to be enabled for your usage. Once your request is received and approved, Datasaur will enable it for your account.
Result from Prompt template 1
|
-
Prompt  1
Delete prompt
What is span labeling?
Completion Result
Span labeling in Datasaur refers to the process of applying labels to specific portions or spans of text in a document. This could be individual tokens (like words or punctuation) or even multi-token entities. This type of labeling is particularly useful for tasks such as Named Entity Recognition (NER) and Part of Speech tagging (POS).

In deep, when users work on a span-based project, they highlight the tokens in a document and assign them labels based on their characteristics or the role they play in the context. The labels are predefined and are part of a label set which is decided before beginning the project.
Result from Prompt template 1
|
-
Add your first prompt here
Add prompt

Explore more about our LLM features

Discover the capabilities of Datasaur for LLM development. An advanced solution for Large Language Model Development built to handle even your most complex requirements, providing a personalized and efficient approach.

Wondering how we can support your use case?
Contact us or schedule a scoping session with our sales and see how Datasaur can be applied to your labeling projects.
contact hero image