Datasaur

April Feature Updates: Mixed Labeling, Smarter Search, and More Project Control

We’re back with a fresh batch of updates aimed at making your labeling experience faster, more precise, and easier to manage. From new labeling types to better search functionality and project organization, here’s what’s new in the Datasaur platform.

Datasaur

April 28, 2025

Data Studio

Span + Line Labeling

We’ve introduced a new mixed labeling type that supports labeling spans within a line while also categorizing entire lines separately. This allows you to configure questions for each line and still perform regular span labeling. Perfect for use cases that require dual-level granularity like transcripts, contracts, or forms.

Learn more

Amazon Transcribe for Audio Labeling

You can now select Amazon Transcribe as an Automatic Speech Recognition (ASR) option when labeling audio files. This brings another powerful transcription engine to your toolkit, giving you more flexibility when comparing accuracy and usability between providers.

Learn more

Bottom-Level Label Selection in Span Labeling

To reduce accidental selections and ensure more precise annotations, we’ve added an option to restrict span labels to only bottom-level labels in your hierarchy. You can now prevent the selection of parent labels/categories when more specific labels are intended.

Learn more

Bulk Applying Project Tag

You can now apply tags to multiple projects in bulk directly from the Projects table. Whether you're organizing by client, task type, or deadline, bulk tagging makes streamlines project management.

Search and Show Only Matching Lines

A new setting in the Search extension allows you to filter and display only matching lines. This helps reviewers and labelers focus on the most relevant sections, speeding up review and annotation.

Learn more

LLM Labs

Direct Access LLMs Expansion

We've significantly expanded our Direct Access LLMs lineup, giving you instant access to the latest state-of-the-art models without complex API setup or configuration. Our newest additions include:

Gemma 3 27B, available from Google AI and Hugging Face
Deepseek R1, now available as an alternative option through Amazon Bedrock
GPT 4.1, GPT 4.1. mini, and GPT 4.1. nano, available from both OpenAI and Azure OpenAI
GPT o3 and GPT 4 mini, also available from both OpenAI and Azure OpenAI
Grok 3 from xAI

Automatic Sync for External Object Storage

Managing your knowledge base just got easier with automatic synchronization for External Object Storage. This new feature allows you to schedule periodic syncs between your external storage (AWS S3, Google Cloud Storage, Azure Blob) and LLM Labs. This automation saves valuable time and ensures your RAG applications always have access to the latest data without requiring manual intervention.

Conversational Prompting in Sandbox

Our Sandbox environment now supports conversational prompting, allowing you to replicate multi-turn dialogues and historical conversations within your testing environment. This feature is invaluable for prompt engineers and developers who need to troubleshoot or fine-tune conversational AI applications.

By simulating realistic conversation flows, you can better understand how your models respond to context-aware queries and make more informed adjustments to your prompting strategies. This brings the Sandbox experience closer to real-world chat applications, helping you develop more natural and effective AI interactions.

We’re excited to bring these features to your workflow. As always, your feedback helps shape Datasaur. Let us know what you'd like to see next!

No items found.