AI Industry

Want to Use ChatGPT at Work But Can't?

Explore the various ways your company's data can be exposed through the use of LLMs and, more importantly, what you can do to protect it.

Datasaur

June 18, 2025

Private AI Decisions Guide

A founder recently had a startling experience. He asked a popular AI chatbot how a competitor might structure a legal contract with a major restaurant chain. To his astonishment, the AI didn't just provide a generic template; it returned what appeared to be the precise contract, complete with accurate unit pricing and specific signatories. This incident is a stark reminder that as we increasingly turn to Large Language Models (LLMs) like ChatGPT to answer a vast range of questions, the risk of sensitive data leakage is becoming a significant concern.

The issue of data privacy with AI is so critical that it's already making its way into the courtroom. In a recent development, a court ordered OpenAI to retain even temporary and deleted user conversations as part of a lawsuit, highlighting the potential for your data to be preserved in ways you may not expect.

In this post, we'll explore the various ways your company's data can be exposed through the use of LLMs and, more importantly, what you can do to protect it.

‍

Your Data as a Training Ground: The Proprietary Model Dilemma

Unguided employee usage of LLMs is on the rise - “AI use at work has nearly doubled in two years… [but] only 22% say their organization has communicated a clear plan or strategy for doing so”. When your employees use free or consumer-grade versions of proprietary LLMs, there’s a significant chance that the information they input is being used to train the model. This means your confidential strategies, code, or customer data could be absorbed into the model’s knowledge base and potentially be surfaced in response to another user’s query.

Solution: Go Pro or Go Open

The most direct way to mitigate this is to upgrade to an enterprise-tier subscription with the model provider. These premium, but often pricey services typically come with contractual guarantees that your data will not be used for training purposes and offer more robust security features.

Alternatively, for organizations that require complete control over their data, investing in open-source LLMs is a compelling option. By hosting these models on your own infrastructure, you can ensure that your sensitive information never leaves your environment, providing the highest level of data sovereignty.

‍

The Internal Threat: When Knowledge Sharing Goes Wrong

The power of LLMs can be integrated with powerful internal knowledge bases via popular techniques like Retrieval Augmented Generation, allowing employees to quickly find information within your company's vast repositories. However, this convenience can turn into a liability if not managed correctly. Imagine an employee in one department updating a central knowledge base with sensitive financial data. Without proper access controls, another employee from a different department, who is not authorized to view this information, could inadvertently access it simply by asking the right question to the integrated LLM.

Solution: Segregate and Secure

The key to preventing this internal data leakage is to implement a robust system of access control and data segregation. Knowledge bases should not be monolithic entities nor should they allow unrestricted data uploads. Instead, data sources should be separated based on sensitivity and access privileges. By enforcing role-based access control (RBAC), you can ensure that the LLM only surfaces information that the specific user is authorized to see, effectively creating a personalized and secure information retrieval system for each employee.

‍

The Compliance Minefield: PII and PHI in the Age of AI

For industries that handle Personally Identifiable Information (PII) or Protected Health Information (PHI), the use of third-party LLMs presents a significant compliance risk. Sending any of this sensitive data to an external AI service without the proper safeguards can be a direct violation of regulations like GDPR, HIPAA, and CCPA, leading to hefty fines and reputational damage.

Solution: Clear Guidelines and a Vetted Tech Stack

The first step is to establish clear and unambiguous guidelines for your employees on the acceptable use of AI tools. This policy should explicitly state that no PII, PHI, or other sensitive data is to be entered into public or non-vetted AI platforms.

For organizations that need to leverage AI to process sensitive data, the solution lies in building a fully vetted and compliant tech stack. This involves using open-source LLMs deployed within your own secure environment. By controlling the entire technology stack—from the underlying infrastructure to the model itself—you can implement the necessary safeguards, such as data anonymization and encryption, to guarantee compliance with relevant regulations.

‍

In conclusion…

The rise of powerful AI assistants offers immense potential for productivity and innovation. However, as with any transformative technology, it comes with a new set of risks. By understanding the potential for data leakage and proactively implementing the right strategies and tools, you can harness the power of AI while ensuring your most valuable asset—your data—remains secure.

‍

Want to learn more about Private LLMs? Schedule time here.

No items found.