Safe & Secure AI Database Access
by Alex Glebov, CTO
AI API Database Integration
The intersection of AI and databases has opened up a realm of possibilities. AI APIs can enrich databases, provide new analyses of your data, and even chat with your data using AI embedding and vector search.
AI can become a powerful user of the database alongside other users, but that also comes with its own risks and potential security concerns. It's essential to put the right safeguards in place. We predict that traffic to MDN will decline precipitously as developers realise they no longer need to look up JS array methods. We also expect Stack Overflow’s sister site, Prompt Overflow, to become one of the most popular sites on the internet in a matter of months.
Plan Your AI API's Capabilities
Before linking your database, it’s important to answer some basic questions about AI API access:
- Which data do we want the AI to access, and which data should be blocked?
- What security categories do we assign to the various types of data we have? Which security category is safe for the AI to gain access to?
- How are the AI API calls authenticated?
- Which users can trigger an AI API call?
- What is our AI API budget per month?
- Who will own and control the data sent in an AI API call?
- Will your business data sent through an API call be retained by the AI model owner?
- Will your data sent through API calls be used to train future AI models?
- What security compliance and encryption does the AI model provider have?
OpenAI’s Privacy for Enterprise
Here is a summary of how OpenAI approaches privacy for enterprise users. Depending on the client’s needs, AI PWRD can conduct a security assessment and issue a report comparing different AI models for API access.
- Ownership Commitments:
- Users have full control and ownership of their data.
- OpenAI doesn't train models on user-specific business data.
- Enterprise users can dictate data retention duration with platforms like ChatGPT Enterprise. Typically, deleted conversations are removed from their systems within 30 days.
- Control Aspects:
- Enterprise-level authentication provided via SAML SSO.
- Users have granular control over access and features.
- Custom models remain the exclusive property of users.
- Security Protocols:
- OpenAI has undergone SOC 2 compliance audits.
- AES-256 (at rest) and TLS 1.2+ (in transit) encryption protocols are in place.
- Detailed security measures can be explored in OpenAI's Trust Portal.
- Training and Data Protocols:
- OpenAI models are not trained on individual business data.
- Models fine-tuned by users remain proprietary.
- OpenAI retains rights for service provision and legal compliance.
- Data Security:
- Strict encryption standards for both stored and in-transit data.
- 24/7 security team with bug bounty programs in place.
- Compliance Support:
- OpenAI aids in GDPR and other privacy law compliance.
- They offer a Data Processing Addendum (DPA) for user assistance.
- Monitoring Protocols:
- Automated classifiers employed for data checking.
- Human review is restricted and done on a need-basis.
What is Personally Identifiable Information (PII)?
Despite the strongest privacy commitments of AI model providers like OpenAI, some data is still too private to send through an API call. Personally Identifiable Information (PII) refers to any information that can be used to identify an individual. Examples include names, addresses, e-mail addresses, and a whole host of other data points, like passport numbers or vehicle identifiers. In the era of data breaches and increased scrutiny over privacy concerns, safeguarding PII has never been more critical.
PII examples:
- Date of Birth
- Names of individuals
- Personal Address
- E-mail Address of individuals
- Telephone Number(s) of individuals
- Drivers License Number
- National ID number
- Employer Identification Number
- Bank Account Information - Account IDs, Routing Numbers, SWIFT IDs, etc. of third parties
- Payment Card Numbers
- Gender
- Ethnicity
- Usernames, ID Numbers of third parties
- Passport Number
- Marital Status
- Number of Allowances/Exemptions
- Dependent Names
- Vehicle Identifiers (VIN, License Plates, etc.)
- Any other unique identifying number, characteristic, or code of an individual that could identify an individual consumer, family, or device over time or across services.
When integrating AI systems or making API calls that might access databases containing PII, it's important to ensure this data remains confidential and isn't exposed inadvertently through an API call. One of several approaches to protect PII data is to simply not send it in the first place.
Auto-Redaction
Auto-redaction involves automatically removing or obscuring portions of data to prevent the disclosure of sensitive information.
How it Works:
- Before data is returned from an API call, the system scans for patterns resembling PII (e.g., patterns that match email formats, sequences that resemble credit card numbers, etc.).
- The detected PII is then either masked or completely removed from the data set before it's transmitted.
Benefits:
- Real data is used, so results remain consistent with actual data trends.
- Immediate compliance with privacy regulations.
The merger of AI and databases heralds an era of enhanced capabilities and insights, but it also comes with the grave responsibility of data privacy. Businesses must remain vigilant and proactive to ensure that while they harness the power of AI, the sanctity and privacy of personal data are never compromised. As technology continues to evolve, it's imperative for organizations to stay updated with the latest best practices and tools to ensure data protection.
If your organization has further questions about privacy and security when integrating your business with AI, feel free to contact our team and book a complimentary consultation call.