Confidential
AI Models

Others claim privacy. We prove it. Access frontier AI models on cloud, with proof that your data is protected end-to-end.

Phala Confidential AI

No storage · No logs · End‑to‑end encryption

You

Google: Gemma 3 27B

Encrypted

OpenAI: gpt-oss-20b

Encrypted

OpenAI: GPT OSS 120B

Encrypted

Qwen: Qwen3 Coder

Encrypted

Qwen: Qwen2.5 VL 72B Instruct

Encrypted

DeepSeek: DeepSeek V3 0324

Encrypted

Qwen2.5 7B Instruct

Encrypted

Meta: Llama 3.3 70B Instruct

Encrypted

Google: Gemma 3 27B

Encrypted

OpenAI: gpt-oss-20b

Encrypted

OpenAI: GPT OSS 120B

Encrypted

Qwen: Qwen3 Coder

Encrypted

Qwen: Qwen2.5 VL 72B Instruct

Encrypted

DeepSeek: DeepSeek V3 0324

Encrypted

Qwen2.5 7B Instruct

Encrypted

Meta: Llama 3.3 70B Instruct

Encrypted

Google: Gemma 3 27B

Encrypted

OpenAI: gpt-oss-20b

Encrypted

OpenAI: GPT OSS 120B

Encrypted

Qwen: Qwen3 Coder

Encrypted

Qwen: Qwen2.5 VL 72B Instruct

Encrypted

DeepSeek: DeepSeek V3 0324

Encrypted

Qwen2.5 7B Instruct

Encrypted

Meta: Llama 3.3 70B Instruct

Encrypted

Google: Gemma 3 27B

Encrypted

OpenAI: gpt-oss-20b

Encrypted

OpenAI: GPT OSS 120B

Encrypted

Qwen: Qwen3 Coder

Encrypted

Qwen: Qwen2.5 VL 72B Instruct

Encrypted

DeepSeek: DeepSeek V3 0324

Encrypted

Qwen2.5 7B Instruct

Encrypted

Meta: Llama 3.3 70B Instruct

Encrypted

Win Your Users' Trust

Differentiate with verifiable privacy, build customer confidence with audit-ready cryptographic proofs, and enter regulated markets instantly.

Integrate in Minutes

The easiest way to add cryptographic privacy to your AI applications. Drop-in replacement for OpenAI, Anthropic, and other major providers.

Supported providers:

OpenAIAnthropicGoogleMeta

Traditional AI

api.openai.com/v1/chat/completions

Phala Confidential AI

encrypted-ai.phala.com/v1/chat/completions

Real-time proof generation
Audit-ready documentation
Customer-facing dashboard

Enterprise features

SLA Support
Custom Models
Volume Discounts
Priority Access

GPT OSS 120B

Input / Output

$0.14 / $0.49

Qwen3 Coder

Input / Output

$0.9 / $1.5

Llama 3.3 70B

Input / Output

$0.1 / $0.25

Drop-in Replacement

Simply replace your API endpoint. Zero code changes required. Works with existing SDKs and frameworks.

Built-in Trust Center

Every request generates cryptographic proof. Show customers exactly how their data is protected with our Trust Center. View demo →

Enterprise Ready

Competitive pricing with enterprise features. Scale with confidence knowing costs won't surprise you.

Available Models

Access the latest frontier AI models with cryptographic privacy protection

Google: Gemma 3 27B

Encryptedphala/gemma-3-27b-it

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to [Gemma 2](google/gemma-2-27b-it)

54K context | $0.11/M input tokens | $0.40/M output tokens

OpenAI: gpt-oss-20b

Encryptedphala/gpt-oss-20b

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

131K context | $0.10/M input tokens | $0.40/M output tokens

OpenAI: GPT OSS 120B

Encryptedphala/gpt-oss-120b

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

131K context | $0.14/M input tokens | $0.49/M output tokens

Qwen: Qwen3 Coder

Encryptedphala/qwen3-coder

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used.

262K context | $0.90/M input tokens | $1.50/M output tokens

Qwen: Qwen2.5 VL 72B Instruct

Encryptedphala/qwen2.5-vl-72b-instruct

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

128K context | $0.59/M input tokens | $0.59/M output tokens

DeepSeek: DeepSeek V3 0324

Encryptedphala/deepseek-chat-v3-0324

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.

164K context | $0.49/M input tokens | $1.14/M output tokens

Qwen2.5 7B Instruct

Encryptedphala/qwen-2.5-7b-instruct

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

33K context | $0.04/M input tokens | $0.10/M output tokens

Meta: Llama 3.3 70B Instruct

Encryptedphala/llama-3.3-70b-instruct

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. [Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md)

131K context | $0.10/M input tokens | $0.25/M output tokens

Enterprise

Build Your Own PCC

Go beyond shared APIs. With our Confidential GPUs, you can deploy private, fully-audited AI clouds, tailored to your business or product. It's the same technology behind Apple's Private Compute Cloud (PCC), but more open and transparent. Now available for your own models and workloads.

Talk to Experts

Private Infrastructure

Private dedicated infrastructure for your AI workloads.

Custom Models

Deploy your own custom AI models securely.

Full Audit Trails

Complete compliance and audit documentation.

24/7 Support

Dedicated enterprise support team.

Fast Performance

Optimized for speed and efficiency.

Secure Processing

Hardware-protected confidential computing.

Compliance Ready

Meet regulatory requirements easily.

Scalable Solution

Grow with your business needs.

Private Infrastructure

Private dedicated infrastructure for your AI workloads.

Custom Models

Deploy your own custom AI models securely.

Full Audit Trails

Complete compliance and audit documentation.

24/7 Support

Dedicated enterprise support team.

Fast Performance

Optimized for speed and efficiency.

Secure Processing

Hardware-protected confidential computing.

Compliance Ready

Meet regulatory requirements easily.

Scalable Solution

Grow with your business needs.

Frequently Asked Questions

Everything you need to know about Confidential AI

Ready to Build AI
People Trust?

Join 500+ teams deploying trustworthy AI in production

Get started Request a demo

No credit card required. Deploy your first model in 5 minutes.

Build AI People Can Trust.

Subscribe to our newsletter

Confidential AI Models

Win Your Users' Trust

Traditional AI

Confidential AI

Integrate in Minutes

Drop-in Replacement

Built-in Trust Center

Enterprise Ready

Available Models

Google: Gemma 3 27B

OpenAI: gpt-oss-20b

OpenAI: GPT OSS 120B

Qwen: Qwen3 Coder

Qwen: Qwen2.5 VL 72B Instruct

DeepSeek: DeepSeek V3 0324

Qwen2.5 7B Instruct

Meta: Llama 3.3 70B Instruct

Build Your Own PCC

Private Infrastructure

Custom Models

Full Audit Trails

24/7 Support

Fast Performance

Secure Processing

Compliance Ready

Scalable Solution

Private Infrastructure

Custom Models

Full Audit Trails

24/7 Support

Fast Performance

Secure Processing

Compliance Ready

Scalable Solution

Frequently Asked Questions

How does Confidential AI ensure my data is truly private?

Is there any performance impact compared to regular AI APIs?

How difficult is it to integrate Confidential AI into my existing application?

What models are available through Confidential AI?

How can I verify that my data was actually protected?

What are the pricing differences compared to standard AI APIs?

Ready to Build AI People Trust?

Confidential
AI Models

Ready to Build AI
People Trust?