Mistral Moderation API

By Mistral May 28, 2026

Safety plays a key role in making AI useful. At Mistral AI, we believe that system level guardrails are critical to protecting downstream deployments.That's why we are releasing a new content moderation API. It is the same API that powers the moderation service in Le Chat. We are launching it to empower our users to utilize and tailor this tool to their specific applications and safety standards.

Mistral Moderation API Mistral AI has released a new content moderation API to enhance AI safety, providing system-level guardrails for deployments. The API features an LLM classifier trained to identify undesirable content across nine categories in multiple languages, offering raw text and conversational endpoints. This tool aims to make moderation more scalable and robust, addressing model-generated harms.

Mistral AI is releasing a new content moderation API.
The API uses an LLM classifier trained to categorize text inputs into 9 categories.
It is designed to be multilingual, supporting languages including Arabic, Chinese, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.
Two endpoints are available: one for raw text and one for conversational content, classifying the last message within a conversational context.
The classifier addresses model-generated harms such as unqualified advice and PII.
Mistral AI is sharing performance metrics (AUC PR) internally and aims to contribute safety advancements to the research community. Continue reading https://foxvector.com/articles/0391ad37-c585-439d-a672-cf53e41c4338

Reference: https://foxvector.com/articles/0391ad37-c585-439d-a672-cf53e41c4338

Write a comment

No comments yet.