Skip to main content

T2: Responsible AI

You are part of the trust chain. Every infrastructure decision you make โ€” from region selection to content filtering configuration โ€” directly impacts whether your AI system is safe, fair, and trustworthy. Responsible AI isn't a checkbox; it's a design discipline woven into every layer. For content safety implementation patterns, see T3: Production Patterns. For grounding and accuracy, see R3: Deterministic AI.

Microsoft's 6 Responsible AI Principlesโ€‹

PrincipleWhat It MeansYour Responsibility
FairnessAI treats all people equitablyTest across demographics, monitor for bias
Reliability & SafetyAI performs as intendedRetry logic, fallbacks, circuit breakers
Privacy & SecurityAI protects data and accessManaged Identity, Key Vault, RBAC, encryption
InclusivenessAI is accessible to everyoneMulti-language, accessibility, diverse testing
TransparencyPeople understand how AI worksSource citations, confidence scores, AI labels
AccountabilityPeople are accountable for AIAudit logs, human-in-the-loop, incident response

Infrastructure Decisions That Impact Safetyโ€‹

Every "infrastructure" choice is actually a safety decision:

DecisionSafety ImpactRecommendation
Region selectionData residency, complianceMatch to user geography + regulations
Content filteringBlocks harmful outputsEnable on ALL endpoints โ€” never disable
Logging strategyAudit trail for incidentsLog all AI interactions (without PII)
Rate limitingPrevents abuse and cost explosionPer-user + per-tenant limits
Key managementPrevents unauthorized accessKey Vault + Managed Identity, never hardcode
RBACLeast-privilege accessSeparate roles for dev/deploy/admin
Private endpointsNetwork isolationRequired for production PaaS services
Model selectionCapability vs risk tradeoffSmaller models for narrow tasks (less hallucination)

Azure AI Content Safetyโ€‹

Azure AI Content Safety provides real-time detection across four harm categories:

User Input โ”€โ”€โ–ถ [Input Filter] โ”€โ”€โ–ถ [Model] โ”€โ”€โ–ถ [Output Filter] โ”€โ”€โ–ถ Response
โ”‚ โ”‚
โ–ผ โ–ผ
Block/Flag Block/Flag
if severity โ‰ฅ threshold if severity โ‰ฅ threshold
CategorySeverity ScaleDefault BlockDescription
Hate0-6โ‰ฅ 2Discrimination, slurs, dehumanization
Self-Harm0-6โ‰ฅ 2Instructions or encouragement of self-harm
Sexual0-6โ‰ฅ 2Explicit sexual content
Violence0-6โ‰ฅ 2Graphic violence, weapons instructions

Additional protections:

  • Prompt Shields โ€” detect jailbreak and indirect prompt injection attempts
  • Groundedness detection โ€” flag ungrounded claims in model outputs
  • Protected material detection โ€” identify copyrighted text in outputs

:::info Content Safety Implementation Configure content filtering in your guardrails.json:

{
"content_safety": {
"hate": { "threshold": 2, "action": "block" },
"self_harm": { "threshold": 2, "action": "block" },
"sexual": { "threshold": 2, "action": "block" },
"violence": { "threshold": 2, "action": "block" }
},
"prompt_shields": { "enabled": true },
"groundedness": { "enabled": true, "threshold": 4.0 }
}

:::

OWASP LLM Top 10 Risksโ€‹

The OWASP Top 10 for LLM Applications identifies the most critical security risks:

#RiskMitigation
1Prompt InjectionInput validation, Prompt Shields, system prompt isolation
2Insecure Output HandlingSanitize AI output before rendering, never exec AI output
3Training Data PoisoningCurate data sources, validate training sets
4Model Denial of ServiceRate limiting, token budgets, timeout enforcement
5Supply Chain VulnerabilitiesPin model versions, audit dependencies
6Sensitive Information DisclosurePII detection, output filtering, data minimization
7Insecure Plugin DesignLeast-privilege tool access, input validation
8Excessive AgencyHuman-in-the-loop for critical actions, action confirmation
9OverrelianceConfidence scores, source citations, user education
10Model TheftPrivate endpoints, access controls, monitoring

EU AI Act Overviewโ€‹

:::warning EU AI Act โ€” Know Your Risk Classification The EU AI Act entered into force in August 2024 with phased enforcement. If your AI system operates in the EU or serves EU users, you must classify it. High-risk systems face mandatory conformity assessments, transparency obligations, and human oversight requirements. Non-compliance penalties reach up to โ‚ฌ35M or 7% of global turnover. :::

Risk LevelExamplesRequirements
UnacceptableSocial scoring, real-time biometric surveillanceBanned
High-RiskHiring, credit scoring, medical diagnosis, law enforcementConformity assessment, logging, human oversight
Limited RiskChatbots, deepfake generationTransparency obligations (label as AI)
Minimal RiskSpam filters, game AINo specific requirements

For most enterprise AI applications (RAG chatbots, document processing, IT assistants), you fall under limited risk โ€” requiring transparency labels. If your system influences decisions about people (hiring, lending, medical), it's likely high-risk.

Content Safety Pipelineโ€‹

A production content safety pipeline has four stages:

1. INPUT FILTERING 2. MODEL GENERATION
โ”œโ”€ Prompt Shields โ”œโ”€ Content filter (built-in)
โ”œโ”€ PII detection โ”œโ”€ Token budget enforcement
โ”œโ”€ Input sanitization โ””โ”€ System prompt guardrails
โ””โ”€ Rate limiting

3. OUTPUT FILTERING 4. LOGGING & MONITORING
โ”œโ”€ Content Safety API โ”œโ”€ Log interaction (no PII)
โ”œโ”€ Groundedness check โ”œโ”€ Correlation ID tracking
โ”œโ”€ Citation verification โ”œโ”€ Alert on blocked content
โ””โ”€ PII redaction โ””โ”€ Audit trail retention

Evaluation for Trustโ€‹

Responsible AI requires continuous evaluation, not one-time checks:

MetricTargetWhat It Measures
Groundednessโ‰ฅ 4.0 / 5.0Are claims supported by provided context?
Relevanceโ‰ฅ 4.0 / 5.0Does the response address the question?
Coherenceโ‰ฅ 4.0 / 5.0Is the response logically consistent?
Safety0 violationsAre harmful content filters effective?
Fairness< 5% varianceDo responses vary by demographic?
# Evaluation pipeline example
from azure.ai.evaluation import GroundednessEvaluator, ContentSafetyEvaluator

groundedness = GroundednessEvaluator(model_config)
safety = ContentSafetyEvaluator(credential, azure_ai_project)

result = groundedness(
response="The contract requires 30-day payment terms.",
context="Section 4.2: Payment shall be made within 30 days...",
query="What are the payment terms?"
)
assert result["groundedness"] >= 4.0

Key Takeawaysโ€‹

  1. You are the trust chain โ€” infrastructure choices are safety choices
  2. Enable content filtering everywhere โ€” never disable, even in dev
  3. Know your OWASP LLM risks โ€” prompt injection is #1 for a reason
  4. Classify under EU AI Act โ€” know your obligations before deployment
  5. Evaluate continuously โ€” groundedness โ‰ฅ 4.0, zero safety violations

Next: T3: Production Patterns โ€” taking AI from prototype to production with resilience, cost control, and monitoring.