AI Security and Governance on AWS: How to Control LLMs in Production

Written by Bion DevOps Team | Jun 10, 2026 12:44:23 PM

Most AI security discussions start with prompt injection, data leakage and guardrails.

Those risks matter.

But once an AI feature moves into production, security becomes a wider platform problem.

A production AI system does not only send a prompt to a model. It may retrieve customer data, assemble context, route requests across models, call tools, apply guardrails, log activity, attribute cost and return outputs into business workflows.

At that point, AI security is no longer only about stopping unsafe responses.

It is about controlling the full request path.

For teams building AI systems on AWS, this means answering practical engineering questions:

Who is allowed to call each AI capability?
Which tenant, customer or workflow does the request belong to?
Which data sources can be retrieved?
Which model route is allowed for this task?
Which prompts and guardrails apply?
What should be logged, redacted or retained?
Which outputs require validation before they reach the user?
Which actions should an AI agent never be allowed to perform directly?

This is where AI security and governance on AWS becomes a production architecture concern.

Not a policy document.

Not only a model setting.

Not a guardrail added at the end.

A secure AI platform needs governance built into the runtime path.

AWS frames generative AI security across governance, privacy, risk management, controls and resilience, and its security scoping model is based on how much ownership the organisation has over the model, data and application layer. The AWS Generative AI Security Scoping Matrix is useful because it helps teams classify what they own and which controls they are responsible for. The AWS Well-Architected Generative AI Lens also treats security, reliability, cost, observability and responsible AI as lifecycle concerns rather than one-off implementation tasks.

Why AI Governance Is Different from Normal Application Security

Traditional application security is built around relatively predictable execution paths.

A user calls an API. The API checks identity and permissions. The application executes code. The database returns data. Logs record what happened.

AI systems introduce a different pattern.

The behaviour of the system can be influenced by:

user input
system prompts
retrieved documents
conversation history
metadata filters
model selection
tool access
guardrail configuration
output validation rules

This creates a new security boundary. The model is not the only risk.

The context around the model is the risk.

Area	Traditional application	Production AI system
Input	Form fields, API payloads	User prompts, documents, embeddings, retrieved context
Execution	Deterministic code path	Model behaviour influenced by prompt and context
Data access	Query-level or API-level permissions	Retrieval-level and context-level permissions
Output	Structured response	Generated response that may need validation
Audit	API logs and database logs	Prompt version, retrieved document IDs, model route, guardrail result, output status
Abuse pattern	API misuse, injection, privilege escalation	Prompt injection, data exfiltration, tool misuse, unsafe generation, over-permissive agents

This is why production AI governance needs to be designed as part of the platform architecture.

A secure AWS AI platform should not let every product team decide model access, prompt handling, retrieval permissions, logging, fallback behaviour and guardrails independently.

Those controls need to become shared platform capabilities.

The Production AI Security Problem

A prototype AI feature often starts like this:

Application
→ Prompt
→ Model
→ Response

A production AI feature usually looks more like this:

User request
→ Identity and tenant context
→ Capability authorisation
→ Data access policy
→ Retrieval decision
→ Document filtering
→ Prompt assembly
→ Prompt redaction
→ Model route selection
→ Guardrail application
→ Inference
→ Response validation
→ Audit logging
→ Cost attribution
→ Product response

Every step is a possible control point. Every missing control becomes a production risk.

The most common mistake is to treat AI security as a model-layer issue only.

In practice, many production failures happen outside the model:

a user retrieves documents they should not see
a low-risk workflow accidentally uses a high-risk model route
a prompt includes sensitive data that should have been redacted
a guardrail is applied inconsistently across services
an agent has access to tools that are too broad
logs store full prompts and responses without retention controls
teams cannot explain which prompt, model, guardrail or retrieved documents produced a response

This is not only a security issue. It is also an operational governance issue.

1. Start by Classifying the AI Workload

Before designing controls, teams need to understand what they actually own.

The security model changes depending on whether the organisation is using:

a third-party AI-enabled SaaS tool
a pre-trained foundation model through Amazon Bedrock
a fine-tuned model
a self-hosted open-weight model
an agentic workflow with tool access
a custom model trained on proprietary data

The AWS Generative AI Security Scoping Matrix separates use cases based on the level of ownership over the model and data, from lower-ownership usage of external AI services to higher-ownership custom or self-trained models.

For SaaS, fintech, healthcare, AI-native and mid-market engineering teams building on AWS, the most common production patterns are usually:

Pattern	Typical example	Main governance concern
Amazon Bedrock with pre-trained model	Support assistant, summarisation, internal knowledge assistant	Access control, prompt governance, guardrails, logging
RAG on AWS	Customer knowledge search, internal policy assistant, finance or healthcare document Q&A	Permission-aware retrieval, document filtering, citation quality
Fine-tuned or customised model	Domain-specific classification, specialist summarisation	Training data governance, model versioning, evaluation
Self-hosted LLM on Kubernetes	Cost-sensitive or latency-sensitive workloads	Runtime isolation, GPU access, scaling, model lifecycle
Agentic workflow	AI assistant calling APIs, creating tickets, querying systems	Tool permissions, approval gates, action logging

A governance model that works for a simple summarisation feature may not be enough for a RAG workflow that retrieves customer records.

A governance model that works for RAG may not be enough for an agent that can call business systems.

The first security decision is therefore not:

Which model are we using?

It is:

What level of control does this AI workload need?

2. Put Model Access Behind a Controlled Runtime Layer

One of the fastest ways AI security becomes fragmented is direct model access from multiple services.

This often starts innocently.

One team calls Amazon Bedrock from a backend service. Another team builds a RAG assistant. Another adds a summarisation feature. Another runs a self-hosted model on Kubernetes.

After a few months, the platform has several independent model access paths.

Each path may have its own:

IAM role
prompt format
logging behaviour
retry logic
model selection
guardrail configuration
token limit
error handling
cost attribution

That is difficult to govern.

A stronger pattern is to place model access behind an AI gateway or inference control layer.

Product services
→ AI gateway / inference control layer
→ Policy engine
→ Retrieval and prompt assembly
→ Model route
→ Guardrails and validation
→ Observability and audit trail

This layer does not need to be heavy at the start.

But it should create one controlled interface for production AI access.

The AI gateway should decide:

whether the user can use the AI capability
whether retrieval is allowed
which data sources can be queried
which prompt template applies
which model route is allowed
which guardrail must be used
whether streaming is allowed
what should be logged
what should be redacted
what cost centre, tenant or product feature owns the request

Amazon Bedrock supports IAM-based control, including service-specific actions, resources and condition keys for inference, guardrails, inference profiles and prompt routers. These controls are important, but they should be part of a wider runtime governance pattern rather than the only layer of control. See the AWS documentation for Amazon Bedrock actions, resources and condition keys.

Illustrative IAM Policy Pattern

This is not a copy-paste policy. It shows the kind of restriction pattern teams should think about.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "InvokeApprovedBedrockModelWithGuardrail",
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:eu-west-2::foundation-model/example-model-id"
      ],
      "Condition": {
        "StringEquals": {
          "bedrock:GuardrailIdentifier": "arn:aws:bedrock:eu-west-2:123456789012:guardrail/example-guardrail-id"
        }
      }
    }
  ]
}

The production version should be tied to your AWS account structure, model selection, environment strategy, IAM roles, logging requirements and guardrail design.

The wider principle is simple:

Product services should not have unlimited model access.

They should have controlled access to approved AI capabilities.

3. Secure RAG Before the Prompt Is Built

RAG security is often misunderstood.

Teams focus on whether the final answer is safe.

But the more important question is:

Was the model allowed to see the retrieved context in the first place?

A RAG system can leak data even if the model behaves correctly.

If retrieval returns the wrong documents, the model may expose information that the user should never have accessed.

This is why secure RAG design needs access control before prompt assembly.

AWS Prescriptive Guidance highlights that RAG workloads can face risks such as data exfiltration from RAG data sources, indirect prompt injection through malicious documents, unauthorised access through weak controls and lack of provenance for auditability. It recommends defence-in-depth across ingestion, storage, retrieval and inference. See AWS guidance on secure access to data and systems for generative AI agents and RAG.

Secure RAG Control Points

Layer	Control	What it prevents
Ingestion	Classify documents before indexing	Sensitive data entering the index without controls
Chunking	Keep document, tenant and permission metadata attached	Context losing its access boundary
Embeddings	Version embedding jobs and index updates	Stale or inconsistent retrieval
Retrieval	Apply tenant, role and document-level filters	Cross-tenant or unauthorised context exposure
Prompt assembly	Include only authorised chunks	Sensitive content entering the model path
Response	Validate grounding and citations	Unsupported or misleading answers
Audit	Log document IDs and retrieval scores	Impossible investigations after incidents

The weakest pattern is this:

Search first
→ Retrieve documents
→ Add to prompt
→ Ask model to follow access rules

The stronger pattern is this:

Authenticate user
→ Resolve tenant and role
→ Apply document-level permissions
→ Retrieve authorised context only
→ Assemble prompt
→ Apply guardrails and validation
→ Log retrieval trace

The prompt should never be the first place where access control appears.

If a user is not allowed to see a document, that document should not be retrieved into the model context.

4. Treat Prompts as Governed Production Artifacts

In many teams, prompts start as text inside application code.

That works for early development.

It does not work well for production governance.

In production, prompts should be treated more like versioned configuration or policy-controlled application artifacts.

A production prompt should have:

a version
an owner
an approval path
a target workflow
a model compatibility note
allowed data classes
output format requirements
evaluation test coverage
rollback capability
change history

Prompt changes can affect security.

A small prompt edit may change whether the model:

reveals internal instructions
includes sensitive data
refuses unsafe requests
follows citation rules
calls tools
uses retrieved context correctly
generates structured output reliably

That means prompt changes should not be invisible.

A practical prompt governance model separates:

System instruction
→ Policy instruction
→ Developer instruction
→ Workflow instruction
→ Retrieved context
→ User input
→ Output schema

Prompt component	Governance requirement
System instruction	Restricted editing, version control, review
Policy instruction	Security-owned or platform-owned
Workflow instruction	Product-owned but reviewed
Retrieved context	Permission-filtered and traceable
User input	Sanitised and monitored
Output schema	Validated before use
Tool instruction	Least-privilege and approval-aware

This is especially important when AI outputs feed downstream systems.

If a generated answer is only displayed to a user, validation may be light.

If a generated output updates a ticket, triggers a workflow, changes a record or calls an API, prompt governance needs to be much stricter.

5. Use Guardrails, But Do Not Treat Them as the Whole Governance Model

Amazon Bedrock Guardrails can provide configurable safeguards across foundation models. AWS documents support for content filters, denied topics, word filters, sensitive information filters, contextual grounding checks and Automated Reasoning checks.

That is useful.

But guardrails are not the entire governance model.

They are one control layer inside the runtime path.

A guardrail can help detect or block certain unsafe inputs or outputs.

It does not replace:

identity and access control
tenant isolation
secure retrieval
prompt versioning
model allowlists
audit logging
tool-level permissions
human approval for high-risk actions
cost and usage governance

What Guardrails Can and Cannot Solve

Control	Helps with	Does not replace
Content filters	Harmful or unsafe content categories	Data access control
Denied topics	Blocking disallowed topics	Workflow-level authorisation
Sensitive information filters	Detecting or masking PII patterns	Full privacy architecture
Contextual grounding	Checking whether output is grounded in reference content	Retrieval quality and permissions
Automated Reasoning checks	Validating responses against logical rules	End-to-end application validation
Custom blocked words	Domain-specific restrictions	Tenant-aware policy enforcement

A better model is defence in depth.

Identity control
→ Data access control
→ Prompt control
→ Model route control
→ Guardrail control
→ Output validation
→ Audit logging

Guardrails are important.

But the platform should not depend on guardrails alone.

6. Control Model Routing by Risk, Not Only by Cost or Latency

Model routing is often introduced for cost optimisation.

Simple tasks can use cheaper models. Complex tasks can use more capable models. Latency-sensitive workflows can use faster routes.

That is useful, but incomplete.

In production, model routing should also consider security and governance.

A customer-facing workflow may need stricter guardrails than an internal summarisation task.

A regulated workflow may need a model route that keeps data within a specific AWS region or account structure.

A high-risk task may require human review before the response is used.

A low-risk internal draft may tolerate more flexible behaviour.

Workflow	Routing decision	Governance requirement
Internal summarisation	Lower-cost model route	Basic logging and prompt versioning
Customer support assistant	Approved model with guardrail	PII masking, response validation, audit trail
Regulated document Q&A	Restricted model route	Permission-aware RAG, citation checks, retention policy
Finance or healthcare workflow	High-control route	Stronger validation, human review, strict logging
Agentic action workflow	Tool broker route	Action allowlist, approval gates, transaction limits

This is why model routing should be part of the governance layer.

The platform should know which route is allowed for each use case.

Not every workflow should be able to call every model.

Not every model should be able to receive every data class.

Not every response should be treated as equally safe.

Amazon Bedrock also supports inference profiles, including application inference profiles that can help teams track costs and model usage by application-level dimensions. This is useful when governance, usage visibility and cost attribution need to be handled together.

7. Design Logging Without Creating a New Data Risk

AI teams need traceability.

Security teams need auditability.

Engineering teams need observability.

But logging AI systems can create a new data exposure problem if prompts, retrieved documents and responses are stored without care.

Amazon Bedrock supports model invocation logging using CloudWatch Logs and Amazon S3. AWS also provides monitoring across model invocations, knowledge bases, guardrails, agents, runtime metrics, EventBridge and CloudTrail through the Amazon Bedrock monitoring capabilities.

The question is not simply whether to enable logging.

The question is what to log, where to log it, how long to retain it and what to redact.

Minimum Useful AI Audit Record

A production AI request should usually produce a structured audit event containing:

request ID
timestamp
user ID or service identity
tenant ID
product feature
workflow type
prompt template version
model route
guardrail ID and result
retrieval query ID
retrieved document IDs
validation result
input token count
output token count
latency
retry count
fallback path
cost attribution tag
final response status

For many workloads, this is more useful than storing the full prompt and full response.

Full prompt and response logging may be needed for debugging, investigation or regulated workflows, but it should be controlled with:

redaction
encryption
restricted access
retention limits
environment separation
clear incident review process

Logging Decision Table

Log item	Usually useful	Risk
Request ID	Yes	Low
Tenant ID	Yes	Medium
Prompt version	Yes	Low
Model route	Yes	Low
Guardrail result	Yes	Low
Retrieved document IDs	Yes	Medium
Full prompt	Sometimes	High
Full response	Sometimes	High
Retrieved document text	Rarely	Very high
User feedback	Yes	Medium

The goal is to make AI behaviour explainable without turning logs into a second copy of sensitive business data.

For cost and usage analysis, Amazon Bedrock also supports per-request metadata tagging, which can help teams attach key-value metadata to inference calls and analyse usage by team, application, environment or experiment.

8. Secure AI Agents with Tool Boundaries

Agentic workflows introduce another layer of risk.

A chatbot that answers questions is one thing.

An AI agent that can call APIs, query systems, create tickets, change records, trigger workflows or interact with cloud resources is different.

The risk is not only what the model says.

The risk is what the model can do.

The OWASP Top 10 for Large Language Model Applications highlights risks such as prompt injection, sensitive information disclosure, insecure plugin design and excessive agency. Those risks become more serious when the AI system can act through tools or APIs.

A secure agent architecture should avoid giving the model direct access to powerful credentials or broad APIs.

A better pattern is:

Model suggests action
→ Tool broker evaluates request
→ Policy engine checks permissions
→ Human approval if required
→ Tool executes with scoped credentials
→ Result is logged
→ Response is returned

Agent Governance Controls

Control	Purpose
Tool allowlist	Limits what the agent can call
Scoped credentials	Prevents broad system access
Parameter validation	Blocks unsafe or malformed tool calls
Human approval	Adds control for high-risk actions
Dry-run mode	Lets teams test suggested actions safely
Transaction limits	Prevents unbounded changes
Full action logging	Supports audit and investigation
Rollback plan	Reduces operational impact

The agent should not be the authority.

The platform should be the authority.

The model can propose.

The control plane decides.

9. Build Governance into the AWS Architecture

A practical AWS architecture for AI security and governance may include the following layers.

Users / product services
→ API Gateway, ALB or service mesh entry point
→ Authentication and tenant resolution
→ AI gateway / inference control layer
→ Policy engine
→ Secure retrieval layer
→ Prompt assembly layer
→ Model route
→ Guardrails and validation
→ Observability, audit logs and cost attribution
→ Product response

Inside the policy engine, the platform should usually define:

capability policy
tenant policy
data access policy
model route policy
logging policy

Inside the secure retrieval layer, the platform should usually enforce:

metadata filtering
tenant filtering
document permissions
ingestion validation

Inside the prompt assembly layer, the platform should usually manage:

prompt versioning
context limits
redaction
output schema selection

The model route may include:

Amazon Bedrock
self-hosted model on Kubernetes
restricted route
batch route

This architecture can be implemented in different ways depending on maturity.

For a smaller team, the AI gateway may start as a shared service inside the application platform.

For a scale-up, it may become a dedicated platform component with policy-as-code, centralised logging and standard integration patterns.

For a regulated environment, it may need stronger separation across AWS accounts, environments, data classes and tenant boundaries.

The important point is not the exact implementation.

The important point is ownership.

AI governance needs a clear owner inside the platform.

If no team owns the runtime control layer, governance will spread across product code, prompts, IAM policies, notebooks, vector stores and dashboards.

That does not scale.

10. Production Readiness Checklist for AI Security on AWS

Before scaling an AI system in production, engineering teams should be able to answer these questions.

Area	Readiness question
Ownership	Who owns the AI runtime governance layer?
Identity	Can every request be tied to a user, tenant, service or workflow?
Authorisation	Are AI capabilities authorised before model invocation?
Model access	Can product services call models directly, or only through approved paths?
RAG security	Is retrieval permission-aware before context enters the prompt?
Data protection	Is sensitive data redacted, masked or blocked where required?
Prompt governance	Are prompts versioned, reviewed and traceable?
Guardrails	Are guardrails applied consistently across workflows?
Model routing	Are model routes selected by task, sensitivity, latency and cost?
Logging	Is audit data useful without storing unnecessary sensitive content?
Monitoring	Can teams see latency, errors, token usage, throttling and guardrail results?
Agents	Are tool calls mediated through policy and approval controls?
Compliance	Can the team explain how a response was produced?
Cost governance	Are token usage and model routes attributed by tenant, feature and workflow?
Incident response	Can a risky prompt, model route or retrieval source be disabled quickly?

If many answers are unclear, the AI feature may work, but the platform is not yet ready for controlled production use.

Common Failure Patterns

1. "We use Bedrock, so security is handled"

Amazon Bedrock reduces the operational burden of model hosting.

It does not remove the need to design application-level controls around identity, retrieval, prompts, logging, validation and governance.

2. "We added guardrails, so governance is covered"

Guardrails help with specific classes of unsafe input and output.

They do not replace access control, secure retrieval, prompt governance, audit design or agent permissions.

3. "The model should not reveal sensitive data because the prompt says so"

Prompts are not access control.

Sensitive or unauthorised data should not enter the model context in the first place.

4. "We log everything for debugging"

Full prompt and response logs may help debugging, but they can also create a sensitive data store.

AI observability should be designed with redaction, retention and access controls.

5. "Each team can choose its own model route"

That may work during experimentation.

In production, model access needs shared rules around security, cost, latency, data sensitivity and compliance.

FAQ: AI Security and Governance on AWS

What is AI governance on AWS?

AI governance on AWS is the set of controls used to manage how AI systems access data, call models, apply prompts, use guardrails, generate outputs, log activity and operate in production. For LLM systems, governance should sit across the request path, not only at the model endpoint.

Is Amazon Bedrock secure enough for production AI workloads?

Amazon Bedrock provides managed access to foundation models and integrates with AWS controls such as IAM, CloudWatch, CloudTrail, logging and guardrails. But production security also depends on how the application controls identity, retrieval, prompt handling, model routing, validation, logging and tenant boundaries.

Are Amazon Bedrock Guardrails enough for AI governance?

No. Bedrock Guardrails are an important runtime control, but they are not the full governance model. Teams still need access control, secure RAG design, prompt versioning, audit logging, model allowlists, tool permissions and incident response processes.

How do you secure RAG applications on AWS?

Secure RAG starts before retrieval. Teams should classify documents, preserve tenant and permission metadata, apply access filters before prompt assembly, validate retrieved context, monitor document ingestion and log which sources influenced each answer.

What should be logged in a production LLM system?

A useful AI audit event should include request ID, tenant, workflow, prompt version, model route, guardrail result, retrieval document IDs, validation status, token usage, latency, retry behaviour and cost attribution. Full prompts and responses should be logged only when necessary and protected with redaction, encryption, access controls and retention limits.

Where Bion Helps

AI security on AWS is not a single configuration choice.

It is a platform design problem.

A working AI prototype may prove the product idea, but production requires stronger controls around model access, RAG permissions, prompt governance, guardrails, observability, audit logging, cost attribution and operational ownership.

Bion helps teams design and implement production-ready AI platforms on AWS, including secure LLM deployment, Amazon Bedrock integration, RAG architecture, AI platform operations, Kubernetes-based inference workloads, observability and governance controls.

For teams moving beyond prototype stage, the key question is not only:

Can this AI feature produce useful output?

It is:

Can we operate this AI capability securely, reliably and transparently in production?

If your AI product is starting to face security, governance, retrieval, observability or compliance challenges, Bion can help you review the architecture and define the controls needed before those risks scale further.

Book a technical strategy call to review your AWS AI security and governance architecture.

View full post