2018-01-26 22:46:01 +00:00
|
|
|
|
# Using TensorFlow Securely
|
|
|
|
|
|
2023-10-01 06:02:10 +00:00
|
|
|
|
This document discusses the TensorFlow security model. It describes the security
|
|
|
|
|
risks to consider when using models, checkpoints or input data for training or
|
|
|
|
|
serving. We also provide guidelines on what constitutes a vulnerability in
|
2022-10-24 20:40:15 +00:00
|
|
|
|
TensorFlow and how to report them.
|
|
|
|
|
|
|
|
|
|
This document applies to other repositories in the TensorFlow organization,
|
|
|
|
|
covering security practices for the entirety of the TensorFlow ecosystem.
|
2018-01-26 22:46:01 +00:00
|
|
|
|
|
|
|
|
|
## TensorFlow models are programs
|
|
|
|
|
|
2022-10-24 20:40:15 +00:00
|
|
|
|
TensorFlow
|
|
|
|
|
[**models**](https://developers.google.com/machine-learning/glossary/#model) (to
|
|
|
|
|
use a term commonly used by machine learning practitioners) are expressed as
|
|
|
|
|
programs that TensorFlow executes. TensorFlow programs are encoded as
|
|
|
|
|
computation
|
2018-01-26 22:46:01 +00:00
|
|
|
|
[**graphs**](https://developers.google.com/machine-learning/glossary/#graph).
|
2023-10-01 06:02:10 +00:00
|
|
|
|
Since models are practically programs that TensorFlow executes, using untrusted
|
|
|
|
|
models or graphs is equivalent to running untrusted code.
|
|
|
|
|
|
|
|
|
|
If you need to run untrusted models, execute them inside a
|
|
|
|
|
[**sandbox**](https://developers.google.com/code-sandboxing). Memory corruptions
|
|
|
|
|
in TensorFlow ops can be recognized as security issues only if they are
|
|
|
|
|
reachable and exploitable through production-grade, benign models.
|
|
|
|
|
|
|
|
|
|
### Compilation
|
|
|
|
|
|
|
|
|
|
Compiling models via the recommended entry points described in
|
|
|
|
|
[XLA](https://www.tensorflow.org/xla) and
|
|
|
|
|
[JAX](https://jax.readthedocs.io/en/latest/jax-101/02-jitting.html)
|
|
|
|
|
documentation should be safe, while some of the testing and debugging tools that
|
|
|
|
|
come with the compiler are not designed to be used with untrusted data and
|
|
|
|
|
should be used with caution when working with untrusted models.
|
|
|
|
|
|
|
|
|
|
### Saved graphs and checkpoints
|
|
|
|
|
|
|
|
|
|
When loading untrusted serialized computation graphs (in form of a `GraphDef`,
|
|
|
|
|
`SavedModel`, or equivalent on-disk format), the set of computation primitives
|
|
|
|
|
available to TensorFlow is powerful enough that you should assume that the
|
|
|
|
|
TensorFlow process effectively executes arbitrary code.
|
|
|
|
|
|
|
|
|
|
The risk of loading untrusted checkpoints depends on the code or graph that you
|
|
|
|
|
are working with. When loading untrusted checkpoints, the values of the traced
|
|
|
|
|
variables from your model are also going to be untrusted. That means that if
|
|
|
|
|
your code interacts with the filesystem, network, etc. and uses checkpointed
|
|
|
|
|
variables as part of those interactions (ex: using a string variable to build a
|
|
|
|
|
filesystem path), a maliciously created checkpoint might be able to change the
|
|
|
|
|
targets of those operations, which could result in arbitrary
|
|
|
|
|
read/write/executions.
|
|
|
|
|
|
|
|
|
|
### Running a TensorFlow server
|
2018-01-26 22:46:01 +00:00
|
|
|
|
|
|
|
|
|
TensorFlow is a platform for distributed computing, and as such there is a
|
2023-10-01 06:02:10 +00:00
|
|
|
|
TensorFlow server (`tf.train.Server`). The TensorFlow server is intended for
|
|
|
|
|
internal communication only. It is not built for use in untrusted environments
|
|
|
|
|
or networks.
|
2018-01-26 22:46:01 +00:00
|
|
|
|
|
|
|
|
|
For performance reasons, the default TensorFlow server does not include any
|
|
|
|
|
authorization protocol and sends messages unencrypted. It accepts connections
|
|
|
|
|
from anywhere, and executes the graphs it is sent without performing any checks.
|
2023-10-01 06:02:10 +00:00
|
|
|
|
Therefore, if you run a `tf.train.Server` in your network, anybody with access
|
|
|
|
|
to the network can execute arbitrary code with the privileges of the user
|
|
|
|
|
running the `tf.train.Server`.
|
|
|
|
|
|
|
|
|
|
## Untrusted inputs during training and prediction
|
|
|
|
|
|
|
|
|
|
TensorFlow supports a wide range of input data formats. For example it can
|
|
|
|
|
process images, audio, videos, and text. There are several modules specialized
|
|
|
|
|
in taking those formats, modifying them, and/or converting them to intermediate
|
|
|
|
|
formats that can be processed by TensorFlow.
|
|
|
|
|
|
|
|
|
|
These modifications and conversions are handled by a variety of libraries that
|
|
|
|
|
have different security properties and provide different levels of confidence
|
|
|
|
|
when dealing with untrusted data. Based on the security history of these
|
|
|
|
|
libraries we consider that it is safe to work with untrusted inputs for PNG,
|
|
|
|
|
BMP, GIF, WAV, RAW, RAW\_PADDED, CSV and PROTO formats. All other input formats,
|
|
|
|
|
including tensorflow-io should be sandboxed if used to process untrusted data.
|
|
|
|
|
|
|
|
|
|
For example, if an attacker were to upload a malicious video file, they could
|
|
|
|
|
potentially exploit a vulnerability in the TensorFlow code that handles videos,
|
|
|
|
|
which could allow them to execute arbitrary code on the system running
|
|
|
|
|
TensorFlow.
|
|
|
|
|
|
|
|
|
|
It is important to keep TensorFlow up to date with the latest security patches
|
|
|
|
|
and follow the sandboxing guideline above to protect against these types of
|
|
|
|
|
vulnerabilities.
|
|
|
|
|
|
|
|
|
|
## Security properties of execution modes
|
|
|
|
|
|
|
|
|
|
TensorFlow has several execution modes, with Eager-mode being the default in v2.
|
|
|
|
|
Eager mode lets users write imperative-style statements that can be easily
|
|
|
|
|
inspected and debugged and it is intended to be used during the development
|
|
|
|
|
phase.
|
|
|
|
|
|
|
|
|
|
As part of the differences that make Eager mode easier to debug, the [shape
|
|
|
|
|
inference
|
|
|
|
|
functions](https://www.tensorflow.org/guide/create_op#define_the_op_interface)
|
|
|
|
|
are skipped, and any checks implemented inside the shape inference code are not
|
|
|
|
|
executed.
|
|
|
|
|
|
|
|
|
|
The security impact of skipping those checks should be low, since the attack
|
|
|
|
|
scenario would require a malicious user to be able to control the model which as
|
|
|
|
|
stated above is already equivalent to code execution. In any case, the
|
|
|
|
|
recommendation is not to serve models using Eager mode since it also has
|
|
|
|
|
performance limitations.
|
|
|
|
|
|
|
|
|
|
## Multi-Tenant environments
|
2022-10-24 20:40:15 +00:00
|
|
|
|
|
|
|
|
|
It is possible to run multiple TensorFlow models in parallel. For example,
|
|
|
|
|
`ModelServer` collates all computation graphs exposed to it (from multiple
|
2023-10-01 06:02:10 +00:00
|
|
|
|
`SavedModel`) and executes them in parallel on available executors. Running
|
|
|
|
|
TensorFlow in a multitenant design mixes the risks described above with the
|
|
|
|
|
inherent ones from multitenant configurations. The primary areas of concern are
|
|
|
|
|
tenant isolation, resource allocation, model sharing and hardware attacks.
|
2022-10-24 20:40:15 +00:00
|
|
|
|
|
2023-10-01 06:02:10 +00:00
|
|
|
|
### Tenant isolation
|
2018-01-26 22:46:01 +00:00
|
|
|
|
|
2023-10-01 06:02:10 +00:00
|
|
|
|
Since any tenants or users providing models, graphs or checkpoints can execute
|
|
|
|
|
code in context of the TensorFlow service, it is important to design isolation
|
|
|
|
|
mechanisms that prevent unwanted access to the data from other tenants.
|
2018-01-26 22:46:01 +00:00
|
|
|
|
|
2023-10-01 06:02:10 +00:00
|
|
|
|
Network isolation between different models is also important not only to prevent
|
|
|
|
|
unauthorized access to data or models, but also to prevent malicious users or
|
|
|
|
|
tenants sending graphs to execute under another tenant’s identity.
|
2018-03-23 18:05:14 +00:00
|
|
|
|
|
2023-10-01 06:02:10 +00:00
|
|
|
|
The isolation mechanisms are the responsibility of the users to design and
|
|
|
|
|
implement, and therefore security issues deriving from their absence are not
|
|
|
|
|
considered a vulnerability in TensorFlow.
|
|
|
|
|
|
|
|
|
|
### Resource allocation
|
|
|
|
|
|
|
|
|
|
A denial of service caused by one model could bring down the entire server, but
|
|
|
|
|
we don't consider this as a vulnerability, given that models can exhaust
|
|
|
|
|
resources in many different ways and solutions exist to prevent this from
|
|
|
|
|
happening (e.g., rate limits, ACLs, monitors to restart broken servers).
|
|
|
|
|
|
|
|
|
|
### Model sharing
|
|
|
|
|
|
|
|
|
|
If the multitenant design allows sharing models, make sure that tenants and
|
|
|
|
|
users are aware of the security risks detailed here and that they are going to
|
|
|
|
|
be practically running code provided by other users. Currently there are no good
|
|
|
|
|
ways to detect malicious models/graphs/checkpoints, so the recommended way to
|
|
|
|
|
mitigate the risk in this scenario is to sandbox the model execution.
|
|
|
|
|
|
|
|
|
|
### Hardware attacks
|
|
|
|
|
|
|
|
|
|
Physical GPUs or TPUs can also be the target of attacks. [Published
|
|
|
|
|
research](https://scholar.google.com/scholar?q=gpu+side+channel) shows that it
|
|
|
|
|
might be possible to use side channel attacks on the GPU to leak data from other
|
|
|
|
|
running models or processes in the same system. GPUs can also have
|
|
|
|
|
implementation bugs that might allow attackers to leave malicious code running
|
|
|
|
|
and leak or tamper with applications from other users. Please report
|
|
|
|
|
vulnerabilities to the vendor of the affected hardware accelerator.
|
|
|
|
|
|
|
|
|
|
## Reporting vulnerabilities
|
2018-03-23 18:05:14 +00:00
|
|
|
|
|
2023-10-01 06:02:10 +00:00
|
|
|
|
### Vulnerabilities in TensorFlow
|
|
|
|
|
|
|
|
|
|
This document covers different use cases for TensorFlow together with comments
|
|
|
|
|
whether these uses were recommended or considered safe, or where we recommend
|
|
|
|
|
some form of isolation when dealing with untrusted data. As a result, this
|
|
|
|
|
document also outlines what issues we consider as TensorFlow security
|
|
|
|
|
vulnerabilities.
|
|
|
|
|
|
|
|
|
|
We recognize issues as vulnerabilities only when they occur in scenarios that we
|
|
|
|
|
outline as safe; issues that have a security impact only when TensorFlow is used
|
|
|
|
|
in a discouraged way (e.g. running untrusted models or checkpoints, data parsing
|
2024-10-16 16:10:43 +00:00
|
|
|
|
outside of the safe formats, etc.) are not treated as vulnerabilities.
|
2023-10-01 06:02:10 +00:00
|
|
|
|
|
|
|
|
|
### Reporting process
|
|
|
|
|
|
|
|
|
|
Please use [Google Bug Hunters reporting form](https://g.co/vulnz) to report
|
|
|
|
|
security vulnerabilities. Please include the following information along with
|
|
|
|
|
your report:
|
|
|
|
|
|
|
|
|
|
- A descriptive title
|
|
|
|
|
- Your name and affiliation (if any).
|
|
|
|
|
- A description of the technical details of the vulnerabilities.
|
|
|
|
|
- A minimal example of the vulnerability. It is very important to let us know
|
|
|
|
|
how we can reproduce your findings. For memory corruption triggerable in
|
|
|
|
|
TensorFlow models, please demonstrate an exploit against one of Alphabet's
|
|
|
|
|
models in <https://tfhub.dev/>
|
|
|
|
|
- An explanation of who can exploit this vulnerability, and what they gain
|
|
|
|
|
when doing so. Write an attack scenario that demonstrates how your issue
|
|
|
|
|
violates the use cases and security assumptions defined in the threat model.
|
|
|
|
|
This will help us evaluate your report quickly, especially if the issue is
|
|
|
|
|
complex.
|
|
|
|
|
- Whether this vulnerability is public or known to third parties. If it is,
|
2022-07-30 05:42:26 +00:00
|
|
|
|
please provide details.
|
2018-01-26 22:46:01 +00:00
|
|
|
|
|
2023-10-01 06:02:10 +00:00
|
|
|
|
We will try to fix the problems as soon as possible. Vulnerabilities will, in
|
|
|
|
|
general, be batched to be fixed at the same time as a quarterly release. We
|
|
|
|
|
credit reporters for identifying security issues, although we keep your name
|
|
|
|
|
confidential if you request it. Please see Google Bug Hunters program website
|
|
|
|
|
for more info.
|