SynthID: Tools for watermarking and detecting LLM-generated Text

Generative artificial intelligence (GenAI) can generate a wider array of highly diverse content at scales previously unimagined. While the majority of this use is for legitimate purposes, there is concern that it could contribute to misinformation and misattribution problems. Watermarking is one technique for mitigating these potential impacts. Watermarks that are imperceptible to humans can be applied to AI-generated content, and detection models can score arbitrary content to indicate the likelihood that it has been watermarked.

SynthID is a technology from Google DeepMind that watermarks and identifies AI-generated content by embedding digital watermarks directly into AI-generated images, audio, text or video. SynthID Text has been open sourced to make watermarking for text generation available to developers. You can read the paper in Nature for a more complete technical description of the method.

A production-grade implementation of SynthID Text is available in the Hugging Face Transformers v4.46.0+, which you can try out in the official SynthID Text Space. A reference implementation is also available on GitHub that may be useful for open source maintainers and contributors looking to bring this technique to other frameworks.

Watermark application

Practically speaking, SynthID Text is a logits processor, applied to your model's generation pipeline after Top-K and Top-P, that augments the model's logits using a pseudorandom g-function to encode watermarking information in a way that helps you determine if the text was generated by your model, without significantly affecting text quality. See the paper for a complete technical description of the algorithm and analyses of how different configuration values affect performance.

Watermarks are configured to parameterize the g-function and how it is applied during generation. Each watermarking configuration you use should be stored securely and privately, otherwise your watermark may be trivially replicable by others.

You must define two parameters in every watermarking configuration:

  • The keys parameter is a list of unique, random integers that are used to compute g-function scores across the model's vocabulary. The length of this list determines how many layers of watermarking are applied. See Appendix C.1 in the paper for more details.
  • The ngram_len parameter is used to balance robustness and detectability; the larger the value the more detectable the watermark will be, at the cost of eing more brittle to changes. A length of 5 is a good default value.

You can further configure the watermark based on your performance needs:

  • A sampling table is configured by two properties, sampling_table_size and sampling_table_seed. You want to use a sampling_table_size of at least \( 2^{16} \) to ensure an unbiased and stable g-function when sampling, but be aware that the size of the sampling table impacts the amount of memory required at inference time. You can use any integer you like as the sampling_table_seed.
  • Repeated n-grams in the context_history_size preceding tokens are not watermarked to improve detectability.

No additional training is required to generate text with a SynthID Text watermark using your models, only a watermarking configuration that gets passed to the model's .generate() method to activate the SynthID Text logits processor. See the blog post and Space for code examples showing how to apply a watermark in the Transformers library.

Watermark detection and verifiability

Watermark detection is probabilistic. A Bayesian detector is provided with Hugging Face Transformers and on GitHub. This detector can output three possible detection states—watermarked, not watermarked, or uncertain—and the behavior can be customized by setting two threshold values to achieve a specific false positive and false negative rate. See Appendix C.8 in the paper for more details.

Models that use the same tokenizer can also share watermarking configuration and detector, thus sharing a common watermark, so long as the detector's training set includes examples from all models that share the watermark.

Once you have a trained detector, you have a choice in if and how you expose it to your users, and the public more generally.

  • The fully-private option does not release or expose the detector in any way.
  • The semi-private option does not release the detector, but does expose it through an API.
  • The public option releases the detector for others to download and use.

You and your organization need to decide which detection verification approach is best for your needs, based on your ability to support the associated infrastructure and processes.

Limitations

SynthID Text watermarks are robust to some transformations—cropping pieces of text, modifying a few words, or mild paraphrasing—but this method does have limitations.

  • Watermark application is less effective on factual responses, as there is less opportunity to augment generation without decreasing accuracy.
  • Detector confidence scores can be greatly reduced when an AI-generated text is thoroughly rewritten, or translated to another language.

SynthID Text is not designed to directly stop motivated adversaries from causing harm. However, it can make it harder to use AI-generated content for malicious purposes, and it can be combined with other approaches to give better coverage across content types and platforms.