Skip to content

Random Number Generation

Random number generation is a fundamental aspect of many applications, particularly those involving cryptography, simulations, and statistical analysis. Ensuring the quality and unpredictability of random numbers is crucial for maintaining security and integrity in these systems. This document provides an overview of random number generation and its importance in various contexts.

Importance for Anonymizing PII

When anonymizing Personally Identifiable Information (PII), the strength of the random number generator (RNG) is important to replace original values with surrogate values. If the surrogate values are generated using a weak or predictable RNG, the anonymization process can be compromised, which can enable reversal or predictable values.

The primary goal of PII anonymization is to prevent the re-identification of individuals. If an RNG is predictable, an attacker might be able to determine the sequence of "random" numbers used. By discovering the seed or the algorithm's pattern, they could potentially reverse the anonymization process and map surrogate values back to the original PII.

Our Random Number Generator

Philterd Data Services implements a robust and high-performance cryptographically secure pseudo-random number generator (CSPRNG) based on the ChaCha20 stream cipher. This implementation ensures that all random numbers used for anonymization, such as salts and initialization vectors, meet the highest security standards. ChaCha20 is used and trusted by many organizations for its high performance and security.

We use ChaCha20 because it is a modern, high-speed stream cipher that provides excellent security and performance. Unlike some older algorithms, ChaCha20 is resistant to various cryptographic attacks and is widely recommended for use in secure systems.

The Security page in Philterd Data Services' dashboard shows a chart of our random number generator. The chart is intended to provide a visual representation of the generator's randomness. Note that the values on the chart do not follow any visual patterns as shown in the comparison image below:

Comparison of RNGs

Seeding with High Entropy

To prevent cryptographic patterns and replay attacks, each document redaction uses a unique, high-entropy nonce. Our hybrid approach combines the AWS Nitro System's dedicated hardware security chip with entropy from an offsite device certified by NIST standards (SP 800-90B), ensuring industry-leading unpredictability and protection.

Security and Performance

By combining the cryptographic strength of ChaCha20 with high-entropy seeding, Philterd Data Services provides:

  • Unpredictability: Ensuring that generated sequences cannot be predicted or reversed.
  • High Performance: ChaCha20 is designed to be fast in software, ensuring that the anonymization process remains efficient even at scale.
  • Compliance: Meeting industry standards for the use of cryptographically secure random number generation in data privacy applications.