Phirestream with AWS Managed Streaming for Apache Kafka (MSK)
Phirestream can be used to redact sensitive information such as personally identifiable information (PII) and protected health information (PHI) from streaming text in Amazon Managed Streaming for Apache Kafka (MSK) clusters. This guide requires you have an Apache Kafka cluster running in Amazon MSK. Refer to the AWS documentation for creating an AWS MSK cluster.
Launch Phirestream in your AWS account.
Phirestream AWS Architecture
Phirestream works as a proxy in front of Apache Kafka and Amazon MSK. Phirestream exposes a REST interface that accepts messages, redacts the sensitive information in the data, and then produces the message to the Kafka brokers.
AWS MSK Cluster Configuration
An example MSK cluster configuration is shown below:
AWS MSK Security Group
The following are example security group rules to allow communication with the brokers using TLS. Customize these rules per your VPC and subnet settings. See the AWS MSK documentation for other ports.
Phirestream Settings
Edit the /opt/phirestream/config/application.properties file to set the addresses of the MSK cluster:
As an example:
Restart Phirestream for the change to take affect.
Phirestream is now ready to receive your text via its Kafka-compliant REST API. The redacted text will be written to the MSK cluster on the appropriate topic. See the Getting Started guide for text redaction examples and refer to the AWS MSK documentation for consuming the redacted text.