Redacting Data Sets

Configuring S3 Bucket for Read and Write Access

Dataset redaction works by reading your dataset from your S3 bucket and writing the redacted dataset back to another S3 bucket. The two S3 buckets can be the same bucket or two different buckets. We recommend creating a separate bucket for redacted datasets to avoid accidentally overwriting your original dataset and for improved security.

To configure your S3 buckets for read and write access, follow the AWS-provided instructions which we have summarized below.

For Read Access to Datasets to Redact

Create an IAM policy that allows s3:GetObject on the bucket you want to write redacted datasets to. It is recommended to scope the permissions to a specific location in the bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::source-bucket-name/*"
        }
    ]
}

Next, create an IAM role. For the "Trusted entity type", select AWS account and provide the Philterd Data Services AWS account ID which is TODO:123. Attach the policy you just created to the role. After creating the role, copy the role's ARN and provide it in the Philterd Data Services' Settings for the Dataset role ARN. We will use this role to write redacted datasets to your S3 bucket.

For Write Access of Redacted Datasets

Create an IAM policy that allows s3:PutObject on the bucket you want to write redacted datasets to. It is recommended to scope the permissions to a specific location in the bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::destination-bucket-name/*"
        }
    ]
}

Next, create an IAM role. For the "Trusted entity type", select AWS account and provide the Philterd Data Services AWS account ID which is TODO:123. Attach the policy you just created to the role. After creating the role, copy the role's ARN and provide it in the Philterd Data Services' Settings for the Dataset role ARN. We will use this role to write redacted datasets to your S3 bucket.