Redaction Policy Syntax Reference

Important: Knowledge of the policy syntax is not necessary to use Philterd Data Services. Creating or editing policies by hand allows for powerful control over the redaction process, but it is not required.

Philterd Data Services uses a powerful and flexible JSON-based syntax for defining redaction policies. This structure allows you to exercise precise control over how sensitive information is identified, classified, and subsequently protected within your documents and datasets.

This reference guide provides a deep dive into the individual components of a policy, the available filters, and the diverse processing strategies at your disposal.

If you need help creating a policy that meets your needs, please contact us.

Core Policy Structure

A Philterd redaction policy is defined as a JSON object containing several mandatory and optional top-level fields. This structured approach ensures that your data protection rules are both machine-readable and easy to manage.

Top-Level Fields

version (Text, Required): The version of the policy syntax being used. The current stable version is 1.0.
name (Text, Required): A unique, descriptive name for the policy (e.g., "Standard-HIPAA-Ruleset"). The name is to allow you to quickly understand the policy's purpose.
description (Text, Optional): A brief explanation of the policy's purpose or the regulatory standard it is designed to meet. The description is to help you remember extra information about the policy.
notes (Text, Optional): Additional information about the policy that you would like to keep with the policy. Notes are useful to remember what the policy is for, why changes were made to the policy, or any other extra information that might be helpful.
filters (Object, Required): A mapping where each key is a PII/PHI type and the value is a list of redaction strategies to be applied to that type.
termsToIgnore (Array of Text, Optional): A "whitelist" of specific terms that the system should never redact, even if they match a filter.
termsToAlwaysRedact (Array of Text, Optional): A "blacklist" of specific terms that must be redacted every time they are encountered.
highlightChangesinWordDocuments (Boolean, Optional): Specifically for .docx files. If set to true, the redacted areas in the output document will be highlighted (usually in yellow) for easy visual identification.
turnOnRevisionsinWordDocuments (Boolean, Optional): Specifically for .docx files. If set to true, the platform will enable Microsoft Word's "Track Changes" (Revisions) feature, showing exactly what was removed and what it was replaced with.
disambiguationScope (Text, Optional): The scope of the disambiguation during redaction. Valid values are Document and Context. Defaults to Document if not specified.
lens (Text, Optional): The lens to apply during redaction. Valid values are general, financial, healthcare, legal, and news. Defaults to general if not specified. See Redaction Lenses for more details.

Example Policy

The following example shows a policy that handles multiple data types with different strategies:

{
  "version": "1.0",
  "name": "comprehensive-privacy-policy",
  "description": "A robust policy for general PII and PHI protection",
  "notes": "I created this policy to redact emails and truncate SSNs.",
  "filters": {
    "PERSON": [
      {
        "strategy": "REDACT",
        "simplifiedCondition": {
          "confidence": 90
        }
      }
    ],
    "EMAIL_ADDRESS": [
      {
        "strategy": "ANONYMIZE"
      }
    ],
    "SSN": [
      {
        "strategy": "LAST_4"
      }
    ]
  },
  "termsToIgnore": ["Philterd", "Headquarters", "Global Sales"],
  "termsToAlwaysRedact": ["Confidential Project Phoenix", "Top Secret"],
  "highlightChangesinWordDocuments": true,
  "turnOnRevisionsinWordDocuments": false,
  "disambiguationScope": "Document"
}

Supported PII/PHI Filters

Filters are the engines that identify specific categories of sensitive information. Philterd supports a wide array of filters, leveraging both pattern matching (RegEx) and advanced AI-based Named Entity Recognition (NER).

Filter Key	Description	Detection Method
`PERSON`	Full names of individuals.	AI / NER
`FIRST_NAME`	Common first names.	Dictionary / Pattern
`SURNAME`	Common surnames/last names.	Dictionary / Pattern
`EMAIL_ADDRESS`	Electronic mail addresses.	Pattern
`SSN`	Social Security Numbers.	Pattern
`AGE`	References to a person's age.	AI / Pattern
`PHONE_NUMBER`	Telephone numbers (various formats).	Pattern
`STREET_ADDRESS`	Physical street addresses.	AI / NER
`LOCATION_CITY`	City names.	AI / NER
`LOCATION_STATE`	State or province names.	AI / NER
`ZIP_CODE`	Postal/Zip codes.	Pattern
`DATE`	Calendar dates.	AI / Pattern
`CREDIT_CARD`	Credit and debit card numbers.	Pattern
`BANK_ROUTING_NUMBER`	Financial institution routing codes.	Pattern
`IP_ADDRESS`	IPv4 and IPv6 addresses.	Pattern
`URL`	Website addresses and links.	Pattern
`PASSPORT_NUMBER`	National passport numbers.	Pattern
`DRIVERS_LICENSE_NUMBER`	Motor vehicle license numbers.	Pattern
`VIN`	Vehicle Identification Numbers.	Pattern
`IDENTIFIER`	Custom identifiers (MRNs, Employee IDs).	Pattern / AI

Redaction Strategies

A strategy determines the specific transformation applied to a piece of information once it has been identified by a filter.

REDACT: The most common strategy. It replaces the sensitive text with a descriptive label enclosed in brackets. For example, "John Doe" becomes [PERSON].
ANONYMIZE: Replaces the identified text with a synthetically generated but realistic value of the same type. For example, one email address might be replaced with a different, fake email address.
LAST_4: Useful for identifiers like SSNs or bank accounts. It replaces all characters except for the final four. For example, "123-456-7890" becomes *******7890.
MASK: Replaces every character in the sensitive string with a specific masking character (typically * or X).
TRUNCATE_TO_YEAR: Specifically for dates. It removes the month and day, preserving only the year. "January 15, 2023" becomes 2023.
SHIFT: For dates. It shifts the date forward or backward by a random number of days (useful for maintaining chronological relationships in datasets while protecting specific dates).
FPE (Format-Preserving Encryption): Encrypts the sensitive data using a key while ensuring the output has the same format and length as the input. This is ideal for maintaining database schema compatibility.

Confidence Thresholds (AI Filters Only)

For filters that utilize AI and machine learning (like PERSON or STREET_ADDRESS), you can specify a confidence threshold within the simplifiedCondition block. This value (0-100) represents the level of certainty the AI must have before it applies the redaction strategy.

High Confidence (e.g., 95+): Minimizes "false positives" (redacting things that aren't actually PII) but may miss some actual sensitive information.
Low Confidence (e.g., 70-80): Ensures more comprehensive redaction but increases the risk of "over-redaction."
Default: If not specified, the system typically defaults to a confidence score of 85.

Advanced Management of Terms

Whitelisting via `termsToIgnore` (Never Redact)

The termsToIgnore list is critical for preventing the accidental redaction of non-sensitive information that may resemble PII. This is effectively a list of terms to never redact. Common use cases include:

Your organization's name or brand.
Publicly known office locations or city names that are central to the document's context.
Industry-specific terminology that might be misidentified as names or locations.

Blacklisting via `termsToAlwaysRedact` (Always Redact)

The termsToAlwaysRedact list ensures that specific, high-risk terms are caught every time, regardless of the automated filters. This is effectively a list of terms to always redact. This is perfect for:

Sensitive project codenames (e.g., "Project X").
Specific internal identifiers that do not follow a standard pattern.
Any proprietary terms that must remain confidential.

Implementation Guide

To implement a policy using this syntax:

Access the Dashboard: Navigate to the Redaction Policies section.
Enter the Advanced Editor: Click the Advanced Editor button for your chosen policy.
Construct Your JSON: Using the rules above, build your JSON configuration.
Save and Validate: Click Save. The platform will validate your JSON structure.
Verify via Testing: Always test your new policy on a representative sample document in the Redacting Documents section to ensure it behaves as expected before wide-scale deployment.