What is Philter?

Philter is an application that finds and removes sensitive information, such as protected health information (PHI) and personally identifiable information (PII), from natural language text in text files or PDF documents. The types of sensitive information that can be identified by Philter is configurable to support custom types specific to your domain and use-case.

Given text as input, Philter applies a sequence of filters to the text to find and remove the desired sensitive information from the text. Philter then returns the filtered text. Philter was designed simplicity in mind to make Philter easy to integrate in existing systems.

Philter is ideal for text processing pipelines in which sensitive information needs removed or redacted from text. Philter runs in your cloud and is available on the AWS, Azure, and GCP cloud marketplaces for easy deploy into virtual private clouds. Philter supports AWS GovCloud.

How does Philter work?

At a high level, when you send text to Philter, Philter looks for sensitive information in the text, manipulates the sensitive information based on how Philter is configured, and returns the filtered (redacted) text.

Philter finds and removes sensitive information from text.

Where can I run Philter?

Anywhere! Philter is not constrained to any cloud provider or on-premises environment. You can run Philter in AWS, Azure, GCP, or any cloud provider. Or, you can run Philter in a Kubernetes cluster or on bare metal.

How do I send text to Philter?

You send text to Philter through its API. Philter's API has a method that accepts text as input and returns the filtered text. Explore Philter's API. All interactions with Philter are through its API.

What types of sensitive information does Philter support and can I customize the types?

The predefined types of sensitive information supported by Philter are PII and PHI identifiers like names, dates, email addresses, and social security numbers.

Yes, you can customize the types of sensitive information. For example, for a given use-case perhaps you are only interested in removing names and phone numbers and and are not worried about email addresses.

You can also create new types of sensitive information through custom patterns and dictionaries. Dictionary types can be "fuzzy" to allow for misspellings through user-configurable sensitivity levels.

Why would I use Philter instead of a list of regular expressions or other manual scripts?

Great question and a fair question. A list of regular expressions executed sequentially to find patterns in text is actually what led to the development of Philter. The list became long, convoluted with logic, and hard to manage. When the list grew it failed to scale to support multiple use-cases. More time was spent trying to manage the list than actually using it.

Philter solves these problems by providing a centralized means of defining and executing the filters. Policies define the filters and the logic required to apply them. The policies are modular and can be interchanged based on the input text.

Philter's API provides a standard interface for filtering text and is consumable by virtually any programming or scripting language making it easy to integrate Philter into any new or existing system.

Philter's capability to find persons names in the text use state-of-the-art natural language processing techniques and technologies. The models employed by Philter were trained on text from various domains to improve Philter's performance across many use-cases. Regular expressions can't do that.

Last modified: 08 November 2023