Airlock User's Guide Help

What is Airlock?

Airlock is an application that finds and removes sensitive information, such as protected health information (PHI) and personally identifiable information (PII), from natural language text in text files or PDF documents. The types of sensitive information that can be identified by Airlock is configurable to support custom types specific to your domain and use-case.

Given text as input, Airlock applies a sequence of filters to the text to find and remove the desired sensitive information from the text. Airlock then returns the filtered text. Airlock was designed simplicity in mind to make Airlock easy to integrate in existing systems.

Airlock is ideal for text processing pipelines in which sensitive information needs removed or redacted from text. Airlock runs in your cloud and is available on the AWS, Azure, and GCP cloud marketplaces for easy deploy into virtual private clouds. Airlock supports AWS GovCloud.

How does Airlock work?

At a high level, when you send text to Airlock, Airlock looks for sensitive information in the text, manipulates the sensitive information based on how Airlock is configured, and returns the filtered (redacted) text.

Where can I run Airlock?

Anywhere! Airlock is not constrained to any cloud provider or on-premises environment. You can run Airlock in AWS, Azure, GCP, or any cloud provider. Or, you can run Airlock in a Kubernetes cluster or on bare metal.

How do I send text to Airlock?

You send text to Airlock through its API. Airlock's API has a method that accepts text as input and returns the filtered text. Explore Airlock's API. All interactions with Airlock are through its API.

What types of sensitive information does Airlock support and can I customize the types?

The predefined types of sensitive information supported by Airlock are PII and PHI identifiers like names, dates, email addresses, and social security numbers.

Yes, you can customize the types of sensitive information. For example, for a given use-case perhaps you are only interested in removing names and phone numbers and and are not worried about email addresses.

You can also create new types of sensitive information through custom patterns and dictionaries. Dictionary types can be "fuzzy" to allow for misspellings through user-configurable sensitivity levels.

Why would I use Airlock instead of a list of regular expressions or other manual scripts?

Great question and a fair question. A list of regular expressions executed sequentially to find patterns in text is actually what led to the development of Airlock. The list became long, convoluted with logic, and hard to manage. When the list grew it failed to scale to support multiple use-cases. More time was spent trying to manage the list than actually using it.

Airlock solves these problems by providing a centralized means of defining and executing the filters. Policies define the filters and the logic required to apply them. The policies are modular and can be interchanged based on the input text.

Airlock's API provides a standard interface for filtering text and is consumable by virtually any programming or scripting language making it easy to integrate Airlock into any new or existing system.

Airlock's capability to find persons names in the text use state-of-the-art natural language processing techniques and technologies. The models employed by Airlock were trained on text from various domains to improve Airlock's performance across many use-cases. Regular expressions can't do that.

Last modified: 17 November 2023