August 15, 2016
TABLE OF CONTENTS
In our previous article, we introduced you to HIPAA, PHI, and provided an overview of what implications they might have for the implementation of software solutions. Now we’ll talk more specifically about what this means for caching PHI using the Redis in-memory data store. Given the compliance requirements discussed earlier, implementing a caching solution that satisfies those requirements may not be straightforward. Redis does not include a “secure” configuration out of the box.
The solution that follows (inspired by an article from Benjamin Cane, and adjusted to meet our needs) was built on behalf of a healthcare client in need of a HIPAA-compliant caching solution deployed to a public cloud infrastructure. Their application had grown to the point where caching was required, and no off-the-shelf solution was available. We’ll start with documenting what needs to be accomplished.
Here’s what we require in order to ensure HIPAA compliance of our cache that contains PHI:
We don’t need to account for the following:
The solution we’re describing has been implemented on an AWS EC2 instance running Ubuntu 14.04 within a VPC with dedicated tenancy. If you’re deploying to a different Linux version, adjust these steps as appropriate.
The first line of defense is the network. When using redis as an application cache, you must ensure that the outside world can’t access the instance. In our Virtual Private Cloud, cache instances run in their own private subnet, accessible from the DMZ and the bastion host only. In AWS, this can be accomplished via an Access Control List configured to only allow access to the caching subnet from those two CIDR ranges, and a Security Group that denies all traffic except the following: Allow application hosts (or CIDR block) to access port 6379 (redis) Allow bastion hosts to access port 22 (ssh).
Also, recall that HIPAA requirements dictate that each person signing into a host should do so using their own (auditable) credentials. In general, HIPAA compliance is best served by using thorough network and host security practices.
We should install the server first. We’ll install the appropriate packages, configure redis, generate an x509 certificate, and configure stunnel.
As an administrative user (i.e., with sudo access), install the redis-server and stunnel4 packages on the redis host:
apt-get install redis-server stunnel4 update-rc.d redis-server enable
If it’s not already present on the system, openssl should be required by stunnel4. That’s it.
After installing the redis package, you’ll need to edit its config file, which can usually be found in /etc/redis/redis.conf. In particular, we will be adding a passphrase, disabling writes to disk, and renaming commands which cause insecurities. In the section of redis.conf dealing with persistence, comment (or remove) all the lines starting with save, or replace them with:
We also want to disable problematic commands. Include the following, selecting a new config command:
rename-command CONFIG <NEW_CONFIG_NAME> rename-command SAVE "" rename-command BGSAVE "" rename-command BGREWRITEAOF “” rename-command DEBUG ""
You can disable as many of redis’s standard commands as you like. That’s a basic list, but there are other things that are worthy of exploration (SHUTDOWN, FLUSHDB, and FLUSHALL come immediately to mind).
Finally, we want to ensure that applications (and users) connecting to redis require an additional layer of authentication. Add the following to redis.conf:
By default, redis will run on port of 6379 of localhost. Review the redis.conf file to ensure this is true. Applications should not be able to connect from outside of the host without going through stunnel4.
Before we can configure stunnel4, however, we need an x509 certificate we can share between the server and any clients.
An x509 certificate can be purchased or generated on a host you control, as long as you have access to openssl. First you’ll generate a key, then a new certificate from that key:
openssl genrsa -out /etc/stunnel/key.pem 4096 openssl req -new -x509 -key /etc/stunnel/key.pem -out /etc/stunnel/cert.pem -days 1826
The -days flag specifies the number of days the certificate will remain valid; 5 years seems like a reasonable duration. You’ll also need to fill in a number of fields to complete the certificate. When complete, merge the key and certificate into a single .pem file, and change the permissions:
cat /etc/stunnel/key.pem /etc/stunnel/cert.pem > /etc/stunnel/private.pem chmod 640 /etc/stunnel/key.pem /etc/stunnel/cert.pem /etc/stunnel/private.pem
This file will need to be present on the server (we’ll assume that the server is where it was generated), and copied to all clients that intend to connect.
Having generated the certificate, we’re ready to configure stunnel.
Configuring stunnel requires enabling the service and creating a tunnel file; each tunnel runs in its own process and connects to a single IP address and port. We’ll be setting up a redis-specific configuration.
In /etc/default/stunnel4, enable the service by setting ENABLED=1. When enabled, you can create a tunnel configuration in the /etc/stunnel directory. For redis, create /etc/stunnel/redis-server.conf:
cert = /etc/stunnel/private.pem pid = /var/run/stunnel.pid [redis] accept = external-host-ip:6379 connect = 127.0.0.1:6379
That’s it! Restart the redis and stunnel services, and move on to the client.
Having configured the server, you’ve already taken most of the steps you’ll need for a client configuration. You’ll need to repeat these steps on each client intending to access redis.
The stunnel service runs transparently; as long as the stunnel service is configured and the service is running, it will appear to any client that redis is running locally, and accessible from 127.0.0.1 and port 6379. We’ve offloaded securing the connection from a client application to a system service for a couple of reasons.
First, stunnel is very fast, and will likely be faster than any client when handling encrypting and decrypting the connection to redis: that’s the only thing it’s built for.
Second, it allows us to create a wall between the application and the credentials necessary to access our redis cache. While the application will need to know the password we chose earlier, it won’t be able to view the x509 certificate. As long as your application is running as a user without administrative privileges, you’ll be able to sleep a little better at night.
We’ll need to install and configure stunnel4 on the client as well as the server; optionally, you may install redis-tools if you want to test the connection from the client before you trust your application with it.
apt-get install stunnel4 [redis-tools]
Before we can configure stunnel, we need a copy of the x509 cert generated on the server. For simplicity, drop it into the same place:
As before, we’ll need to set ENABLED=true in /etc/default/stunnel4, and create a tunnel file in /etc/stunnel. On the client, a couple of things are switched—in /etc/stunnel/redis-client.conf:
cert = /etc/stunnel/private.pem client = yes pid = /var/run/stunnel.pid [redis] accept = 127.0.0.1:6379 connect = ip-of-redis-server:6379
Restart the stunnel service, and you’re good to go. Yes, that’s all there is to it.
Let’s talk about the questions that are likely on your mind before setting this up, namely: security in practice, trade-offs, scalability, and automation.
Beginning with the understanding that there is no such thing as perfect security, this system provides a solid defense—with a few conditions. It only works if you’re a responsible administrator, and you’ve developed a secure application.
As we’ve defined it above, in order to compromise redis, an attacker must have access to the x509 certificate to make the connection, and must know the redis password defined in its config file. These are easy to achieve if an attacker can compromise the server or client hosts.
By keeping security patches current on the host, limiting access to the host via the network and firewall rules, and limiting the number of people able to sign into a user account on the host, your bases are largely covered. Remain vigilant and responsible.
Likewise, if an attacker can compromise a client application that connects to redis, the information contained therein is compromised. It’s likely they’ll also have access to other data sources in that case. Just as you need to remain vigilant to protect your network and hosts, use good security practices and tools when developing or integrating the client software.
Remember, however, that our goal is not to create a perfectly impregnable environment, but to ensure that we’re meeting HIPAA compliance recommendations and providing our users and customers with as much risk mitigation as we can, but subject to our chosen trade-offs. We’re also scoping this to the connection between a client application and a redis datastore running remotely. From that point of view, this arrangement is sufficiently secure.
All technical decisions have trade-offs. In this case, we’re trading a certain amount of performance (sending and receiving data via stunnel carries an execution cost) for the ability to run a cache remotely, likely because it needs to be accessible from a number of application hosts.
We’re also creating a certain amount of complexity: an additional service must be configured and running on both client and server in order for the system to communicate. This is exactly the kind of thing one might neglect to implement in a test environment, so there is the potential for differences in performance and function between test and production. This risk can be mitigated by ensuring the test environment and the production environment match configurations.
Finally, because we’re choosing a simple method and a single redis server, our ability to scale our caching layer is limited. This is perhaps the greatest weakness of this system; however, for most applications, by the time you’ve reached the need for automatically-scaling distributed caching, you’ll need to re-evaluate your overall caching strategy anyway. This solution is most appropriate for applications that have grown sufficiently enough to need a caching layer but have not yet undergone a major performance optimization.
With our first article in this series, we provided you with an overview of HIPAA compliance and PHI. Here, we’ve provided a straightforward lockdown of a Redis instance that satisfies all HIPAA compliance requirements. While we didn’t address documentation requirements or personnel responsibilities in this article, they will mirror those present in the rest of your application environment.
In this article, we configured the system manually–there is value in understanding all the steps to be performed to make this (relatively simple) lockdown a reality. In Levvel’s practice, we try to automate provisioning whenever possible. We’re also firm believers in open-source solutions. We’ll be following up this article with two additional libraries: a chef cookbook and puppet module, each of which implements the solution as described above for easy integration into your application infrastructure.
As in all cases of implementation, there are details that will vary between our system and your setup, and it should be easy to adjust accordingly. We hope you’ve gained another technique for building secure and performant software. Please feel free to contact us with any questions.
At the end of lunch with a mentee, I used the items on our table to express the fundamental concepts of Kubernetes. Sometime after explaining the purpose of the Kubernetes scheduler, she asked a question I spent the next several weeks thinking about.
API design is crucial, giving structure to application interaction. Given cross-functional teams and applications, development time is reduced with a clear, intuitive way to access data. API development often follows two approaches: REST and GraphQL.
As of June 2018, the state of California passed a new privacy law that could lead to more consequences for US-based companies than the European Union’s General Data Protection Regulation (GDPR). Here's what you need to know and how to be compliant.
Before your data scientists wring value out of your reams of data, it has to be accessible and, on some basic level, coherently arranged. To harness all that brainpower, you need to keep the data wrangling to a minimum. Enter the data lake.