Containerizing A Modern Application, Part 2: Your Application
In our last post, we explored the process of selecting an upstream container and preparing a version of that image for our application. In this post, we’ll explore making the container fully our own by putting our source code into the container and starting our main process.
Step 3: Acquire Your Code
At this point, the container doesn’t know anything about your application, but your preparations have allowed you to leverage your working familiarity with how to run your code. For a Ruby on Rails application, assuming your gem bundle is up to date, running ‘rails s’ will allow you to connect to the service you’re writing. Node applications are very similar. For PHP, you will have an Apache server configured to read your files at request time and interpret them, so all you need is a web server configured to run PHP. For a Java web application, even though you might be using your integrated development environment’s (IDE) special capabilities to allow more flexibility while you develop, there’s an application server (e.g. Tomcat) where you drop your WAR files, and the server will take it from there. Most other programming languages have a story similar to the ones above, and they can all vary by individual circumstances. Sometimes your code needs to be packaged and sometimes it doesn’t; sometimes it needs to run its own special process and sometimes your code will run alongside the web server itself.
In many cases, the Dockerfile we’ve been working on lives alongside the codebase in the application repository. This is a natural placement for the Dockerfile for many reasons—beyond just the convenience of having your application code immediately available.
One reason is that the ultimate goal of the Dockerfile is to represent all the engineering tasks required to prepare a host operating system that will run your application. As time passes, it’s common for applications to add features that require additional libraries or dependencies. Keeping the Dockerfile in the repository allows for tracking necessary changes to the operating system alongside changes to your codebase and dependencies.
Another reason is that the directory from which you issue the command to build a Docker container has important implications for the build cache, an important architectural concept during container construction. We’re going to be exploring this concept more in Step 4 as we show how this sort of preparation is accomplished.
For those cases where the application server works with the files in place, we’ll be copying them into the container file system from our own using a COPY command. You can use a simple COPY for now, and we’ll be exploring a potential improvement in our next section.
For the purposes of this post, we’re going to ask those of you whose code is packaged for runtime to run that process yourself. (If you have a CI/CD pipeline that produces WARs for you, so much the better!) Download the results and use a similar COPY command to the one below to place your WAR into the proper location (/usr/local/tomcat/webapps for the official containers).
Step 4: Try to Run it Until it Runs
Let’s examine a section of the example Dockerfile we referenced in our last post. For those of you who aren’t familiar at all with Ruby, the bundle command uses a Gemfile and a Gemfile.lock to resolve, download, and make library dependencies available to the application, somewhat similar to npm.
RUN mkdir -p /app ENV PORT 9292 ENV DATABASE_URL postgres://postgres@citygram_db/citygram_development COPY docker/citygram/env.docker /app/.env COPY Gemfile Gemfile.lock /app/ WORKDIR /app RUN bundle install COPY . /app/ CMD bundle exec rackup -o 0.0.0.0 -p $PORT
These commands set up a working directory for us to use, and then set up some default environment variables. The final CMD command indicates the purpose of this container— running an application server. When we issue a docker run command using this container image, the process will start. Our goal is for it to start successfully. The middle commands in the Dockerfile help to illustrate some of the powerful (and bizarre) concepts underpinning containerized operations.
When executing the bundle command for the first time for a particular project on a machine, the command will typically complete in approximately 5 minutes as it processes which dependencies you need and downloads them from the network. (You can imagine the same sort of delay coming from installing operating system dependencies with an apt-get or yum command.) As part of the initial process of trying to make sure your application runs inside the container, you might expect to make dozens of attempts. Despite the undeniable allure of sword fighting on chairs, you don’t really want to sit through minutes of idle time each iteration. Understanding and using the build cache appropriately can prevent that.
We mentioned the layered filesystem of containers in our last post, and it explains the improved behavior we observe from the above two-phase copy. Generally, the software development process changes the source files native to the application more often than it changes the libraries it uses. Therefore, in many cases the Gemfile and Gemfile.lock files won’t have any changes at all to support a new feature branch. Ideally, we’d like to be able to use the results of previous times that command had been issued to download and install gems. In fact, that’s exactly what the Docker build cache delivers.
You can think of the layered file system as a construct similar to source control for your code, except that the domain of control for a Dockerfile is the environment and filesystem of an operating system. Beginning from a known state, if an identical command has been run before and this resulted in a new filesystem layer, the build process will use that layer immediately rather than recomputing the output of the process. Imagine we’ve worked on a branch where no changes to the Gemfile or Gemfile.lock were necessary. When we copy the Gemfile and Gemfile.lock into our working directory separately, and ahead of the balance of the application code, the container build cache still recognizes no state change. As a result, bundle installation will proceed immediately with a message like this:
Step 13/15 : RUN bundle install ---> Using cache ---> 458247338f8c
Of course, this analogy isn’t perfect. With no other activity but the passage of time, the results of a particular operating system command might still change. The default invocations of apt-get and yum, for example, don’t have the same version locking mechanisms we associate with language-specific dependency managers. For this reason, among others, you can issue a docker build —no-cache command to ignore the cache and rebuild from scratch. It’s also somewhat common, particularly in environments where the CI/CD process is responsible for distributing built containers, for container authors to include a CACHED_ON timestamp or similar in their base image. This allows them to trigger a rebuild of downstream containers across the board without requiring any content to be changed inside downstream Dockerfiles.
With that understanding to help you tighten your iteration cycle, it’s time to proceed with your feedback loop. In general, you’ll be building the container and then trying to run it. Here is an example command pair:
docker build --rm -f docker/citygram/Dockerfile -t my/citygram . docker run my/citygram
If the docker run command gives you an error, adjust your Dockerfile, your application code, or both, and try again. This step is the hardest to speak generally about, since so many applications have unique peculiarities in function and configuration, but keep at it and you’ll get something running! For example, it’s at this stage that you’ll need to ensure the FROM line in your Dockerfile uses the correct JRE (e.g. tomcat:8.0-jre8) if you want to use classes compiled by Java 8 and have the Tomcat process start successfully. Our Ruby application will probably need to define a specific Ruby version. Requirements of this type are not always in the project documentation, and the process of containerization helps make them explicit. That’s a good thing!
At this step, your container process can run without crashing, and that’s a significant achievement. However, to function properly, most web applications need supporting services (like a database) and the correct configurations for their use. Those are the topics we’ll cover in our next post.