Development, Staging, and Production

This post is the third in a series that explains how data scientists can turn their scripts into consumer-facing web applications. The intro post for this series explains the scope of the project and contains an index of other posts.

The code seen in this post can be found in the backend branch of the GitHub repo.

cd /to/desired/directory
git clone -b backend https:[email protected]/coshx/portfolio_optimizer.git
cd portfolio_optimizer

Problem, Goal, Solution

  • Problem: Our backend Tornado HTTP server only runs locally and is not publicly accessible.
  • Goal: Set up a decoupled system that allows access to our backend from any frontend framework.
  • Solution: Use a reverse proxy to handle requests to the backend.

The Architecture

In Part 0, we briefly discussed the server architecture seen below. In Part 2, we constructed a Tornado HTTP server, which acts as a data service. In this post, our goal is to implement an Nginx reverse proxy, which is the public facing component of the data service. This architecture offers a few benefits. First, it allows connections from anywhere, including separately hosted frontend applications. It also allows us to spin up as many Tornado backends as want and to use Nginx to load balance among them. Finally, the architecture does not limit our choice of frontend frameworks because all data access occurs as HTTP requests. Because I am using a Linode as a virtual private server, I will supplement general configuration instructions with Linode-specific ones.

Tornado from Development to Production

At the end of Part 2, we were able to run our Tornado HTTP server locally using python -m backend.app --port=XXXX. In production, we would like to start up multiple Tornado HTTP servers, each on a different port. To manage these separate processes, we will use Supervisor. Then, to provide a single point of access and load balancing to these multiple processes, we can use Nginx.

Supervisor

We will largely follow this configuration to set up Supervisor for managing our Tornado HTTP servers.

First, install Supervisor. Supervisor will not run on Windows systems. On Debian-based systems (like Ubuntu) Supervisor can be installed using apt-get. Python’s easy_install can be used on other systems. More details on installation can be found in the documentation.

apt-get install supervisor  # Debian-based install
easy_install supervisor  # other systems

Next, we need to configure Supervisor to know about and handle our Tornado HTTP servers. Supervisor’s configuration file could be in one of many locations based on your system and install method. If using apt-get on Ubuntu, it is located at /etc/supervisor/supervisord.conf. We want to append the following configuration to the end of the .conf file.

[program:stocks-backend]  # name the program stocks-backend
process_name=STOCKS-BACKEND%(process_num)s  # use the process_num variable to get correct number of processes
directory=/var/www/portfolio_optimizer/  # the directory the following command will be run from
command=source </path/to/'which python'> -m backend.app --port=%(process_num)s  # use the conda env to start app
startsecs=2  # minumum runtime (in secs) to be considered successful
user=<yourusername>
stdout_logfile=/var/log/myapp/out-%(process_num)s.log
stderr_logfile=/var/log/myapp/err-%(process_num)s.log
numprocs=2  # number of processes
numprocs_start=8001  # start of port number range for processes

Note: Replace </path/to/'which python'> with the result of which python while in the stocks conda environment. Replace <yourusername> with your actual username. Finally, we specify the port number(s) of our Tornado app(s) by passing the %process_num variable through the conf. Supervisor will create numprocs number of processes with the first process having the value of numprocs_start.

Now, we can load our Supervisor configuration and tell Supervisor to spin up our Tornado HTTP servers. You will need sudo privileges for this. A good resource for this process can be found here.

sudo supervisord
sudo service supervisor status  # check that supervisord is running
sudo supervisorctl update  # apply the new configuration
sudo supervisorctl status  # verify that we are up and running

Supervisor will ensure that each Tornado HTTP server is up and running at all times. If one of the servers goes down, Supervisor will bring it back up again. Following the above link also has directions on how to add an init script that will automatically start supervisord on startup.

Nginx as a Forwarding Proxy for Tornado

Now that we have our supervised Tornado servers running, we could begin developing our frontend application and just point to our data service using the ports we specified above. However, we can construct a more scalable, load balanced solution by running Nginx as a forwarding proxy to our data services. To do this we will use Nginx as a reverse proxy that sits between our Tornado servers and the clients making requests.

Configuring Nginx

First, install Nginx. Then, get a handle on general Nginx configurations. Below I am showing only the most relevant parts for our configuration, which are the upstream, server, and location keywords.

http {
    ##
    # Set localhost ports 8001 and 8002 as an upstream group called tornado-backend
    ##
    upstream tornado-backend {
        server 127.0.0.1:8001  max_fails=3             fail_timeout=3s;
        server 127.0.0.1:8002  max_fails=3             fail_timeout=3s;
    }

    server {
        listen 80;
        server_name stocks.yourdomain.com;
        access_log /var/log/nginx/stocks.yourdomain.com.access.log;
        error_log /var/log/nginx/stocks.yourdomain.com.error.log;

        ##
        # Set stocks.yourdomain.com/backend to reference upstream group
        ##
        location /backend {
            proxy_set_header  Host                     $host;
            proxy_set_header  X-Real-IP                $remote_addr;
            proxy_pass        http://tornado-backend;
        }
    }
}

Note: We use localhost (127.0.0.1) as the IP address as opposed to our server’s public IP address. Also note that the location block passes the real IP address of the original requestor to the Tornado server. Finally, the location block references tornado-backend, which was the name given to our group of upstream Tornado servers.

Updating Tornado Route Handlers

Given the above configuration, stocks.yourdomain.com will be listening on port 80. In order to hit our proxied and load balanced Tornado backends, we need to hit stocks.yourdomain.com/backend. That means we need a new handler for /backend in our backend’s app.py.

def make_app():
    tornado.options.parse_command_line()
    return tornado.web.Application([
        (r"/", MainHandler),
        (r"/backend", MainHandler)
    ])

Now, navigating to stocks.yourdomain.com/backend should return the Tornado server’s “Success!” page. We can also test that the data service works by using a cURL request.

curl -H "Content-Type: application /json" -X POST -d '{"symbols": ["AAPL", "GOOG", "FB"], "start_date": "01-01-12", "end_date": "03-20-16", "principle": 1000.00}' http://stocks.coshx.com/backend

You might consider limiting the traffic to your backend service to only those requests emanating from your frontend application. Similarly, configuring the Tornado HTTP servers to only accept requests from the Nginx reverse proxy can limit your system’s exposure to folks with bad intentions.

In the next post, we will begin our look at Angular 2 and Typescript, focusing on strategies to get off the ground quickly.