Engineering

SOA on Rails Pt. 1: Intro

“When you hit the Amazon.com gateway, the application calls more than one hundred services to collect data and construct the page for you.” – Werner Vogels, Amazon CTO

Amazon’s Mandate

About 15 years ago, back when Amazon was just an online bookseller, Jeff Bezos, Founder and CEO, gave every team at Amazon a mandate: To rebuild Amazon’s infrastructure into a Service-Oriented Architecture from top to bottom. But this is around 2001. The term SOA hadn’t even been invented yet. So what exactly did Bezos ask his teams to do? This is the mandate as expressed by Jeff Bezos:

1. All teams will henceforth expose their data and functionality through service interfaces.

2. Teams must communicate with each other through these interfaces.

3. There will be no other form of inter-process communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.

4. It doesn’t matter what technology they use.

5. All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.

Bezos ended his mandate with this: Anyone who doesn’t do this will be fired. Thank you; have a nice day!

Even if they couldn’t guess what Bezos’ mandate would mean for them at the time, No one can dispute the results. Nor can anyone deny that Bezos was years ahead of the curve. Does anyone think of Amazon as a bookseller now?

What Came Before

Big Bad Monolith (The Ol’ Days)

A few years after Bezos’ Mandate, a small band of opinionated ruby developers were releasing Rails into the ether and introducing Ruby to the ways of the distributed system and the enterprise architecture. Their first attempt didn’t break any records, but through a small but dedicated (and growing) following. Rails muscled its way into the mainstream. Most Rails apps at the time were small MVC stacks on top of a single database, on which the application depended heavily. This was a big flaw that was rather obvious, but forgiven because of Rails’ seemingly magical ability to produce a complete, functional (though a bit brittle) web stack with a trivial amount of time and effort.

Don’t get me wrong. The Rails community, along with developers of Ruby Gems and of other related technologies such as Passenger (ModRails), Nginx, Memcached, PostgreSQL, MySQL, and many others worked tirelessly worked to make Rails the center of a scalable multi-tiered platform worthy of even the most sophisicated application architectures. With the help of Load balancers and other scaling strategies, many developers have found success with more traditional Rails stacks.

Common Rails deployments, even those that have not fully embraced SOA, have service-oriented modules within the web and application layers. For instance, Rails controllers conventionally implements RESTful endpoints for their resources. And ModRails has favored a threaded approach to allow many app instances to run concurrently. However, encapsulation has been mostly a province of the code layer.

Don’t feel bad if your app still looks like that. Everyone starts out like that. The Important thing is that you are reading this. Because if you are, it means the time has come to take this to the next level, and you’re ready to act. Let’s go build a web service!

*…Hold on… What exactly is a web service anyway? I’m glad you asked!*

What is a Web Service

Working Behind The Scenes

Web services are self-contained, modular, distributed, web applications that can be described, published, located and/or invoked over a network to execute some action specified in the request. These applications are ideally small, representing a single, specific function or domain model. Distributed over a network and usually web-based, they can be internal, with some managing application as the primary consumer or other services acting as clients. They can also be external and web-facing, accepting requests from 3rd-party client applications (Facebook API, Twitter, and Google Maps come to mind). Regardless, they are all built on top of open standards such as TCP/IP, HTTP, JSON, and XML. Well, that’s what I heard anyway.

API (The Contract):

In order to provide the service for which it is intended, a web service needs several things. It needs a publisher or broker so clients can find it. But knowing where to send the envelope is only the beginning. Clients also need to know how to communicate with the service. The Service Contract is the means by which a service communicates its API to potential clients. An Application Program Interface (API) specifies how software components should communicate and interact. It describes the location, scheme, protocol, and functions used by a service to accept and process requests, as well as prepare and return responses.

Messaging (The Transaction):

The client and service talk to each other via messages. Clients send a request to the server, and the server replies with a response. Apart from the actual data, these messages also contain metadata (HTTP Method, headers, etc.) about the message. The structure of the message depends in large part on the protocol the service implements.

The two most common protocols are SOAP and REST (with REST emerging as the clear frontrunner in later years). The figure below shows a simple RESTful web service using JSON as the message format. I’ll get into them more later, but I’ll summarize the difference like this:

SOAP – A soap-based web service has an explicit service contract in the form of a wsdl document. A wsdl file contains a very detailed description of the service’s API. Given the wsdl file, a client will know exactly how to interact with the service, including the data is needs, and the format of the request and response. It’s explicit nature gives the service developer much more control over how the service accessed, and in what context.

REST – A RESTful web service is the triumph of convention over configuration, given that its contract is implicit. It provides no detailed description, because the request’s envelope is the contract. To be RESTful, a service must adhere to the HTTP protocol and its resource-based methodologies. Most would say that what REST sacrifices in control, it more than makes up for in simplicity and maintainability.

What is a Services-Oriented Architecture (SOA)

Concepts and Methodology

SOA is a distributed software architecture designed to facilitate the communication, interaction, and collaboration of loosely-coupled, self-contained web services. Like traditional multi-tiered architectures, SOA is based on a strategy wherein software components are distributed across a network. SOA, however tries to represent business processes and domains as shared, reusable components that can be combined in different ways. Service-orientation takes much of its inspiration and many of its principles from object-orientation. Just as with object-orientation, concepts such as encapsulation, abstraction, and reusability are fundamental to the composition of services within the SOA platform.

Separation of Concerns:

The concept described by the phrase Separation of Concerns, is essentially a methodology that attempts to break large applications into a set of individually defined functions or “concerns.” The logic required to address or solve the larger problem can then be broken down into individual units of logic that address specific concerns. Many design patterns and distributed architectures have applied this concept in different ways. SOA has uniquely evolved out of an ambition to realize a separation of concerns among stand-alone services that can then be accessed and reused by multiple applications. For example, an “Authentication” service can be used for single-signon into numerous applications. Or a “PDF Generation” service can be used by a banking platform to generate invoices or render banks statements. Hell, we could spend all week coming up with ways an email scheduling service could be used any application… but I digress

OUR SOA Platform (At a Glance)

While 3rd-party provider services are well-known, Grand Rounds has mainly implemented the internal services. We maintain several domain-driven web services, handling requests from our main web application and other existing web services. We do employ some provider APIs, such as bulk mail and analytics, for example, but it is the internal type I will be discussing in this article. All of our services are self-contained, compact Rails applications. The only exceptions are a few Sinatra apps running internal, non-critical processes. Our frontend UI and our mobile applications are clients to these services. Some talk directly to the clients returning data and other resources, but most are asynchronous, and event-driven in nature. A high-level, extremely simplified view our system would look like this:

Why SOA

Benefits:

Let’s take a look at a simple SOA architecture. Then we can see what benefits we can glean by comparing the Monolithic approach we started with. Here’s our platform. What have we gained by moving to this approach.

Each service becomes a surprisingly simple component of a complex application platform. Independently, they focus on a single task, but together they form an aggregate application that is both fast and durable. Able to carry larger loads, and yet absorb more losses. The termination a single worker process or a single server instance doesn’t cripple the larger application as a whole.

Growing Pains:

The first and most obvious benefit of Our new architecture is redundancy. With a system like this we can start eliminating those single points of failure from the first approach. Instantly, we have a more robust architecture. In the case of the Database, it is best practice to setup redundancy through methods such as master/slave replication, sharding, etc.

Encapsulation:

Now we can pulling critical domain paths out of our monolithic Rails application, and moving them into independent services. Think about it this way: If Service A represents authentication/authorization service and Service B is a PDF generating service, both having been lifted as separate domain functions from your original application, then now you can update PDF generator engine without affecting the auth service at all. You can move PDF generation to a 3rd-party service without changing the rest of the application. Neither of these things were possible in a single, massive Rails app. As you add or update features in your application, you risk introducing bugs or worse into the rest of the application. Changing anything might break everything. But best of all, the very concept of an SOA architecture makes it easy to move over to a full SOA platform in careful, compact steps. The flexibility it give you allows you to move critical domain logic into self-contained services a peice at a time.

Scalability:

I’d say the most important reason is scalability. Each of our services can be scaled independently. In the old days our public UI and our internal application was ran on the same server instances and bound them to live and die together. By making them separate services we were able to adjust the resources given to each separately. That goes for all of our other services. As you grow and begin to get more traffic, scalability will ascend your priority list and stay there.

Next:

Now that the What and the Why are clear. Let start putting the pieces together.

SOA on Rails Pt. 2: Web Services in Rails

Web QA Tools for Smarter Testing

Reduce repetitive typing tasks with Dash

Dash is a free OSX app that allows you to save text snippets and input them anywhere using custom keywords.

For example, if you have to enter a console command 20 times a day, you can save this command as a snippet in Dash, and title it `my_command. If you want to change a variable in this command each time, you can signify a manual entry point by using “__” on either side of the variable:

Dash snippet

When you run the snippet by typing `my_command in the console, a box will pop up showing the full snippet being entered, with a prompt to enter your variable (in this case, the last name value) manually:

Entering a dash snippet with variable

Dash works within ANY application on OSX, I highly recommend it to reduce typing and speed up those repetetive tasks.

Download Dash

Set up a light-weight web server for realistic javascript testing

When testing web code, getting the content on an internet addressable server is necessary to replicate a live web environment. This has been important at Grand Rounds for testing embedded javascript with cross-site requests.

Using the npm serve package, it only takes a few minutes to set up.

  1. Install the web server:
1
npm install serve
  1. Edit your host file (/etc/hosts) to include a hostname other than localhost:
1
127.0.0.1       www.l.grandrounds.com

Why add a new host name? When your browser connects to a hostname other than localhost, it treats it as an external internet addressable server. This is important for replicating a live environment.

  1. Create a directory containing the pages you want to test:
1
mkdir Pages
  1. In terminal, go to that directory, and serve it:
1
2
cd Pages
serve

If successful, you should see:

serving /Users/Pages on port 3000

  1. In a web browser, open that directory from the server: http://www.l.grandrounds.com:3000/

In your browser, you should see the directory being served, and be able to view your pages from here in a live environment.

Happy Testing!

DataMapper: Special Brownies Pt. I

Here at Grand Rounds, we use Integrity as our CI system. In working with Integrity, I’ve had the pleasure, adventure… of learning about DataMapper. In this entry, I’ve decided to write about one of my adventures with querying in DataMapper.

Background

Let’s say I really like cats, and I am interested in querying the local pet stores for Cat, but my house can only hold ten cats. No more, and no less. In other words, I also wanted to assert that there were ten records returned when I added a limit: 10 to my query of the catabase. No problem!

For this example, let us say I have the option to go to two different shops to pick up my cats: Discount Pets, and Pet Shop Boys. I need to pick up ten, so my I’ll just check the .count on my limit: 10 query.

DataMapper Query

First, let’s read the DataMapper documentation for querying a range:

“If you have guaranteed the order of a set of results, you might choose to only use the first ten results, like this.”

1
  @zoos_by_tiger_count = Zoo.all(:limit => 10, :order => [ :tiger_count.desc ])

Cool, simple enough!

Let’s query Discount Pets!

1
2
3
  pry(main)> discount_pets_cats = DiscountPets::Cat.all(breed: 'Norwegian Forest Cat', limit: 10)
  => [#<Cat @id=498 @name="Aristocat" @breed="Norwegian Forest Cat">, #<Cat @id=741 @name="Apple" @breed="Norwegian Forest Cat">,
  #<Cat @id=838 @name="Anteater" @breed="Norwegian Forest Cat">]

Looks like there are only three.

1
2
  pry(main)> discount_pets_cats.count
  => 3

Correct! I guess I cannot go to Discout Pets to pick up my ten cats.

Let’s query Pet Shop Boys!

1
2
3
4
5
6
  pry(main)> pet_shop_boys_cats = PetShopBoys::Cat.all(breed: 'Norwegian Forest Cat', limit: 10)
  => [#<Cat @id=498 @name="Bob" @breed="Norwegian Forest Cat">, #<Cat @id=741 @name="Bernadette" @breed="Norwegian Forest Cat">,
  #<Cat @id=838 @name="Bill" @breed="Norwegian Forest Cat">, #<Cat @id=223 @name="Benson" @breed="Norwegian Forest Cat">,
  #<Cat @id=198 @name="Beavis" @breed="Norwegian Forest Cat">, #<Cat @id=444 @name="Butthead" @breed="Norwegian Forest Cat">,
  #<Cat @id=568 @name="Basil" @breed="Norwegian Forest Cat">, #<Cat @id=782 @name="Brent" @breed="Norwegian Forest Cat">,
  #<Cat @id=366 @name="Beaver" @breed="Norwegian Forest Cat">, #<Cat @id=324 @name="Bruno" @breed="Norwegian Forest Cat">]

Neat! If I can count, there are ten records. Right?

1
2
  pry(main)> pet_shop_boys_cats.count
  => 23

Wat.

Five mins of wat-ing, printing out pet_shop_boys_cats, counting to ten. I must be missing something, or am really dumb.

Trying to examine this, what’s the sixth cat?

1
2
  pry(main)> pet_shop_boys_cats[5]
  => #<Cat @id=444 @name="Butthead" @breed="Norwegian Forest Cat">

Makes sense, but since it’s telling me there are twenty three cats, what’s the twentieth?

1
2
  pry(main)> pet_shop_boys_cats[19]
  => nil

More wat.

1
2
  pry(main)> PetShopBoys::Cat.all(breed: 'Norwegian Forest Cat').count
  => 23

Okay, so that’s how they’re getting 23… So what is pet_shop_boys_cats?

1
2
  pry(main)> pet_shop_boys_cats.class
  => DataMapper::Collection

So apparently, the limit: 10 does not carry with the query. Great!

Sigh, yes, I’ll just have to convert pet_shop_boys_cats to an array. But should I really have to? Please tell me that it’s a reasonable expectation, let’s try it in ActiveRecord!

ActiveRecord Query

1
2
3
4
5
6
  pry(main)> pet_shop_boys_cats = PetShopBoys::Cat.where(breed: 'Norwegian Forest Cat').limit(10)
  => [#<Cat @id=498 @name="Bob" @breed="Norwegian Forest Cat">, #<Cat @id=741 @name="Bernadette" @breed="Norwegian Forest Cat">,
  #<Cat @id=838 @name="Bill" @breed="Norwegian Forest Cat">, #<Cat @id=223 @name="Benson" @breed="Norwegian Forest Cat">,
  #<Cat @id=198 @name="Beavis" @breed="Norwegian Forest Cat">, #<Cat @id=444 @name="Butthead" @breed="Norwegian Forest Cat">,
  #<Cat @id=568 @name="Basil" @breed="Norwegian Forest Cat">, #<Cat @id=782 @name="Brent" @breed="Norwegian Forest Cat">,
  #<Cat @id=366 @name="Beaver" @breed="Norwegian Forest Cat">, #<Cat @id=324 @name="Bruno" @breed="Norwegian Forest Cat">]

Cool.

1
2
  pry(main)> pet_shop_boys_cats.count
  => 10

Yep.

1
2
  pry(main)> pet_shop_boys.cats.class
  => ActiveRecord::Relation

Thank you, DataMapper.

Conclusion

A query with a limit in DataMapper is simply the query with the addition of nilling the elements passed the given limit, and a to_a performs a .compact. Weird!

CSS Best Practices

examples in slim.

Use a grid system

DO consistently use a grid layout where contents on the page are layed out in increments of the smallest block in the grid. try bootstrap. example below uses bootstrap 3’s 12 grid layout.

1
2
3
4
5
  .row
    .col-md-4
      Name
    .col-md-8
      input type=text

DO NOT modify position (top, left, right) and margins to layout your content

Use relative sizes

DO use ems to specify size for fonts, margins and paddings. You can set the par values at the document level and everything else will just scale relatively

1
2
3
.restaurant {
  font-size: 1.5em;
}

DO NOT use px to specify size. it breaks responsiveness

Be Specific

DO specify accurate/unambiguous selectors. The order of specificity is inline styles, ids, attributes/classes and elements.

1
2
3
.restaurant > div:first-child {
  font-weight: bold;
}

DO NOT use important!. It is almost never needed and is often used because people fail to be specific in their styles so their intended rules don’t apply to the elements they want. DO NOT use ids for styling. ids are specific to the contents of the dom and less its layout. Styles should be general and reusable and not tied to ids.

Use classes sparsely

DO use existing dom elements for styling instead of adding a class to every dom element that needs styles

1
2
3
.restaurant > div:first-child {
  font-weight: bold;
}

instead of

1
2
3
.restaurant .name {
  font-weight: bold;
}

DO NOT use class for every element you want to style. There is probably a natural order of elements to reference it DO NOT use very long class names. class names add up and significantly increase the asset size that needs to be loaded by a page, degredating performance

Modularity

DO use mixins to build modular visual components that are reusable

1
2
3
4
5
6
@mixin circle($width) {
  width: $width;
  height: auto;
  @include border-radius($width);
  border: 0.2em white solid;
}

and you can reference them in other mixins as well

1
2
3
4
5
6
@mixin avatar($width) {
  @include circle($width);
  height: $width;
  border: none;
  background-position: center top;
}

DO NOT copy and paste styles. Whenever you see yourself do that, create a more general class or a mixin.

Even precision

DO use precisions that are even splits of 100. This will help with responsiveness and an even layout

1
2
margin: 1.25em;
padding: 1.125em;

DO NOT use random precisions and attempt to match a mock in pixel perfect form, this will throw your entire layout in disarray

Shameless footer: We would love smart engineers to join us to change the world of healthcare. inquire here

Docker Storage Driver Performance Issues

Here at Grand Rounds, we’re pragmatists when it comes to choosing the technologies we work with on a daily basis. Often, that means using what’s tried-and-true, such as Ruby on Rails. But not always. Sometimes the state of the art can deliver win after win, even amongst the inevitable trials of using unproven software. Plus, the cutting edge is exciting and…actually, screw that. Stable systems are vastly superior to any of that nonsense.

“Honesty”, our custom CI system, was originally built on CoreOS. A year and many trials later, we’ve moved completely off the CoreOS ecosystem. We’re still using Docker, but now we’re on the very familiar Ubuntu, and using Docker Swarm for clustering. To document all of the weird problems we worked around or never quite got a handle on, while keeping the team’s builds moving along smoothly towards our Series C round and beyond, would be a sort of fishing tale: of interest mainly to those who were there, but pretty boring otherwise.

One struggle that popped up recently may be helpful to document though. After flipping the switch on our new Ubuntu cluster, we started seeing significantly slower build times, on the order of an additional five minutes. Digging in, we found that the slowdown was coming from our parallelized RSpec processes starting up. Eventually, we figured out that the additional time was due to the default Docker --storage-driver being different between Ubuntu and CoreOS. Ubuntu was using aufs, and CoreOS overlay. With aufs, it appeared that the parallel RSpec processes were loading their respective files in serial, each taking about 15 seconds. At 24 processes, the 6 minute slowdown is accounted for. With overlay, all 24 processes loaded in the expected total time of ~15 seconds. After re-provisioning our Ubuntu instances to use --storage-driver=overlay, we started to see expected build times again. Huge relief.

In The Flesh

We’ll probably want a multi-core system to reproduce this behavior, so I spun up an m4.2xlarge EC2 instance with the official Ubuntu 14.04 AMI, with a 50GB EBS volume attached (called /dev/sdb). Let’s get on the machine:

1
me@localhost$ ssh ubuntu@<instance_ip>

To run the overlay storage driver, we’ll need at least the 3.18 Linux Kernel. Install:

1
ubuntu@ec2-instance$ sudo apt-get upgrade -y linux-generic-lts-vivid && sudo shutdown -r now

Get back on the train, and install Docker:

1
2
3
4
5
6
7
8
9
ubuntu@ec2-instance$ sudo -i

# paste all these lines at once:
apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
echo 'deb https://apt.dockerproject.org/repo ubuntu-trusty main' > /etc/apt/sources.list.d/docker.list
apt-get update -y
apt-get purge -y lxc-docker*
apt-get install -y docker-engine
usermod -a -G docker ubuntu

Log out and back in again to pick up the group change, and do some additional setup:

1
2
3
4
5
6
ubuntu@ec2-instance$ sudo -i

# paste all these lines at once:
service docker stop  # we'll be running our down docker daemon
mkfs.ext4 /dev/sdb
mount /dev/sdb /var/lib/docker

Run the docker daemon:

1
2
3
4
5
ubuntu@ec2-instance$ sudo -i

# paste all these lines at once:
rm -rf /var/lib/docker
docker daemon -D --storage-driver=aufs

Instead of the Grand Rounds secret sauce, we’ll be using discourse as an example project. The load time for its Ruby runtime is large enough to represent our problem well.

I’ve stuck all the discourse setup into a docker image. Run the datastores required for our test along with the discourse container:

1
2
3
4
ubuntu@ec2-instance$ docker run -d --name=redis --net=host redis:3.0
ubuntu@ec2-instance$ docker run -d --name=pg --net=host postgres:9.4
ubuntu@ec2-instance$ docker run -it --rm --name=storage-driver-test --net=host -w /root/discourse grnds/docker-storage-driver-test bash
root@ec2-instance:~/discourse# . ../setup.sh  # now we're in the container

We’re ready to test our parallel problems, but let’s get a baseline figure first:

1
2
3
4
5
# Ctrl-C the rspec command once it starts printing dots, because we only care about the load time
root@ec2-instance:~/discourse# rspec
...^C
Finished in 6.68 seconds (files took 5.3 seconds to load)
...

The “files took 5.3 seconds to load” is our baseline. Let’s run the aufs test:

1
2
3
4
5
6
7
8
9
10
# this will print out lots of stuff, and we don't care about most of it, but
# we can't kill the process early or the information we want won't get printed.
root@ec2-instance:~/discourse# parallel_rspec ./spec
8 processes for 330 specs, ~ 41 specs per process
...
Finished in 3 minutes 16.2 seconds (files took 25.16 seconds to load)
560 examples, 13 failures

Failed examples:
...

Eight rspec processes have run, and the “files took 25.16 seconds to load”. It’s clearly not (baseline x 8), but it’s much higher than expected. And compared with overlay?

We’ll have to re-do our setup, but dockerization ends up saving us quite a bit of headache:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# kill the aufs docker daemon from before.
# this is exactly the same as above, other than the --storage-driver.
ubuntu@ec2-instance$ sudo -i
root@ec2-instance# rm -rf /var/lib/docker
root@ec2-instance# docker daemon -D --storage-driver=overlay

ubuntu@ec2-instance$ docker run -d --name=redis --net=host redis:3.0
ubuntu@ec2-instance$ docker run -d --name=pg --net=host postgres:9.4
ubuntu@ec2-instance$ docker run -it --rm --name=storage-driver-test --net=host -w /root/discourse grnds/docker-storage-driver-test bash
root@ec2-instance:~/discourse# . ../setup.sh  # now we're in the container

root@ec2-instance:~/discourse# parallel_rspec ./spec
8 processes for 330 specs, ~ 41 specs per process
...
Finished in 1 minute 6.27 seconds (files took 6.04 seconds to load)
592 examples, 21 failures

Failed examples:
...

Here, the eight rspec processes took 6.04 seconds to load their files. It’s very close to the baseline.

Conclusion

This analysis certainly has confounders, but it captures the pattern of poor performance we were seeing in production on nearly the same stack and in an easily reproducable way.

I don’t think we necessarily learned any profound lessons from this. It mainly served to reinforce the fact that you can’t plan for everything when you make big changes, and that an in-depth knowledge of your system and good troubleshooting skills are the only things that can save you from yourself.