Engineering

Burying Your Dead Code

As our code evolves and changes, little bits of leftovers from refactors or changed paths can build up as dead code. It’s like the plaque that builds up on your teeth over time: not really noticeable at first, but after a trip to the dentist, you feel a whole lot better.

Removing dead code from your codebase is a simple process at its core: find the dead parts, prove that they’re dead, and delete them. The tricky part is in that second step, but after a little trial and error, we discovered a couple tricks to speed up the process.

Start small

Before digging into a gigantic repo to see what you can rip out of there, consider the impact of what you’re about to do and start with just a small piece of it. Because I was investigating one of our biggest repos, I decided to tackle a single Model first rather than, say, the entire lib directory.

Finding the dead parts

I’m sure there are other gems out there that perform similar assessment, but our favorite gem for this is debride by Seattle Ruby Brigade. It’s not 100% perfect, but it will give you a good place to start digging around. The whitelisting option is particularly helpful after your initial investigation. Another option is the brute-force method, where you comb through a file and investigate each piece one by one. If your debride output is suspiciously large, this might be worth a try.

Prove that they’re dead

This is the fun part.

The specific service I was trying to clean up is one of our oldest repositories, and some of the functions that were showing up as dead ends had been sitting in there for months or even years. What’s a girl to do, dig through hundreds and thousands of pull requests and commits until coming up with the right one? Oh no, we have a tool for that. It’s called git-bisect, and it’s one of the coolest things I’ve learned so far about git.

git bisect is generally used for tracking down when a bug was introduced by using a binary search through as many commits as you’d like; you can start at the very beginning of your git history or somewhere in the middle. It checks out the repo at each commit, where you can test your issue and mark it as “good” or “bad.” Rather than testing for a bug, though, I used it to grep for the method names that were appearing as dead from my previous step one at a time. If grep only gave one result (barring tests that might still exist), then nothing else was calling the method anymore and I’d mark it “bad.” If it showed up with more results, I’d mark it “good.” The point when searching for a bug is to find the last “good” commit before a bug was introduced; the point when using it to look for dead code is to track down when the consumer was removed.

Here’s an example to illustrate the steps you’ll take to do the same thing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
~/my_repo [master] $ git bisect start
~/my_repo [master] $ git bisect bad
~/my_repo [master] $ git bisect good ecef47e2fefc4c8ac6f3a358a4961332d24a46e3
Bisecting: 4825 revisions left to test after this (roughly 12 steps)
[0a9fc468c6efb6465c9fa96232b9f61dee01a12b] Merge pull request #5902 from my_repo/branch_5902
~/my_repo [:0a9fc46|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:0a9fc46|…15375] $ git bisect bad
Bisecting: 2412 revisions left to test after this (roughly 11 steps)
[d30229d184d5e93d39683b1f129745a211368890] Merge pull request #5346 from my_repo/branch_5346
~/my_repo [:d30229d|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:d30229d|…15375] $ git bisect bad
Bisecting: 1205 revisions left to test after this (roughly 10 steps)
[d116a5bf4d26ad1e33a099b5027883d16bbe68f2] changed a thing
~/my_repo [:d116a5b|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:d116a5b|…15375] $ git bisect bad
Bisecting: 602 revisions left to test after this (roughly 9 steps)
[ff2693d90e72798abbb8adbf57f331e661b6445b] fixed a bug
~/my_repo [:ff2693d|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:ff2693d|…15375] $ git bisect good
Bisecting: 296 revisions left to test after this (roughly 8 steps)
[c6e79d2aa3185774be46473f51027d65aca6216e] Merge pull request #5040 from my_repo/branch_5040
~/my_repo [:c6e79d2|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:c6e79d2|…15375] $ git bisect bad
Bisecting: 152 revisions left to test after this (roughly 7 steps)
[74a8add547a62de49d535df07587703186057f24] Merge pull request #5017 from my_repo/branch_5017
~/my_repo [:74a8add|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:74a8add|…15375] $ git bisect good
Bisecting: 68 revisions left to test after this (roughly 6 steps)
[c0d8f4c89964e95b5e5a823d0ecf36533a8c67b0] Merge pull request #5008 from my_repo/branch_5008
~/my_repo [:c0d8f4c|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:c0d8f4c|…15375] $ git bisect bad
Bisecting: 42 revisions left to test after this (roughly 5 steps)
[2a26af52b5c6234709f9816425c2cb3be7c1d3c3] changed error handling
~/my_repo [:2a26af5|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:2a26af5|…15375] $ git bisect good
Bisecting: 21 revisions left to test after this (roughly 5 steps)
[37f592be9e75a68a497055109047d2e8d478cc64] Merge pull request #5038 from my_repo/branch_5038
~/my_repo [:37f592b|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:37f592b|…15375] $ git bisect good
Bisecting: 10 revisions left to test after this (roughly 4 steps)
[6af2e9adc4556d153436880fb4f25e6cfa33dda0] move this thing
~/my_repo [:6af2e9a|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:6af2e9a|…15375] $ git bisect good
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[9ad36a544f3995e85556e9f58561d648c503ce47] added a thing
~/my_repo [:9ad36a5|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:9ad36a5|…15375] $ git bisect good
Bisecting: 2 revisions left to test after this (roughly 2 steps)
[1ede562f1ab630df8b6a14dd3177413373738dca] fixed a bug
~/my_repo [:1ede562|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:1ede562|…15375] $ git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[8d15ccfd9d7fb0b0a6609ad87af4f5cc5566ee21] Merge pull request #5039 from my_repo/branch_5039
~/my_repo [:8d15ccf|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:8d15ccf|…15375] $ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[c62e10dbe9436b9a7ad5afd37e495414591f3160] fixed another bug
~/my_repo [:c62e10d|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:c62e10d|…15375] $ git bisect bad
c62e10dbe9436b9a7ad5afd37e495414591f3160 is the first bad commit
commit c62e10dbe9436b9a7ad5afd37e495414591f3160
Author: Jane Doe <jane@Doe.com>
Date:   Thu Mar 3 12:34:56 2016 -0800

    Refactored a thing

:040000 040000 8a3d0400211d91920ea2caba83cf73f80c858b3f b39651a7ef882af4e5be2fe6f2a06a5baa4cccbe M    app
~/my_repo [:c62e10d|…15375] $
Delete them

As you can see, I finally found the commit where the diff showed the changed or removed method call; thus, I could remove the call in my current branch. After deleting all the dead code, I made sure to comment on each removed function in my pull request with the commit number where the consumer was removed (because documentation is rad). Many code reviews later, our Model was done with her teeth cleaning and felt just a little bit better.

Learning to Sight Along the Space Capsule

Testing does not have to be complicated, but sometimes it can turn into spaghetti just like any other code. This talk by Sandi Metz at Railsconf 2013 inspired me to revisit some of the unit tests for one of our services that relies heavily on FTP and SFTP connections.

Why are my tests failing?

The previous test suite for this service used a gem that was originally created as a mocked framework for testing FTP connection behaviors. However, a few key problems existed with this solution:

  • The gem hasn’t been maintained in a couple years, which started to cause intermittent spec failures.
  • The gem doesn’t support SFTP connections, which our service relies much more heavily on than FTP.
  • The gem doesn’t actually mock anything. It starts up a local FTP server, runs the tests, and then shuts the server down. Each test that’s run against it is generating a real response, albeit locally. This not only took up a lot of time (starting and stopping the local server before and after each spec), but was also another source of intermittent failures.

So, after creating a plan of attack, I rewrote or deleted every single test that used this gem. Each test was also re-evaluated for “sighting along the space capsule,” as Sandi Metz calls it. Any test that was written for a private method, a message sent to the object’s self, or an implementation detail was scrutinized (and most of them deleted), which exposed the true underlying issue: almost every single test we’d written that used the gem was an implementation test, not an interface test.

External responses are supposed to be completely mocked out in unit testing, not make actual calls to an API or server. Tests exist to make sure our code works as expected, not to verify that no errors are raised when making external requests. A unit test that requires a response from an outside source is already a broken test; save those for integration tests instead.

Reframing our point of view

Here’s an example to illustrate my point. Let’s say we have an object whose job is to get a list of files from a specific folder on an FTP server. For brevity’s sake, let’s define it like this, with the nitty gritty details fleshed out elsewhere:

class FtpThing
  def get_list_of_files(username, password, folder_name)
    @some_connection.open(username, password)
    list_of_files = @some_connection.list(folder_name)
    @some_connection.close
    list_of_files
  end
end

Now, one option for testing this is the way we did it before, something like this:

let(:ftp_thing) { FtpThing.new }
let(:list) { [‘file1.txt’, ‘file2.txt'] }

before do
  # spin up the “mock" (real) server
  # put a (real) file or two in a folder on the server
end

it ‘gives me the right list of files’ do
  expect(ftp_thing.get_list_of_files(‘username’, ‘password’, ‘folder_name’)).to eq(list)
  ftp_thing.get_list_of_files(‘username’, ‘password’, ‘folder_name’)
end

after do
  # shut down the server
end

On the surface, this doesn’t look terrible. It’s testing the value returned from getListOfFiles, right? Yay, that’s what we want! But… not quite. We’ve started a real server, and we hope that a) it started in time for our tests to run, and b) it doesn’t fail or raise any errors, which is how our tests will pass. Instead, we need to allow a mocked response and set up our expectation this way:

let(:mock_connection) { double(‘ftp’) }
let(:ftp_thing) { FtpThing.new }
let(:list) { [‘file1.txt’, ‘file2.txt'] }

before do
  ftp_thing.set_instance_variable(:@some_connection, mock_connection)
  allow(mock_connection).to receive(:open)
  allow(mock_connection).to receive(:list).and_return(list)
  allow(mock_connection).to receive(:close)
end

it ‘gives me the right list of files’ do
  expect(ftp_thing.get_list_of_files(‘username’, ‘password’, ‘folder_name’)).to eq(list)
  ftp_thing.get_list_of_files(‘username’, ‘password’, ‘folder_name’)
end

See the difference? We aren’t starting a server or making a real external call; we’re just pretending that it happened and gave us the right response. The true test is whether our code is giving us the right value back after it makes an external call. This is also an easy way to test error responses, too! I can allow my connection to raise an error, and then test my error handling from there, rather than trying to fake out a whole server thing.

Bonus points: If I want to change the implementation of my server call—maybe I want to change how it connects or opens the connection, or something similar—then my tests are not broken and I can continue on my merry way.

The moral of the story

No project is complete without some beautiful numerical data to back it up, so here’s some numbers to show off just how much time this saves us. Running the entire test suite on the application before:

Finished in 2 minutes 56.1 seconds (files took 2.72 seconds to load)
172 examples, 0 failures

And after:

Finished in 5.67 seconds (files took 2.77 seconds to load)
155 examples, 0 failures

That’s a 97% time improvement. Precious minutes of time regained, and no more intermittent failures as well.

The moral of the story? Don’t make real calls to real servers in your tests, even if they’re local to your machine or CI container. Set up mocked responses when needed, and test that your code is actually working, not theirs. (And seriously: watch that talk by Sandi Metz at least three more times until it really sticks. Your future self will thank you.)

Jq - the Cool Tool for School

There’s something incredibly satisfying about finding the perfect tool for a job. If you deal with JSON and you haven’t played with jq, you’re in for a real treat.

The description from their site states, 

“jq is like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text… …jq can mangle the data format that you have into the one that you want with very little effort, and the program to do so is often shorter and simpler than you’d expect.”

I’ll go over a few use cases I’ve had for it so far.

Simple Case

The most basic and neat thing you can do with jq is to pipe JSON into jq to pretty-print it using '.'  Whether the JSON is from a curled response:

curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | jq '.'

or just some JSON you need to use:

cat template.json | jq ‘.’

It’s pretty and readable! This becomes more useful when you’re dealing with messier JSON.

Validation

I’ve also used it as a way to find errors in what I assumed was valid JSON format. My use case has been for when DynamoDB expects imports in a special JSON format. I found out that the import failed, but not much more than that. Knowing that the most likely culprit is probably illegitimate JSON somewhere in the mountains of data, we piped it all through jq to find an exact line of where an extra quotation had creeped into one file. Talk about a needle in the haystack.

This is also especially useful when you want to validate JSON and the data could be sensitive information.

Sorting Keys

Have you ever had to diff a large piece of JSON with another JSON only to find out that the keys are in a different order? Your tooling tells you that all of the things are not like the others which ends up not being useful at all. 

My generated CloudFormation template keys were in a completely different order than GetTemplate from the current stack, and I needed a way tell what the delta between the two was. The -S or --sort-keys option will sort the keys of your hash all the way down so that two sorted equivalent hashes will be identical. Knowing that, I was able to create two sorted files and diff them from there.

cat actual.json | jq -S . > asorted.json

cat proposed.json | jq -S. > bsorted.json

Use your favorite diffing tool and the problem resolved in three lines! meld asorted.json bsorted.json

More Info

There are other neat features to jq, such as very powerful filtering. You can find out more in their manual here: (https://stedolan.github.io/jq/manual/)

Rails Active Records: Assigning a Non-boolean to a Boolean

We’ll discuss how Active Records handle data assignment to a numeric attribute. I observed this ‘odd’ behavior while working in my last epic.

We’ll work with boolean attribute here because Mysql treats boolean data as numeric. Mysql doesn’t have inherent data type called ‘boolean’. When create a column of type boolean internally stores the binary state in a ‘tinyint’ (1 byte datatype which holds integer values in the range -128 to 127). TRUE , FALSE are simple constants which evaluate to 1 & 0.

Now let’s imagine Active Record trying to work with a boolean column in mysql. What happens when we assign string data to a boolean attribute in AR. Active Record will try and coerce the data set in the attribute to a number (because boolean is numeric in mysql). Great!!! How does it convert string to int ?

I know 2 different ways this can be achieved in ruby.

1
2
3
4
pry(main)> Integer('23')
=> 23
[4] pry(main)> '23'.to_i
=> 23

Interesting to see the behavior of the above methods when try to cast a non integer to integer.

1
2
3
4
5
pry(main)> 'asdf'.to_i
=> 0
[2] pry(main)> Integer('asdf')
ArgumentError: invalid value for Integer(): "asdf"
from (pry):2:in `Integer'

The #Integer method complains whereas the use of #to_i results in 0. Unfortunately Active Record uses #to_i to set a boolean attribute and results in FALSE for any non boolean assignment. :–(

Here’s what happened :–

1
2
3
4
5
6
7
8
9
10
11
pry(main)> ds = DataSource.find(5)
  DataSource Load (0.3ms)  SELECT `data_sources`.* FROM `data_sources` WHERE `data_sources`.`id` = 5 LIMIT 1
=> #<DataSource id: 5, auto_approval_enabled: true>
[2] pry(main)> ds.auto_approval_enabled
=> true
[3] pry(main)> ds.auto_approval_enabled = 'asdf'
=> "asdf"
[4] pry(main)> ds.save!
=> true
[5] pry(main)> ds.reload.auto_approval_enabled
=> false

The world is bad. Not really ..

We only observe this behavior in Active Record 3. With Active Record 4 it throws the much needed warning for non-boolean to boolean assignment.

1
2
3
4
DEPRECATION WARNING: You attempted to assign a value which is not explicitly `true` or `false` 
("asdf") to a boolean column. Currently this value casts to `false`. This will change to match Ruby's 
semantics, and will cast to `true` in Rails 5. If you would like to maintain the current behavior, you 
should explicitly handle the values you would like cast to `false`.

Much better, right ??

Starting to Work With Databases on Linux

Starting to Work with Database on Linux

It irks and surprises me when I see engineers invoke mysql or postgres on their development box (i.e., localhost) in some funky way. These include: mysql -uroot and psql -Uroot. You should be able to work locally without pretending to be someone else. It’s just weird.

So, do yourself a favor and start right with the correct set of permissions.

MySQL

After you sudo apt-get install -y mysql-server-5.6, and install without a password issue a: mysql -uroot -e"grant ALL on *.* to $USER" mysql

PostgreSQL

After you sudo apt-get install -y postgresql, issue a: sudo su -c "psql -c\"create user $USER with SUPERUSER\"" postgres