Engineering

Guarding an Open Door: Authorizing Your GraphQL Fields

Can you imagine what it must be like to work security at the U.S. Mint?

On one hand, you’re tasked with safeguarding one of the most juicy targets in the country. On the other, you have a third-grade class rubbing their hands all over the bullion chained up in the lobby.

Everybody walks in through the front door – thieves and students alike.

I thought about that problem recently in the context of our GraphQL schema. Here we have another open door, this time the controller endpoint that receives and processes the query string. The authentication happens early; the user is admitted without yet knowing what they’re after.

Let’s say our user makes a query that looks like this:

1
2
3
4
5
6
7
query {
  delayedJobs {
    createdAt
    failedAt
    handler
  }
}

When our server processes the query (using the graphql gem), it executes three steps in the following order:

1. Parse

The server constructs an abstract syntax tree (AST) representing the query and validates the query syntax.

2. Validate

The server validates the query path, ensuring that the hierarchy of requested fields is correct and that any required arguments are provided, among other checks. A query analyzer visits each field without executing the field’s resolver and collects metadata before returning a final result.

3. Execute

The server resolves each field, beginning with the query root and working its way down the AST until it hits a leaf or an ExecutionError.

In the example above, assuming our query syntax is properly formed, the server should quickly iterate through all three steps and return an array of DelayedJob objects, each with the createdAt, failedAt, and handler properties.

Of course, we’ve got a problem. It’s perfectly reasonable for an administrator to request our delayedJobs – however, we don’t want to share that information with just anybody. In fact, we’d like to prevent the majority of our users from even being able to access this field.

Ahead-of-Time Authorization

Incidentally, access-level authorization lines up nicely with the second step in the query processing flow. We only need to know enough about the field to make a quick decision as to whether or not a user should be able to request it – we don’t actually care about what the field returns (in fact, we don’t even want the resolver to execute).

Let’s see if we can use a query analyzer to prevent the field from surfacing:

1
2
3
TestSchema = GraphQL::Schema.define do
  query_analyzer AuthorizationAnalyzer.new
end

We’ll define our new AuthorizationAnalyzer using the template provided in the graphql gem documentation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class AuthorizationAnalyzer
  # Called before the visit.
  # Returns the initial value for `memo`
  def initial_value(query)
  end

  # This is like the `reduce` callback.
  # The return value is passed to the next call as `memo`
  def call(memo, visit_type, irep_node)
  end

  # Called when we're done with the whole visit.
  # The return value may be a GraphQL::AnalysisError (or an array of them).
  # Or, you can use this hook to write to a log, etc
  def final_value(memo)
  end
end

We’re interested in two methods: call, which iterates over each node of the AST, and final_value, which we’ll use to return an AnalysisError if the user requests an unauthorized field.

In our call method, we’ll want to take a good look at the irep_node given as the third argument:

1
2
3
def call(memo, visit_type, irep_node)
  irep_node.definition # nil
end

The first time we hit this method, our irep_node will be the query node (one of three root types in our schema: query, mutation, and subscription). The second time, we’ll get the delayedJobs node:

1
2
3
def call(memo, visit_type, irep_node)
  irep_node.definition.name # "delayedJobs"
end

Now that we know which field the user is trying to get access to, we just need to surface the current_user object so we can check for the relevant permissions. That’s as easy as attaching the current_user to the query context:

1
2
3
4
5
6
7
# Called from the controller endpoint handling GraphQL requests
Graph::Schema.execute(
  query: "your_query_string_here",
  context: { current_user: current_user },
  operation_name: "your_operation_name_here",
  variables: { ... }
)

Then, we can call the current_user in our query analyzer:

1
2
3
def call(memo, visit_type, irep_node)
  irep_node.query.context[:current_user]
end

Checking for the right permissions goes something like this:

1
2
3
4
5
6
7
8
def call(memo, visit_type, irep_node)
  current_user = irep_node.query.context[:current_user]
  requested_node_name = irep_node.definition.name.to_sym
  ability = Ability.new(current_user) # Using CanCanCan
  memo[:unauthorized_nodes] ||= []
  memo[:unauthorized_nodes] << irep_node if ability.cannot?(:access, requested_node_name)
  memo
end

Finally, we can return an AnalysisError once query analysis has completed:

1
2
3
4
def final_value(memo)
  unauthorized_node_names = memo[:unauthorized_nodes].map { |node| node.definition.name }
  GraphQL::AnalysisError.new("You do not have permission to access #{unauthorized_node_names.join(', ')}")
end

If we want to get fancy, we can build some instrumentation to attach metadata to each field indicating which action and subject we’d like to authorize against.

Key Takeaways:

  • Ahead-of-time authorization is used when the conditions for access do not depend on the values returned by each field

  • Authorization happens after the query has been parsed, but before the query is executed. A query analyzer makes a good hook for authorization checks

  • Ahead-of-time authorization usually takes two parameters: The query context and the metadata for each field

Runtime Authorization

Let’s go back to the third-grade class in the lobby. We’re not worried if they’re playing with the gold bar on display; in fact, it’s encouraged (what else is there to do at the Mint?). We’d be a little more concerned if we found a third-grader in the vaults, though.

In that case, we can’t forbid everyone from handling our gold. But we can be very clear about which ingots can be touched.

Let’s take a look at a different query:

1
2
3
4
5
6
query {
  user(id: 123) {
    email
    name
  }
}

We can’t apply our binary field-access authorization policy here; at least, not if we want to support certain features (like allowing a member to see their user information in a profile page, for example). We need a more fine-grained authorization strategy that takes into account the user object being requested.

It’s clear that we won’t be able to perform this authorization in the first two steps of the query process. We’ll have to wait until the third step – execution – when we have access to the return values of the field resolvers.

The obvious solution, then, is to perform authorization in the resolver itself. Let’s pretend the resolver in our user field looks like this:

1
2
3
resolve ->(obj, args, ctx) {
  User.find(args[:id])
}

We can take the User instance and authorize against it before returning anything; if authorization fails, we can simply return nil instead.

1
2
3
4
5
6
resolve ->(obj, args, ctx) {
  ability = Ability.new(ctx[:current_user]) # Using CanCanCan
  user = User.find(args[:id])
  return user if ability.can?(:read, user)
  nil
}

By definition, queries are free of side effects, so waiting until the end of the resolver to perform authorization should be safe (that is, no unintended changes are made to our server’s state).

If we’re authorizing a mutation, however, we need a different approach. Let’s take a look at this mutation:

1
2
3
4
5
mutation {
  updateUser(id: 123, input: { name: "<script>alert('You got hacked!')</script>" }) {
    id
  }
}

If our resolver looks like this:

1
2
3
4
5
6
7
resolve ->(obj, args, ctx) {
  ability = Ability.new(ctx[:current_user]) # Using CanCanCan
  user = User.find(args[:id])
  user.update_attributes(args[:input]) # "You got hacked!"
  return user if ability.can?(:update, user) # false
  nil
}

… someone is going to be very annoyed. Clearly, we need to authorize before we perform the update:

1
2
3
4
5
6
7
8
9
10
resolve ->(obj, args, ctx) {
  ability = Ability.new(ctx[:current_user]) # Using CanCanCan
  user = User.find(args[:id])
  if ability.can?(:update, user) # false
    user.update_attributes(args[:input])
    user
  else
    nil
  end
}

With just a bit of work, we can parameterize this solution for reuse:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Performs runtime authorization when resolving GraphQL mutation fields.
class MutationAuthorization
  # @return [NilClass] if unauthorized
  # @return [Result] if authorized
  def call(obj, args, ctx)
    ability = ctx[:current_ability]
    subject = @subject.call(obj, args, ctx)
    return nil if ability.cannot?(@action, subject) # Early return if authorization fails
    @resolver.call(obj, args, ctx, subject)
  end

  # @param action [Symbol] the action to authorize on
  # @param subject [Proc, Symbol] the subject to call cannot? on
  # @param resolver [Proc] the resolver to be called
  def initialize(action, subject, resolver)
    @action = action
    @subject = subject
    @resolver = resolver
  end
end

Using our new class, our mutation resolver looks like this:

1
2
3
4
5
6
7
resolve MutationAuthorization.new(
  :update,
  ->(obj, args, ctx) { User.find(args[:id]) },
  ->(obj, args, ctx, subject) {
    subject.update_attributes(args[:input]) if subject.present?
  }
)

Now we’re covered for resource-level mutations. We may not need this authorization strategy for all mutations, however. For example, consider the following mutation:

1
2
3
4
5
mutation {
  createUser(input: { role: "admin" }) {
    id
  }
}

In this case, we might simply perform an ahead-of-time authorization check to see if the user has permission to create admin-level users.

Key Takeaways:

  • Runtime authorization is used when the resolvers in each field must be executed in order to find the subjects to be authorized on

  • Runtime authorization is performed anywhere in the resolver body

  • Authorization must be performed before stateful changes are made

Persisted Queries

Why do we even need to watch those third-graders so carefully? Couldn’t we just tell them exactly where to go, and trust the teacher to enforce that?

There is a similar concept in GraphQL: persisted queries. Persisted queries are immutable, and their fields cannot be modified; clients reference these queries by ID.

Instead of sending this:

1
2
3
post({
  query: "query { user(id: 123) { name } }",
});

… clients referencing persisted queries would send this:

1
2
3
post({
  operationId: "yourQueryIdHere",
});

At runtime, the server uses the operationId to lookup the query string in the relevant table. Once the query is found, it is given to the query processor and the result is returned to the client.

There are some key benefits to this strategy:

  • It is easier to authorize the entire query than it is to authorize individual fields. For example, consider the following query:
1
2
3
4
5
6
query {
  user(id: 123) {
    hashedId
    name
  }
}

We may have different, less restrictive permissions for the hashedId field. name is confidential information, but hashedId doesn’t reveal anything personal about the user (at least on its own).

What if we had multiple fields that we wanted to restrict more thoroughly? What if we decided to reveal them only to admin-level users? We could store all personal fields in a persisted query:

1
2
3
4
5
6
7
8
9
# operationId "1234abcdef"
query {
  user(id: $id) {
    address
    email
    name
    ...otherPersonalFields
  }
}

We can refuse to perform the query (in a query analyzer, say) if we find that the user does not have an admin-level role. If we allowed arbitrary queries, we would have to authorize each field individually, greatly increasing the potential area for error (making multiple authorization checks versus a single check).

  • Persisted queries help prevent complicated queries from exhausting your resources. An attacker may be able to bring your server to a crawl with a deeply-nested query

  • Referencing a query by ID saves a bit of bandwith versus sending the entire query string in a request

That’s it! Persisted queries can be an invaluable tool for creating a tightly-controlled API. Facebook, for example, references all of its queries by ID, guaranteeing a high degree of security and supervision for its various APIs.

Conclusion

Authorization always requires a good deal of forethought and careful planning. This is especially true when creating a GraphQL API – because the request vector is harder to predict, each field’s permissions must be considered across multiple contexts. Good understanding of the various authorization strategies is critical.

If nothing else, please remember the analogy of the third-grade class at the Mint; if you always expect honesty from a third-grader, you may be encouraging a budding kleptomaniac.

Burying Your Dead Code

As our code evolves and changes, little bits of leftovers from refactors or changed paths can build up as dead code. It’s like the plaque that builds up on your teeth over time: not really noticeable at first, but after a trip to the dentist, you feel a whole lot better.

Removing dead code from your codebase is a simple process at its core: find the dead parts, prove that they’re dead, and delete them. The tricky part is in that second step, but after a little trial and error, we discovered a couple tricks to speed up the process.

Start small

Before digging into a gigantic repo to see what you can rip out of there, consider the impact of what you’re about to do and start with just a small piece of it. Because I was investigating one of our biggest repos, I decided to tackle a single Model first rather than, say, the entire lib directory.

Finding the dead parts

I’m sure there are other gems out there that perform similar assessment, but our favorite gem for this is debride by Seattle Ruby Brigade. It’s not 100% perfect, but it will give you a good place to start digging around. The whitelisting option is particularly helpful after your initial investigation. Another option is the brute-force method, where you comb through a file and investigate each piece one by one. If your debride output is suspiciously large, this might be worth a try.

Prove that they’re dead

This is the fun part.

The specific service I was trying to clean up is one of our oldest repositories, and some of the functions that were showing up as dead ends had been sitting in there for months or even years. What’s a girl to do, dig through hundreds and thousands of pull requests and commits until coming up with the right one? Oh no, we have a tool for that. It’s called git-bisect, and it’s one of the coolest things I’ve learned so far about git.

git bisect is generally used for tracking down when a bug was introduced by using a binary search through as many commits as you’d like; you can start at the very beginning of your git history or somewhere in the middle. It checks out the repo at each commit, where you can test your issue and mark it as “good” or “bad.” Rather than testing for a bug, though, I used it to grep for the method names that were appearing as dead from my previous step one at a time. If grep only gave one result (barring tests that might still exist), then nothing else was calling the method anymore and I’d mark it “bad.” If it showed up with more results, I’d mark it “good.” The point when searching for a bug is to find the last “good” commit before a bug was introduced; the point when using it to look for dead code is to track down when the consumer was removed.

Here’s an example to illustrate the steps you’ll take to do the same thing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
~/my_repo [master] $ git bisect start
~/my_repo [master] $ git bisect bad
~/my_repo [master] $ git bisect good ecef47e2fefc4c8ac6f3a358a4961332d24a46e3
Bisecting: 4825 revisions left to test after this (roughly 12 steps)
[0a9fc468c6efb6465c9fa96232b9f61dee01a12b] Merge pull request #5902 from my_repo/branch_5902
~/my_repo [:0a9fc46|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:0a9fc46|…15375] $ git bisect bad
Bisecting: 2412 revisions left to test after this (roughly 11 steps)
[d30229d184d5e93d39683b1f129745a211368890] Merge pull request #5346 from my_repo/branch_5346
~/my_repo [:d30229d|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:d30229d|…15375] $ git bisect bad
Bisecting: 1205 revisions left to test after this (roughly 10 steps)
[d116a5bf4d26ad1e33a099b5027883d16bbe68f2] changed a thing
~/my_repo [:d116a5b|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:d116a5b|…15375] $ git bisect bad
Bisecting: 602 revisions left to test after this (roughly 9 steps)
[ff2693d90e72798abbb8adbf57f331e661b6445b] fixed a bug
~/my_repo [:ff2693d|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:ff2693d|…15375] $ git bisect good
Bisecting: 296 revisions left to test after this (roughly 8 steps)
[c6e79d2aa3185774be46473f51027d65aca6216e] Merge pull request #5040 from my_repo/branch_5040
~/my_repo [:c6e79d2|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:c6e79d2|…15375] $ git bisect bad
Bisecting: 152 revisions left to test after this (roughly 7 steps)
[74a8add547a62de49d535df07587703186057f24] Merge pull request #5017 from my_repo/branch_5017
~/my_repo [:74a8add|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:74a8add|…15375] $ git bisect good
Bisecting: 68 revisions left to test after this (roughly 6 steps)
[c0d8f4c89964e95b5e5a823d0ecf36533a8c67b0] Merge pull request #5008 from my_repo/branch_5008
~/my_repo [:c0d8f4c|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:c0d8f4c|…15375] $ git bisect bad
Bisecting: 42 revisions left to test after this (roughly 5 steps)
[2a26af52b5c6234709f9816425c2cb3be7c1d3c3] changed error handling
~/my_repo [:2a26af5|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:2a26af5|…15375] $ git bisect good
Bisecting: 21 revisions left to test after this (roughly 5 steps)
[37f592be9e75a68a497055109047d2e8d478cc64] Merge pull request #5038 from my_repo/branch_5038
~/my_repo [:37f592b|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:37f592b|…15375] $ git bisect good
Bisecting: 10 revisions left to test after this (roughly 4 steps)
[6af2e9adc4556d153436880fb4f25e6cfa33dda0] move this thing
~/my_repo [:6af2e9a|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:6af2e9a|…15375] $ git bisect good
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[9ad36a544f3995e85556e9f58561d648c503ce47] added a thing
~/my_repo [:9ad36a5|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:9ad36a5|…15375] $ git bisect good
Bisecting: 2 revisions left to test after this (roughly 2 steps)
[1ede562f1ab630df8b6a14dd3177413373738dca] fixed a bug
~/my_repo [:1ede562|…15375] $ git grep old_method_name
app/helpers/my_helper.rb:    old_method_consumer = (model.try(:old_method_name) || model.try(:something_else?))
app/models/my_model.rb:  def old_method_name
~/my_repo [:1ede562|…15375] $ git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[8d15ccfd9d7fb0b0a6609ad87af4f5cc5566ee21] Merge pull request #5039 from my_repo/branch_5039
~/my_repo [:8d15ccf|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:8d15ccf|…15375] $ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[c62e10dbe9436b9a7ad5afd37e495414591f3160] fixed another bug
~/my_repo [:c62e10d|…15375] $ git grep old_method_name
app/models/my_model.rb:  def old_method_name
~/my_repo [:c62e10d|…15375] $ git bisect bad
c62e10dbe9436b9a7ad5afd37e495414591f3160 is the first bad commit
commit c62e10dbe9436b9a7ad5afd37e495414591f3160
Author: Jane Doe <jane@Doe.com>
Date:   Thu Mar 3 12:34:56 2016 -0800

    Refactored a thing

:040000 040000 8a3d0400211d91920ea2caba83cf73f80c858b3f b39651a7ef882af4e5be2fe6f2a06a5baa4cccbe M    app
~/my_repo [:c62e10d|…15375] $
Delete them

As you can see, I finally found the commit where the diff showed the changed or removed method call; thus, I could remove the call in my current branch. After deleting all the dead code, I made sure to comment on each removed function in my pull request with the commit number where the consumer was removed (because documentation is rad). Many code reviews later, our Model was done with her teeth cleaning and felt just a little bit better.

Learning to Sight Along the Space Capsule

Testing does not have to be complicated, but sometimes it can turn into spaghetti just like any other code. This talk by Sandi Metz at Railsconf 2013 inspired me to revisit some of the unit tests for one of our services that relies heavily on FTP and SFTP connections.

Why are my tests failing?

The previous test suite for this service used a gem that was originally created as a mocked framework for testing FTP connection behaviors. However, a few key problems existed with this solution:

  • The gem hasn’t been maintained in a couple years, which started to cause intermittent spec failures.
  • The gem doesn’t support SFTP connections, which our service relies much more heavily on than FTP.
  • The gem doesn’t actually mock anything. It starts up a local FTP server, runs the tests, and then shuts the server down. Each test that’s run against it is generating a real response, albeit locally. This not only took up a lot of time (starting and stopping the local server before and after each spec), but was also another source of intermittent failures.

So, after creating a plan of attack, I rewrote or deleted every single test that used this gem. Each test was also re-evaluated for “sighting along the space capsule,” as Sandi Metz calls it. Any test that was written for a private method, a message sent to the object’s self, or an implementation detail was scrutinized (and most of them deleted), which exposed the true underlying issue: almost every single test we’d written that used the gem was an implementation test, not an interface test.

External responses are supposed to be completely mocked out in unit testing, not make actual calls to an API or server. Tests exist to make sure our code works as expected, not to verify that no errors are raised when making external requests. A unit test that requires a response from an outside source is already a broken test; save those for integration tests instead.

Reframing our point of view

Here’s an example to illustrate my point. Let’s say we have an object whose job is to get a list of files from a specific folder on an FTP server. For brevity’s sake, let’s define it like this, with the nitty gritty details fleshed out elsewhere:

class FtpThing
  def get_list_of_files(username, password, folder_name)
    @some_connection.open(username, password)
    list_of_files = @some_connection.list(folder_name)
    @some_connection.close
    list_of_files
  end
end

Now, one option for testing this is the way we did it before, something like this:

let(:ftp_thing) { FtpThing.new }
let(:list) { [‘file1.txt’, ‘file2.txt'] }

before do
  # spin up the “mock" (real) server
  # put a (real) file or two in a folder on the server
end

it ‘gives me the right list of files’ do
  expect(ftp_thing.get_list_of_files(‘username’, ‘password’, ‘folder_name’)).to eq(list)
  ftp_thing.get_list_of_files(‘username’, ‘password’, ‘folder_name’)
end

after do
  # shut down the server
end

On the surface, this doesn’t look terrible. It’s testing the value returned from getListOfFiles, right? Yay, that’s what we want! But… not quite. We’ve started a real server, and we hope that a) it started in time for our tests to run, and b) it doesn’t fail or raise any errors, which is how our tests will pass. Instead, we need to allow a mocked response and set up our expectation this way:

let(:mock_connection) { double(‘ftp’) }
let(:ftp_thing) { FtpThing.new }
let(:list) { [‘file1.txt’, ‘file2.txt'] }

before do
  ftp_thing.set_instance_variable(:@some_connection, mock_connection)
  allow(mock_connection).to receive(:open)
  allow(mock_connection).to receive(:list).and_return(list)
  allow(mock_connection).to receive(:close)
end

it ‘gives me the right list of files’ do
  expect(ftp_thing.get_list_of_files(‘username’, ‘password’, ‘folder_name’)).to eq(list)
  ftp_thing.get_list_of_files(‘username’, ‘password’, ‘folder_name’)
end

See the difference? We aren’t starting a server or making a real external call; we’re just pretending that it happened and gave us the right response. The true test is whether our code is giving us the right value back after it makes an external call. This is also an easy way to test error responses, too! I can allow my connection to raise an error, and then test my error handling from there, rather than trying to fake out a whole server thing.

Bonus points: If I want to change the implementation of my server call—maybe I want to change how it connects or opens the connection, or something similar—then my tests are not broken and I can continue on my merry way.

The moral of the story

No project is complete without some beautiful numerical data to back it up, so here’s some numbers to show off just how much time this saves us. Running the entire test suite on the application before:

Finished in 2 minutes 56.1 seconds (files took 2.72 seconds to load)
172 examples, 0 failures

And after:

Finished in 5.67 seconds (files took 2.77 seconds to load)
155 examples, 0 failures

That’s a 97% time improvement. Precious minutes of time regained, and no more intermittent failures as well.

The moral of the story? Don’t make real calls to real servers in your tests, even if they’re local to your machine or CI container. Set up mocked responses when needed, and test that your code is actually working, not theirs. (And seriously: watch that talk by Sandi Metz at least three more times until it really sticks. Your future self will thank you.)

Jq - the Cool Tool for School

There’s something incredibly satisfying about finding the perfect tool for a job. If you deal with JSON and you haven’t played with jq, you’re in for a real treat.

The description from their site states, 

“jq is like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text… …jq can mangle the data format that you have into the one that you want with very little effort, and the program to do so is often shorter and simpler than you’d expect.”

I’ll go over a few use cases I’ve had for it so far.

Simple Case

The most basic and neat thing you can do with jq is to pipe JSON into jq to pretty-print it using '.'  Whether the JSON is from a curled response:

curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | jq '.'

or just some JSON you need to use:

cat template.json | jq ‘.’

It’s pretty and readable! This becomes more useful when you’re dealing with messier JSON.

Validation

I’ve also used it as a way to find errors in what I assumed was valid JSON format. My use case has been for when DynamoDB expects imports in a special JSON format. I found out that the import failed, but not much more than that. Knowing that the most likely culprit is probably illegitimate JSON somewhere in the mountains of data, we piped it all through jq to find an exact line of where an extra quotation had creeped into one file. Talk about a needle in the haystack.

This is also especially useful when you want to validate JSON and the data could be sensitive information.

Sorting Keys

Have you ever had to diff a large piece of JSON with another JSON only to find out that the keys are in a different order? Your tooling tells you that all of the things are not like the others which ends up not being useful at all. 

My generated CloudFormation template keys were in a completely different order than GetTemplate from the current stack, and I needed a way tell what the delta between the two was. The -S or --sort-keys option will sort the keys of your hash all the way down so that two sorted equivalent hashes will be identical. Knowing that, I was able to create two sorted files and diff them from there.

cat actual.json | jq -S . > asorted.json

cat proposed.json | jq -S. > bsorted.json

Use your favorite diffing tool and the problem resolved in three lines! meld asorted.json bsorted.json

More Info

There are other neat features to jq, such as very powerful filtering. You can find out more in their manual here: (https://stedolan.github.io/jq/manual/)

Rails Active Records: Assigning a Non-boolean to a Boolean

We’ll discuss how Active Records handle data assignment to a numeric attribute. I observed this ‘odd’ behavior while working in my last epic.

We’ll work with boolean attribute here because Mysql treats boolean data as numeric. Mysql doesn’t have inherent data type called ‘boolean’. When create a column of type boolean internally stores the binary state in a ‘tinyint’ (1 byte datatype which holds integer values in the range -128 to 127). TRUE , FALSE are simple constants which evaluate to 1 & 0.

Now let’s imagine Active Record trying to work with a boolean column in mysql. What happens when we assign string data to a boolean attribute in AR. Active Record will try and coerce the data set in the attribute to a number (because boolean is numeric in mysql). Great!!! How does it convert string to int ?

I know 2 different ways this can be achieved in ruby.

1
2
3
4
pry(main)> Integer('23')
=> 23
[4] pry(main)> '23'.to_i
=> 23

Interesting to see the behavior of the above methods when try to cast a non integer to integer.

1
2
3
4
5
pry(main)> 'asdf'.to_i
=> 0
[2] pry(main)> Integer('asdf')
ArgumentError: invalid value for Integer(): "asdf"
from (pry):2:in `Integer'

The #Integer method complains whereas the use of #to_i results in 0. Unfortunately Active Record uses #to_i to set a boolean attribute and results in FALSE for any non boolean assignment. :–(

Here’s what happened :–

1
2
3
4
5
6
7
8
9
10
11
pry(main)> ds = DataSource.find(5)
  DataSource Load (0.3ms)  SELECT `data_sources`.* FROM `data_sources` WHERE `data_sources`.`id` = 5 LIMIT 1
=> #<DataSource id: 5, auto_approval_enabled: true>
[2] pry(main)> ds.auto_approval_enabled
=> true
[3] pry(main)> ds.auto_approval_enabled = 'asdf'
=> "asdf"
[4] pry(main)> ds.save!
=> true
[5] pry(main)> ds.reload.auto_approval_enabled
=> false

The world is bad. Not really ..

We only observe this behavior in Active Record 3. With Active Record 4 it throws the much needed warning for non-boolean to boolean assignment.

1
2
3
4
DEPRECATION WARNING: You attempted to assign a value which is not explicitly `true` or `false` 
("asdf") to a boolean column. Currently this value casts to `false`. This will change to match Ruby's 
semantics, and will cast to `true` in Rails 5. If you would like to maintain the current behavior, you 
should explicitly handle the values you would like cast to `false`.

Much better, right ??