We fall back on common sense when all else fails. People make mistakes, technologies break, processes fail, and then there's angry customers, angry executives, confusion, and ambiguity. Now what? All that remains is common sense.
The problem with common sense, though, is that it's different for everyone. For some people, common sense might be "don't get fired" or "wait for someone else to step up." That sort of common sense isn't useful in a crisis, but it's not unusual. For others, common sense could be "take ownership", "get help", and "communicate progress." That sort of common sense is useful. But why do some people have the right common sense and others not?
Good companies don't leave common sense to chance. They indoctrinate it. Much of what I consider common sense is a product of Amazon.com's systematic program of indoctrinating its employees in its core values. Though I left Amazon seven years ago after two inconsequential years, the company's values are fresh in my mind. Everyone, not just those with titles, needs to be a leader. Everyone should "think like an owner." Everyone should have a "bias for action." No leader should be above "diving deep." "Frugality" pervaded everything we did.
What's absent from Amazon's values is also noteworthy. There's no mention of code quaity or agility or process or scalability. The values are more fundamental — the sorts of things that need to be present 100 years from now regardless of technology shifts, market trends, or process fads.
Amazon indoctrinated us by extreme repetition and enforcement. I only saw Jeff Bezos speak 4 or 5 times, but he talked about company values at every opportunity. At each company meeting, the company handed out Just Do It awards (and old used Nike shoe) to those who exemplified the company's bias for action. The orientation program for all developers repeated this message. We all used door desks which, though they weren't the cheapest desks on the market, were the best at enforcing the company's value of frugality. All of this probably seemed numbingly repetitive to old-timers at Amazon. But numbing repetition is what indoctrination is all about.
I quit Amazon in under two years, and yet the values burned in as common sense. A year after I left Amazon, a crisis erupted at Vontu. While I was on a database troubleshooting call with our biggest customer, the executive sponsors got on the phone and started lambasting us for problems they were having with the product. I had no idea what to do. The most knowledgeable guy was on vacation. I fell back on Amazon-indoctrinated common sense. I took ownership and, over the next week, helped to solve every problem that the customer was having. Making this customer happy was a big deal for Vontu. My boss and Vontu's CEO were grateful and thanked me publicly. But all I was really doing was following common sense. And the common sense I was following was the list of values that Amazon had burned into my mind.
Seven years after Amazon, I'm an "old guy" in software industry terms and I've come to realize it's my job to indoctrinate common sense in others. I need to repeat myself more than is sensible or comfortable. I need to think of creative ways — such as door desks and old, cheap shoes — to reinforce what common sense is. When everything else is failed — when we're sleep deprived and confused and under pressure — common sense is what remains. Developing the right sort of common sense is fundamental to building company.
I forbade myself from coding this weekend.
I desperately want to code. That's why I've forbidden myself. I've spent the last two weeks writing a sweet set of iOS build scripts. At first, the entire iOS build process made no sense and I was hacking blindly. Then I started to figure things out. And then the vision of the Perfect Build started to burn into my mind. I could think about nothing else. And I willingly flung myself at it. Each commit brought me closer, but my progress enlightened me to how much better things could be. I left work on Friday night at 10 pm happy, but exhausted. On my Saturday train ride to Philadelphia, I mentally obsessed about how to eliminate three more configuration settings and how to blow away a hard-coded value.
This feeling of exhausted, obsessed happiness is what first got me excited about coding. I remember experiencing it for the first time in 1992, when my friend Jed and I were on the phone late every night trying to figure out our AP Pascal assignments. I didn't sleep much. I didn't care. Nothing felt as great or as fulfilling as a problem's transformation from impossible to solvable to perfectable.
There's a paradox, though. You need to obsess about code to be a good engineer. To obsess about code, you need to willfully ignore the big picture sometimes. You can't get to an adequate depth of understanding if you always stop at the point where customers are happy and where the business makes money. Increasing your understanding enables you to do things that make customers even happier and that makes the business even more successful. But losing sight of the big picture makes your work less valuable, and that makes you a bad engineer.
After working all Friday night on my build scripts, dreaming about them, and then mulling them on my train ride, I decided it was time to step away. My build scripts will benefit GameChanger, but were they really the best way for me to spend a working Saturday? Reluctantly, I decided that they were not. I probably should be spending my time regaining the perspective that I suspend when I write code. So I spent a few hours actually using our product at the Penn-Princeton baseball game. And I learned a few things while doing so.
I stuck to my pledge not to code (save for a short Facebook API emergency). It wasn't easy. I really wanted my coding fix. But my weekend of perspective was valuable. I should do this every weekend.
For the past few months at GameChanger, my team has had a "nobody codes alone" policy. Our three developers — Nick, Ben, and I — work together on the same features at the same time. It's an unusual practice. Conventional wisdom says that we should work on separate projects to minimize coordination overhead.
Conventional wisdom is wrong for two reasons. The first reason is pretty intuitive, but the second one is surprising.
The intuitive reason: working together simplifies code and reduces bugs. Any time wasted on coordination is gained back by time NOT spent on fixing bugs. I've lost count of the number of times that I've taken a complex, esoteric idea and simplified it after talking with someone else. Bugs reveal themselves more quickly in simple code. People working together are good at simplifying and distilling each other's ideas.
The surprise, though, is that co-development makes non-engineers more productive. Designers, testers, and product managers have to juggle every project that's in flight. When a bunch of projects are happening simultaneously, the burden falls on non-engineers who have to deal with interruptions and last-second requests. In most places I've worked, PMs, testers, and designers are overworked and stressed out. They crave the opportunity to do few things and to do them well. Co-developing features helps non-engineers become happier and more productive.
Not every team is capable of working together on everything. Ben, Nick and I can do it because we don't freak out at ambiguous situations. Our work collides frequently, and, when it does, someone invariably speaks up. When that happens, we stop what we're doing and we talk. Then we figure out a plan and get back to coding.
Rails and Rspec
If you've been practicing Test Driven Development, you are familiar with the normal cycle:
Earlier this year I started developing in Rails with Rspec and found my cycle to be closer to the following:
These wait stages will vary depending on your environment, but in my case it can be high as 15 seconds to run a test that is measured in ms. After a while I found myself starting to skip steps, writing larger and larger tests, filling in more code at once, etc. My rationale was, if its going to take so long to run one test, I might as well write a few at once. When I realized what I was doing I thought there had to be a better way. After some brief searching, I found it!
Spork is a service that preloads your Rails environment, and then forks a copy of your server when you run tests. This reduces your TDD cycle back to the normal 3 step process, saving you valuable time.
Setting up Spork only takes a few minutes. For the latest instructions, see the spork-rails gem.
This assumes you have already installed the rspec-rails gem, and configured your spec_helper.rb file.
Add spork-rails to your Gemfile
group :test, :development do ... gem "spork-rails" ...
After installing the gem, you need to configure Spork. You can bootstrap your test helper file by running:
spork rspec --bootstrap
When it completes, it will tell you to modify your spec_helper.rb file and follow the instructions within. The bootstrap command will have edited your spec_helper.rb file and added two new blocks at the top:
require 'spork' #uncomment the following line to use spork with the debugger #require 'spork/ext/ruby-debug' Spork.prefork do # Loading more in this block will cause your tests to run faster. However, # if you change any configuration or code from libraries loaded here, you'll # need to restart spork for it take effect. end Spork.each_run do # This code will be run each time you run your specs. end # The previous contents of the file will be at the bottom ...
Generally, all you need to is move everything that was in the file before, and is now below the Spork sections, within the Spork.prefork block. This block instructs Spork to perform this work only when it starts your initial Rails environment. Thus, when your environment is forked by Spork, all this overhead is avoided.
If you need to do some activity for each forked environment, place it in the Spork.each_run block.
Now that Spork is configured, you can start it and see how much faster your test runs are.
To start Spork:
$ spork Using RSpec, Rails Preloading Rails environment Loading Spork.prefork block... Spork is ready and listening on 8989!
Spork is now up and running, and listening on port 8989
To run your tests using Spork:
rspec --drb spec/
You will know its working because:
- your tests run immediately
- in your spork terminal you see a message indicating it is running your tests
If you want RSpec to default to using Spork, you can edit your .rspec file and add the --drb option to it. This way when you run RSpec it will look for Spork and use it if available, otherwise it will load your Rails environment normally.
For those of you using Rubymine, you can also leverage Spork. They have a great help page that provides the instructions here: Using DRB Server
If you are making changes that would normally require you to restart Rails, you will now need to remember to restart Spork instead. This can be automated using tools like Guard which I'll cover in another post.
Two weeks ago, I got frustrated with the hundreds of bugs and feature requests in our database. So I deleted the whole thing. Then an odd thing happened: we started fixing bugs. We fixed almost 30 bugs last week, easily our best bug fix rate since I've been at GameChanger.
Joel Spolsky inspired me with his post on Software Inventory. I followed his advice almost to the letter:
At some point you realize that you’ve put too much work into the bug database and not quite enough work into the product.
- Suggestion: use a triage system to decide if a bug is even worth recording.
- Do not allow more than two weeks (in fix time) of bugs to get into the bug database.
- If you have more than that, stop and fix bugs until you feel like you’re fixing stupid bugs. Then close as “won’t fix” everything left in the bug database. Don’t worry, the severe bugs will come back.
In our release cycle, two weeks is an eternity. So we don't wait for two weeks of bugs to pile up. We wait until our bug column in Trello is roughly a screen-and-a-half tall.
Then we fix bugs! Our new bug column is too small to ignore. Some of the bugs that popped up were old bugs that customers had complained about for months. They had nowhere to hide in our tiny Trello column. So we fixed them.
The normal Huge Bug Database works in the opposite way. It requires a triage system (meeting), which requires agreement on a system of priority and severity (something to argue about). Then there needs to be some sort of scheme (meeting) for scheduling bug fixes along with feature work. And someone's got to make sure (emails) that the small percentage of Chosen Bugs are actually fixed before release goes out. That's a ton of meetings, arguments, emails, and management for little benefit.
Instead of documenting, categorizing, and scheduling bugs, we're fixing them and going back to writing features. Yay!
This week, an intriguing question made the rounds at StackExchange: "I've inherited 200k lines of spaghetti code — what now?" Since I apparently lack enough karma to post on StackExchange — I created my account today — I'll respond here.
At its core, this is a people problem and not a technology problem. I've failed when I haven't recognized this fact. The core question isn't one of SCM, build systems, or coding practices. The core problem is convincing a group of people (scientists, in this case) to adopt a new set of practices and to change the behaviors that led to spaghetti code.
To make this transformation, some of the same people who are writing spaghetti code today will have to become evangelists for your ideas.
Here's what I'd do:
- Observe for a couple of weeks. Understand where the team is experiencing the most pain. What short term pressures does the team face? Learn what people are good at. Figure out who's most excited about making changes.
- Form a rough vision of what success looks like. The leading answer at StackExchange is quite thorough and it's a great starting point. But remember that this vision will necessarily be unique to each organization.
- Start by solving a problem that people already care about. Are people complaining about lost source code? Embarrassing bugs? Time spent on supporting legacy code? Too many feature requests and too little time? Bad user experience? Late releases? Whatever it is, a 200k-line code base is going to have lots of problems. The team is going to care about some of those problems more than others. With your first big initiative, earn people's gratitude! At a past job, the team became resentful when I tried to solve an important problem that the team didn't care about. I likely would have succeeded if I'd spent my first few months on problems that mattered to them.
- Support other people's good ideas. You can't tame a 200k-line code base by yourself. If you need other people to come up with good ideas, you better support those people when their ideas come along. My colleague Andrew was incensed at our group chat software and drove our adoption of HipChat. A few weeks later, our entire development workflow is built on HipChat. Build notifications, deploy notifications, code review requests, and production alerts all route to HipChat rooms. At a company that hates email, nobody's ideas on continuous integration, monitoring, and alerting would have worked without a chat platform that the team loved.
- Be optimistic. Every problem has a solution. Your 200k-line code base won't be beautiful overnight, but there are going to be some great victories along the way. Enjoy them.
- Don't obsess over your failures. When you're dealing with a 200k-line code base, you're going to have to make a lot of changes. Not every change will work. Don't worry about it. I tried to turn every Monday into a bug-fixing day. It worked for a little bit but it proved hard to keep our attention on bugs when we were hustling to wrap up higher priority work on Mondays. I didn't force the issue. We'll find another way to prioritize bug fixes.
- Sell, sell, sell. When you make something better, make sure that other people understand it and can adopt it themselves. Do demos. Hold training classes. Pair program. Do what you need to do to make sure that good ideas get critical mass.
Transforming a codebase is really about transforming a team. And transforming a team is about getting people excited to make big changes. Enjoy the challenge, and good luck!
Risk is invisible but reward is not. This basic fact has been at the root of more than a few calamities, not the least of which are the recent financial crisis and the dot-com boom and bust. A company's or a society's attitude towards risk is a core part of its culture. And we'd like to have a culture that encourages smart risks and discourages stupid ones. How do we do that?
Software engineers are responsible for avoiding stupid risks. But we often don't. Under pressure to meet a public deadline or to ship a highly visible feature, engineers routinely take reckless — but invisible — risks. We can introduce security loopholes, skimp on testing, or skip monitoring entirely. To make matters worse, customers — who can't see the risks — are usually thrilled by this behavior and software managers sometimes reward it. When you aren't attuned to the risks of software development, it's easy to mistake recklessness for "customer focus." How do we prevent this from happening?
It obviously helps when managers already understand risk. I was pleasantly surprised and impressed when our investor Jos White stopped by GameChanger and advised us to prioritize technical architecture. Jos is a wildly successful three-time entrepreneur. He founded three $100M+ companies, but he's never been an engineer. So when came to speak with us, I was excited to hear his story. And I wasn't surprised to hear him talk about finding untapped markets and finding great teams who believed in their bones that they could bring their products to life. But I wasn't expecting a self-described marketing guy to talk about the importance of doing architecural work. Jos understood that good architecture was essential for scaling at low risk, even if that work doesn't pay off instantly. Jos has been around the block a few times, and so he seemingly "gets" the risk management balance that software companies need to master. Others learn to "get it" by working with technical people they trust.
As with many interesting problems, the root answer is cultural. Companies need to create a culture that values good risks and that discourages bad ones. In software, this comes down to valuing risk reduction and the people who practice it well:
- Does your company have a technical career path that rewards people for paying attention to details?
- Do engineers get public attention and praise for risk mitigation? Or, alternatively, do they get attention and praise for heroic responses to problems that they should have prevented?
- Do engineers have prestige in your company's culture?
- Do engineers get rewarded or chastised for speaking truth to power?
For our part, engineers need to use direct no-nonsense language when informing others about risk. At GameChanger, I was impressed when my colleague Doug made risk tangible by illustrating risks with graphs. A line that moved down and to the right was a good way to illustrate that we lowered our risk of crippling performance problems in the future.
It's hard to create a culture that takes good risks and avoids foolish ones. But it's an essential challenge for any company to grasp.