How should we respond to accusations of sexual violence?

by Seth Petry-Johnson 9. October 2013 05:25

This post was triggered by an event unfolding on Twitter in which a female developer accused her male employer of sexually assaulting her at a conference. It's a difficult post to write; I am a man and I'm writing about a topic that, historically, men don't have a great track record with. Because this is such a sensitive topic I run the risk of inelegantly expressing my thoughts and offending the very people I hope to help. Please; if you take offense to anything written here, trust that it's written from a position of honestly wanting to make things better.

The first thing I'm struggling with is how to respond to the accusation itself. Details are scarce, so is "sit back and wait for more details" the correct response? Is "publicly shame the accused" the correct response? I don't know these people personally, so is "butt out of their business" the correct response? I honestly don't know.

On the one hand, immediately shaming the accused seems like an unfair response. The original, public accusation was a 140 character tweet. It didn't specify the type of assault, when it occurred, where it occurred, or anything else except the name of the alleged attacker. If someone accused ME of something heinous, and provided little data to back it up, I certainly wouldn't want my community to lynch me without evaluating the facts.

On the other hand, I think women have a legitimate complaint that men don't take this stuff seriously, that we don't understand the dangers they face, and that we don't realize how utterly and totally destructive an assault can be to their sense of security and self-worth. Any reaction short of a firm, immediate challenge to the accused could be evidence of this.

Here's an example: the accusation is that an "assault" occurred. So my first instinct is to say, "well, what sort of assault was it? Did he force intercourse on you, or did he just touch you inappropriately?" In my mind, those are drastically different assaults with different levels of trauma associated with them. But to a woman, especially one that has been assaulted, they might not be different at all. They are both examples of a man abusing power and privilege to violate her sense of self and safety, and to dismiss the latter as "only" inappropriate touching sends the message that it's not really a big deal. And while logically I can understand that point of view, my initial gut reaction is still to base my response on "what type of assault was it?".

That brings me to the second thing I'm struggling with. I feel like we as a community should withhold judgement until all the facts are out and a full and complete accusation is made. But the reality is that even making a vague, 140-character accusation late at night on Twitter requires a great deal of courage. I've read far too many stories about women who have spoken out, even about things less serious than assault, and have been subjected to an unbelievable amount of harassment including name calling, stalking and even threats of death and violence. How can we afford to withhold judgement until the facts are in when it's so dangerous for women to publicly provide those facts?

And the last thing I'm struggling with is "witness responsibility". One tweet suggested that this assault occurred in front of ~20 other people, most or all of them men. Only one that I know of has come forward to publicly support the victim's accusation. If I witness something like this, do I have a duty to the community to speak up in case there are other victims out there? Or do I have a duty to respect the victim's wishes for privacy and right to decide for herself when to go public? And what about the witnesses' moral obligation to intervene during the assault?

I guess what I want to know is, how should I as a man, a community member, or a witness support a victim in a case like this?

Serious comments welcome. 

Core principles: your compass in the storm

by Seth Petry-Johnson 30. April 2013 04:33

Software development can be chaotic. We often need to make decisions based on missing data (or data we know is likely wrong), and it's difficult to ask outsiders for advice because the "right" answer is often context-dependent. In essence, successful software development depends on repeatedly selecting the least bad option from a set of imperfect solutions.

In practice, this means that developers cannot simply memorize solution patterns or "recipes". If I say "authentication" and you immediately think Forms Auth, then you're short-circuiting the selection process without evaluating the options. Same thing if I say "shorter feedback" and you immediately think "two week sprints". You can't make a good decision without evaluating your options, and just because you choose Solution A on a similar problem a month ago doesn't make it the appropriate solution to today's problem.

"Been there, done that" is not a decision making process! 

Making decisions is hard. The deeper you analyze a problem the more variables you identify, and the more variables you identify the harder it is to reason through the myriad ways they interact. It's so much easier to look at a problem, wait a few nanoseconds while the pattern matching functions of your subconscious mind do their magic, and then do the same thing that you did the last time you had a similar problem. After all, you tell yourself, it's the "pragmatic thing to do" because you don't have to "waste time" on analysis or research. "The devil you know", and all that.

Not so fast.

Pattern matching is a great heuristic for quickly identifying potential courses of action, but not for selecting the best one. Making the best possible decision requires greater attention to detail and greater appreciation of nuance. If you get the details wrong then it might seem like a good decision at a high level, but eventually you'll suffer death by a thousand papercuts. [Or you'll go broke under technical debt, etc. Insert your favorite metaphor here]

So how do we select from that set of imperfect solutions?

The key to making good decisions is to articulate your core values and principles, and then use them to derive a solution. Rather than memorizing specific solutions, memorize the steps you follow and the questions you ask to arrive at a solution.

For example, at Heuristic Solutions we have identified four core values that guide everything we do:

  • Understanding: we can't be successful unless we know what "success" looks like
  • Predictability: surprises are disruptive; we value procedures that minimize their impact
  • Productivity: success requires efficient operations
  • Quality: we value doing it right the first time; re-work is anathema to us

When making a decision, we frame it in context of these values to better see the trade-offs at play. For example, a low degree of Understanding means we can't be very Predictable, so we do more up-front analysis when predictability is crucial. When Productivity is necessary then we invest in Quality so that we can preserve velocity over time. 

This process forces us to consider those pesky (yet all-important) details specific to each situation. Sometimes this leads us to take radically different approaches to similar problems, but in each case we know we're maximizing for the things that truly matter to our success.

What are your core values?

What matters most to your organization? If you haven't already articulated your core values, take a minute to do so. Do you care about speed to market? What are you prepared (or not prepared) to sacrifice to get it? What does "quality" mean to you? How important are estimates to your planning process or stakeholders? Is it more important to maximize developer productivity, or team productivity?

When you're done, write them on your team board. Repeat them out loud each time you make a decision. Have discussions about which values are more important in each scenario, and then brainstorm ways to maximize those specific values. 

One parting word of advice: don't be afraid to follow your values, even if they contradict "best practices". While it's never a good idea to blindly ignore prevailing wisdom, realize that only YOU can fully appreciate the nuances of your specific situation. Core values are your compass, and by trusting them you allow yourself to select the best possible solution for this specific decision, ignoring "one size fits all" advice that might otherwise get in your way.

(Of course, if you frequently find yourself ignoring best practices then you might be thinking your situation is more unique than it really is. More on that in a later post!)

Bottom line: articulate what really matters to you, and then consciously and intentionally use those values every time you make a significant decision. You might be surprised at where this process takes you.

Architecture and design are negotiable; clean code is not

by Seth Petry-Johnson 29. November 2012 04:42

In a perfect world, each and every feature we build would be lovingly crafted, properly factored, elegantly architected and fully tested... and we'd have enough budget for all of it.

I'm not lucky enough to live in that world. My job is to help clients use their limited budgets in ways that maximize their overall business objectives. Sometimes that means minimizing software maintenance costs, other times it means getting an imperfect feature into production as fast as possible. These types of decisions always involve trade-offs, often sacrificing some sacred agile calves on the altar of "getting it done".

So what's a pragmatic craftsman to do? How can we intentionally leverage technical debt to meet short term goals and still maintain a high bar of general quality in our code?

The principle I use in these situations is that architecture and design are negotiable, but clean code is not. This is best explained by breaking it down into component parts:

  • Architecture is negotiable. Not every project needs an n-tier separation of concerns. Not every project needs DI/IOC. Same for message buses, impersonation frameworks, 2nd level caching and so on. These things are often valuable and should not be forsaken lightly, but they do have costs. A pragmatic craftsman should be able to articulate those costs and weigh them against their value over time.
     
  • Design is negotiable. By "design" I mean the low level feature code. Sometimes you can get away with a switch statement instead of a Strategy, or tight coupling or low cohesion or large method bodies. Same for violating SOLID principles. I'm not saying do these things lightly, but be pragmatic about it. Learn to identify scenarios when their benefits will be realized, and scenarios when they won't.
     
  • Clean code is NOT NEGOTIABLE. Sacrificing architecture or design can be forgiven if you make it easy for future programmers to clean up and improve. This means that no matter how "dirty" your architecture is, be damn sure your code easy to read, clearly communicates your intent, and documents WHY you've made the decisions you have.

But aren't architecture and design part of "clean code"?

Absolutely. Clean design trumps comments explaining bad design every day of the week. But you will face scenarios when you have to trade away something in favor of something else (time to market, hitting a budget, risk aversion, etc). This blog post is all about identifying those parts of clean code that you can give up, and which parts you should die on your sword to keep.

Here are some of the things I consider inviolable and strategies for preserving them:

  • Code should always be easy to read and understand. I don't care how nasty the architecture or design is, I don't care how stripped down the feature is, and I don't care what your budget is. You should always make it easy for the next programmer down the road to understand your intent (what you mean the code to do) and your implementation (what the code actually does).
     
  • The messier the architecture or design, the more you should document with comments. Well written clean code doesn't need a lot of "here's what I was thinking" commentary. That commentary is much more valuable when you're taking shortcuts and incurring technical debt, because it can make it much easier for someone to pay back that debt later. My rule of thumb is, do it right yourself or provide clues to help the next dev make it right later. (With a huge preference on the former!)
     
  • Keep your eye on the prize. In other words, always have an idea of what the end goal would be. Think about what you wish you could be implementing and try and "lean" the code in that direction. For example, think about what the very first step of refactoring would look like. Can you go ahead and take that first step now?

Closing thoughts

Like many of my posts, I'm talking about edge cases. Most of the time we should be striving for clean code including architecture and design. Too many programmers (and clients!) are far too quick to take shortcuts, and this post is NOT about taking more of them.

But if you've fought the good fight and tried everything else first, and you still need a shortcut, then be sure you do it cleanly and in a way that can be easily fixed later.

Happy coding!

JOIN-less lookup fields using enums and metadata attributes

by Seth Petry-Johnson 19. September 2012 18:10

One of the projects I work on contains a large database with a lot of lookup fields containing status codes, record types, processing flags, etc. A great deal of these are implemented in a typical normalized fashion with two tables and a foreign key relationship:

Pretty standard stuff, right? Sure, yet at the start of a new development phase last year I decreed Thou Shalt No Longer Do This!  

What's the big deal with lookup tables?

On this project (and on many of my others) I had noticed the following patterns:

  1. The vast majority of the lookup tables contained a single "Name" field containing a human-readable description of that status code or record type.
  2. Because the database is so large, a typical query might need to do five or six joins just to get the names of the lookup values.
  3. The values in the lookup table rarely changed. When they did change, it was always as part of a scheduled release.
In short, we were paying a performance penalty on each and every query to obtain unchanging metadata about a small, discrete set of known values.
 
In addition, dealing with these joins by hand was an annoyance whenever we needed to write manual T-SQL queries or express ad-hoc queries directly against the Linq to Sql data context. 

There's Got To Be A Better Way! ™

The solution that we implemented, and that we're still using nearly two years later, is simple:

  • All lookup-style data (status codes, record types, etc) have a corresponding C# Enum
    • A custom Attribute associates each value with a human-readable string
    • A custom Attribute associates each value with a database key representation
  • There are no lookup tables or foreign keys.
    • The domain model contains properties of the Enum types
    • In the database, each lookup field is a string, not an integer foreign key
    • When we write to the database, we convert the enum into its database representation and store that value
    • When we read from the database, we convert the stored string into an enum instance
  • The parsing and conversion is handled via extension methods:
    • String.ToEnum<T>
    • Enum.ToDescription()
    • Enum.ToStringConstant()
A picture is worth a thousand words here:

Was it worth the effort? 

After nearly two years of use I'm pleased to say that this pattern has served us well. The extension methods make the lookup values easy to use, avoiding joins improves system performance, and storing strings (rather than foreign key integers) in the tables makes the raw data a little bit easier to use. 

Of course, your mileage may vary. This technique isn't appropriate if your lookup values are dynamic (rather than a fixed set) or if you need to track a large amount of metadata in the lookup table. But if your project has the same characteristics that mine does, I recommend you give this a shot.

Happy coding!

 

Appendix: the source code

I slopped the code for the attribute classes and extension methods onto my Github repo.

Avoid heroics; real value comes from discipline

by Seth Petry-Johnson 1. August 2012 19:11

Spend any amount of time in this industry and you'll eventually end up playing the hero. Maybe you meet that deadline by pulling a 70-hour week, or you fix that production issue by editing a script or database procedure directly on the server. You shipped the product, you fixed the bug, you "got the job done". You're a hero, right?

The only problem is, heroic behavior is dangerous. I've played the hero enough times to know what happens after the dust settles:

  • You pull a 70-hour week and hit the deadline, but the code sucks. It isn't tested, it has bugs, or it just feels like a half-assed feature. 
  • You hot-fix a file on the web server, but forget to update source control. The next deployment replaces your fix and re-introduces the bug.
  • You hot-fix the database server, and the next deployment crashes because a table or column already exists.
  • You burn out, lose focus, and make stupid mistakes.
The common pattern here is that you've achieved a short-term goal at the cost of highly unpredictable future results. Someone, somewhere, will have to clean up the mess when it catches them by surprise. 

In other words, you've created bad technical debt that is unintentional, hard to manage, and hard to quantify.

So what's the solution? 

It's certainly easier said than done, but the solution is to stay disciplined and stick to your process

If that process says you write tests first and get QA feedback before committing to trunk, then that's what you need to do... even if it means missing a deadline.

If that process says you must create a formal release package to modify the production database, then that's what you do... even if it means taking longer to fix the bug.

Discipline yields predictability by forcing you to be proactive. It helps minimize future surprises and prevents you from becoming overly reactive, which can often lead to a cascading series of errors when you start jumping from fire to fire.

When to cheat

There are obviously exceptions. If the server is totally down, and you know of a quick fix to bring it back online, then maybe you should fix it. But if you've internalized these principles then you'll feel real damn uncomfortable doing it, and that discomfort will remind you to take the necessary "after-action" steps to pay back that technical debt immediately after the crisis passes.

Remember kids: avoid heroics. Real, lasting value comes from staying disciplined... especially when you feel pressure not to. 

Test Data Setup: Staying clean, DRY, and sane

by Seth Petry-Johnson 24. July 2012 18:27

There are many good reasons to avoid hitting a database in your tests. I agree with all of them, and I try my best to avoid doing it.

However, some tests do need to hit the database. Even the most dependency-injected and mock-infested system should hit the database when testing the data access layer... after all, what good is a test suite that doesn't test any of your actual data access logic? And if you're smart and follow the testing pyramid then you'll have some integration and acceptance tests that need a database as well.

In "Rules for Effective Data Tests" I mentioned some strategies for setting up those data tests. This post expands on those ideas and shows how to keep your setup code clean, DRY and maintainable.

What's so difficult about setting up a data test?

First, a definition. When I say "data setup" I'm talking about anything you do in the body of a test [or a setup method] to create the database records needed for a given test to execute.

While similar to the setup of a "true" unit test, interacting with a Real Life Database™ makes things a little more interesting. Some of the challenges we have to overcome are:

  • Test residue: Unless we delete it, data created by each test remains in the database when the test exits. At best this just wastes space; at worst, it starts to interfere with other tests. (See here for a common solution to this problem) 
  • Database constraints: Foreign key constraints are a real pain. When setting up test data you need to create the entire data graph to satisfy the database constraints, regardless of if those relationships are actually relevant to the test.  
  • Verbosity: Because of the foreign key issues mentioned above, setting up data tests requires more code than setting up a unit test. This makes tests harder to write, harder to maintain, and harder to keep DRY. 
  • False negatives: The more complex the setup, the greater the change that tests will fail not because your application logic is wrong, but because you screwed up the setup. 
  • Painful to debug: Debugging a data test is more difficult and time consuming than a unit test. Not only does the test take longer to run, but debugging it often means poking around in both the application debugger and a database tool.
A daunting list to be sure, but it's manageable.

Characteristics of good setup code

The primary contributor to the quality and maintainability of your data tests is the setup code; the easier it is for someone to understand the specific scenario you are creating, the better equipped they are to maintain that test.

Conversely, the harder the scenario is to understand and maintain, the less value that test will provide over time. Tests that contain an unintelligible jumble of setup code have a very real risk of being deleted (rather than fixed) if they ever break due to new code changes.

So what is "good" setup code? It should be: 

  • Highly expressive (high signal-to-noise ratio). Readers should be able to very quickly understand the scenario(s) you are creating without mentally parsing code. 
  • Highly reusable through the use of default values. If I just need to create a Person, let me call "CreatePerson()" and fill in the details for me. 
  • Easily customizable to each test's needs. Since the customized data are usually very relevant to the test at hand, it should be easy for a reader to spot them.  
  • Maintainable; databases change, and its not uncommon to add a new required field. The fewer changes you need to make to existing test code to support these changes the better.
These characteristics aren't specific to data tests, of course. They apply equally well to setup code of any kind.
 
So what happens when we apply these principles? Read on for specific suggestions...

Data Helpers: the Object Mother pattern for DB entities

The Object Mother pattern describes a special kind of factory class that encapsulates the instantiation of an object (or group of objects) in a specific state, usually mirroring a common scenario in the underlying domain. For instance, you might have an Object Mother that creates an Order object, adds some Order Items and marks it as Shipped. The goal is to turn a complex initialization process into a one-liner so that it is easier to read and maintain.

We can use this same approach in a data test, except that instead of constructing an object in code we need to create one or more records in the database. I call these classes "Data Helpers" and they generally:
  • Are static classes: These classes have no need to ever be mocked out, and making them static makes them easier to invoke in your tests. Omitting the need to instantiate them increases the signal-to-noise ratio and keeps setup code lean.
  • Follow a naming convention: It's important that other developers can discover and use your helpers, so follow an obvious naming convention. I recommend:
    • Put all Data Helpers in the same namespace
    • Name according to the primary entity being created. OrderHelper, CustomerHelper, etc.
  • Create a single "primary" entity: I find that Data Helpers are best focused around a single primary entity, such as an Order. It's fine if they create child or related data for the primary entity, but they should avoid creating a large number of collaborating entities. See below for how to use "scenario" objects for more complicated setups.
  • Treat performance as an important, but secondary, concern: Data Helpers provide their primary value by reducing the cost to create and maintain data tests, so whenever "speed of execution" and "ease of use" are at odds with each other, favor ease of use. That doesn't mean you shouldn't care about performance, and in fact you should care very much. Just not so much that you erode the overarching goal. You can easily offload the performance hit to the CI server.  (You do have a CI server, right?)
The methods exposed by a Data Helper class should:
  • Use optional parameters for as much as possible: A primary benefit of Data Helpers is dramatically increasing the signal to noise ratio within setup logic. Callers should only have to specify the specific values that are significant to their test; all other properties should be created using reasonable defaults.
  • Are semantic: Don't be afraid to create highly specialized methods, such as CreateOrderWithBackorderedItems(), which usually just delegate to a more general method with a specific combination of arguments. This can dramatically improve maintainability; if you add a new field to the database, and you can easily infer the correct default value based on the semantics of the method call, then you can implement that new field in the helper method without touching any of the existing tests.
  • Return the created entity: The caller probably needs to know about the data that was created, so return the entity object that you just created. 

Data Scenarios: a bunch of Object Mothers working together

Data Helpers are great when you need to create test data, especially if you want to specify a few key properties and use defaults for the rest.

But what if you want to create multiple related entities, or you want to reuse a specific setup in multiple tests? For instance, you need to create a Customer, with completed Orders in the past, and an in progress Order that's ready for checkout. In these cases, I create a special type of Data Helper that I call a "Data Scenario". 

Scenario objects have these characteristics:

  • Create a large or complex set of data: Just like Data Helpers reduce individual object setup to a one-liner, Scenarios reduce multiple object setup to a one-liner.
  • Model real-world scenarios: The whole point of a Scenario is to encapsulate realistic data patterns that might exist in production.
  • Expose a smaller set of configurable defaults: Scenarios tend to expose fewer arguments than Data Helpers because they are better suited to creating general purpose groups of data rather than highly-specific records.
  • Are often used in fixture-level setup: A common pattern is for a group of tests to share a Scenario object that is created in the test fixture's setup routine, and then provide test-specific adjustments to the Scenario via inline Data Helper calls. 
  • Are instantiated, not static: Scenario objects are NOT static methods of a helper class. Instead, they are objects that get instantiated and perform their data manipulations in the constructor. This allows Scenarios to be created, manipulated and passed around as needed.
  • Expose pointers to the interesting data: A Scenario object should contain public properties containing references to the entities it creates (or at least their IDs). This allows test code to further manipulate the Scenario data or to make assertions against it. 

Common objections to these techniques

Some of the specific objections that I've heard are:

  • It takes a lot of time/code to write/maintain helpers: Yes, on a complex system you'll end up with a decent amount of non-production code implementing these helpers. And yes, it requires an investment of time to get started. But I've been using these patterns for two years on a large application and I'm absolutely convinced the effort is justified. Once you get a decent library of helpers set up it becomes really, really easy to write tests... sometimes even easier than setting up expectations in a true unit test!
  • The tests take a long time to run: Yes, they do. You should do your best to avoid hitting the database except when necessary, and you should lean on your CI server to run the whole suite for you. If you can find a way to test the data access code without hitting the database, I'll eat my hat.
  • Its hard to know what helpers exist: True, if you're not the author of the helpers then they are harder to use. That's why it's so important to follow good naming conventions. You can also, you know, talk to your teammates if you create a new helper or wonder if one exists.
  • I don't wanna: If you don't care about testing the data access code, or you don't care about writing good tests, then I got nothin'. Go play in traffic.
Let's face it: data tests suck, but they are a necessary evil. The goal is to maximize their value while minimizing their cost, and that's what these techniques do.

Closing thoughts

In my experience it works best to think of Scenarios as the broad context in which a test will execute; they create all of the background data that is necessary for a test to run, but isn't very significant by itself. Data Helpers are used to create specific data records that are significant to a specific test. Used together, they create a very rich language for setting up your test data in an easy to write, easy to read, and easy to maintain form.

I've been using these techniques on a multi-year, multi-developer, multi-hundreds-of-thousands-LOC project and I am convinced that they are directly responsible for allowing us to maintain high test coverage on a very data-intensive app. 

Happy testing!  

Defensive Programming: Avoid Tomorrow's Debugging, Today

by Seth Petry-Johnson 18. July 2012 04:38

Just as I was trying to write a good intro to this post, Jimmy Bogard tweeted:

I've felt that frustration myself many times. I work on large software systems and often have to troubleshoot hard-to-replicate, data-specific defects given only an error message and limited access to the production environment. Turning this limited data into an actionable bug report can be very, very difficult.

This experience has shown me that there are two types of programmers: those that intentionally craft code that it is easy to debug, and those that don't. Programmers that don't do this are, unfortunately, incredibly common and incredibly costly to an organization. Don't be that guy/gal whose code everyone hates to debug!

This post explains some coding techniques that will make your systems easier to troubleshoot and less costly to maintain. Use them; your team will love you for it!

What does "defensive programming" look like?

"Defensive Programming" refers to a collection of coding techniques that decrease maintenance costs by surfacing defects as early as possible, and by making them easy to troubleshoot. There are many articles on this topic, some arguing for and against it, and I encourage you to read them for additional insight.

Specifically, defensive programming means that you:

Write clean, simple, intent-revealing code

This is a universal requirement, I don't care if you're coding defensively, offensively or somewhere in the middle. The easiest defect to fix is the one that never occurs, and simple code is less likely to contain defects than complex code, so keep your designs as simple as possible.

(If you don't agree with this statement, stop reading and go play in traffic... your team will thank you!)

Assume inputs are tainted until proven otherwise

Most applications need data to function and many programmers make assumptions about their data, such as "this string will never be empty" or "this value will always be positive". 

Unfortunately, that string can be empty in some cases, and that value will be zero at some point in time. If you don't validate your assumptions before using the data then you risk intermittent, hard-to-troubleshoot errors. 

Therefore, do sanity checks on your input BEFORE you use it. Use a "design by contract" tool like Code Contracts for .NET if you can, or do it manually if you must. In any case, validate your input before you use it and display a helpful error message if validation fails. (See below for more on helpful exceptions)

In addition to making these errors easier to diagnose, treating all input as potentially hostile is also a security best practice. Sanity check your data and make both your teammates AND your security team a little happier!

Fail early, with useful messages

This is as important as it gets.

Imagine you get an error report that says "Sequence contains no elements". What do you do next? If you're lucky enough to get a stack trace then you can trudge through the code looking for the offending line, but what happens if the offending line contains multiple statements chained together? 

Now imagine the error report says "Could not obtain order items for order 1234; sequence contains no elements". You haven't looked at a single line of code yet, and you already have way more information about the problem!

Same goes for null reference exceptions: Would you rather see "Object reference not set to an instance of an object" or "Cannot calculate sales tax for order 1234; Tax Calculator object was null"?  

The key principle here is that you should anticipate errors that might occur and throw exceptions that provide key debugging info directly in the error message:

  • Help the programmer locate the statement that failed and understand WHY it failed.
  • Include key pieces of data needed to reproduce it: order ID, customer ID, etc. (Obviously, be careful not to expose identifiers that could compromise the security of your system!)
Ask yourself, "if this occurs in production 6 months from now, what pointers would I need to zero in on the problem?" and then include those pointers in the exception. 

Use "fail safe" default values, where appropriate

In many cases, invalid data may not necessarily require an exception. For example, ask yourself these questions about each variable or statement you write:

  • Can I treat null strings the same as empty strings?
  • Can I treat null sequences (lists, arrays, etc) the same as empty sequences?
  • If a string parsing fails, can I substitute a default value instead of throwing an exception?
If the answer to any of these questions is "yes" then use the null coalescing operator or conversion helpers to convert null or invalid values into something less "exception prone". I rarely need to differentiate between null and empty sequences so I've written an .ToEmptyIfNull() extension method that I use whenever I need to iterate over a collection. Major reduction in null reference exceptions for negligible effort.
 
Of course, sometimes you DO care about differentiating between null and empty, or ensuring a parse succeeds. In those cases just throw a helpful error message (see above) as soon as you detect the problem. 

"Future proof" your program flow

I've seen a lot of defects occur when business conditions change, and something that "could never happen" when the code was written suddenly becomes possible. 

Examples:

  • When you write a switch statement, always include a default branch. It's better to have the default branch throw an exception like "not implemented condition 'FOO'" than silently fall through and cause a potentially harder-to-debug error.  (Of course, you do your best to avoid switch statements, don't you?)  
  • When you have a chain of if/else-ifs, always include an else branch. If it should never be reached, throw an exception that explains the conditions that occurred and why you expected them to never happen.  
  • If you're dealing with combinations of different states or variables, and certain combinations "should never occur", go ahead and handle those combinations anyway. It's better to throw an exception you can control than to let the system fail on its own.  (For example, "Order 123 has status SHIPPED, but IS_CANCELLED was true; is the update service malfunctioning?")

Go, make the world a brighter place! 

Using these techniques can help you avoid errors in production and can make it easier to resolve errors that do occur.  Using them will bring joy to the hearts of men and will make you beloved amongst your teammates. Use them; do it for the children.

A Good Consultant Is Always Selling

by Seth Petry-Johnson 16. May 2012 19:24

When I was just starting out as a consultant, a friend and co-worker commented that "a good consultant is always selling". Three and a half years later I've come to agree, and in fact I think an appreciation of this concept is critical to becoming a truly effective consultant.

In consulting, like many industries, attracting new business is time consuming and sometimes expensive. It's hard to stay fully utilized during project transitions and gaps in the work schedule can quickly eat away at your profit margin. Good consultants understand that adopting a sales mindset allows them to convert existing engagements into larger ones, and one-time customers into repeat clients.

Done right, this is a service to the customer, not a sleazy "sales technique" to increase billed hours. I'm not talking about artificially inflating schedules or gold-plating unnecessary features, but rather caring enough about the customer's end goals that you can help them identify opportunities they might otherwise miss.
 

To set the stage, a quick example

My team recently spent a week making changes to an existing feature for a customer. During the iteration demo, I realized that the changes we'd made were going to interact awkwardly with a feature coming in the next iteration. Though it was implemented "per the spec", the combination of the two features was going to be confusing and hard to use.

At the same time, I realized that by re-designing an existing feature of the site we could not only improve that feature, but also improve the feature we'd just built and completely avoid the need for the upcoming changes that were going to be so difficult. In short, by totally re-envisioning a section of the site we could radically simplify some key functionality.

This re-design was totally out of scope for the project goals and would cost the customer at least a week of additional work, but they were thrilled to add it to the backlog anyway! 

Why? Because I had found an idea that was worth more to them than the cash they would trade for it. I had successfully "sold" them a concept for improving their software. 
 

To sell successfully, you must understand what your customer values

I'd been working with this customer long enough to know what they consider "valuable". In this example, I knew that they cared deeply about their public UI and that the improvements I suggested would significantly simplify it. Making suggestions that are aligned with your customer's values increases your chances of success and avoids coming across as "that guy who is always trying to sell us work we don't need".

Conversely, I know that they have very different values regarding their administrative interfaces. Suggestions to spend money simplifying the admin UI would likely fall on deaf ears; they generally favor more powerful and complex admin features that allow their staff to be more productive.

Key point: spend time understanding why your customers value the features they build. The deeper your understanding of their business model, the personas of their users, and their core strategic values, the better the chance you'll have of spotting sales opportunities. It doesn't feel like "selling" when you're aligned with your customer's needs.
 

To sell successfully, you must be creative

"Selling" in this context means coming up with new and valuable ideas on your own. I find that thinking about my software from the user's (or customer's) perspective often yields valuable results.

In the example above, I was trying to determine how an end user would use a new feature in combination with an existing one, rather than testing it in isolation. Once I put myself in the user's shoes I immediately noticed some issues that the entire team had overlooked, and once I had identified the problem the creative juices kicked in and the solution became very clear.

It also helps to practice wearing your analyst hat. Don't assume that just because the client asks for a specific interface that they considered other (better) options. Clients, especially non-technical ones, lack your sophisticated understanding of software systems. They may have ruled out the best solution because it seemed "too hard", when in fact it might be totally doable. 

Key points: innovation comes from creativity, and creativity comes from considering different viewpoints. Try "thinking like a user" to identify sales opportunities, and look at requirements from multiple angles. Practice those BA skills!
 

To sell successfully, you must have a "customer service" mindset

Let's be clear: I'm not talking about extending contracts to "milk" a budget or convincing customers they need something they don't. There's a big difference between being motivated to serve the client and being motivated to increase billing revenues; in my experience, if you focus on the customer service the revenue will follow.

In fact, being motivated by customer service sometimes means you will identify a less expensive solution then the client asked for and/or approved. Don't be afraid to share those cost-saving ideas! In many cases the customer will just swap in additional "nice-to-haves" rather than reduce the budget, but even if they do reduce the budget you've demonstrated integrity and built trust. More often than not, that customer will come back.

Key point: truly care about the customer. Focus on what's good for them, and you'll benefit as well.
 

In conclusion: always be selling offering

Perhaps a better way to think of this is that "a good consultant is always offering additional value". Done right there's really no selling involved; show your customers ideas that are clearly aligned with their values and worth the development cost and they will often do the rest.

And if they say no, that's cool. Offer something else of value. As long as you're properly motivated you'll find most customers are grateful for the suggestions and come back to you for additional work many times over.

 

Rules for effective data-driven tests

by Seth Petry-Johnson 17. December 2011 21:36

Tests that hit the database are slow, fragile, and difficult to automate. All the cool kids are using mocking and stubbing and in-memory databases to keep their unit tests isolated and fast, and that's awesome. I do that too (though my "cool kid" status is debatable). 

However, there are times when talking to a real database is necessary. Maybe you're testing actual data access logic, or maybe you're writing some high end integration/acceptance tests, or maybe you're just working in an architecture that doesn't let you mock/stub/inject your way to isolated bliss. If any of that sounds familiar, then this post is for you!

Below is a list of strategies and suggestions for effective data testing that I've collected from years of experience testing large, "enterprisey", data-driven applications. Data tests will never be painless, but following these rules makes it suck less.
 

Rules for good data tests

  1. Tests should create their own scenario data; never assume it already exists. Magic row IDs kill kittens!
  2. Make liberal use of data helper and scenario setup classes.
  3. Don't use your data access layer to test your data access layer.
  4. Tests should make no permanent changes to the database - leave no data behind!

Rule 1: Tests should create their own data

One of the worst things you can do in a data test is to assume that some record (a customer, an order, etc) exists that fulfills your scenario requirements. This is a cardinal sin for many reasons:

  1. It's extremely fragile; databases change over time, and tests that rely on pre-existing data often break (causing false-negative test failures).
  2. It obscures the test's purpose. A test's setup communicates the data context in which our assertions are valid. If you omit that setup logic, you make it hard for other programmers to understand the scenario that you are testing.
  3. It's not maintainable; other programmers won't know what makes customer ID 5 appropriate for one test and customer ID 7 appropriate for another. Once a test like this breaks, it tends to stay broken or get deleted.
In other words: relying on pre-existing data means your tests will break often, are painful to maintain when they do break, and don't clearly justify why another program should spend time fixing them.
 
The solution is simple: each test should create each and every test record it will rely on. If that sounds like a lot of work, it can be.... but keep reading to see how to keep it manageable.
 

Rule 2: Liberal use of data helper and scenario setup classes

Setting up the supporting data for a test sucks. It's time consuming and generally results in a lot of duplication, which then reduces test maintainability and readability. Test code is real code and should be kept DRY like anything else!

I've found it useful to create two different types of helper classes to attack this problem:

  • Data helpers are utility classes that expose methods for quickly creating entities in the system. These classes:
    • Are generally static, for convenience.
    • Exist in the test project, not the main data access project.
    • Create a single object (or object graph), such as a Customer or an Order with its OrderItem children.  
    • Create data with meaningful defaults, but allow the important fields to be explicitly specified where needed. (Optional parameters in .NET 4 FTW!)
       
  • Scenario objects (aka "fixtures") represent specific data scenarios that might apply to multiple tests, such as the scenario in which a Customer has placed an Order and one of the items is backordered. These classes:
    • Exist in the test project.
    • Have public properties that identify key data in the scenario (e.g. the Customer ID, Order ID, and backordered Item ID).
    • Are instantiated by a test, at which time the scenario data is created.
In short, data helpers are low-level utilities for creating a specific data record in a specific state, while scenario classes represent larger contexts consisting of multiple entities. I have found that while the time needed to create these objects is not trivial, it quickly pays off as new tests are easier and easier to write. 

Rule 3: Don't use your DAL to test your DAL

Tests for DAL code generally set up some data in the database, invoke the DAL, and then verify that the database was properly modified. I've generally found it difficult to use the primary DAL to quickly and concisely verify those assertions.

In some cases, the primary DAL may not expose a suitable API for doing record-level verification. For example, when there are significant differences between the logical schema exposed through the domain layer and the physical schema of the database, it may be impossible (or at least difficult) to write low-level data assertions.

In other cases, especially early in development, using the DAL to test the DAL creates dependency issues. For instance, many tests involve a sequence of events like "get entity by ID, save changes to it, then verify it was changed". If both the GetById() and Save() methods are currently under development then your test will give you inconclusive results until both methods are implemented. 

In all of these cases I've found it valuable to verify data assertions using a LINQ to SQL data context. This provides a convenient, object-based representation of the data schema that is perfectly suited for verifying row-level operations were performed properly. This data context lives in the test project and is automatically regenerated (using SQLMetal.exe) whenever the schema changes, so it's a close-to-zero-effort solution.

You could also use a micro ORM like Massive, or anything else that makes it quick and easy to interact directly with the database.

 

Rule 4: Tests make no permanent changes to the database

Tests should be run often, and if you follow Rule #1 your tests create a lot of new data when they run. If you don't clean up after them, your test database will quickly grow in size. Also, if you point your test suite at the same database you use to run your app, you'll quickly get tired of seeing that test data accumulate in your views.

The easiest way to prevent this is to wrap each test in a database transaction, and then rollback that transaction at the end of the test. This performs the desired cleanup and also isolates tests running in parallel from interfering with each other's data.

There are a few different ways to approach this. Depending on your needs, check out this or this.

 

Conclusion

None of these techniques are particularly clever or game changing, but when used together they can significantly improve your data tests:

  • When tests create their own scenario data, you don't need to run them against a particular known state. This reduces maintenance costs significantly.
     
  • Investing in data helpers and scenario classes makes it easy to add new tests. The easier it is to write tests, the more likely that developers will actually do it.
     
  • "Close to the metal" abstractions like LINQ to SQL make it easy to write row- and field-level assertions against the database.
     
  • Adding some "auto rollback" behavior to your data tests keeps your database trim and tidy, no matter how many times you run your test suite.
Happy data testing!

Seth Petry-Johnson

I'm a software architect and consultant for Heuristic Solutions.

I value clean code, malleable designs, short feedback cycles, usable interfaces and balance in all things.

I am a Pisces.

Month List