Beyond Paving the Cow Paths

Some data warehouse designers want to declare warehouse victory after merely replicating the organization’s top five reports. They’re satisfied with this level of deliverable because “that’s what the users asked for.” However, this approach is akin to paving the cow paths. In some communities, the roadways resemble a tangled web because early roads were built on preexisting cow paths. Unfortunately, the cows didn’t meander along straight grid lines. Similarly, merely using the data warehouse to pave reporting “cow paths” doesn’t push the organization beyond what it has today. This is where the analytic life cycle can help.

In “The Promise of Decision Support” (Dec. 5, 2002), we introduced the five-stage analytic life cycle.

  • Stage 1: Publish reports supports standard operational and managerial reporting on the current state of the business.
  • Stage 2: Identify exceptions pinpoints unusual performance situations that warrant further attention.
  • Stage 3: Determine causal factors seeks to understand the causal factors behind the exceptions.
  • Stage 4: Model alternatives synthesizes what’s been learned to build a model for evaluating alternatives and trade-offs.
  • Stage 5: Track actions analyzes the effectiveness of the recommended actions and feeds the results back to the operational and data warehouse systems. We then return back to Stage 1 to report on these results, thereby closing the loop.

To move beyond replicating reports, you can use the analytic life cycle for gathering more in-depth business requirements. It provides a framework to collaborate with users to understand their analytic processes. It forces data warehouse designers to ask the second- and third-level questions, the “hows” and “whys,” to understand how the organization could leverage the data warehouse for analysis.

Begin with Reported Results

Most analyses start with a report, which details business performance metrics. Our challenge is to push beyond into the more detailed analytic requirements.

Let’s walk through a real-world experience — buying a house — in order to understand how the analytics life cycle guides the analytics requirements gathering process. Let’s say that you’ve been transferred to a new city, and you have to find a new house. What sort of process do you use to find that ideal house? You might start with a couple of real estate listings (and the guidance of a knowledgeable real estate agent) and begin asking a lot of questions:

  • What neighborhoods have the best schools?
  • What neighborhoods are closest to my job?
  • What can I afford?

For the data warehouse designer, reporting requirements are the starting point. You need to take the time to identify and understand which reports the business relies on to monitor their performance. However, users can’t possibly look at all the data. You need to take the analysis process to the next level.

Identify Criteria and Threshold Tolerances

When house hunting, you need to limit your search; otherwise you’ll be inundated by all the housing options (especially considering that houses are constantly moving on and off the market). You can reduce the number of housing options by identifying only those properties that meet a certain set of criteria. You’ve now moved into the identify exceptions stage (stage 2). In the housing example, these critical criteria might include:

  • Price range
  • Quality of schools
  • Safety of neighborhood
  • Square footage of the house.

Stage 2 guides the data warehouse designer to look for requirements that focus on identifying the factors and thresholds that identify unusual situations worthy of further analysis. The exception identification factors typically manifest themselves as new facts and dimension attributes.

Understand Cause and Effect

After identifying those factors that you’ll use to scope your search, you need to understand why these drivers are critical to your housing decision. You need to understand the relationship between these driving factors — what makes them important — and the ultimate housing choice. You have now moved into the determine causal factors stage (stage 3). Here you refine your selection criteria, being more detailed in their definition and their corresponding acceptance criteria, such as:

  • School ranking in the top five in the city over the past year (because you have three school-age children)
  • Minimum of 3,200 square feet with four bedrooms and two bathrooms
  • One-half acre of a usable, mostly flat lot (room to play catch with the kids)
  • No more than a 30-minute drive to work (you don’t want to spend more than five hours a week driving to work)
  • No more than a 20-minute drive to downtown shopping
  • In the price range of $350,000 to $400,000 (because you’re not rich).

During Stage 3, the data warehouse designer focuses on understanding why these variables are important, how they interrelate with each other, and how they’ll be used in making the final decision. The results of this phase typically result in even more detailed dimension tables, new data sources (typically third-party or nonelectronic causal data), and statistical routines to quantify the cause and effect of the relationships.

Evaluate the Options

After doing all the research and house tours, you can now create some sort of model to help you with the inevitable trade-offs in your final housing decision. You have now moved into the model alternatives stage (stage 4).

Models can be quite advanced statistical or spreadsheet algorithms or simple heuristics, rules of thumbs, or gut feeling. Whatever type of model used, its basic purpose is to provide a framework against which these different trade-off decisions can be evaluated. The model doesn’t make the simple decision mundane, but helps make the seemingly impossible decision manageable.

You can employ your housing “model” to help you with the following types of housing trade-off decisions, perhaps using weighted averages in a spreadsheet to make the decision more quantitative vs. entirely qualitative:

  • Price of the house vs. the average neighboring prices
  • Price per square foot of the house vs. the neighborhood average
  • Price of the house vs. ranked quality of the school
  • Ranked quality of the school vs. number of minutes to work
  • Number of bedrooms vs. extra rooms (dens or sun rooms)
  • Square footage of the house vs. usability of the lot.

For the data warehouse designer, the analytics requirements gathering process focuses on the “model” that will be used in evaluating the different decision alternatives. This includes the metrics that will drive the ultimate decision (independent variables) and their relationships to the ultimate decision (dependent variable).

Track Actions for Future Optimization

And finally, once a decision has been made, you need to track the effectiveness of that decision in order to fine-tune the future decision process. That’s the goal of the track actions stage (stage 5).

This stage is often skipped in the analytics process. Few people or organizations seem willing to spend the time to examine the effectiveness of their decisions. In our housing example, the same probably holds true. I’m not sure how many folks really consciously examine the effectiveness of their decision — until it comes time to sell their house. Then you quickly learn if the general marketplace values the factors that you valued.

  • Did I get the price appreciation that other neighborhoods got?
  • Was the quality of school what I thought it would be?
  • Did I have the access to work that I thought I would have?

For the data warehouse designer, the analytics requirements gathering process needs to capture the decision or actions taken, ideally in the data warehouse. With this information captured, the business user can see if an action had the desired impact upon the key driving business metrics (such as revenue, share, profitability, or customer satisfaction).

As you can see, reporting is typically the starting point for the analysis, but it isn’t the end-state goal. Only when an organization is able to move beyond just the reporting do you start to see the business return associated with making better decisions.