Integrated Evaluation: April 2012

Monday, April 23, 2012

Evaluation Rubrics for Measuring Staff Skills and Behaviors

I've had to create several rubrics for managers to use to measure a set of staff skills and behaviors.

Here's the set up. A library I am working with has defined several outcomes they want to achieve for staff - staff will do such and such behavior related to excellent customer service or staff will know how to do such and such a task related to technology skills. We've decided to measure these outcomes using rubrics that managers will fill out for each staff member (think how the SAT grades writing on the 1 to 6 scale). Because the library is trying to develop new skills and behaviors for staff, the purpose of the rubrics is not to be a performance review for staff to punish or reward them, but instead to identify areas where further training or focus is needed.

Everyone at the library is busy and has too much on their plate as it is. So we wanted to triangulate between a measurement that is quick, painless, and accurate. Rubrics seemed like an interesting way to go.

I did some investigating and started following the standard format that rubrics take. (btw, I found this website to be a great resource for baseline info on rubrics: http://www.carla.umn.edu/assessment/vac/evaluation/p_7.html.) Essentially, you end up with a scoring grid with 2 axis. On the vertical access you have the different categories that behavior is measured against. If the overall outcome has to do with customer service, then the categories might be "attitude, accessibility, accuracy" (brief nod to the alliteration). On the horizontal access, you have the scoring levels. There are often 4 or 6 levels (even-numbers to avoid the tendency to put everyone in the center), for example "exemplary, "superior, very good, fair, needs work".

Looks something like this:

The final step is to fill in each grid in your table with a description of what the outcome would look like, for each category and each level. The problem is, this leaves you with a pretty dense table of text. If you have three categories and four levels, that's 12 paragraphs of text that someone has to read through and take a measurement on.

Now looks something like this:

Not the quick and painless solution we were looking for.

After wrestling with this for a while, here's the solution I came up with. Instead of describing each level for each category, I preface the table with a general description of each level for performance. Then the table describes the ideal performance for each category.

So managers start off reading something like this:

4 – Exemplary. Matches the Ideal perfectly. You would describe every characteristic with words like “always, all, no errors, comprehensive”.
3 – Excellent. A pretty close match to the Ideal, but you can think of a few exceptions. You would describe some characteristics with words like “usually, almost all, very few errors, broad”, even if other characteristics are at a 4 level.
2 – Acceptable. Matches the Ideal in many respects, but there are definitely areas for improvement. You would describe some characteristics with words like “often, many, few errors, somewhat limited”, even if other characteristics are at a 4 or 3 level.
1 – Not there yet. Some matches with the Ideal, but many areas where improvement is needed. You would describe some characteristics with words like “sometimes, some, some errors, limited”.

Then they look at the table of idealized characteristics, and jot down their ranking, which looks something like this:

This nice thing about this is that once they read through that initial description of performance levels, they can fill out any number of rubrics for various outcomes and know exactly what the scoring criteria are, without having to read something new each time. Triangulation of quick, painless, and accurate. Check!

Note: drawings done using http://dabbleboard.com

Wednesday, April 18, 2012

How to analyze interview data

Every time I conduct structured or semi-structured interviews as part of an evaluation, I feel a bit overwhelmed when I open the folder on my computer with the interview transcripts. How do I take all of this text and turn it into something meaningful?

I'm still working out different techniques, but I've been surprised how much I end up getting out of going through the following simple routine.

Step 1. I open a new word document and write out main questions that the interview was meant to answer. If I was looking for anything specific (e.g. a story about a frustrating visitor experience or an idea for how the library could be more user-friendly for seniors) then I'll also write that down.

These become my main section headings for analyzing the data.

Step 2. I read through each interview and copy/paste sections of the interview under the appropriate section heading. I try to do this thoughtfully but not agonizingly. I usually set a timer to keep myself from getting bogged down in hyper-interpretation. I often set the original interview text in italics once it's been copy/pasted once. That way I can skip sections that I don't know what to do with and come back to them later.

Step 3. I read through my categorized document and start shifting quotes around, as needed. Sometimes I'll put a quote in 2 different places, but not often. I've found that if I do that too much, I end up with way too many categories and subcategories. Keep it simple.

Step 4. I take some time away from the analysis. A day or two, if possible.

Step 5. I go through step 3 again.

Step 6. I look over my data and ask myself, "so what?" This is where the fun interpretation stage comes in. Once I get to this point, I've found that I'm familiar enough with the data to really be able to question my assumptions and be intellectually honest about whether my assessment is founded in fact or in preconception.

Thursday, April 5, 2012

Simple evaluation tool - the card sort

One of my favorite evaluation tools in the Card Sort. To understand visitors' perspective on something, hand them a stack of cards, each with a 1-2 word description of the topic you have in mind. Ask visitors to pull out the words that best and least describe the topic (limit them to 3-4; many people will want to pull out 8 or 9 cards). Then follow up and ask them why they chose the cards they did.

When you analyze the data, it's interested to add them the numbers for what cards were chosen the most, and which cards were ignored the most. Add to that some really great qualitative information about why people made their selections. Often, you find that people have different interpretations of the words than you, as the researcher, had.

Here's an example. Let's say you want to know how people perceive the library's current collection of fiction. Put together a list of 10-15 words that could possibly describe the collection - and don't be afraid to include some negative words (good selection, new materials, lots of options, worn out, not relevant to me, etc). The words you choose are important. Keep them simple, so people can process them quickly. But be specific and even a bit daring - that will bring out interesting comments and discussion. Above all, make sure they are relevant to what you want to know about. Test your words with a few people from the organization. Then try the list out on a few patrons before going live with the study.

As a rule of thumb, when you go live, ask as many people to do the card sort as it takes until you feel like answers are getting redundant. If you must have a number, I would recommend 30 people as a minimum.

To analyze results, tally up the total of "best" and "worst" hits each word got. Then compare what people said as their reasons why they chose those words. What patterns or themes do you see? One note: the more accurately you transcribe people's responses to why they chose words, the better your qualitative analysis will be. Resist the temptation to summarize people's statements when collecting the data. Try to write down what they say as close to word-for-word as possible. It's tough, but worth it. You don't want to add your layer of interpretation until all the data is collected.