Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 1 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
Abstract and Keywords
This article provides a summary of the literature's suggestions on survey design research.
In doing so, it points researchers toward question formats that appear to yield the highest
measurement reliability and validity. Using the American National Election Studies as a
starting point, it shows the general principles of good questionnaire design, desirable
choices to make when designing new questions, biases in some question formats and
ways to avoid them, and strategies for reporting survey results. Finally, it offers a
discussion of strategies for measuring voter turnout in particular, as a case study that
poses special challenges. Scholars designing their own surveys should not presume that
previously written questions are the best ones to use. Applying best practices in
questionnaire design will yield more accurate data and more accurate substantive
findings about the nature and origins of mass political behavior.
Keywords: survey questionnaire design, political science, American National Election Studies, optimization,
question formats, voter turnout
Optimizing Survey Questionnaire Design in Political
Science: Insights from Psychology
Josh Pasek and Jon A. Krosnick
The Oxford Handbook of American Elections and Political Behavior
Edited by Jan E. Leighley
Print Publication Date: Feb 2010 Subject: Political Science, U.S. Politics, Political Behavior
Online Publication Date: May 2010 DOI: 10.1093/oxfordhb/9780199235476.003.0003
Oxford Handbooks Online
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 2 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
QUESTIONNAIRES have long been a primary means of gathering data on political beliefs,
attitudes, and behavior (F. H. Allport 1940; G. W. Allport 1929; Campbell et al. 1060; Dahl
1961; Lazarsfeld and Rosenberg 1949–1950; Merriam 1926; Woodward and Roper 1950).
Many of the most frequently studied and important measurements made to understand
mass political action have been done with questions in the American National Election
Studies (ANES) surveys and other such data collection enterprises. Although in
principle, it might seem desirable to observe political behavior directly rather than
relying on people's descriptions of it, questionnaire‐based measurement offers
tremendous efficiencies and conveniences for researchers over direct observational
efforts. Furthermore, many of the most important explanatory variables thought to drive
political behavior are subjective phenomena that can only be measured via people's
descriptions of their own thoughts. Internal political efficacy, political party identification,
attitudes toward social groups, trust in government, preferences among government
policy options on specific issues, presidential approval, and many more such variables
reside in citizens' heads, so we must seek their help by asking them to describe those
constructs for us.
A quick glance at ANES questionnaires might lead an observer to think that the design of
self‐report questions need follow no rules governing item format, because formats have
differed tremendously from item to item. Thus, it might appear that just about any
question format is as effective as any other format for producing valid and reliable
measurements. But in fact, this is not true. Nearly a century's worth of survey design
research suggests that some question formats are optimal, whereas others are
suboptimal.
In this chapter, we offer a summary of this literature's suggestions. In doing so, we point
researchers toward question formats that appear to yield the highest measurement
reliability and validity. Using the American National Election Studies as a starting point,
the chapter illuminates general principles of good questionnaire design, desirable choices
to make when designing new questions, biases in some question formats and ways to
avoid them, and strategies for reporting survey results. Finally, the chapter offers a
discussion of strategies for measuring voter turnout in particular, as a case study that
poses special challenges. We hope that the tools we present will help scholars to design
effective questionnaires and utilize self‐reports so that the data gathered are useful and
the conclusions drawn are justified.
The Questions We have Asked
Many hundreds of questions have been asked of respondents in the ANES surveys,
usually more than an hour's worth in one sitting, either before or after a national election.
Many of these items asked respondents to place themselves on rating scales, but the
(p. 28)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 3 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
length of these scales varies considerably. For example, some have 101 points, such as
the feeling thermometers:
Feeling Thermometer. I'd like to get your feelings toward some of our political
leaders and other people who are in the news these days. I'll read the name of a
person and I'd like you to rate that person using something we call the
feeling thermometer. The feeling thermometer can rate people from 0 to 100
degrees. Ratings between 50 degrees and 100 degrees mean that you feel
favorable and warm toward the person. Ratings between 0 degrees and 50
degrees mean that you don't feel favorable toward the person. Rating the person
at the midpoint, the 50 degree mark, means you don't feel particularly warm or
cold toward the person. If we come to a person whose name you don't recognize,
you don't need to rate that person. Just tell me and we'll move on to the next one.
Other ratings scales have offered just seven points, such as the ideology question:
Liberal–conservative Ideology. We hear a lot of talk these days about liberals and
conservatives. When it comes to politics, do you usually think of yourself as
extremely liberal, liberal, slightly liberal; moderate or middle of the road, slightly
conservative, conservative, extremely conservative, or haven't you thought much
about this?
Still others have just five points:
Attention to Local News about the Campaign. How much attention do you pay to
news on local news shows about the campaign for President—a great deal, quite a
bit, some, very little, or none?
Or three points:
Interest in the Campaigns. Some people don't pay much attention to political
campaigns. How about you? Would you say that you have been very much
interested, somewhat interested or not much interested in the political campaigns
so far this year?
Or just two:
Internal efficacy. Please tell me how much you agree or disagree with these
statements about the government: “Sometimes politics and government seem so
complicated that a person like me can't really understand what's going on.”
Whereas the internal efficacy measure above offers generic response choices (“agree”
and “disagree”), which could be used to measure a wide array of constructs, other items
offer construct‐specific response alternatives (meaning that the construct being
measured is explicitly mentioned in each answer choice), such as:
(p. 29)
1
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 4 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
Issue Importance. How important is this issue to you personally? Not at all
important, not too important, somewhat important, very important, or extremely
important? (ANES 2004)
Some rating scales have had verbal labels and no numbers on all the points, as in the
above measure of issue importance, whereas other rating scales have numbered points
with verbal labels on just a few, as in this case:
Defense Spending. Some people believe that we should spend much less money
for defense. Suppose these people are at one end of the scale, at point number 1.
Others feel that defense spending should be greatly increased. Suppose these
people are at the other end, at point 7. And, of course, some other people have
opinions somewhere in between at points 2, 3, 4, 5 or 6. Where would you place
yourself on this scale, or haven't you thought much about this?
In contrast to all of the above closed‐ended questions, some other questions are asked in
open‐ended formats:
Candidate Likes–dislikes. Is there anything in particular about Vice President Al
Gore that might make you want to vote for him?
Most Important Problems. What do you think are the most important problems
facing this country?
Political Knowledge. Now we have a set of questions concerning various public
figures. We want to see how much information about them gets out to the public
from television, newspapers and the like. What job or political office does Dick
Cheney now hold?
Some questions offered respondents opportunities to say they did not have an opinion on
an issue, as in the ideology question above (“or haven't you thought much about this?”).
But many questions measuring similar constructs did not offer that option, such as:
U.S. Strength in the World. Turning to some other types of issues facing the
country. During the past year, would you say that the United States' position in the
world has grown weaker, stayed about the same, or has it grown stronger?
Variations in question design are not, in themselves, problematic. Indeed, one cannot
expect to gather meaningful data on a variety of issues simply by altering a single word in
a “perfect,” generic question. To that end, some design decisions in the ANES represent
the conscious choices of researchers based on pre‐testing and the literature on best
practices in questionnaire design. In many cases, however, differences between question
wordings are due instead to the intuitions and expectations of researchers, a desire to
retain consistent questions for time‐series analyses, or researchers preferring the ease of
using an existent question rather than designing and pre‐testing a novel one.
(p. 30)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 5 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
All of these motivations are understandable, but there may be a better way to go about
questionnaire design to yield better questions. Poorly designed questions can produce (1)
momentary confusion among respondents, (2) more widespread frustration, and (3)
compromises in reliability, or (4) systematic biases in measurement or analysis results.
Designing optimal measurement tools in surveys sometimes requires expenditure of more
resources (by asking longer questions or more questions to measure a single construct),
but many measurements can be made optimal simply by changing wording without
increasing a researcher's costs. Doing so requires understanding the principles of optimal
design, which we review next.
Basic Design Principles
Good questionnaires are easy to administer, yield reliable data, and accurately measure
the constructs for which the survey was designed. When rapid administration and
acquiring reliable data conflict, however, we lean toward placing priority on acquiring
accurate data. An important way to enhance measurement accuracy is to ask questions
that respondents can easily interpret and answer and that are interpreted similarly by
different respondents. It is also important to ask questions in ways that motivate
respondents to provide accurate answers instead of answering sloppily or intentionally
inaccurately. How can we maximize respondent motivation to provide accurate self‐
reports while minimizing the difficulty of doing so? Two general principles underlie most
of the challenges that researchers face in this regard. They involve (1) understanding the
distinction between “optimizing” and “satisficing,” and (2) accounting for the
conversational norms and conventions that shape the survey response process. We
describe these theoretical perspectives next.
Optimizing and Satisficing
Imagine the ideal survey respondent, whom we'll call an optimizer. Such an individual
goes through four stages in answering each survey question (though not necessarily
strictly sequentially). First, the optimizer reads or listens to the question and attempts to
discern the question's intent (e.g., “the researcher wants to know how often I watch
television programs about politics”). Second, the optimizer searches his or her memory
for information useful to answer the question (e.g., “I guess I usually watch television
news on Monday and Wednesday nights for about an hour at a time, and there's almost
always some political news covered”). Third, the optimizer evaluates the available
information and integrates that information into a summary judgment (e.g., “I watch two
hours of television about politics per week”). Finally, the optimizer answers the question
by translating the summary judgment onto the response alternatives (e.g. by choosing
(p. 31)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 6 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
“between 1 and 4 hours per week”) (Cannell et al. 1981; Krosnick 1991; Schwarz and
Strack 1985; Tourangeau and Rasinski 1988; Turner and Martin 1984).
Given the substantial effort required to execute all the steps of optimizing when
answering every question in a long questionnaire, it is easy to imagine that not every
respondent implements all of the steps fully for every question (Krosnick 1999; Krosnick
and Fabrigar in press). Indeed, more and more research indicates that some individuals
sometimes answer questions using only the most readily available information, or, worse,
look for cues in the question that point toward easy‐to‐select answers and choose them so
as to do as little thinking as possible (Krosnick 1999). The act of abridging the
search for information or skipping it altogether is termed “survey satisficing” and
appears to pose a major challenge to researchers (Krosnick 1991, 1999). When
respondents satisfice, they give researchers answers that are at best loosely related to
the construct of interest and may sometimes be completely unrelated to it.
Research on survey satisficing has revealed a consistent pattern of who satisfices and
when. Respondents are likely to satisfice when the task of answering a particular
question optimally is difficult, when the respondent lacks the skills needed to answer
optimally, or when he or she is unmotivated (Krosnick 1991; Krosnick and Alwin 1987).
Hence, satisficers are individuals who have limited cognitive skills, fail to see sufficient
value in a survey, find a question confusing, or have simply been worn down by a barrage
of preceding questions (Krosnick 1999; Krosnick, Narayan, and Smith 1996; McClendon
1986, 1991; Narayan and Krosnick 1996). These individuals tend to be less educated and
are lower in “need for cognition” than non‐satisficers (Anand, Krosnick, Mulligan, Smith,
Green, and Bizer 2005; Narayan and Krosnick 1996). Importantly, they do not represent a
random subset of the population, and they tend to satisfice in systematic, rather than
stochastic, ways. Hence, to ignore satisficers is to introduce potentially problematic bias
in survey results.
No research has yet identified a surefire way to prevent respondents from satisficing, but
a number of techniques for designing questions and putting them together into
questionnaires seem to reduce the extent to which respondents satisfice (Krosnick 1999).
Questions, therefore, should be designed to minimize the incentives to satisfice and
maximize the efficiency of the survey for optimizers.
Conversational Norms and Conventions
In most interpersonal interactions, participants expect a conversant to follow certain
conversational standards. When people violate these conversational norms and rules,
confusion and misunderstandings often ensue. From this perspective, a variety of
researchers have attempted to identify the expectations that conversants bring to
conversations, so any potentially misleading expectations can be overcome. In his seminal
work Logic and Conversation, Grice (1975) proposed a set of rules that speakers usually
follow and listeners usually assume that speakers follow: that they should be truthful,
(p. 32)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 7 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
meaningfully informative, relevant, and to the point. This perspective highlights a critical
point that survey researchers often ignore: respondents enter all conversations with
expectations, and when researchers violate those expectations (which they often do
unwittingly), measurement accuracy can be compromised (Lipari 2000; Schuman and
Ludwig 1983; Schwarz 1996).
Krosnick, Li, and Lehman (1990) illustrated the impact of conversational norms. They
found the order in which information was presented in a survey question could
substantially change how respondents answered. In everyday conversations, when people
list a series of pieces of information leading to a conclusion, they tend to present that
whey think of as the most important information last. When Krosnick et al.'s respondents
were given information and were asked to make decisions with that information, the
respondents placed more weight on the information that was presented last because they
presumed the questioner ascribed most importance to that information. In another study,
Holbrook et al. (2000) presented response options to survey questions in either a normal
(“are you for or against X?”) or unusual (“are you against or for X?”) order. Respondents
whose question used the normal ordering were quicker to respond to the questions and
answered more validly. Thus, breaking rules of conversation manipulates and
compromises the quality of answers.
Implications
Taken together, these theoretical perspectives suggest that survey designers should
follow three basic rules. Surveys should:
(1) be designed to make questions as easy as possible for optimizers to answer,
(2) take steps to discourage satisficing, and
(3) be sure not to violate conversational conventions without explicitly saying so, to
avoid confusion and misunderstandings.
The specifics of how to accomplish these three goals are not always obvious. Cognitive
pre‐testing (which involves having respondents restate questions in their own words and
think aloud while answering questions, to highlight misunderstandings that need to be
prevented) is always a good idea (Willis 2004), but many of the specific decisions that
researchers must make when designing questions can be guided by the findings of past
studies on survey methodology. The literature in these areas, reviewed below, provides
useful and frequently counter‐intuitive advice.
(p. 33)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 8 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
Designing Optimal Survey Questions
Open‐ended Questions or Closed Questions?
In the 1930s and 1940s, when modern survey research was born, a debate emerged as to
whether researchers should ask open‐ended questions or should ask respondents
to select among a set of offered response choices (J. M. Converse 1984). Each method had
apparent benefits. Open‐ended questions could capture the sentiments of individuals on
an issue with tones of nuance and without the possibility that offered answer choices
colored respondent selections. Closed questions seemed easier to administer and to
analyze, and more of them could be asked in a similar amount of time (Lazarsfeld 1944).
Perhaps more out of convenience than merit, closed questions eclipsed open‐ended ones
in contemporary survey research. For example, in surveys done by major news media
outlets, open‐ended questions constituted a high of 33 percent of questions in 1936 and
dropped to 8 percent of questions by 1972 (T. Smith 1987).
The administrative ease of closed questions, however, comes with a distinct cost.
Respondents tend to select among offered answer choices rather than selecting “other,
specify” when the latter would be optimal to answer a question with nominal response
options (Belson and Duncan 1962; Bishop et al. 1988; Lindzey and Guest 1951;
Oppenheim 1966; Presser 1990b). If every potential option is offered by such a question,
then this concern is irrelevant. For most questions, however, offering every possible
answer choice is not practical. And when some options are omitted, respondents who
would have selected them choose among the offered options instead, thereby changing
the distribution of responses as compared to what would have been obtained if a
complete list had been offered (Belson and Duncan 1962). Therefore, an open‐ended
format would be preferable in this sort of situation.
Open‐ended questions also discourage satisficing. When respondents are given a closed
question, they might settle for choosing an appropriate‐sounding answer. But open‐ended
questions demand that individuals generate an answer on their own and do not point
respondents toward any particular response, thus inspiring more thought and
consideration. Furthermore, many closed questions require respondents to answer an
open‐ended question in their minds first (e.g., “what is the most important problem facing
the country?”) and then to select the answer choice that best matches that answer.
Skipping the latter, matching step will make the respondent's task easier and thereby
encourage optimizing when answering this and subsequent questions.
Closed questions can also present particular problems when seeking numbers. Schwarz
et al. (1985) manipulated response alternatives for a question gauging amount of
television watching and found striking effects. When “more than 2½ hours” was the
highest category offered, only 16 percent of individuals reported watching that much
(p. 34)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 9 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
television. But when five response categories broke up “more than 2½ hours” into five
sub‐ranges, nearly 40 percent of respondents placed themselves in one of those
categories. This appears to occur because whatever range is in the middle of the set of
offered ranges is perceived to be typical or normal by respondents, and this implicit
message sent by the response alternatives alters people's reports (Schwarz 1995).
Open‐ended questions seeking numbers do not suffer from this potential problem.
The higher validity of open‐ended questions does not mean that every question should be
open‐ended. Open‐ended questions take longer to answer (Wason 1961) and must be
systematically coded by researchers. When the full spectrum of possible nomial responses
is known, closed questions are an especially appealing approach. But when the full
spectrum of answers is not know, or when a numeric quantity is sought (e.g., “during the
last month, how many times did you talk to someone about the election?”), open‐ended
questions are preferable. Before asking a closed question seeking nominal answers,
however, researchers should pre‐test an open‐ended version of the question on the
population of interest, to be sure the offered list of response alternatives is
comprehensive.
Rating Questions or Ranking Questions?
Rating scale questions are very common in surveys (e.g., the “feeling thermometer” and
“issue importance” questions above). Such questions are useful because they place
respondents on the continua of interest to researchers and are readily suited to statistical
analysis. Furthermore, rating multiple items of a given type can permit comparisons of
evaluations across items (McIntyre and Ryans 1977; Moore 1975; Munson and McIntyre
1979).
Researchers are sometimes interested in obtaining a rank order of objects from
respondents (e.g., rank these candidates from most desirable to least desirable). In such
situations, asking respondents to rank‐order the objects is an obvious measurement
option, but it is quite time‐consuming (Munson and McIntyre 1979). Therefore, it is
tempting to ask respondents instead to rate the objects individually and to derive a rank
order from the ratings.
Unfortunately, though, rating questions sometimes entail a major challenge: when asked
to rate a set of objects on the same scale, respondents sometimes fail to differentiate
their ratings, thus clouding analytic results (McCarty and Shrum 2000). This appears to
occur because some respondents choose to satisfice by non‐differentiating: drawing a
straight line down the battery of rating questions. For example, in one study with thirteen
rating scales, 42 percent of individuals evaluated nine or more of the objects identically
(Krosnick and Alwin 1988). And such non‐differentiation is most likely to occur under the
conditions that foster satisficing (see Krosnick 1999).
(p. 35)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 10 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
If giving objects identical ratings is appropriate, rating scales would be desirable. But
when researchers are interested in understanding how respondents rank‐order objects
when forced to do so, satisficing‐induced non‐differentiation in ratings yields misleading
data (Alwin and Krosnick 1985). Fortunately, respondents can be asked to rank
objects instead. Although ranking questions take more time, rankings acquire responses
that are less distorted by satisficing and are more reliable and valid than ratings (Alwin
and Krosnick 1985; Krosnick and Alwin 1988; Miethe 1985; Reynolds and Jolly 1980).
Thus, ranking questions are preferable for assessing rank orders of objects.
Rating Scale Points
Although the “feeling thermometer” measure has been used in numerous American
National Election Study surveys, it has obvious drawbacks: the meanings of the many
scale points are not clear and uniformly interpreted by respondents. Only nine of the
points have been labeled with words on the show‐card handed to respondents, and a huge
proportion of respondents choose one of those nine points (Weisberg and Miller 1979).
And subjective differences in interpreting response alternatives may mean that one
person's 80 is equivalent to another's 65 (Wilcox, Sigelman, and Cook 1989). Therefore,
this very long and ambiguous rating scale introduces considerable error into analysis.
Although 101 points is far too many for a meaningful scale, providing only two or three
response choices for a rating scale can make it impossible for respondents to provide
evaluations at a sufficiently refined level to communicate their perceptions (Alwin 1992).
Too few response alternatives can provide a particular challenge for optimizers who
attempt to map complex opinions onto limited answer choices. A large body of research
has gone into assessing the most effective number of options to offer respondents (Alwin
1992; Alwin and Krosnick 1985; Cox 1980; Lissitz and Green 1975; Lodge and Tursky
1979; Matell and Jacoby 1972; Ramsay 1973; Schuman and Presser 1981). Ratings tend
to be more reliable and valid when five points are offered for unipolar dimensions (e.g.,
“not at all important” to “extremely important”; Lissitz and Green 1975) and seven points
for bipolar dimensions (e.g., “Dislike a great deal” to “like a great deal” Green and Rao
1970).
Another drawback of the “feeling thermometer” scale is its numerical scale point labels.
Labels are meant to improve respondent interpretation of scale points, but the meanings
of most of the numerically labeled scale points are unclear. It is therefore preferable to
put verbal labels on all rating scale points to clarify their intended meanings, which
increases the reliability and validity of ratings (Krosnick and Berent 1993). Providing
numeric labels in addition to the verbal labels increases respondents' cognitive burden
but does not increase data quality and in fact can mislead respondents about the intended
meanings of the scale points (e.g., Schwarz et al. 1991). Verbal labels with meanings that
are not equally spaced from one another can cause respondent confusion (Klockars and
(p. 36)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 11 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
Yamagishi 1988), so the selected verbal labels should have equally spaced meanings
(Hofmans et al. 2007; Schwarz, Grayson, and Knauper 1998; Wallsten et al. 1986).
“Don't Know” Options and Attitude Strength
Although some questionnaire designers advise that opinion questions offer respondents
the opportunity to say they do not have an opinion at all (e.g., Vaillancourt 1973), others
do not advise including “don't know” or “no opinion” response options (Krosnick et al.
2002). And most major survey research firms have routinely trained their interviewers to
probe respondents when they say “don't know” to encourage them to offer a substantive
answer instead. The former advice is sometimes justified by claims that respondents may
sometimes be unfamiliar with the issue in question or may not have enough information
about it to form a legitimate opinion (e.g., P. Converse 1964). Other supportive evidence
has shown that people sometimes offer opinions about extremely obscure or fictitious
issues, thus suggesting that they are manufacturing non‐attitudes instead of confessing
ignorance (e.g., Bishop, Tuchfarber, and Oldendick 1986; Hawkins and Coney 1981;
Schwarz 1996).
In contrast, advice to avoid offering “don't know” options is justified by the notion that
such options can encourage satisficing (Krosnick 1991). Consistent with this argument,
when answering political knowledge quiz questions, respondents who are encouraged to
guess after initially saying “don't know” tend to give the correct answer at better‐than‐
chance rates (Mondak and Davis 2001). Similarly, candidate preferences predict actual
votes better when researchers discourage “don't know” responses (Krosnick et al. 2002;
Visser et al. 2000). Thus, discouraging “don't know” responses collects more valid data
than does encouraging such responses. And respondents who truly are completely
unfamiliar with the topic of a question will say so when probed, and that answer can be
accepted at that time, thus avoiding collecting measurements of non‐existent “opinions.”
Thus, because many people who initially say “don't know” do indeed have a substantive
opinion, researchers are best served by discouraging these responses in surveys.
Converse (1964) did have an important insight, though. Not all people who express an
opinion hold that view equally strongly, based upon equal amounts of information and
thought. Instead, attitudes vary in their strength. A strong attitude is very difficult to
change and has powerful impact on a person's thinking and action. A weak attitude is
easy to change and has little impact on anything. To understand the role that attitudes
play in governing a person's political behavior, it is valuable to understand the strength of
those attitudes. Offering a “don't know” option is not a good way to identify weak
attitudes. Instead, it is best to ask follow‐up questions intended to diagnose the strength
of an opinion after it has been reported (see Krosnick and Abelson 1992).
(p. 38)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 12 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
Acquiescence Response Bias
In everyday conversations, norms of social conduct dictate that people should strive to be
agreeable (Brown and Levinson 1987). In surveys when researchers ask questions, they
mean to invite all possible responses, even when asking respondents whether they agree
or disagree with a statement offered by a question. “Likert scales” is the label often used
to describe the agree–disagree scales that are used in many surveys these days. Such
scales are appreciated by both designers and respondents because they speed up the
interview process. Unfortunately, though, respondents are biased toward agreement.
Some 10–20 percent of respondents tend to agree with both a statement and its opposite
(e.g., Schuman and Presser 1981). This tendency toward agreeing is known as
acquiescence response bias and may occur for a variety of reasons. First, conversational
conventions dictate that people should be agreeable and polite (Bass 1956; Campbell et
al. 1960). Second, people tend to defer to individuals of higher authority (a position they
assume the researcher holds) (Carr 1971; Lenski and Leggett 1960). Additionally, a
person inclined to satisfice is more likely to agree with a statement than to disagree
(Krosnick 1991).
Whatever the cause, acquiescence presents a major challenge for researchers. Consider,
for example, the ANES question measuring internal efficacy. If certain respondents are
more likely to agree with any statement regardless of its content, then these individuals
will appear to believe that government and politics are too complicated to understand,
even if that is not their view. And any correlations between this question and other
questions could be due to associations with the individual's actual internal efficacy or his
or her tendency to acquiesce (Wright 1975).
Agree–disagree rating scales are extremely popular in social science research, and
researchers rarely take steps to minimize the impact of acquiescence on research
findings. One such step is to balance batteries of questions, such that affirmative answers
indicate a high level of the construct for half the items and a low level of the construct for
the other half, thus placing acquiescers at the midpoint of the final score's continuum
(Bass 1956; Cloud and Vaughan 1970). Unfortunately, this approach simply moves
acquiescers from the agree of a rating scale (where they don't necessarily belong) to the
midpoint of the final score's continuum (where they also don't necessarily belong).
A more effective solution becomes apparent when we recognize first that answering an
agree–disagree question is more cognitively demanding than answering a question that
offers construct‐specific response alternatives. This is so because in order to answer most
agree–disagree questions (e.g., “Sometimes politics is so complicated that I can't
understand it”), the respondent must answer a construct‐specific version of it in his or
her own mind (“How often is politics so complicated that I can't understand it?”) and then
translate the answer onto the agree–disagree response continuum. And in this translation
process, a person might produce an answer that maps onto the underlying construct in a
(p. 39)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 13 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
way the researcher would not anticipate. For example, a person might disagree with the
statement, “Sometimes politics is so complicated that I can't understand it,” either
because politics is never that complicated or because politics is always that complicated.
Thus, the agree–disagree continuum would not be monotonically related to the construct
of interest. For all these reasons, it is preferable simply to ask questions with construct‐
specific response alternatives.
Yes/No questions and True/False questions are also subject to acquiescence response bias
(Fritzley and Lee 2003; Schuman and Presser 1981). In these cases, a simple fix involves
changing the question so that it explicitly offers all possible views. For example, instead
of asking “Do you think abortion should be legal?” one can ask “Do you think abortion
should or should not be legal?”
Response Order Effects
Another form of satisficing is choosing the first plausible response option one considers,
which produces what are called response order effects (Krosnick 1991, 1999; Krosnick
and Alwin 1987). Two types of response order effects are primacy effects and recency
effects. Primacy effects occur when respondents are inclined to select response options
presented near the beginning of a list (Belson 1966). Recency effects occur when
respondents are inclined to select options presented at the end of a list (Kalton, Collins,
and Brook 1978). When categorical (non‐rating scale) response options are presented
visually, primacy effects predominate. When categorical response options are presented
orally, recency effects predominate. When rating scales are presented, primacy effects
predominate in both the visual and oral modes. Response order effects are most likely to
occur under the conditions that foster satisficing (Holbrook et al. 2007).
One type of question that can minimize response order effects is the seemingly open‐
ended question (SOEQ). SOEQs separate the question from the response alternatives
with a short pause to encourage individuals to optimize. Instead of asking, “If the election
were held today, would you vote for Candidate A or Candidate B?,” response order effects
can be reduced by asking, “If the election were held today, whom would you vote
for? Would you vote for Candidate A or Candidate B?” The pause after the question and
before the answer choices encourages respondents to contemplate, as if when answering
an open‐ended question, and then offers the list of the possible answers to respondents
(Holbrook et al. 2007). By rotating response order or using SOEQs, researchers can
prevent the order of the response options from coloring results.
Response order effects do not only happen in surveys. They occur in elections as well. In
a series of natural experiments, Brook and Upton (1974), Krosnick and Miller (1998),
Koppell and Steen (2004), and others found consistent patterns indicating that a few
voters choose the first name on the ballot, giving that candidate an advantage of about 3
percent on average. Some elections are decided by less than 3 percent of the vote, so
name order can alter an election outcome. When telephone survey questions mirror the
(p. 40)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 14 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
name order on the ballot, those surveys are likely to manifest a recency effect, which
would run in the direction opposite to what would be expected in the voting booth, thus
creating error in predicting the election outcome. Many survey firms rotate candidate
name order to control for potential effects, but this will maximize forecast accuracy only
in states, such as Ohio, that rotate candidate name order in voting booths.
Question Order Effects
In 1948, a survey asked Americans whether Communist news reporters should be allowed
in the United States and found that the majority (63 percent) said “no.” Yet in another
survey, an identical question found 73 percent of Americans believed that Communist
reporters should be allowed. This discrepancy turned out to be attributable to the impact
of the question that preceded the target question in the latter survey. In an experiment, a
majority of Americans said “yes” only when the item immediately followed a question
about whether American reporters should be allowed in Russia. Wanting to appear
consistent and attuned to the norm of even‐handedness after hearing the initial question,
respondents were more willing to allow Communist reporters into the US (Schuman and
Presser 1981).
A variety of other types of question order effects have been identified. Subtraction occurs
when two nested concepts are presented in sequence (e.g., George W. Bush and the
Republican Party) as items for evaluation. When a question about the Republican Party
follows a question about George W. Bush, respondents assume that the questioner does
not want them to include their opinion of Bush in their evaluations of the GOP (Schuman,
Presser, and Ludwig 1981). Perceptual contrast occurs when one rating follows another,
and the second rating is made in contrast to the first one. For example,
respondents who dislike George Bush may be inclined to offer a more favorable rating of
John McCain when a question about McCain follows Bush than when the question about
McCain is asked first (Schwarz and Bless 1992; Schwarz and Strack 1991). And priming
occurs when questions earlier in the survey increase the salience of certain attitudes or
beliefs in the mind of the respondent (e.g., preceding questions about abortion may make
respondents more likely to evaluate George W. Bush based on his abortion views) (Kalton
et al. 1978). Also, asking questions later in a long survey enhances the likelihood that
respondents will satisfice (Krosnick 1999).
Unfortunately, it is impossible to prevent question order effects. Rotating the order of
questions across respondents might seem sensible, but doing so may cause topics of
questions that seem to jump around in ways that don't seem obviously sensible and tax
respondents' memories (Silver and Krosnick 1991). And rotating question order will not
make question order effects disappear. Therefore, the best researchers can do is to use
past research on question order effects as a basis for being attentive to possible question
order effects in a new questionnaire.
(p. 41)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 15 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
Attitude Recall
It would be very helpful to researchers if respondents could remember the opinions they
held at various times in the past and describe them accurately in surveys. Unfortunately,
this is rarely true. People usually have no recollection of how they thought about things at
previous times. When asked, they will happily guess, and their guesses are strongly
biased—people tend to assume they always believed what they believe today (Goethals
and Reckman 1973; Roberts 1985). Consequently, attitude recall questions can produce
wildly inaccurate results (T. Smith 1984). Because of the enormous amount of error
associated with these questions, they cannot be used for statistical analyses. Instead,
attitude change must be assessed prospectively. Only by measuring attitudes at multiple
time points is it possible to gain an accurate understanding of attitude change.
The Danger of Asking “Why?”
Social science spends much of its time determining causality. Instead of running dozens
of statistical studies and spending millions of dollars, it might seem much more efficient
simply to ask people to describe the reasons for their thoughts and actions (Lazarsfeld
1935). Unfortunately, respondents rarely know why they think and act as they do (Nisbett
and Wilson 1977; E. R. Smith and Miller 1978; Wilson and Dunn 2004; Wilson and Nisbett
1978). People are willing to guess when asked, but their guesses are rarely
informed by any genuine self‐insight and are usually no more accurate than would be
guesses about why someone else thought or acted as they did. Consequently, it is best not
to ask people to explain why they think or act in particular ways.
Social Desirability
Some observers of questionnaire data are skeptical of their value because they suspect
that respondents may sometimes intentionally lie in order to appear more socially
admirable, thus manifesting what is called social desirability response bias. Many
observers have attributed discrepancies between survey reports of voter turnout and
official government turnout figures to intentional lying by survey respondents (Belli,
Traugott, and Beckmann 2001; Silver, Anderson, and Abramson 1986). Rather than
appearing not to fulfill their civic duty, some respondents who did not vote in an election
are thought to claim that they did so. Similar claims have been made about reports of
illegal drug use and racial stereotyping (Evans, Hansen, and Mittelmark 1977; Sigall and
Page 1971).
A range of techniques have been developed to assess the scope of social desirability
effects and to reduce the likelihood that people's answers are distorted by social norms.
These methods either assure respondents that their answers will be kept confidential or
seek to convince respondents that the researcher can detect lies—making it pointless not
(p. 42)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 16 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
to tell the truth (see Krosnick 1999). Interestingly, although these techniques have often
revealed evidence of social desirability response bias, the amount of distortion is
generally small. Even for voting, where social desirability initially seemed likely to occur,
researchers have sometimes found lower voting rates when people can report secretly
but no large universal effect (Abelson, Loftus, and Greenwald 1992; Duff et al. 2007;
Holbrook and Krosnick in press; Presser 1990a). Even after eliminating for social
desirability response bias, surveyed turnout rates are above those reported in
government records.
A number of other errors are likely to contribute to overestimation of voter turnout. First,
official turnout records contain errors, and those errors are more likely to be omissions of
individuals who did vote than inclusions of individuals who did not vote (Presser,
Traugott, and Traugott 1990). Second, many individuals who could have voted but did not
fall outside of survey sampling frames (Clausen 1968–1969; McDonald 2003; McDonald
and Popkin 2001). Third, individuals who choose not to participate in a political survey
are less likely to vote than individuals who do participate (Burden 2000; Clausen 1968
1969). Fourth, individuals who were surveyed just before an election may be made more
likely to vote as the result of the interview experience (Kraut and McConahay 1973;
Traugott and Katosh 1979). Surveys like the ANES could overestimate turnout
partially because follow‐up surveys are conducted with individuals who had already been
interviewed (Clausen 1968–1969). All of these factors may explain why survey results do
not match published voter turnout figures.
Another reason for apparent overestimation of turnout by surveys may be acquiescence,
because answering “yes” to a question about voting usually indicates having done so
(Abelson, Loftus, and Greenwald 1992). Second, respondents who usually vote may not
recall that, in a specific instance, they failed to do so (Belli, Traugott, and Beckmann
2001; Belli, Traugott, and Rosenstone 1994; Belli et al. 1999). Each of these alternate
proposals has gotten some empirical support. So although social desirability may be
operating, especially in telephone interviews, it probably accounts for only a small
portion of the overestimation of turnout rates.
Question Wording
Although much of questionnaire design should be considered a science rather than an art,
the process of selecting words for a question is thought to be artistic and intuitive. A
question's effectiveness can easily be undermined by long, awkward wording that taps
multiple constructs. Despite the obvious value of pithy, easy‐to‐understand queries,
questionnaire designers sometimes offer tome‐worthy introductions. One obvious
example is the preamble for the “feeling thermometer.” When tempted to use such a long
and complicated introduction, researchers should strive for brevity.
(p. 43)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 17 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
Choices of words for questions are worth agonizing over, because even very small
changes can produce sizable differences in responses. In one study, for instance, 73
percent of respondents said they strongly or somewhat “favored” policies on average,
whereas only 45 percent strongly or somewhat “supported” the same policies (Krosnick
1989). Many studies have produced findings showing that differences in word choice can
change individuals' responses remarkably (e.g., Rugg 1941). But this does not mean that
respondents are arbitrary or fickle. The choice of a particular word or phrase can change
the perceived meaning of a question in sensible ways and therefore change the judgment
that is reported. Therefore, researchers should be very careful to select words tapping
the exact construct they mean to measure.
Conclusion
Numerous studies of question construction suggest a roadmap of best practices.
Systematic biases caused by satisficing and the violation of conversational conventions
can distort responses, and researchers have both the opportunity and ability to
minimize those errors. These problems therefore are mostly those of design. That is, they
can generally be blamed on the researcher, not on the respondent. And fortunately,
intentional lying by respondents appears to be very rare and preventable by using
creative techniques to assure anonymity. So again, accuracy is attainable.
The American National Election Study questionnaires include a smorgasbord of some
good and many suboptimal questions. Despite these shortcomings, those survey questions
nonetheless offer a window into political attitudes and behaviors that would be impossible
to achieve through any other research design. Nonetheless, scholars designing their own
surveys should not presume that previously written questions are the best ones to use.
Applying best practices in questionnaire design will yield more accurate data and more
accurate substantive findings about the nature and origins of mass political behavior.
References
ABELSON, R. P., LOFTUS, E., and GREENWALD, A. G. 1992. Attempts to Improve the
Accuracy of Self‐Reports of Voting. In Questions About Questions: Inquiries into the
Cognitive Bases of Surveys, ed. J. M. Tanur. New York: Russell Sage Foundation.
ALLPORT, F. H. 1940. Polls and the Science of Public Opinion. Public Opinion Quarterly,
4/2: 249–57.
ALLPORT, G. W. 1929. The Composition of Political Attitudes. American Journal of
Sociology, 35/2: 220–38.
ALWIN, D. F. 1992. Information Transmission in the Survey Interview: Number of
Response Categories and the Reliability of Attitude Measurement. Sociological
Methodology, 22: 83–118.
(p. 44)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 18 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
—— and KROSNICK, J. A. 1985. The Measurement of Values in Surveys: A Comparison of
Ratings and Rankings. Public Opinion Quarterly, 49/4: 535–52.
AMERICAN NATIONAL ELECTION STUDIES 2004. The 2004 National Election Study
[codebook]. Ann Arbor, MI: University of Michigan, Center for Political Studies (producer
and distributor). <http://www.electionstudies.org>.
ANAND, S., KROSNICK, J. A., MULLIGAN, K., SMITH, W., GREEN, M., and BIZER, G.
2005. Effects of Respondent Motivation and Task Difficulty on Nondifferentiation in
Ratings: A Test of Satisficing Theory Predictions. Paper presented at the American
Association for Public Opinion Research Annual Meeting, Miami, Florida.
BASS, B. M. 1956. Development and evaluation of a scale for measuring social
acquiescence. Journal of Abnormal and Social Psychology, 53/3: 296–9.
BELLI, R. F., TRAUGOTT, M. W., and BECKMANN, M. N. 2001. What Leads to Voting
Overreports? Contrasts of Overreporters to Validated Voters and Admitted Nonvoters in
the American National Election Studies. Journal of Official Statistics, 17/4: 479–98.
BELLI, R. F., TRAUGOTT, S., and ROSENSTONE, S. J. 1994. Reducing Over‐Reporting
of Voter Turnout: An Experiment Using a “Source Monitoring” Framework. In NES
Technical Reports.
—— TRAUGOTT, M. W., YOUNG, M., and MCGONAGLE, K. A. 1999. Reducing
Vote Overreporting in Surveys: Social Desirability, Memory Failure, and Source
Monitoring. Public Opinion Quarterly, 63/1: 90–108.
BELSON, W. A. 1966. The Effects of Reversing the Presentation Order of Verbal Rating
Scales. Journal of Advertising Research, 6: 30–7.
—— and DUNCAN, J. A. 1962. A Comparison of the Check‐List and The Open Response
Questioning Systems. Applied Statistics, 11/2: 120–32.
BISHOP, G. F., HIPPLER, H.‐J., SCHWARZ, N., and STRACK, F. 1988. A Comparison of
Response Effects in Self‐Administered and Telephone Surveys. In Telephone Survey
Methodology, ed. R. M. Groves, P. P. Biemer, L. E. Lyberg, J. T. Massey, W. L. Nocholls II,
and J. Waksberg. New York: Wiley.
—— TUCHFARBER, A. J., and OLDENDICK, R. W. 1986. Opinions on Fictitious Issues:
The Pressure to Answer Survey Questions. Public Opinion Quarterly, 50/2: 240–50.
BROOK, D. and UPTON, G. J. G. 1974. Biases in Local Government Elections Due to
Position on the Ballot Paper. Applied Statistics, 23/3: 414–19.
BROWN, P. and LEVINSON, S. C. 1987. Politeness: Some Universals in Language Usage.
New York: Cambridge University Press.
(p. 45)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 19 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
BURDEN, B. C. 2000. Voter Turnout and the National Election Studies. Political Analysis,
8/4: 389–98.
CAMPBELL, A., CONVERSE, P. E., MILLER, W. E., and STOKES, D. 1960. The
American Voter: Unabridged Edition. New York: Wiley.
CANNELL, C. F., MILLER, P. V., and OSKENBERG, L. 1981. Research on Interviewing
Techniques. Sociological Methodology, 12: 389–437.
CARR, L. G. 1971. The Srole Items and Acquiescence. American Sociological Review,
36/2: 287–93.
CLAUSEN, A. R. 1968–1969. Response Validity: Vote Report. Public Opinion Quarterly,
32/4: 588–606.
CLOUD, J. and VAUGHAN, G. M. 1970. Using Balanced Scales to Control Acquiescence.
Sociometry, 33/2: 193–202.
CONVERSE, J. M. 1984. Strong Arguments and Weak Evidence: the Open/Closed
Questioning Controversy of the 1940s. Public Opinion Quarterly, 48/1: 267–82.
CONVERSE, P. E. 1964. The Nature of Belief Systems in Mass Publics. In Ideology and
Discontent, ed. D. Apter. New York: Free Press.
COX, E. P., III. 1980. The Optimal Number of Response Alternatives for a Scale: A Review.
Journal of Marketing Research, 17/4: 407–22.
DAHL, R. A. 1961. The Behavioral Approach in Political Science: Epitaph for a Monument
to a Successful Protest. American Political Science Review, 55/4: 763–72.
DUFF, B., HANMER, M. J., PARK, W.‐H., and WHITE, I. K. 2007. Good Excuses:
Understanding Who Votes With An Improved Turnout Question. Public Opinion Quarterly,
71/1: 67–90.
EVANS, R. I., HANSEN, W. B., and MITTELMARK, M. B. 1977. Increasing the Validity of
Self‐Reports of Smoking Behavior in Children. Journal of Applied Psychology, 62/4: 521–3.
FRITZLEY, V. H. and LEE, K. 2003. Do Young Children Always Say Yes to Yes–No
Questions? A Metadevelopmental Study of the Affirmation Bias. Child Development, 74/5:
1297–313.
GOETHALS, G. R. and RECKMAN, R. F. 1973. Perception of Consistency in Attitudes.
Journal of Experimental Social Psychology, 9: 491–501.
GREEN, P. E. and RAO, V. R. 1970. Rating Scales and Information Recovery. How
Many Scales and Response Categories to Use? Journal of Marketing, 34/3: 33–9.
(p. 46)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 20 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
GRICE, H. P. 1975. Logic and Conversation. In Syntax and Semantics, Volume 3, Speech
Acts, ed. P. Cole and J. L. Morgan. New York: Academic Press.
HAWKINS, D. I. and CONEY, K. A. 1981. Uninformed Response Error in Survey
Research. Journal of Marketing Research, 18/3: 370–4.
HOFMANS, J., THEUNS, P., BAEKELANDT, S., MAIRESSE, O., SCHILLEWAERT, N.,
and COOLS, W. 2007. Bias and Changes in Perceived Intensity of Verbal Qualifiers
Effected by Scale Orientation. Survey Research Methods, 1/2: 97–108.
HOLBROOK, A. L. and KROSNICK, J. A. In Press. Social Desirability Bias in Voter
Turnout Reports: Tests Using the Item Count Technique. Public Opinion Quarterly.
—— —— CARSON, R. T, and MITCHELL, R. C. 2000. Violating Conversational
Conventions Disrupts Cognitive Processing of Attitude Questions. Journal of Experimental
Social Psychology, 36: 465–94.
—— —— MOORE, D., and TOURANGEAU, R. 2007. Response Order Effects in
Dichotomous Categorical Questions Presented Orally: The Impact of Question and
Respondent Attributes. Public Opinion Quarterly, 71/3: 325–48.
KALTON, G., COLLINS, M. and BROOK, L. 1978. Experiments in Wording Opinion
Questions. Applied Statistics, 27/2: 149–61.
KLOCKARS, A. J. and YAMAGISHI, M. 1988. The Influence of Labels and Positions in
Rating Scales. Journal of Educational Measurement, 25/2: 85–96.
KOPPELL, J. G. S. and STEEN, J. A. 2004. The Effects of Ballot Position on Election
Outcomes. Journal of Politics, 66/1: 267–81.
KRAUT, R. E. and MCCONAHAY, J. B. 1973. How Being Interviewed Affects Voting: An
Experiment. Public Opinion Quarterly, 37/3: 398–406.
KROSNICK, J. A. 1989. Question Wording and Reports of Survey Results: The Case of
Louis Harris and Aetna Life and Casualty. Public Opinion Quarterly, 53: 107–13.
—— 1991. Response Strategies for Coping with the Cognitive Demands of Attitude
Measures in Surveys. Applied Cognitive Psychology, 5: 213–36.
—— 1999. Maximizing Questionnaire Quality. In Measures of Political Attitudes, ed J. P.
Robinson, P. R. Shauer, and L. S. Wrightsman. New York: Academic Press.
—— 1999. Survey Research. Annual Review of Psychology, 50: 537–67.
—— and ABELSON, R. P. (1992). The Case for Measuring Attitude Strength in Surveys. In
Questions About Questions: Inquiries into the Cognitive Bases of Surveys, ed. J. M. Tanur.
New York: Russell Sage Foundation.
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 21 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
—— and ALWIN, D. F. 1987. An Evaluation of a Cognitive Theory of Response‐Order
Effects in Survey Measurement. Public Opinion Quarterly, 51/2: 201–19.
—— ——. 1988. A Test of the Form‐Resistant Correlation Hypothesis: Ratings, Rankings,
and the Measurement of Values. Public Opinion Quarterly, 52/4: 526–38.
—— and BERENT, M. K. 1993. Comparisons of Party Identification and Policy
Preferences: The Impact of Survey Question Format. American Journal of Political
Science, 37/3: 941–64.
—— and FABRIGAR, L. R. In press. The Handbook of Questionnaire Design. New York:
Oxford University Press.
—— LI, F., and LEHMAN, D. R. 1990. Conversational Conventions, Order of Information
Acquisition, and the Effect of Base Rates and Individuating Information on Social
Judgments. Journal of Personality and Social Psychology, 59: 1140–52.
—— and MILLER, J. M. 1998. “The Impact of Candidate Name Order on Election
Outcomes.” Public Opinion Quarterly 62/3: 291–330.
—— NARAYAN, S., and SMITH, W. R. 1996. Satisficing in Surveys: Initial Evidence. New
Directions for Program Evaluation, 70: 29–44.
—— HOLBROOK, A. L, BERENT, M. K., CARSON, R. T, HANEMANN W. M., KOPP, R.
J., MITCHELL, R. C., PRESSER, S., RUUD, P. A., SMITH, V. K., MOODY, W. R., GREEN,
M. C., and CONAWAY, M. 2002. The Impact of “No Opinion” Response Options on Data
Quality: Non‐Attitude Reduction or an Invitation to Satisfice? Public Opinion Quarterly,
66: 371–403.
LAZARSFELD, P. F. 1935. The Art of Asking Why: Three Principles Underlying the
Formulation of Questionnaires. National Marketing Review, 1: 32–43.
—— 1944. The Controversy Over Detailed Interviews—An Offer for Negotiation. Public
Opinion Quarterly, 8/1: 38–60.
—— and ROSENBERG, M. 1949–1950. The Contribution of the Regional Poll to Political
Understanding. Public Opinion Quarterly, 13/4: 569–86.
LENSKI, G. E. and LEGGETT, J. C. 1960. Caste, Class, and Deference in the Research
Interview. American Journal of Sociology, 65/5: 463–7.
LINDZEY, G. E. and GUEST, L. 1951. To Repeat—Check Lists Can Be Dangerous. Public
Opinion Quarterly, 15/2: 355–8.
LIPARI, L. 2000. Toward a Discourse Approach to Polling. Discourse Studies, 2/2: 187–
215.
(p. 47)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 22 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
LISSITZ, R. W. and GREEN, S. B. 1975. Effect of the Number of Scale Points on
Reliability: A Monte Carlo Approach. Journal of Applied Psychology, 60/1: 10–3.
LODGE, M. and TURSKY, B. 1979. Comparisons between Category and Magnitude
Scaling of Political Opinion Employing SRC/CPS Items. American Political Science
Review, 73/1: 50–66.
MATELL, M. S. and JACOBY, J. 1972. Is There an Optimal Number of Alternatives for
Likert‐scale Items? Effects of Testing Time and Scale Properties. Journal of Applied
Psychology, 56/6: 506–9.
MCCARTY, J. A. and SHRUM, L. J. 2000. The Measurement of Personal Values in Survey
Research. Public Opinion Quarterly, 64/3: 271–98.
MCCLENDON, M. J. 1986. Response‐Order Effects for Dichotomous Questions. Social
Science Quarterly, 67: 205–11.
—— 1991. Acquiescence and Recency Response‐Order Effects in Interview Surveys.
Sociological Methods & Research, 20/1: 60–103.
MCDONALD, M. P. 2003. On the Overreport Bias of the National Election Study Turnout
Rate. Political Analysis, 11: 180–6.
—— and POPKIN, S. L. 2001. The Myth of the Vanishing Voter. American Political Science
Review, 95/4: 963–74.
MCINTYRE, S. H. and RYANS, A. B. 1977. Time and Accuracy Measures for Alternative
Multidimensional Scaling Data Collection Methods: Some Additional Results. Journal of
Marketing Research, 14/4: 607–10.
MERRIAM, C. E. 1926. Progress in Political Research. American Political Science
Review, 20/1: 1–13.
MIETHE, T. D. 1985. Validity and Reliability of Value Measurements. Journal of
Psychology, 119/5: 441–53.
MONDAK, J. J. and DAVIS, B. C. 2001. Asked and Answered: Knowledge Levels When We
Will Not Take “Don't Know” for an Answer. Political Behavior, 23/3: 199–222.
MOORE, M. 1975. Rating Versus Ranking in the Rokeach Value Survey: An Israeli
Comparison. European Journal of Social Psychology, 5/3: 405–8.
MUNSON, J. M. and MCINTYRE, S. H. 1979. Developing Practical Procedures for the
Measurement of Personal Values in Cross‐Cultural Marketing. Journal of Marketing
Research, 16/1: 48–52.
NARAYAN, S. and KROSNICK, J. A. 1996. Education Moderates Some Response Effects
in Attitude Measurement. Public Opinion Quarterly, 60/1: 58–88.
(p. 48)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 23 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
NISBETT, R. E. and WILSON, T. D. 1977. Telling More Than We Can Know: Verbal
Reports on Mental Processes. Psychological Review, 84/3: 231–59.
OPPENHEIM, A. N. 1966. Questionnaire Design and Attitude Measurement. New York:
Basic Books.
PAYNE, S. L. 1951. The Art of Asking Questions. Princeton, N.J.: Princeton University
Press.
PRESSER, S. 1990a. Can Changes in Context Reduce Vote Overreporting in Surveys?
Public Opinion Quarterly, 54/4: 586–93.
—— 1990b. Measurement Issues in the Study of Social Change. Social Forces, 68/3: 856–
68.
—— TRAUGOTT, M. W., and TRAUGOTT, S. 1990. Vote “Over” Reporting in Surveys:
The Records or the Respondents? In International Conference on Measurement Errors.
Tucson, Ariz.
RAMSAY, J. O. 1973. The Effect of Number of Categories in Rating Scales on Precision of
Estimation of Scale Values. Psychometrika, 38/4: 513–32.
REYNOLDS, T. J. and JOLLY, J. P. 1980. Measuring Personal Values: An Evaluation of
Alternative Methods. Journal of Marketing Research, 17/4: 531–6.
ROBERTS, J. V. 1985. The Attitude‐Memory Relationship After 40 Years: A Meta‐analysis
of the Literature. Basic and Applied Social Psychology, 6/3: 221–41.
RUGG, D. 1941. Experiments in Wording Questions: II. Public Opinion Quarterly, 5/1: 91–
2.
SCHAEFFER, N. C. and BRADBURN, N. M. 1989. Respondent Behavior in Magnitude
Estimation. Journal of the American Statistical Association, 84 (406): 402–13.
SCHUMAN, H. and LUDWIG, J. 1983. The Norm of Even‐Handedness in Surveys as in
Life. American Sociological Review, 48/1: 112–20.
—— and PRESSER, S. 1981. Questions and Answers in Attitude Surveys. New York:
Academic Press.
—— —— and LUDWIG, J. 1981. Context Effects on Survey Responses to Questions About
Abortion. Public Opinion Quarterly, 45/2: 216–23.
SCHWARZ, N. 1995. What Respondents Learn from Questionnaires: The Survey
Interview and the Logic of Conversation. International Statistical Review, 63/2: 153–68.
—— 1996. Cognition and Communication: Judgmental Biases, Research Methods and the
Logic of Conversation. Hillsdale, N.J.: Erlbaum.
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 24 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
—— and BLESS, H. 1992. Scandals and the Public's Trust in Politicians: Assimilation and
Contrast Effects. Personality and Social Psychology Bulletin, 18/5: 574–9.
—— GRAYSON, C. E, and KNAUPER, B. 1998. Formal Features of Rating Scales and the
Interpretation of Question Meaning. International Journal of Public Opinion Research,
10/2: 177–83.
—— and STRACK, F. 1985. Cognitive and Affective Processes in Judgments of Subjective
Well‐Being: A Preliminary Model. In Economic Psychology, ed. E. Kirchler and H.
Brandstatter. Linz, Austria: R. Tauner.
—— —— 1991. Context Effects in Attitude Surveys: Applying Cognitive Theory to
Social Research. European Review of Social Psychology, 2: 31–50.
—— HIPPLER, H.‐J., DEUTSCH, B., and STRACK, F. 1985. Response Scales: Effects of
Category Range on Reported Behavior and Comparative Judgments. Public Opinion
Quarterly, 49/3: 388–95.
—— KNAUPER, B, HIPPLER, H.‐J., NOELLENEUMANN, E., and CLARK, L. 1991.
Rating Scales: Numeric Values May Change the Meaning of Scale Labels. Public Opinion
Quarterly, 55/4: 570–82.
SIGALL, H. and PAGE, R. 1971. Current Stereotypes: A Little Fading, A Little Faking.
Journal of Personality and Social Psychology, 18/2: 247–55.
SILVER, B. D., ANDERSON, B. A., and ABRAMSON, P. R. 1986. Who Overreports
Voting? American Political Science Review, 80/2: 613–24.
SILVER, M. D. and KROSNICK, J. A. 1991. Optimizing Survey Measurement Accuracy by
Matching Question Design to Respondent Memory Organization. In Federal Committee on
Statistical Methodology Conference. NTIS: PB2002‐100103. <http://www.fcsm.gov/
01papers/Krosnick.pdf>.
SMITH, E. R. and MILLER, F. D. 1978. Limits on Perception of Cognitive Processes: A
Reply to Nisbett and Wilson. Psychological Review, 85/4: 355–62.
SMITH, T. W. 1984. Recalling Attitudes: An Analysis of Retrospective Questions on the
1982 GSS. Public Opinion Quarterly, 48/3: 639–49.
—— 1987. The Art of Asking Questions, 1936–1985. Public Opinion Quarterly, 51/2: S95–
108.
TOURANGEAU, R. and RASINSKI, K. A. 1988. Cognitive Processes Underlying Context
Effects in Attitude Measurement. Psychological Bulletin, 103/3: 299–314.
TRAUGOTT, M. W. and KATOSH, J. P. 1979. Response Validity in Surveys of Voting
Behavior. Public Opinion Quarterly, 43/3: 359–77.
(p. 49)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 25 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
TURNER, C. F. and MARTIN, E. 1984. Surveying Subjective Phenomena 1. New York:
Russell Sage Foundation.
VAILLANCOURT, P. M. 1973. Stability of Children's Survey Responses. Public Opinion
Quarterly, 37/3: 373–87.
VISSER, P. S., KROSNICK, J. A., MARQUETTE, J. F., and CURTIN, M. F. 2000.
Improving Election Forcasting: Allocation of Undecided Respondents, Identification of
Likely Voters, and Response Order Effects. In Election Polls, the News Media, and
Democracy, ed. P. Lavarakas and M. W. Traugott. New York: Chatham House.
WALLSTEN, T. S., BUDESCU, D. V., RAPOPORT, A., ZWICK, R., and FORSYTH, B.
1986. Measuring the Vague Meanings of Probability Terms. Journal of Experimental
Psychology: General, 115/4: 348–65.
WASON, P. C. 1961. Response to affirmative and negative binary statements. British
Journal of Psychology, 52: 133–42.
WEISBERG, H. F. and MILLER, A. H. 1979. Evaluation of the Feeling Thermometer: A
Report to the National Election Study Board Based on Data from the 1979 Pilot Survey.
ANES Pilot Study Report No. nes002241.
WILCOX, C., SIGELMAN, L., and COOK, E. 1989. Some Like It Hot: Individual
Differences in Responses to Group Feeling Thermometers. Public Opinion Quarterly, 53/2:
246–57.
WILLIS, G. B. 2004. Cognitive Interviewing: A Tool for Improving Questionnaire Design.
Thousand Oaks, CA: Sage Publications.
WILSON, T. D. and DUNN, E. W. 2004. Self‐Knowledge: It's Limits, Value, and Potential
for Improvement. Annual Review of Psychology, 55: 493–518.
—— and NISBETT, R. E. 1978. The Accuracy of Verbal Reports About the Effects
of Stimuli on Evaluations and Behavior. Social Psychology, 41/2: 118–131.
WOODWARD, J. L. and ROPER, E. 1950. Political Activity of American Citizens.
American Political Science Review, 44/4: 872–85.
WRIGHT, J. D. 1975. Does Acquiescence Bias the “Index of Political Efficacy?” Public
Opinion Quarterly, 39/2: 219–26.
Notes:
(1) All the question wordings displayed are from the 2004 ANES.
Josh Pasek
(p. 50)
Optimizing Survey Questionnaire Design in Political Science: Insights from
Psychology
Page 26 of 26
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com).©Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Subscriber: UC - Davis; date: 06 January 2019
Josh Pasek is Ph.D. candidate, Department of Communication, Stanford University.
Jon A. Krosnick
Jon A. Krosnick is Professor of Political Science, Communication, and Psychology,
Frederic O. Glover Professor in Humanities and Social Sciences, and Senior Fellow at
Woods Institute, Stanford University.