First, Catch the Rabbit:
The Methodological Imperative
and the Dramatization of Dialogic Reading
[As published in Multidisciplinary Perspectives on Literacy Research, ed. R. Beach, R. J. Green, M. Kamil & T. Shanahan. 69-89. Urbana, Illinois, National Conference on Research in English, 1992, and Reprinted in Poetics 20 (1991): 577-595.]
It is astonishing how rarely attempts at "testing" psycho-analytic theory have confronted the question of a possible incompatibility between some of Freud's theoretical assumptions and those necessary for the constitution of the empirical domain in which the testing is to take place. (Danziger,1988, pp. 91-92)This paper begins in 1976, when Vipond was a postdoctoral student at the University of Colorado. Another visitor to Boulder, James J. Jenkins, at one point asked what the subject of his doctoral research had been. The project had been to extend the lexical ambiguity effect to paragraph comprehension, but, Vipond said, problems had arisen when he couldn't replicate the effect. Jenkins observed, "So, you don't have a phenomenon," and asked if Vipond knew the recipe for rabbit stew. We've given it in the title of this paper.
Language and speech communication (as a dialogic exchange of utterances) can never be identical. Two or more sentences can be absolutely identical. . . . But as an utterance (or part of an utterance) no one sentence, even if it has only one word, can ever be repeated: it is always a new utterance . . . . (Bakhtin, 1986, p. 108)
Here we want to discuss some reasons researchers may find it hard to catch their particular rabbits. Because we are so familiar with it, we will use as an illustration some of our own research on point-driven or dialogic reading. We spent six years trying in vain to catch this rabbit, perhaps -- or so it seems now -- simply because we hadn't been looking in the right ways. Unlike Dryden's Diana, whose "chase had a beast in view," we chased a quarry that we continued, on the basis of our experience as readers, to believe must exist -- but of which we could find precious little evidence. To understand why, we will draw first on the work of the psychologist Kurt Danziger, whose notion of the "methodological imperative" in psychology suggests that researchers too often allow methods to dictate to theory how research is to be done. Still following Danziger, we will question the assumption that the purpose of research is the testing of theoretical claims. Testing is a worthy function of research but it is far from the only one. Using Danziger's term, research can also serve to "dramatize" a theory in order that its implications can be explored. When a theory is dramatized, one can learn things one did not expect. In this case, the dramatization gave us a new sense of the importance and place of dialogue in research as well as in reading itself. To make clearer the role we found dialogue playing in our studies, we then turn to the language theories of Mikhail Bakhtin.
We started working together because each of us was dissatisfied with his own discipline. Hunt wanted to understand better why students in English literature classes read and wrote as they did. Traditional literary studies and literary criticism (even reader-response criticism) were not much help, so during a sabbatical in Indiana he read as much as he could in cognitive and developmental psychology, semiotics, and educational theory, believing that insights from these disciplines could be used to understand his students' behaviour. Vipond, meanwhile, was moving in the opposite direction, from cognitive psychology toward literary issues. He was trying to extend work on text comprehension to literary texts, because there seemed to be important aspects of reading and response that were being missed by the paradigm that he was then working in (read a short paragraph and then free recall it).
Vipond's sabbatical in 1983-84 seemed a good opportunity for us to collaborate. The plan was to use some methods supplied by cognitive psychology (Vipond) to study some problems raised by literary studies and rhetoric (Hunt). Empirical and quantitative methods had helped bring about genuine advances in understanding human information processing; it should be possible, we thought, to apply these methods to the problem of what happens when people read literary texts. Although the line between Hunt's expertise and Vipond's quickly blurred (very early we began horning in on each other's territory), what did not become blurred was the central idea that the methods of psychology could be used to test the ideas of literary study.
Intuitively, it appeared that experts read literary texts in a way that psychological studies of text processing did not illuminate very well. Because those studies used what Robert de Beaugrande (1982) attacked as "fragmentary and inane" texts, they limited themselves to considering reading according to a simplistic "conduit" or "transfer of information" model (Reddy, 1979). Such studies tested theories about how information is acquired from text by making statistical inferences from the performance of large numbers of subjects working with simple texts. For instance, experimenters would identify certain text features and determine whether manipulating those features reliably influenced what readers could remember. The problem was that these studies seemed to have, at best, only peripheral implications for the more complex "literary reading" that we were interested in.
By focusing attention on a different set of texts and theories, we planned to turn the old methods to new purposes. We wondered what would happen if literary texts were substituted for the simple or artificial "textoids" traditional studies employed. What would happen if theories drawn from rhetorical, reader-response, and poststructuralist literary theory were substituted for the information-processing theories underlying traditional studies? What would it mean to consider literary reading from the vantage point of "story" and "discourse" -- the Russian Formalist distinction that we had discovered by way of Seymour Chatman (1978)? Finally, we wondered what we would find if we looked for the types of reading described by Louise Rosenblatt (1978), who suggested that there are two quite different kinds of rabbits out there -- one type ("efferent"), concerned with acquiring information from text, and the other ("aesthetic"), concerned with the lived- through experience of engaging in a transaction with a text.
Study 1: Branching text
In 1983-84, partly as a result of an extended discussion of undergraduates' recognition of irony and ironically-compromised narrators (Booth, 1961), we carried out a series of what we called "branching text" studies. We were trying to see if there were a causal relationship between readers' sensitivity to "narrative surface" and their "aesthetic" response. ("Narrative surface" is what Chatman calls "discourse" -- matters of tone and point of view as opposed to story events.) We reasoned that if readers could be induced to pay greater attention to narrative surface, they should be more likely to respond aesthetically rather than simply reading for information or events. Obviously, we needed a way to induce attention to discourse, a way to check that it had been induced, and a way to see if and to what extent aesthetic rather than efferent reading had been engaged in.
These seemed to be straightforward problems. We asked university students, as part of their first-year English classes, to read John Updike's "A&P." "A&P" is narrated in the first person by Sammy, a 19-year old grocery store checkout clerk. One group of readers, the experimental group, had a task intended to make them more aware of the story's narrative surface. At two places in the story these readers were given three parallel paragraph-length continuations. The three continuations, or "branches," were virtually identical in terms of story events, but they differed in tone and point of view. The first branch was consistent in both tone (colloquial) and point of view (first-person) with the rest of the story; in fact, it was Updike's original version. The second branch varied in tone but not in point of view: It was narrated by a first-person protagonist whose tone was formal (as opposed to Sammy's racy vernacular). The third branch varied both tone and point of view: It described the events from the point of view of an uncharacterized third-person narrator.
Readers in the experimental group were asked to rate the branches, on a 7-point Likert scale, for appropriateness with the rest of the story, whereas readers in the control group did not see the alternatives -- they were merely asked to think about the story for a few moments at those places. Near the end of the story there was a final set of branches, and this time all readers were asked to make appropriateness ratings. The experimental readers, by virtue of having been encouraged to attend to the text's narrative surface in the first two sets, were expected to demonstrate greater perception of surface -- that is, higher ratings for Sammy -- on the third set.
Afterwards, all the readers were given a set of statements (or "probes," as we came to call them) that other readers had allegedly made, and asked to rate their agreement with each statement on a 7-point scale. They also had the option of making written comments on each statement. One of the statements was what we took to be a reasonable "point" for the story. The rating on this critical item was thus taken as indicating the degree to which the reader had constructed a valid point for the story; that is, had read the text with some attention to aesthetic values rather than as a mere exercise in recall. (At that time, evidently, we were bold enough to rush in -- where now we might fear to tread -- and say what a valid point for the story was.)
For reasons that needn't concern us here, we ended up with 14 students in the experimental (branching) group and 17 in the control group. Of greater concern is the way we thought about the experiment and the data it generated. The ratings on two sets of 7-point scales were subjected to analysis of variance (ANOVA), with group as a between- subjects variable. The F-tests, however, were statistically nonsignificant: Readers in the experimental group did not rate the Sammy branch as more consistent than did the control readers, nor were the experimental readers more likely to agree that our point was an appropriate one for the story. And, in several subsequent studies, the ANOVAs remained nonsignificant even though we increased the number of readers (to 51) and the number of branches (to 5), reduced the length of the branches (now called twigs), gave feedback after each one, altered the probes, and so on.
We were taken aback. We had thought the rabbit in question was an obvious, public phenomenon, but there was absolutely no sign of it in our trap.
And yet we had done something right. There was space at the bottom of each scale for written comments, and some readers used it. From their comments we learned something that the F -tests didn't tell us; namely, that most of the students heartily disliked "A&P." Typical reactions were "dumb," "stupid," "boring," "choppy." Again we were surprised. Our expectation -- presumably shared by the many editors of classroom anthologies who include the text -- was that "A&P" is the type of story undergraduates should find engaging. Nevertheless, it was only when we began seriously to ask why the story was so deeply disliked that we began to think that our students were not reading it as though they expected it to have any relevance to them. Instead, they appeared to be reading as though their aim were to remember the information in the story, or to follow the sequence of events.
Neither of these stances toward "A&P" seemed to have much to do with the strategies we ourselves, and other readers we knew, used when reading the story -- for instance, expecting relevance and coherence, and distinguishing between Updike's purposes and Sammy's. After re-reading some work on "point" by sociolinguists such as Livia Polanyi and William Labov, it began to seem that the students' responses might be better accounted for by positing not two modes of reading, as Rosenblatt did, but three. We called these story-, information-, and point- driven modes, and suggested that reading a text in a mode it doesn't "afford" (Gibson, 1979) might result in the sort of disappointment or even anger voiced by some of our readers.
Interestingly, though, when we first wrote about point- driven understanding in Poetics (1984), we didn't have much evidence for it among our student readers. The discussion was almost entirely speculative, based on our intuitions about how we ourselves read "A&P" and our guesses as to how critics must have read it. Clearly, the branching text studies had not produced point-driven reading. Thus we found ourselves in the awkward position of trying to study a phenomenon we couldn't find. We had a new theory, but, so far, no rabbit.
Study 2: Letter frame
It now seemed that what was wrong with the branching text studies was that they were too subtle. We had tried to produce aesthetic reading through the mediating influence of narrative surface, but our readers didn't seem to notice. So we shelved the idea that attention to narrative surface causes aesthetic reading. What seemed much more immediate, in any case, was to find some way to test the new hypothesis about the three modes of reading. To do this we decided to use different orienting tasks to induce the three modes. Again we conducted a series of studies that would yield quantitative data.
In the first, we had 70 first-year university students read three short stories (two by Hemingway, the other by Maeve Brennan), but this time they did so under different task conditions (the task was repeated after each page of reading and was constant for a given reader): (a) to induce information-driven reading, some readers were asked a factual question after each page; (b) to induce story-driven reading, others were asked to predict what might happen next; and (c) to induce point-driven reading, other readers were asked whether they saw any connections developing between the story and a framing letter they had been shown before reading. (The letter was one in which the letter- writer recommended this story to someone else because it illuminated his, the letter-writer's, situation; the ongoing question for our readers, therefore, was, "Are you beginning to see why the letter-writer recommended this particular story?") These task conditions, we expected, would affect the stances readers took toward the texts. Although the tasks wouldn't affect all readers the same way, our methodological assumptions led us to expect some aggregate differences would be visible among the three groups.
ANOVAs were conducted on each of the several dependent measures in this experiment. Task effects were found only for the reading time scores: The readers asked information questions were slowest overall and those asked story questions were fastest, but only the letter frame group slowed down significantly over the last few pages of the story. Other dependent measures, notably recognition test scores and agreement ratings on probes, did not show a task effect. Nevertheless, we were encouraged by the reading time results, which we took to indicate that there are indeed processing differences between readers who are trying to construct a point and readers who are trying to retain information or follow the story. Thus we took these data as a warrant to continue our research, even though, in retrospect, they are only indirect evidence of the different modes. Perhaps now we had seen a few tracks, but the elusive rabbit was still nowhere to be found.
If there were processing differences between these modes, what are they? At about this time we began taking more seriously sociolinguistic work on the structure of conversational narratives. Livia Polanyi (1985) and William Labov (1972) argue that such narratives achieve their interpersonal effects by means of their evaluation structure. We understood "evaluations" as incongruities with the local norm of the discourse, by means of which interlocutors are invited to share the speaker's attitude toward an event, character, or idea. We saw important analogies between sociolinguistic analyses of conversational narratives and ours of printed, literary texts. Eventually (1986) we came to a definition of three distinct types of evaluations in literary texts, and to the hypothesis that point-driven readings should show the kind of sensitivity to evaluations that characterize listeners to conversational stories.
Meanwhile, we were planning a more elaborate experiment that would test some of these developing ideas. This time we used just one story, in two versions (the original and one with many of the textual evaluations eliminated), two tasks (frame and story), and two modalities (oral reading and silent). Thus the design was a 2x2x2 factorial, with 12 undergraduate readers in each of the 8 cells. The independent variables in the resulting ANOVAs were version, task, and modality. We also had 12 faculty members participate in the study as a reference group. Seven different dependent measures were used, many of them in an attempt to find effects due to evaluative language. In contrast to the previous experiment, readers given the frame task did not show a different pattern of reading times. There were, however, small but statistically reliable differences on the probe agreement scores. Readers given the frame task answered in what seemed a more point-driven way than the story task readers, and the 12 faculty members responded in a more point-driven way than the comparable group of undergraduates.
Besides the paper-and-pencil measures, readers were asked open-ended oral questions such as, "What do you make of this story?" "Do you like it?" "What do you (dis)like about it?" Responses to these questions, we assumed, had to be translated into a numerical format that would allow them to be analyzed statistically. Accordingly, the responses were treated as if they were statements in a verbal protocol (Ericsson & Simon,1984): Complex statements were decomposed into simpler ones, and the simple statements were coded according to whether they were primarily comments on plot, author, character, text, and so on. Chi-square analysis showed that faculty members made significantly more author, theme, and style comments, whereas students commented more often on plot. (We compared students from all eight treatment conditions with faculty members because the students' responses didn't vary much from one condition to another.) We took these results as evidence that the faculty members read in a relatively point-driven way, compared to the students' preference for story-driven mode. The evidence for point-driven reading, however, was still quite indirect. If there was a rabbit here, it wasn't the obvious one we were looking for.
Study 3: Social reading
To summarize so far. We had tried to produce aesthetic reading by encouraging readers to pay attention to narrative surface, but the branching text exercise simply didn't work. Next we tried to produce point-driven reading by giving people, before reading, a context or frame, and asking them during reading to relate the story to the frame. Relative to other task conditions, the letter frame task produced differences in reading times and agreement ratings with probes, but the differences were small and inconsistent across experiments.
Once again we were confronted with the problem of why a phenomenon which intuition told us was common and obvious should be so difficult to find. Turning again to the analogy with conversational stories, we reasoned that point- driven reading may be thought of as an interpersonal phenomenon, like point-driven listening in a conversation. Participants in conversations expect to be able to construct points precisely because the narrator is right there as a warrant that the story is potentially relevant to the situation -- that is, they expect a "tellable" story. A written text, however, offers only a theoretical equivalent of such a warrant, and the experience of many readers leads them to expect that written stories are not always tellable.
It seemed that a point-driven reader might be acting like a point-driven listener; that is, such a reader might construct a notion of the text's author, who acts as a warrant that the text being read is potentially relevant to the situation. According to this analysis, the problem with our manipulations was that they were just make-believe. If we wanted readers who wouldn't otherwise do so to treat text as the product of an intentional author, giving them yet another (equally unwarranted) text was hardly likely to be effective. Suppose, though, we manipulated the situation by having the reading take place in an actual interpersonal setting? If readers had an authentic interpersonal motive for reading, perhaps they would be more likely to read in a point-driven way.
Together with our student Lynwood Wheeler, we had 68 undergraduates look over three short stories, and choose one to work with further. The students then read the story they had selected either to a person who hadn't heard it before (this was the social reading condition), or else they read it aloud into a microphone, expecting to receive a comprehension test later (nonsocial reading). Dependent measures included the quality of oral reading, as determined by miscue analysis, and agreement ratings with probes. The social readers were expected to be more point-driven because they were reading a story they had chosen to someone who hadn't heard it, but as it turned out it was the nonsocial readers who, according to the probe scores, had read in the more point-driven way. Thus, far from producing greater incidence of point-driven reading, the social manipulation had produced less. Our rabbit seemed further away than ever.
Study 4: Discourse-based interview
By this time we were beginning to wonder whether we weren't imagining the whole thing. After three or four years of study, still the strongest evidence for point- driven reading was our own intuitions. On the basis of our felt experience we still thought we knew what "literary reading" ought to look like. But we couldn't seem to get it into our lab in order to find out more about it. As Jenkins might have asked, where was the phenomenon? In a kind of desperation, therefore, we decided simply to see if we could get some clear, convincing examples of point-driven reading.
So this time we wouldn't try to produce point-driven reading. Instead we would try to create conditions so favorable that if there were such a beast, it would appear. (And, of course, we would be waiting, with the tape recorder turned on, to catch it if it did.) What would be the ideal conditions? Like Rosenblatt, we had been saying for some time that aesthetic reading is a transaction between text and reader that is shaped by the particular situation in which it occurs; thus it was necessary to pay attention to all three components. Accordingly, we chose texts that seemed to afford point-driven reading (and for the first time used nonfictional as well as fictional texts). We chose readers from a range of educational levels: first- year undergraduates, as usual, but this time equal numbers of fourth-year undergraduates and faculty members, as well. (Since we planned to study each reader in some detail, we were content to have only five readers in each group.)
But what we changed most of all was the situation. True, the physical environment was much the same (the reading still occurred in a basement "lab" at the university), but the tasks the readers performed, and the type of data collected, were very different. The first difference was that instead of giving the readers photocopies of the texts they were to read, they were handed an actual, published copy of each text. The second was that whereas before the readers performed various tasks with respect to the texts, this time they were engaged in an intensive interview -- a "guided conversation" (Lofland & Lofland, 1984, p. 59) -- about each text. Following Odell and Goswami (1982; see also Odell, Goswami, & Herrington,1983), this interview was "discourse-based," meaning that the reader was shown sentences from the original text along with alternatives that we had composed. For each alternative, the reader was asked whether it would make a difference if the new phrasing were substituted for the original, and if so, what kind of difference it would make. By composing the alternatives ourselves we were able to highlight issues of interest, with the added advantage that literary nomenclature didn't have to be introduced. (For example, in some cases we replaced metaphoric language with prosaic language. Some readers thought that much was lost, whereas others said that the prosaic alternative, because it was easier to understand, was an improvement on the original.) In each case, however, the alternatives were treated as occasions for talk.
After the discourse-based interview, the readers responded, as in previous studies, to probes we had devised for each text. But this time instead of converting the responses to numbers on a Likert scale and then determining by ANOVA whether there were statistically reliable between- group differences, the probes were occasions for still more talk.
More than 700 pages of transcript resulted. We did not attempt to analyze the corpus into codable statements. Instead we read and reread the transcripts, looking for clear instances of the kind of reading we had been calling point-driven. We were now, taking the idea from Bakhtin's (1981) insights about the status of literary texts, beginning to think of it as "dialogic." That is, as explained more fully below, we were moving even further away from the notion of "point" as a specific, unitary -- and perhaps unproblematic -- phenomenon that a story might in some sense "have," and thinking of it even more as a process of establishing relationships between people by means of texts. In this sense, our conception of the rabbit's nature was shifting, at least slightly, as our investigation continued.
And when we did find instances of dialogic reading (and information- and story-driven modes, too), we tried to account for them by the specific conjunction of reader, text, and situation. In doing this, we learned some new, and perhaps surprising, things about the rabbit.
For example, we learned that people who read dialogically often expect to be able to "converse with" and continue to refer to the text after the immediate reading is finished, they talk about passing texts on to others, and they are more likely to connect what they read to their own knowledge and concerns. (The most dramatic instance of connecting reading to experience: two of our readers had actually met one of the authors used in the study, but one made no connection between the stories handed him and remembering that he heard the author read similar stories in a class the previous term.) Dialogic reading seemed to be more prevalent for some texts than for others, and clearly was engaged in more by the faculty members than by the students. For a fuller discussion of some of the things we learned, see Vipond, Hunt, Jewett, and Reither (1991), and Hunt (1989).
In other words, it was our sense that finally we had a rabbit, and were in a position to learn some new things about its character and habits. For the first time we had, in a recognizable form, the phenomenon itself. The question, of course, is why did it take so long?
Like all stories, the one we've just told can be read in different ways. It might be read, for instance, as a tale of two hard-nosed experimentalists who eventually saw the light and turned into warm-hearted humanists. The studies that began as experiments, committed to quantification and statistical inference, evolved toward smaller-scale projects in more "natural" situations, with an acknowledgment that the data did not necessarily have to be quantified but were to be understood and interpreted as discourse. It might be suggested that we are now studying "everyday" as opposed to "laboratory" reading, and understanding it qualitatively instead of quantitatively.
There are a number of problems in reading the story this way, however. As George Hillocks notes (this volume), realist (quantitative-artificial) and idealist (qualitative- natural) approaches are not the polar opposites they are often made out to be. Instead, they are better seen as complementary metaphors; each informs the other, and we need both. Another problem is that there seems to be an implicit value hierarchy (one we do not share) in such a scheme; it's far too easy to see the bad guys as either the "number crunchers" or the "storytellers." We would not derive the moral that quantified, "artificial" experiments are bad whereas qualitative, "real world" studies are good. As Douglas Mook (1983, 1989) argues, external validity is not necessarily a requirement in research -- for some purposes, quantified experiments done in completely "unnatural" laboratory settings are precisely what is needed, because they provide answers to questions posed by theory.
But if the quantitative/qualitative scheme cannot account for the story of failure and success we have told, what is it about? In the remainder of this paper we will discuss two different though related accounts. In keeping with our different disciplinary backgrounds, one derives from psychology (in particular, Kurt Danziger), and the other from literary theory (Mikhail Bakhtin).
Danziger and the methodological imperative
In "The Methodological Imperative in Psychology" (1985), Danziger considers the relationship between psychological theory and rules of evidence. How are theories to be tested? "In this discipline it is generally assumed without question that the only valid way to test theoretical claims is by the use of statistical inference" (p. 3). According to Danziger, statistical inference is so pervasive that it not only dominates design considerations but works back to data collection procedures themselves: One collects data of a type that can be handled by statistical inference. (We would add that the effects of statistical inference sometimes extend even further back than that, to a researcher's earliest ideas about doable experiments.)
So what's the problem? The problem is that the assumptions made by the psychological theory one is testing may not be congruent with the assumptions of the system that is being used to test it. Specifically, theories (in this case, theories of reading, but as Danziger points out, many others as well) are often theories of individual functioning, whereas statistical inference requires group or aggregate data, ideally with large numbers of subjects. Given inconsistency between theory and method, there is no intrinsic reason why either one should dominate the other. Historically, however, methods have enjoyed higher status than theories; theories have been seen as idiosyncratic and subject to irrational impulses, whereas methods have been seen as universal and rational (Danziger, 1988, p. 87). Thus the problem of inconsistency between theory and method has usually been resolved by accommodating theory to method. The methodological imperative in psychology is thus double- edged: It means not only that statistical inference has been taken as the one legitimate way to test theories, but that theories themselves have come to be constructed in the image of the methodology. Danziger coins the term "methodomorphic theory" as a shorthand description of this state of affairs (1985, p. 9).
Where does this leave theories that have been developed outside the influence of the dominant methodology? This question is relevant to our own work, because the theoretical claims we wish to make about point-driven or dialogic reading have no obvious compatibility with the demands of statistical inference; we were not, that is, trying to develop a model of group (aggregate) functioning. Danziger cites Wundt, Freud, Köhler, Wertheimer, Piaget, and especially Kurt Lewin as examples of psychologists whose theories were developed outside the "charmed circle" of statistical inference. Because the assumptions of statistical inference are incompatible with these psychologists' theoretical assumptions, the methods cannot claim to be neutral with respect to their theories. Consequently to test the theoretical claims of a Freud, a Piaget, or a Lewin by means of statistical inference means that the theories are going into the test situation with an "absurd handicap" (Danziger,1985, p. 7).
Let's reconsider Studies 1-3 in light of Danziger's analysis. We did assume "without question" that the only way to test our theoretical claims was to devise tasks that would yield numbers; these numbers were necessary because they are required by statistical inference procedures, preferably ANOVA. So no matter what problem was being investigated, it was an unacknowledged requirement that the study yield numbers amenable to statistical analysis -- reading times (in seconds), agreement ratings (on a 1-7 scale), percent meaning-preserving miscues, and so on. At the same time it was a built-in requirement that the number of readers per experiment be fairly large. Thus the branching text experiments used from 31 to 51 readers; the letter frame experiments used 70 to 96; and the social reading experiment used 68.
In retrospect, it appears that we didn't question whether it was appropriate to test claims about point-driven reading by ANOVA-driven experimentation. Statistically significant results obtained by means of ANOVA are persuasive -- that's all we needed to know. What we didn't notice at the time was that the assumptions made by statistical inference may have been incompatible with our theoretical assumptions about reading. The most important of these assumptions is that reading is a transaction between reader and text, shaped by situation. Because readers are different, and because they represent texts and situations differently, there are likely to be wide individual differences in response. Statistical inference, however, is intolerant of individual variation within a treatment condition -- it is mere "error variance"; consequently the chances of finding overall, group differences among the various treatments was diminished. It could be said that what we were really looking for was what Stephen Jay Gould calls "historical explanation," but our methods were giving us "experimental results" (1989, p. 278).
In any event, either the theory or the method had to yield. Given the methodological imperative, it was perhaps inevitable that it was the theory that gave ground. Thus not only did the theory of literary reading enter these experiments with an "absurd handicap," but what was even more absurd was that we had handicapped ourselves. We begin to appreciate why the rabbit stayed away.
Now whether Vipond and Hunt test their theories by the "right" or "wrong" methods may not be a question of universal significance, but it is worth trying to understand how we, and psychology in general, got into a situation where theories are tested unfairly by transforming them into methodomorphic mutants. Largely, Danziger suggests, it is because the relation between theory and method has been conceived too narrowly. We need to re-examine the notion that the only reason for doing experiments is to test a theory. Although the testing function is one possible relation between theory and method, it is a more limited one than we have been led to believe. There are other functions that methods may serve. In particular, Danziger draws attention to the demonstration or staging function of methods, in which "a particular methodology [may be used] to construct a working model that demonstrates the theory in action" (1988, p. 92).
For example, neither social psychology experiments nor psychoanalytic sessions provide formal, logical tests of their respective theories; rather, they provide applications or "dramatizations." Thus, when Stanley Milgram conducted experiments on destructive obedience he was not attempting to test specific theoretical claims (Miller, 1986, p. 45). Milgram's experiments were instead demonstrations of the power of situations over behaviour, and nearly thirty years later they still function as exceptionally vivid dramatizations of his theoretical claims.
Returning to our own work, it's now possible to see more clearly why Study 4, the discourse-based interview, was finally successful in catching the rabbit. Dialogic reading is a form of social interaction; in order to test it, situations need to be created that will allow it to exist. The social situation of Studies 1-3 did not afford dialogic reading, whereas the situation of Study 4 did. In that study dialogic reading was not "tested" but dramatized, demonstrated in action. Pushing the dramatization metaphor even further, we could say that in this study a stage was set; participants ("actors") invited onto it; props (texts) supplied. One of the actors, the interviewer, had a rough script to work from; beyond that it was a matter of controlled improvisation. Perhaps one reason the dramatization was effective is that nothing else got in the way. In particular, when we set up the study we made no attempt to provide for the collection of numbers; in Danziger's terminology, we did not impose any kind of numerical ordering on the empirical domain being studied. In this case, therefore, and for the first time, statistical inference did not call the shots. 
In short, what we ended up doing was studying dialogic reading by staging studies as dialogues. The word dialogic brings to mind the name of Bakhtin, and it's to a Bakhtinian account that we now turn.
Bakhtin, text, and utterance
Bakhtin's dialogism -- especially when Bakhtin is considered as a literary theorist rather than as a philosopher of language -- is often taken to be a position from which critics can see aspects of literary texts not noticed before: their many-voicedness, intertextual richness, heteroglossia. But there is a further implication of Bakhtin's ideas, one that is clearer in works such as Marxism and the Philosophy of Language (1973) and Speech Genres and Other Late Essays (1986) -- works that are not always seen as relevant to literary theory. This implication helps us understand what it might mean to read a literary text in an engaged way, and what it might mean to try to understand that engaged reading. It extends the dialogue outside the world of texts to include actual readers and writers.
The fact is that when the listener perceives and understands the meaning (the language meaning) of speech, he simultaneously takes an active, responsive attitude toward it. He either agrees or disagrees with it (completely or partially), augments it, applies it, prepares for its execution, and so on. And the listener adopts this responsive attitude for the entire duration of the process of listening and understanding, from the very beginning - - sometimes literally from the speaker's first word. . . . Any understanding is imbued with response and necessarily elicits it in one form or another: the listener becomes the speaker. (Bakhtin, 1986, p. 68)
Even when overt responses are delayed, Bakhtin insists, understanding is actively responsive:
Bakhtin's distinction between utterance and text is central here (and not to be confused with David Olson's more recent and better-known distinction; cf. Lotto, 1989). The difference for Bakhtin lies not in any characteristics of the discourse itself, but rather in how it functions. A text becomes an utterance when it is used; the same text may constitute quite different utterances in different situations. Bakhtin offers the sentence as an example of text:
The sentence as a unit of language, like the word, has no author. Like the word, it belongs to nobody, and only by functioning as a whole utterance does it become an expression of the position of someone speaking individually in a concrete situation of speech communication. (1986, pp. 83-84)
There are, to be sure, differences between oral and written discourse, but we ignore their similarities at our peril. If we begin, for example, with the assumption of unidirectionality -- that texts have predictable consequences in readers -- we're in methodological trouble. But we're in just as deep trouble if we assume that texts and readers conduct their transactions in pretty much the same ways irrespective of the "concrete situation of speech communication." And we're in particularly deep trouble if we don't notice that some readers are more at the mercy of situation than others, that some are able to exert more control than others over the way in which they construct the situation. It's a common observation among literature teachers that literary texts "create their own context" -- and, as a text-centred shorthand, it's true. But, in any given situation, it's not true for all readers.
The most important single fact about "concrete situations of speech communication" is that they are socially constructed. When the situation affords it, and a reader or listener takes a text as an utterance, it becomes a move in a dialogue. If the reader sees the text as an utterance in one dialogue, she'll tend to expect certain kinds of things from it; if she sees it as a move in another, quite different, dialogue, she'll expect different things from it. In other words, what a given reader does is affected as much by how that reader sees the text as framed by an ongoing dialogue as it is by anything we may be able to identify as "text characteristics." And it is what the reader does that determines the shape of the most fundamental kinds of connections or inferences that will be constructed on the basis of the text. In brief, what that reader does will be profoundly influenced by how she constructs the situation and the text's role in it. 
All this has consequences for the methodological story we've been telling. If we look back at the early experiments (Studies 1-3) from this standpoint, some data which made little sense at the time come into clearer focus. Virtually none of our undergraduate readers of "A&P," for example, took the story as an utterance in a dialogue between them and us about the values we saw in the story. "A&P" was not taken as an utterance -- ours or Updike's -- because the concrete situation in which it was presented did not support such a reading. "A&P" was taken as a move in an experiment rather than as a move in a conversation; it was a mere text, given (in photocopied form) to readers who were asked to "do things to" it. Behind the text was the considerable authority of the institution and of the experiment, but the text itself was little more than an example, for research purposes, of appropriate stimulus materials. No wonder, then, that in this situation the student readers adopted other stances: reading for information or events. (One reader who objected strongly to what she saw as the sexist nature of the story was clearly reading in a dialogic way, although it wasn't clear whether her dialogue was with Sammy, Updike, or the experimenters; unfortunately, we didn't know enough to ask. (For a feminist critique of "A&P", see Bogdan, Pitt, and Millen, in press.)
Bakhtin also helps us make more sense of the results of Study 4. This time the texts were not stimulus materials that readers used to complete various tasks; instead, the texts were occasions for response. The interviewer assumed that the readers would have responses to each text, and he shared his. Rather than being "controlled," the readers were participating in the creation of a conversation in which, after the fact, it was possible to find "regularities" (Rubin, 1989). Because of the extent to which the texts were embedded in a flow of conversation -- the interviewer did not merely go on to the next question, but regularly responded to the readers' views -- it was easier for a reader to treat the text as an utterance -- that is, in Bakhtin's terms, "an expression of the position of someone speaking individually in a concrete situation of speech communication" (1986, p. 84). But whose utterances were they? In Bakhtin's view, of course, "our speech is filled to overflowing with other people's words" (1981, p. 337), so the answer need not be simple. It's reasonable to suppose that the texts were seen as both the authors' utterances and the interviewer's. The authors', because the texts were not photocopies but published copies, and thus more readily seen as authored; and the interviewer's, because he personally handed a copy of the text to the reader, saying, "I'd like you to read this."
Now that we have presented two views, from psychology and from literary theory, it's worth considering how different they really are. According to Danziger, theory and methods should be congruent. What this means for a theory of dialogic reading, we suggest, is that one must use "social" methods to study a "social" theory; in other words, one sets up, or stages, a dialogue. Bakhtin, on the other hand, stresses the importance of how the text is framed. Whether a piece of discourse is taken as text or as utterance depends crucially on the "concrete situation" in which it is embedded. In order that texts are taken as utterances they must be embedded in concrete situations that support dialogue.
In the last analysis we think the two explanations illuminate the same ground from different points of view. The methods used to study particular readings constitute one aspect of the concrete situation in which the readings are inevitably embedded. The methods used in Studies 1-3 helped create one type of situation (and thus supported certain types of reading), whereas the different methods of Study 4 helped create a quite different type of situation (and thus supported another type). Whether one focuses on the methods, with Danziger, or the situation, with Bakhtin, both psychological and literary accounts suggest that dialogic reading is appropriately studied by setting up studies as dialogues.
This is not to say, however, that dramatization is the only correct way to do research. On the contrary, we believe there is a place for testing hypotheses, and in many circumstances statistical inference is an essential research strategy. But there is also a place -- and for the study of phenomena such as dialogic reading it's an important place -- for other types of studies. Ultimately it should be theory, not method, that determines the aim and shape of an investigation. The problem is that researchers who blindly obey the methodological imperative are unlikely to catch their rabbits. And if they don't, how will the rest of us ever get any rabbit stew?
Bakhtin, M. M. [V. N. Volosinov.] (1973). Marxism and the philosophy of language. (L. Matejka & I. R. Titunik, Trans.). Cambridge: Harvard University Press. (Original work published 1929)
Bakhtin, M. M. (1981). The dialogic imagination. (M. Holquist, Ed.; C. Emerson & M. Holquist, Trans.). Austin: University of Texas Press.
Bakhtin, M. M. (1986). Speech genres and other late essays. (C. Emerson & M. Holquist, Eds.; V. W. McGee, Trans.). Austin: University of Texas Press.
Beaugrande, R. de. (1982). The story of grammars and the grammar of stories. Journal of Pragmatics, 6, 383-422.
Bogdan, D., Pitt, A., & Millen, J. (In press). Approaches to gender in teaching John Updike's "A&P". In E. Evans (Ed.),Critical approaches to teaching fiction in secondary schools. Australia: St. Clair Press.
Booth, W. (1961). The rhetoric of fiction. Chicago: University of Chicago Press.
Chatman, S. (1978). Story and discourse:Narrative structure in fiction and film. Ithaca: Cornell University Press.
Danziger, K. (1985). The methodological imperative in psychology. Philosophy of the Social Sciences, 15, 1-13.
Danziger, K. (1988). On theory and method in psychology. In W. J. Baker, L. P. Mos, H. V. Rappard, & H. J. Stam (Eds.), Recent trends in theoretical psychology (pp. 87-94). New York: Springer-Verlag.
Ericsson, K. A., & Simon, H. A. (1984). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT/Bradford.
Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin.
Gould, S. J. (1989). Wonderful life: The Burgess Shale and the nature of history. New York: W. W. Norton.
Hunt, R. A. (1989). Learning to converse with texts: Some real readers, some real texts, and the pragmatic situation. SPIEL: Siegener Periodicum zur Internationalen Empirischen Literaturwissenschaft,8(1), 107- 130.
Hunt, R. A., & Vipond, D. (1985). Crash-testing a transactional model of literary reading.Reader: Essays in Reader- Oriented Theory, Criticism, and Pedagogy, No. 14 (Fall), 23- 39.
Hunt, R. A., & Vipond, D. (1986). Evaluations in literary reading. TEXT, 6, 53-71.
Labov, W. (1972). Language in the inner city: Studies in the Black English Vernacular. Philadelphia: University of Pennsylvania Press.
Lofland, J., & Lofland, L. H. (1984). Analyzing social settings: A guide to qualitative observation and analysis. (2nd edition). Belmont, CA: Wadsworth.
Lotto, E. (1989). Utterance and text in freshman English. College English, 51, 677-687.
Miller, A. G. (1986). The obedience experiments: A case study of controversy in social science. New York: Praeger.
Mook, D. G. (1983). In defense of external invalidity.American Psychologist, 38, 379-387.
Mook, D. G. (1989). The myth of external validity. In L. W. Poon, D. C. Rubin, & B. A. Wilson (Eds.), Everyday cognition in adulthood and later life (pp. 25-43). New York: Cambridge University Press.
Odell, L., & Goswami, D. (1982). Writing in a non-academic setting. Research in the Teaching of English, 16, 201-223.
Odell, L., Goswami, D., & Herrington, A. (1983). The discourse- based interview: A procedure for exploring the tacit knowledge of writers in nonacademic settings. In P. Mosenthal, L. Tamor, & S. A. Walmsley (Eds.), Research on writing: Principles and methods (pp. 221-236). New York and London: Longman.
Polanyi, L. (1985). Telling the American story: A structural and cultural analysis of conversational storytelling. Norwood, NJ: Ablex.
Reddy, M. J. (1979). The conduit metaphor: A case of frame conflict in our language about language. In A. Ortony (Ed.), Metaphor and thought (pp. 284-324). Cambridge: Cambridge University Press.
Rosenblatt, L. M. (1978). The reader,the text, the poem: The transactional theory of the literary work. Carbondale: Southern Illinois University Press.
Rubin, D. C. (1989). Issues of regularity and control: Confessions of a regularity freak. In L. W. Poon, D. C. Rubin, & B. A. Wilson (Eds.), Everyday cognition in adulthood and later life (pp. 84-103). New York: Cambridge University Press.
Vipond, D., & Hunt, R. A. (1984). Point-driven understanding: Pragmatic and cognitive dimensions of literary reading.Poetics, 13, 261-277.
Vipond, D., & Hunt, R. A. (1989). Literary processing and response as transaction: Evidence for the contribution of readers, texts, and situations. In D. Meutsch & R. Viehoff (Eds.),Comprehension of literary discourse: Results and problems of interdisciplinary approaches (pp. 155-174). Berlin: de Gruyter.
Vipond, D., Hunt, R. A., Jewett, J., & Reither, J. A. (1991). Making sense of reading. In R. Beach & S. Hynds (Eds.),Developing discourse practices in adolescence and adulthood (pp. 110-135). Norwood, NJ: Ablex.
Vipond, D., Hunt, R. A., & Wheeler, L. C. (1987). Social reading and literary engagement.Reading Research and Instruction, 26, 151-161.