transcoding, word-for-word, or word-based perspective. As the names clearly suggest, according to this perspective, interpreting is a process of changing linguistic codes from one language into another. The interpreter does not wait for one complete unit of meaning to be fully comprehended; he may start to translate as soon as he hears the smallest meaningful unit in the SL (e.g. a word). In other words, the two processes of comprehension and reformulation take place simultaneously rather than in a sequence. Fabbro and Gran (1994, p. 297) provide the following brief account of this approach:
With the word-for-word translation strategy, the interpreter translates minimal meaningful units of the SL which have an equivalent in the TL. At a phonetic, phonological, morphological, syntactic and semantic level, the two working languages thus tend to short-circuit, and the overall message is not decoded at a cognitive level. In fact, by using this strategy, the interpreter may not even understand the meaning of the incoming message and nevertheless be able to translate every single word, provided he can store the single elements in his short-term memory for the time needed to find an equivalent translation of the single terms and formulate a syntactically correct sentence in the TL.
This may seem a bit undesirable when it comes to the quality of the TL expression. However, even seasoned interpreters may at times choose this strategy for interpretation (Fabbro & Gran, 1994). Examples of such cases are when the interpreter is faced with a highly technical topic (mathematics, physics, chemistry, etc.), or when he is rendering lists of any kind, e.g. numbers, names, etc. (Isham, 1994). Fabbro and Gran’s (1994, p. 304) results show student interpreters and professional interpreters adopt a different approach to SI. “The latter seem to adopt semantic strategies (meaning-based translation), whereas the former apparently divert more of their attention to the syntactic for of the message (word-for-word translation).”
The selection of one or the other approach is quite closely interconnected with another important concept in SI, namely EVS, which will be elaborated on in what follows.
2.3.3 EVS and TTS
The ‘Ear Voice Span’ or ‘EVS’ can be defined as the “lag time between the moment an incoming message is perceived by a conference interpreter and the moment the interpreter produces his translation of the segment” (Lee, 2002, p. 596). The importance of EVS in SI studies stems from the fact that it is one of the few quantifiable variables in this process.
Indicative of “the interpreters’ time management during SI” (Lee, 2011, p. 153), EVS is the time when interpreters perform many simultaneous information processing operations including comprehension of the incoming message, conversion of the message, planning and producing the interpretation as well as other unknown cognitive processes (Lee, 2002; Lee, 2011). Chernov (2004, p. 14) says “that would seem to be the delay necessary for the perception and comprehension of the SL chunk and the planning, formulation and articulation of the corresponding chunk of TL discourse.”
The interpreter has to keep up with the time constraint imposed on him. The ability to anticipate (which will be discussed at length in section 2.4.5) enables the interpreter to start uttering his IT before the incoming sentence comes to an end. As Chernov (2004, p. 91) explains “the basic mechanism making SI possible is the probability anticipation of the development of the message” Thus, EVS can be defined as the minimum time an interpreter requires for information processing under heavy cognitive processing (Lee, 2002). According to Kade (1971, as quoted in Lee, 2002) the optimal moment for the interpreter to start uttering the TL is immediately after all syntactic and semantic ambiguities in the unit have been resolved. The word immediately here implies that desirably the EVS should be as short as the prevailing circumstances allow (de Groot, 1997, as quoted in Lee, 2002). Mizuno (2005, p. 743), too, recommends that “it would be desirable for interpreters to keep the delay time as short as possible” which “may call for interpreting strategies or processing strategies of some kind.”
Interpreters, however, are bound to cope with the heavy cognitive processing, which inevitably lengthens the EVS. Setton (1999, as cited in Lee, 2002, p. 598) is aware of this limitation and observes “interpreters are not at liberty to wait indefinitely for possible disambiguating information downstream.” As EVS increases, more information should be stored in the interpreter’s short-term memory. Then the memory will be overloaded, which means that the processing of the subsequent sentences might be impaired.
Having defined EVS as “a time-lag between the moment an incoming message is perceived by an interpreter and the moment the interpreter produces his converted version of the segment,” (Lee, 2011, p. 152) which clearly emphasizes the beginning of the rendition, Lee moves on to introduce another important concept: ‘tail-to-tail span’ or ‘TTS’.
Unlike EVS, TTS means the time-lag between the end of a speaker’s sentence and that of interpreter’s converted sentence (Lee 2003). This is the span of time the interpreter is lagging behind the speaker in winding up delivery of the interpreters’ rendition. (Lee, 2011, p. 153)
The important point is that EVS and TTS are not necessarily the same since the interpreter may lag behind the speaker for a certain amount of time at the beginning of the segment, but lag further behind at the end of the same segment. This may be due to the fact that the interpreter needs more time to understand the message or reformulate it, or simply because he overtranslates this segment. This explains why Christoffels, de Groot et al. (2003, as cited in Lee, 2011, p. 154) maintained that “if finding an appropriate word for a concept during SI takes a long time, it is likely that the interpreting process may break down due to the loss of valuable processing capacity and time.”
It goes without saying that EVS is not a fixed value (Chernov, 2004); it depends on the language pair at hand, the directionality of interpreting, the interpreter’s habits, the subject matter, the level of difficulty of the speech, etc. For instance as Schweda-Nicholson (1987, as cited in Lee, 2002) states “some difficult or complicated material may require a longer EVS, while more straightforward parts can be processed with a shorter EVS” (p. 600).
Even so, certain scholars have reported their findings in this regard. Barik (1973, as cited in Lee, 2002) found the average EVS to be within the range of two to three seconds. Schweda-Nicholson (1987, as quoted in Lee, 2002) maintains that EVS is five to ten words and Lederer (1978, as cited in Lee, 2002) reports this value to be between three to six seconds. Lee (2006, cited in Lee, 2011) found an average EVS of 3.13 seconds in English into Korean SI in general. Chernov (2004, p. 14) maintains:
[…] this lag is far from being constant, and in fact varies very widely, although it is usually around 3 seconds. It is interesting to note that this value was quite correctly identified as the average time lag in the very first known research paper on SI, by Eva Paneth (1957), who presumably used no equipment other than a stopwatch.
Regarding the length of EVS, Van Dam (1986, p. 61, cited in Lee, 2002, p. 602) warns:
When, for whatever reason, we fall behind the speaker, the amount of information backlog is proportional to the increasing distance between the speaker and the interpreter. In other words, the further behind we fall, the more information we must store in short-term memory. And the greater the memory load, the greater the stress under which we work. We therefore catch up with the speaker as quickly and unobtrusively as possible, but without losing any substantive portion of the message.
As to the relationship between the length of EVS and the quality of SI there are opposing ideas: Moser-Mercer (1997, as cited in Lee, 2002) reported that expert interpreters tend to opt for a longer EVS as they have a more comprehensive, micro view of the evolving message. In line with this, Massaro and Shlesinger (1997, as quoted in Lee, 2011, p. 154) “contended that novice interpreters tend to favor short EVS, failing to make effective use of the text-schematic knowledge available for longer EVS.” On the other hand, Barik (1973, as cited in Lee, 2002, p. 602) argued that the “interpreter will perform better in terms of omitting less material if he does not lag too far behind the speaker.”
The most important thing in the case of the length of EVS seems to be the ability to strike a balance between the benefit of getting more information and the risk of a cognitive breakdown due to the complexity of all the operations required. “Since no interpreters engaging in SI would wait for the next incoming sentence without delivering the interpreted version for the current sentence, beginning too late means the second sentence pushes into interpreters’ ears even before they finish delivering the first sentence. Thus the interpreter’s capacity diminishes and the final quality suffers if unable to handle the demand.” (Lee, 2011, p. 154)
Lee (2011, p. 153) makes an interesting observation with regards to the relationship between TTS and the quality of SI, or more specifically the relationship between TTS and omission of certain parts of the ST in the interpretation:
[…] long TTS hurt the quality of the sentence being interpreted likely because longer TTS indicates that the interpreter needs more time and processing capacity for the sentence in question. Therefore, when an interpreter stays with a certain sentence longer than allowed, either due to the problem of understanding or