Analysis of maritime team workload and communication dynamics in standard and emergency scenarios

Lochner, Martin; Duenser, Andreas; Lutzhoft, Margareta; Brooks, Ben; Rozado, David

doi:10.1186/s41072-018-0028-z

Original Article
Open access
Published: 23 February 2018

Analysis of maritime team workload and communication dynamics in standard and emergency scenarios

Martin Lochner¹,
Andreas Duenser¹,
Margareta Lutzhoft²,
Ben Brooks² &
…
David Rozado³

Journal of Shipping and Trade volume 3, Article number: 2 (2018) Cite this article

7707 Accesses
15 Citations
1 Altmetric
Metrics details

Abstract

The introduction of next-generation technologies to the maritime shipping industry, including Portable Pilotage Units, Remote Pilotage, advanced situation awareness aids, and Autonomous Shipping, creates an urgent need to understand operator workload during Bridge Team operations, and co-operations with shore based personnel. In this paper we analyse mental workload of maritime Captains, Pilots and Tug Masters during standard and emergency scenarios, using traditional measures (SWAT, ISA), communications analysis, and the collection of simultaneous electro-dermal activity (EDA) of team members. Results indicate that the EDA measure overcomes some of the problems with paper-based techniques, and has excellent temporal resolution for emergency events. Implications for testing of novel technologies are discussed.

Introduction

The need to understand operator workload is a key requirement across numerous sectors, including maritime shipping (Lützhöft et al., 2011), nuclear power operations (Sheridan, 1981), air traffic control (Loft et al., 2007), driving (Trick et al., 2009), and many other contexts that impose a high demand on the human attentional system. While the physical elements of workload are generally well understood (e.g. De Zwart et al., 1996), the concept of cognitive workload, or mental workload, is less self-evident. One classic definition of mental workload by Hart & Staveland (Hart & Staveland, 1988) is “the perceived relationship between the amount of mental processing capability or resources and the amount required by the task”. Human attention is by nature a limited resource, and decades of research have been conducted into its strengths and its limitations. We have a remarkable ability to divide attention across multiple foci, both in physical space, and conceptually. Nevertheless, under certain conditions our comprehension of a situation can break down, with the result being that accidents happen, causing damage to property, human life, and to the environment. In the context of shipping and trade, developing a clear methodology for measuring maritime operator workload has the potential benefit of improving efficiency and safety, by better understanding the human error component that is common in many maritime accidents. In addition to understanding workload of specific individuals during emergency events, the research reported here also investigates how members of a maritime operations team, including tug operators, Vessel Traffic Service, pilots, and the bridge team react together to deal with emergency events. Our results have implications when considering how to test human performance with novel technological systems, both on the ship’s bridge and at remote locations such as a tug, or VTS facility.

Efficiency and safety have long been key drivers of change in the maritime industry. Because of the large volumes and profits involved, and the critical nature of maritime accidents, technological solutions to age old problems of navigation have been employed to various degrees during the past half-century. Technological systems including RADAR, SONAR, GPS, VTS, ECDIS, AIS, and others, have benefitted maritime operations in many ways, but have not necessarily resulted in lower operator workload. Rather, in many cases the result has been just the opposite, where operators in high workload situations have a tendency to ignore maritime decision support systems (Grabowski & Sanborn, 2001). Furthermore, there are examples of technology contributing to failures, for example RADAR (Andrea Doria-Stockholm incident, 1956) and ECDIS (Ovit incident, 2013). To give an idea of the hybrid complexity that can exist on a ship’s bridge, one of the present authors (Lützhöft & Nyce, 2014) reports that a container vessel that was manufactured in the 1960s, and which had been converted to a passenger liner in 1990 prior to being inspected by the author in 2001, had an assortment of 15 different manufacturer’s brands on the bridge equipment and an offshore supply ship built in 2005 had close to 30 brands. The integration work required to safely operate such a system is a clear strain on the operator’s physical and mental capacity (Lützhöft, 2004). It is no surprise then, that a large proportion of modern maritime accidents is attributed to human error, which in turn has been directly linked to mental workload (Hetherington et al., 2006). Note, that this in turn does not necessarily mean an overt mistake was performed by a human.

While there is some general agreement that mental workload is the culprit in many maritime accidents, and thus should be the subject of investigation, there is no such concord on the best way to operationalize the concept of mental workload. A number of methodologies have been employed to this end, each of which has advantages and disadvantages (Tsang & Vidulich, 2006). For the current research we first provide a review of the main methods available in the literature, discuss the most commonly used techniques, and provide a rationale for our use of a simple and effective electrophysiological technique known as Galvanic Skin Response (GSR), or alternately as Electro-Dermal Activity (EDA).

We present three case studies in this paper. In each case, the participants were experienced maritime professionals consisting of Ships Master / Captain (responsible for safe conduct of ship), Pilot (a local addition to the bridge team, who in practice takes over the manoeuvring and leads the communication), Tug Master (tugs are small powerful vessels that assist in manoeuvring large ships in restricted waters, either connected by rope/wire or pushing), Helmsman (performs the steering, on orders from master/pilot but no other tasks), and Vessel Traffic Service (VTS) Operator (VTS is a shore-based information service, much like air traffic control but with no mandate to give orders). First of all, we present an analysis of operator workload using the ISA (Instantaneous Self Assessment) and the SWAT (Subjective Workload Assessment Technique) that are commonly employed in the literature (Cain, 2007). These measures were chosen for their prevalence in the workload literature, and because they are straightforward to administer and analyse. Second, an analysis of communications patterns during emergency manoeuvres is presented as an additional means of understanding operator workload within the maritime environment. These studies illustrate some of the drawbacks to using the standard ISA/SWAT methodology, and provide some insight regarding communication patterns during an emergency event. Finally, we conducted a series of maritime operations while collecting GSR/EDA measures for the key team players: the Captain/Master, the Pilot, and the Tug Master. The use of GSR/EDA measurement allowed us to collect workload measures from a distributed team of maritime personnel as they performed routine and emergency manoeuvres in a large maritime ship simulator. It has the clear advantage of detecting the onset and relative level of operator stress (a robust correlate of mental workload), and further, of capturing this information for multiple individuals within a distributed team operating environment.

Mental or ‘cognitive’ workload

The ability to assess and understand human performance, particularly in critical tasks where the actions carried out have major significance for safety and productivity, is a long standing goal of human factors research. Stemming from capacity-based models of human cognition, e.g., (Wickens, 2008; Baddeley, 1992), the concept of cognitive workload is based on the notion that as task demands increase, the individual is required to exert an increasing amount of his or her limited cognitive resources to maintain a steady level of performance. Workload has been assumed to follow the ‘Yerkes-Dodson law’ (i.e. ‘the inverted U’) where performance improves for low to medium levels of workload, but drops with higher workload levels (Staal, 2004). Increasing evidence, however, informs us that the true pattern may vary depending on the type of activity, and environmental characteristics. In terms of human cognition, there is evidence that performance tends to decrease in a linear relation to workload (Marshall, 2002).

Individual workload has been assessed in many ways, and a number of detailed reviews are available (e.g., Tsang & Vidulich, 2006). We will briefly mention the main categories, and discuss their applicability to studying team workload in a safety-critical environment – namely the ship’s bridge. First, primary measures of performance on the given task can be used to infer workload. This means a direct measure of performance on the task of interest, with the notion that decreased performance indicates high workload. It should be evident, however, that task performance can be affected by other factors besides workload (e.g., competence, distraction, equipment failure, etc.). Further, a performance failure in such environments could be catastrophic, and it is therefore desirable to have an alternate measure to pick up load before a failure.

Secondary task methods of measuring cognitive workload involve the addition of a so-called secondary task, performance on which varies depending upon the hypothesized ‘spare capacity’ remaining for the user. Wickens’ (Wickens, 2008) influential Multiple Resource Theory (MRT) takes advantage of this framework, and evolves the concept to include separate resources for different processing modalities such as for visual versus manual information. MRT has empirically shown that processing resources are parsed along the lines of modality, where a visual task and a manual task, for example, may be performed without immediate processing conflicts. Despite such successes, however, the inclusion of secondary task is not generally ecologically valid - for example requiring a participant to detect the onset of a peripheral light while executing the main task concurrently - and can be assumed to impact performance on the primary task. As such, the addition of a secondary task may be considered impractical for assessing performance in a safety-critical environment.

A very popular method of evaluating user workload is to employ subjective questionnaires which are administered to the participant either during^{Footnote 1} or after^{Footnote 2} the activity of interest. Such methodology is wide-spread in the literature (Funke et al., 2012) provide a review including 18 such studies, spanning from 1987 to 2010; and this is certainly only a cross-section). While easy to administer, and relatively informative and easy to interpret, such methodology has some major problems. In the case of the mid-trial tasks, the questionnaire breaks up the flow of the task – again impacting ecological validity, and limiting its usefulness in real-world situations. The post test measures are also troublesome, as the results are based on the recall of mental workload, rather than an immediate index at the time of interest. In both cases, the tests are not generally suitable for mission-critical situations where the task flow cannot be interrupted.

A solution to these difficulties exists in physiological (Tsang & Vidulich, 2006) or physio-behavioural (Funke et al., 2012) measurements. A number of techniques available today can directly tap into the physiological signals of individuals, thus gaining insight regarding physiological states, which have been previously shown to be associated with the concept of cognitive workload (Funke et al., 2012). These include but are not limited to: brain imaging techniques (functional MRI, Positron Emission Tomography), electrophysiological techniques (electroencephalography, electromyography, galvanic skin response / electro-dermal activity), respiration, heart rate, heart rate variability, and blood flow. Although almost any measure of physiology can be a potential indicator for stress, and accordingly, workload, the number becomes more restricted when we consider that the measure is desirably carried out within the expected operational environment, with no (or minimal) impact on the ecological validity of the task (i.e. minimally affecting the standard or typical operation). In short, we want to test workload without interfering with the task. To this end, in the current set of studies we first show how typical measures of workload perform in our operational context. Second, we develop our understanding of workload in the maritime operations context by investigating communications patterns within the maritime operations team. Finally, we investigate EDA as a simple but effective technique to measure workload.

Team workload

While individual cognitive workload (Sweller et al., 2011) has been considered extensively over the past 20 years, research into workload within a team setting is still gaining momentum in the literature (Funke et al., 2012). Despite the possibility that factors influencing team workload may be characteristically different (or more complex) from those impacting individual workload, it is often the case that team workload is nevertheless measured using an amalgamation of the individual workload measurements. For example, a researcher may collect workload ratings from each member in the team, and create a team average from these ratings. The applicability of individual workload measures to a team setting is open for discussion; however, well-accepted measures of team workload are currently unavailable. Although there are many ways to assess individual workload, the suitability of these for assessing team workload in a realistic setting must be considered. In this research, we employ simultaneous measurement of workload for team members, and use a team average for comparison across workload levels.

Simulation #1: Using ISA / SWAT to measure team workload

Much of the prior research into cognitive workload has employed subjective paper-based tests of workload, as described above. In order to determine the suitability of such questionnaires for measuring team workload in a maritime operations setting, we tested the maritime operations team (Captain/Master, Pilot, Tug Master and Helmsman) within standard training runs in the maritime full-mission simulator, using both the ISA and the SWAT metrics.

Method

Participants

Participants were all experts recruited for the study. Participants consisted of a distributed maritime operations team, including an experienced Captain (male, 31 years old, 5 years experience in current role), Pilot (male, 52 years old, 10 years experience in current role), Helmsman (male, 65 years old, unknown years experience), and Tug Master (male, 50 years old, 2 years experience in current role).

Apparatus

The simulators used were the Australian Maritime College (AMC) full mission ship’s bridge and tug simulators.

Kongsberg Full Mission Ship Simulator: The current simulation was carried out in a Kongsberg custom maritime simulator at the Australian Maritime College, in Launceston, Tasmania. This simulator suite includes a maximum of (Marshall, 2002) separate operational rooms, seven of which (excluding the main bridge and main control room) can be setup in various arrangements, such as tugs, secondary ships etc. (Fig. 1).

Design and procedure

Two consecutive days of testing were undertaken in the AMC maritime simulator. For each day, participants went through two runs, one of high workload, and one of low workload. In this case, workload was specifically controlled by manipulating simulator parameters such as wind speed, current (and therefore drift), as well as concurrent maritime traffic. Both ISA and SWAT scores were collected. The ISA is a single numeric workload rating (from 1 to 5) requested from the participant at intervals during the run, while the SWAT is more complex, involving ratings in three categories, ‘Time Load’, ‘Mental Effort’, and ‘Psychological Stress’, with ratings taken both during and after the run. For day one, ISA measures were taken at 5 min intervals, and the SWAT measure was taken once during the run at approximately 30 min, and once immediately after the run. For day two, ISA measures were taken at 3 min intervals. Due to poor feedback on the first day, the SWAT measure was not collected on day 2.

Results

The scores for Day 1, Low vs. High workload are graphed below. In each case, the ISA score is on the y-axis, and the time in minutes is on the x-axis. Low-workload runs are displayed on the left, and high workload on the right. (Fig. 2).

ISA analysis

As a measure of team workload we compared the day 1 mean team ISA scores for the low (M = 2.76, SD = 0.53) and high workload (M = 2.62, SD = 0.59) runs using a t-test. The scores did not differ significantly (t = 0.37, df = 6, p = .72). Similarly, team ISA scores at day 2 did not differ significantly (t = − 1.39, df = 12, p = .19) between the low (M = 2.18, SD = 0.31) and high (M = 2.47, SD = 0.46) workload runs. Because the Day 2 results were non-significant, the graphs were excluded from this text, and are available in Appendix.

SWAT analysis

We analysed the SWAT scores using a 2 × 2 mixed model ANOVA with the time at which participants filled out the questionnaire (during run / after completion) as repeated measures factor and low / high workload as between subjects factor and mean team SWAT scores as dependent variable. While both main effects were not significant, the interaction between both factors was significant (F_{(1, 5)} = 12.71, p = .02). This interaction is visualised in Figs. 3 and 4. (Table 1).

Table 1 SWAT ratings for all runs. Measures were collected during each run at approximately 30 min in, and immediately after each run. * indicates missing scores

Full size table

Discussion

ISA measures

Measuring of workload ratings using this methodology was a challenge in the simulator, and it is likely that this difficulty would increase in real life operational settings. One issue is the time-stamping, or coordination, of data across experimenters at each station. Although the administration of the ISA demands rigid time intervals (e.g. 5 min) between measurements, these fluctuate slightly depending upon the availability of the participant to reply to the verbal prompt, and the ability of the experimenter to administer timely cues. One possible remediation for this difficulty would be an automated logging system for workload ratings – though the user would still need to respond to the system at the appropriate time. One can likewise infer that the simple act of verbally prompting a participant for his / her workload may impact the operation currently underway.

Despite this issue, the ISA appears to have worked well, in that it shows coherent gradients across users during the simulation run, and that the pattern of performance differs in the ‘high workload’ and ‘low workload’ runs. Although the high and low workload conditions are statistically equivalent when comparing the average score over a run, a look at the graphs above, particularly the Captain and Pilot, indicate a very different pattern between the High and Low workload conditions. Specifically, in the Low load conditions, performance for the pilots and captains appears to peak during the middle of the run, whereas in the High load conditions the ratings continue to rise until the end of the run. This pattern is not evident for the helm and tug operators, indicating that perhaps they were not as affected by the workload manipulation.

There were indications that the 5 min intervals may be too long. Specifically, a brief, but nonetheless serious, incident can start and finish within a 5 min time period, and be missed completely on the rating scales. This was seen to happen at the 11:28 mark during the High workload condition on Day 1. In this case, the vessel crashed into the breakwater, and the Pilot voluntarily reported that he was momentarily at a workload of 5. This was missed by the standard ISA recording interval of 5 min. Reducing the interval, however, does not seem to be a viable solution, as verbal feedback from the participants on day 2 indicated that the 3-min ISA scoring interval was “annoying”, because it was given too often.

The ISA scoring 1–5 was considered by one participant “straightforward and subconscious”. The scale steps 1–5 were judged easy to remember but not fine-grained enough. Some ratings were between scale steps, which may indicate a preference for a more finely grained scale. Finally, another issue to consider is that the verbalization (speaking out loud) of individuals workload scores may have had an influence of other team members interpretation of the workload occurring at any given time.

SWAT measures

The SWAT measure was taken twice for each run: once during the run (at approximately 30 min into the run), and once after the run had completed. No visible relationships were evident, either to the ISA measures for the same run, or to operational environment of the bridge. Scoring on the SWAT was marginally lower in the high workload run, however this effect was not significant. This pattern is the opposite as that found in the ISA scores, which correctly identified the high workload runs. It must be considered, however, that the SWAT was not administered close to any high workload situations.

Overall, participants regarded the SWAT measure as too complex and wordy for quick judgements, i.e. during navigation. Further, it was perceived to interrupt the workflow to a greater extent than the ISA. One contradictory comment by a participant was that the SWAT measure works well – in particular that the 3-level scale was good, but that the description had too many words.

The statistical analysis showed an interesting interaction, which, with the amount of data collected in this study has to be interpreted rather cautiously. The results indicate that while during the simulation SWAT scores were slightly lower in the high workload condition and slightly higher in low workload condition, this was reversed after completion of the simulation where higher SWAT scores were given in the high workload condition and vice versa. From this it can be concluded that, after the run had completed, the SWAT was able to differentiate between high and low workload conditions, but that it was less successful when administered during the run.

Summary

In summary, though these measures were able to capture the workload of participants in the maritime simulator, it is evident that there are problems on a number of levels. Specifically, the measures themselves may impact workload; the measures were either too far apart to be meaningful, or so close in timing as to seriously disrupt the tasks being measured. The SWAT was unable to capture workload differences between the runs; and finally, it was considered to be too wordy and complex for any rapid assessment of workload in an operational environment. Given this assessment, we undertook to develop more sophisticated methods of workload analysis for operational team environments.

Simulation #2: Using communications patterns to measure team workload

In order to further develop our model for team interactions and workload on a sea-going vessel’s Bridge, between the Maritime Pilot and ship’s Captain, as well as the Vessel Traffic Service (VTS) and auxiliaries such as local tug boats, we deployed a team of 5 researchers during a port operations training simulation being undertaken by a local Port Authority. Researchers logged the verbal, VHF, and local intercom communications from all parties during an 80 min simulation.

For this training operation, a fairway transit into the port of Melbourne, in the state of Victoria, Australia, was chosen. In addition to a team of 5 researchers from our group, there was a multi-disciplinary group from Melbourne Ports, including a Captain and an Officer of the Watch, one Helmsman; two VTS operators (one experienced VTS operator and one trainee); and a control-room based operator for the two tugboats. The purpose of the simulation was to model a standard entry into the port for a Panamax container ship, the Offen 4100. The vessel transit was aided by two tugs. An additional vessel, the Hual Trooper, was in transit directly behind the Offen 4100.

In order to assess how the team reacted to an unexpected serious event, a key emergency situation was timed to occur during a critical stage of the transit. In this case, the Offen 4100 experienced a main engine failure (bow thrusters remained operational) at approximately 40 min into the run. Incidentally, the timing of this failure was particularly serious, as the vessel was undergoing a turn in the fairway, and an oil refinery was visible on the shore at this point.