Benchmarking bulkers delivered between 2010 and 2016, identifying the effect of the EEDI introduction

The EEDI is one measure to identify the energy efficiency of a vessel. Besides EEDI there are many other aspects relevant when considering the energy efficiency quality of a vessel, most based on design insights. In this paper, a set of ten relevant indicators are identified to establish the performance of the vessel and pinpoint the source of an energy efficiency improvement. These indicators are applied to the dry bulk vessels delivered between 2010 and 2016. Finally, for all indicators with a reasonable fit, the significance of the differences between trend lines is investigated. This results in groupings with equal performance, showing a clear split between vessels delivered between 2014 and 2016 and before. This is in line with the EEDI enforcement starting 2013. However, for the vessels below 125.000 DWT, the effect so far is traced back to be an increase in contract speed only. No other aspects changed significantly enough. Above this size, a reduction in speed and power can be observed, as predicted by many scholars.


Introduction
In the MEPC 72 (Marine Environment Protection Committee, meeting 72) of the IMO (International Maritime Organisation) (IMO 2018), the agreement was reached to reduce Green House Gas (GHG) emissions in 2050 by 50% for shipping. In the same meeting, the extension of the EEDI (Energy Efficiency Design Index) was discussed especially the speed of reduction of its value and a potential phase 4 in its implementation. The current EEDI has an introduction, phase 0 for 2013-2014, where the size dependant EEDI has to be achieved, followed by three phases, each requiring an extra 10% reduction in the EEDI value. The timing of the phases was set to 2015-2019 for phase 1, 2020-2024 for phase 2 and phase 3 thereafter. The phase 4 reduction would lead to the 40% reduction already mentioned. The EEDI, which is mandatory for new vessels since 2013, has been criticized extensively in the literature (Devanney 2010a(Devanney , 2010bDulebenets 2016;Papanikolaou 2014;Randers 2012;Stevens et al. 2015) as it singles out speed reduction as a preferred measure to comply with the index. This may lead to dangerous situations in the future where underpowered vessels run aground in a storm. With bulkers and tankers being the slowest ships currently, the chances are higher for these segments than others. Besides this, it is often stated that a speed reduction across the entire fleet will increase the number of vessels required to supply the same transport capacity, although this does depend on more aspects, such as the activity rate of the ship as vessels do not sail continuously. Stevens et al. (2015) use a model based on physics of the ship's drive train to show that EEDI targets fuel consumption, rather than emissions. The focus is on installed power, not on efficiency at a certain sailing speed, while engines outside of the optimal load, could be a lot less efficient. The main cause for the focus on speed lies in the relation between fuel consumption and speed known as the Admiralty constant, speed is included here to the third power. This may cause EEDI to miss its intended effect in the end, as demonstrated by Devanney (2010a); a VLCC with a lower EEDI, used more fuel in practice. All this makes it rather uncertain if a reduction of 50% on emissions for all shipping by 2050 will be achieved.
Nevertheless, the goal of the IMO is important and warrants investigation. Bouman et al. (2017) did extensive research on 150 papers, suggesting improvements to the energy efficiency of ships. In many cases, there was a significant spread in the potential of the solution. Especially the higher reported values rely on a combination of one or more measures with an alternative fuel. Such a group of improvements were often studied together, not individually. This requires complex models (Calleya et al. 2015;Chatzinikolaou and Ventikos 2015) to be estimated correctly in the early stages of design. Bouman et al. (2017) further identified 14 papers that compared the total reduction in CO 2 emissions for shipping to the Business As Usual (BAU) scenario such as first extensively researched by Buhaug et al. (2009). The averages come close to the desired 50% reduction in emissions. Even though many studies (Crist 2009;Eide et al. 2009;Eide et al. 2011;Faber et al. 2011;Longva et al. 2010) show that several GHGabatement options are even profitable when applied, uptake seems low (Pruyn 2017). This may be because of too high fuel price in the research Buhaug et al. 2009;Eide et al. 2011;Faber 2012), hence later researchers implemented large ranges (Bouman et al. 2017;Lindstad 2013). Research (Rehmatulla et al. 2017;Rehmatulla et al. 2013) also showed the reluctance of industry to implement seemingly beneficial measures. In their study measures with significant negative marginal costs were only applied to about 50-60% of the fleet, instead of an expected 90-100%. Access to capital, pay-back uncertainty, lack of incentives and lack of reliable information on the costs and savings are named as excuses for not implementing them. The lack of incentives was further elaborated by Rehmatulla et al. (2013) and Kosmas and Acciaro (2017). Finally Armstrong and Banks (2017) describe a mismatch in priorities, interpretations of data and incentives for different stakeholders involved in the finance, operation and commercial exploitation of the vessel as a possible cause for this.
In this paper bulkers are investigated, although the same approach could be used for tankers or container ships, bulkers were selected for two main reasons: it is the largest fleet in numbers, so more data points are available. It is also the most accessible fleet, which means more data is available on these ships then e.g. containerships, which are operated by a small group of large companies. Finally, but not a key factor, almost all vessel designs are yard designs. So many ships are comparable and show slight improvements/adjustments over the years. On the other hand, these yard designs can lead to ship owners, not particularly interested in Green House Gas (GHG) emissions, overlooking measures that have been implemented to improve the performance. This is also demonstrated by e.g. Frouws (2018) and Chen et al. (2010), who showed that ship owners tend to overlook important physical qualities of the ship when selecting a ship to buy. This may result in surprises for the performance after delivery. Both papers have therefore recommended several indicators to involve when considering the quality of a ship. Pruyn (2017) did something similar in his research into the visibility of ecobulkers in the qualities of the bulk fleet. This paper will identify and expand on the number of measures used in these papers, as well as identify which efficiency improvements the indicator can measure in the next section. It is the intention to use these measures to further identify the improvements realised through the introduction of the EEDI. In Section three the data openly available will be discussed and the suggested adjustments to the measures to make them applicable to almost all vessels in the world. Section four will then apply this set of benchmark indicators to the dry bulk vessels delivered between 2000 and 2017 and present the results of the benchmarks. Finally, section five will contain conclusions on the usefulness of the selected benchmarks.

Indicators for ship efficiency
The first measure of a ship's efficiency was already discussed in the text above; the EEDI. It measures the relation between CO 2 emissions, through installed power (main and auxiliary engines) and the cargo transportation capacity. It therewith only accounts for the consumption under sail and does not address port consumption directly. The basic formula for this is represented in Eq. 1 in Table 1. As discussed, this ratio is very sensitive to the design speed, however, it should react to other improvements as well, such as waste heat recovery or lower resistance of the hull. To broadly and quickly apply the full EEDI calculation is difficult, as it required detailed information on all engines and their data. Therefore, it is not uncommon to use a simplified version, which only focusses on the installed power and transport capacity. This version is presented as Eq. 2 in Table 1. With the loss of the specific fuel consumption and carbon factor, this formula also loses quite some sensitivity. It will still be able to establish if the vessel design is efficient, but improvements in engine performance, as well as the use of alternative fuels, are not identified.
A second, much older, relation between installed power and the size of the ship is Admiralty constant (Eq. 3), developed by the British Admiralty in the early nineteenth century (Papanikolaou 2014). It divides the displacement to the power of 2/3rd multiplied by the speed to the third power by the installed engine power. The first term translates a volume to a surface which relates the friction. The second term expresses the relation between speed and power for vessels. This means that the admiralty constant is an efficiency value. The nominator values express a clear relation with expected power installed and the denominator is the actual installed power. The lower the value of the resulting constant, the more efficient the ship is. In this formula the size of the vessel is taken into account, however, the auxiliary power is not considered, which means that the efficiency measures not directly related to the propulsion of the vessel, are not monitored by this indicator.
Both Chen et al. (2010) and Frouws (2018) evaluate the design of the vessel by considering three other indicators: A power or consumption ratio (Eq. 4), a block coefficient (Eq. 5), a lightweight (LDT) ratio (Eq. 6) and a Speed ratio (Eq. 7). Each will be shortly discussed here. The first indicator discussed is the power or consumption ratio. The admiralty constant discussed before, showed a clear link between size, design speed and installed power. However, this primarily concerns low-speed resistance. Using a large number of towing tank tests, Holtrop and Mennen (1982) were able to come up with statistics for the total resistance. This estimation of the resistance is translated to a required installed power (or consumption, using a fixed specific fuel consumption). This can then be compared to the actual installed power of the vessel. This way it can   be made clear if a vessel is above or below this estimation, independent of size and speed. This in turn makes comparing trends easier. This ratio does therefore capture all improvements related to the propulsion of the vessel, be it on the weight or shape side or on the engine side. The block coefficient is an expression for the shape of the vessel, if the vessel would be a rectangular pontoon, this coefficient would be one. For each reduction of volume, compared to the full block formed by length, width and draft, this value will become smaller. In general, the smaller the block coefficient, the more streamlined a vessel is. This means that at the same speed it will generate smaller waves. This relation has given rise to the correlation between speed and the block coefficient; a higher design speed requires a lower block coefficient. To calculate this ratio either the displacement or the LDT of the vessel is required. The displacement is the total volume of water displaced by the weight of the fully-loaded vessel. The displacement can be subdivided in lightweight (LDT) and deadweight (DWT). DWT is the carrying capacity of the ship, LDT is the total weight of the empty ships, it includes all structures, machinery and none consumables. The lightweight ratio also uses LDT of the vessel in combination with an estimation of the LDT used often by designers and described by Watson (1998). It uses an estimation of the block coefficient to estimate the LDT. This does mean that the outcome is based on two approximations, potentially increasing its uncertainty. Further inputs required to calculate these are the length, width, draft and depth of a vessel as well as main engine power and rotations per minute (rpm) of the engine. This last element may also require significant detail in the investigation, though can perhaps also be estimated based on common engine characteristics for that period.
A ratio that is introduced by Chen et al. (2010) and Frouws (2018) is the speed ratio. In this case, the actual design speed of the vessel is compared to the speed the vessel would have at the average Froude number of the fleet (Eq. 8). Vessels with the same Froude number will have the same wave pattern (not the same wave resistance). This pattern is a relation between the speed of the vessel and the length of the vessel. Smaller ships, at the same speed, have a higher Froude number. The speed ratio is a correction on the average wave-making resistance of the fleet, compensated for speed and length. Both Eq. 7 and 8 rely on the Froude number, Eq. 7 could easily be rewritten to the ratio between the actual Froude number and the average Froude number. This means that the values are different, but the trends that may be discovered are identical. With Eq. 7 giving a more easily interpreted output, this will be used in this paper.
Finally, in his investigation Pruyn (2013) further checked on the shape of the vessel using two other indicators than the block coefficient and the Froude number, which were already mentioned; Length over width (Eq. 9) and Length ratio (Eq. 10). In general, it can be said that a slenderer vessel (longer length, compared to width and draft) has a lower wave-making resistance. A downside of a longer, slenderer vessel is that the total wetted surface will increase. This means that this lower wave-making resistance comes at the cost of higher frictional resistance. Therefore, at lower design speeds a more voluminous vessel is preferred. On a side note, this also implies that slow steaming fast vessels are not as energy efficient as fuller vessels designed for the lower speed, as their frictional resistance is lower. This trade-off between frictional resistance and wave-making resistance has led to the introduction of a final ratio to consider, namely the wet surface ratio, a formula for such a ratio is already in use for a long time (Lewis 1988) and links the wetted surface area to the displacement and length of the vessel (Eq. 11). The lower this value, the lower the frictional resistance of the vessel should be.
From the discussion above and the summary in Table 1, two elements should become clear; first of all, it will not be possible to benchmark vessels or a fleet using one factor. A combination of several, if not all, of the ten factors discussed, should be used to pinpoint developments over time or sources of differences between individual vessels. Secondly, two improvements to a ship mentioned in Bouman et al. (2017) will not be measured by these factors at all. These are ballast water reduction and cold ironing. All formulas consider the design condition, which is the laden condition of the vessel. For the conditions of an empty trip, there is not enough information available. Yet this is exactly one of the most heard considerations in the design of vessels; vessels should not be optimized for one condition, but for the combination of all prevalent conditions during their lifetime (Andrews et al. 2018;Duchateau and Hopman 2016;Ölçer 2008;Papanikolaou 2010). Perhaps this change in insight will increase the information requested and provided in the future. Until then such considerations cannot yet be monitored for large groups of ships, as is the main consideration here.

Considering data availability in relation to the selected factors
The set of ten equations presented in section two require specific data of the vessels to be calculated correctly. This section checks the availability of this data with the Clarkson World Fleet Register and discusses solutions to issues found before continuing with the actual benchmark. Not all particulars are required, as Eq. 4 shows relations do exist between several variables, which reduces the total number of inputs required. This leads to the following data being required, based on the formulas in Table 1: Length (L), Width (B), Draft (T), Deadweight (DWT), Design Speed (V), Main Engine Power (PME), Auxiliary Engine Power (PAUX), Wetted Surface Area (WSA) and if the full EEDI is considered, Specific Fuel Consumption (SFC) of the engines. Besides these particulars, one of the following is needed as well: displacement (volume or weight), block coefficient or lightweight (LDT).
Furthermore, for the estimations of power and LDT, another set of extra inputs are required. Especially the method of Holtrop and Mennen (1982) requires the knowledge of the prismatic coefficient, which can be calculated using the block coefficient and the midship coefficient. The last one is the midship area divided by the width times the draft. Besides these coefficients also the length centre of buoyancy, entrance angle at the bow and depth of the vessel are required. As for the lightweight estimation (Watson 1998), both the block coefficient and the rpm of the engine are extra inputs that are required.
As already mentioned lightweight or block coefficient is not readily available, in the database by Clarkson (2019) about 5% of the ships include this information. The other two coefficients (prismatic and midship) are not taken up in the vessel database at all. The same goes for the wetted surface, entrance angle and length centre of buoyancy. Besides these particulars, if the engine type is known, both for the main and the auxiliary engines, the rpm and SFC could be looked up in the datasheets of the engine provider. However, looking up data for thousands of vessels may be rather cumbersome.
The only factor for the estimations of power and LDT readily available is the depth (D).
In summary, the introduction of the estimations poses some issues with data availability. Chen et al. (2010) solve this by fixing values for the unknown particulars, such as the midship coefficient, rpm, length centre of buoyancy and entrance angle. This may introduce small biases, but in bulk ship design, no major variations of these values are expected, or their influence on the outcome can according to them be considered small enough. However, for the block coefficient, this is not possible. Fortunately, the work of Watson (1998) also provides an estimator for the block coefficient. LDT can be determined as DWT is readily available to us. This block coefficient estimator only requires the Froude number as input, the data to calculate this variable is readily available, and the second estimation can thus be skipped. With lightweight available as an absolute value, the next step can be taken, as the ratio between lightweight and cargo carrying capacity is of more interest from an efficiency perspective than lightweight by itself. When dimensions and block coefficient are equal, a lighter ship can carry more cargo. One should consider that this value is in the range of 0.10-0.30, with the higher values for smaller vessels, due to the different relations for size and weight. Finally, with the estimation of the block coefficient, this value is lost as an indicator for efficiency, as it is fully dependent on design speed and length, which are also individually investigated.
A similar discussion can be held for the power estimation of the ship. The estimation of power links inputs for the form of the ship (both wetted area and shape) to estimations for frictional and wave-making resistance. However, at the design stage required power is an unknown. Once build, installed power is known. The main reason for Chen et al. (2010) to use these estimators, is to make power and lightweight dimensionless over a wide range of vessels sizes and design speeds. This, however, is not strictly necessary. It will make a comparison over years more susceptible to variation in size and design speed, but in a detailed benchmark comparison, this will be corrected for. So for this research, the actual power is used not a dimensionless value.
Auxiliary engine power is still required for one of the equations but can generally not be found in the data available. However, this is not a real issue as the IMO (2012) has set a fixed calculation for this aspect for their formula in the MEPC 62. This means that this value is also not required and that this element can be replaced by a formula based on vessel size (in DWT) and main engine power.
Finally, the wetted surface area (WSA) remains to be determined. This requires the drawings to be done precisely, which of course is not an option at the fleet level. Several formulas exist, the most popular being the Denny-Mumford estimation and that of Holtrop and Mennen (1982), as part of their power prediction equations. The latter has been discussed and cannot be used. The Denny-Mumford formula only requires length, displacement and draft, yet is seemingly just as accurate as Holtrop and Mennen (Moser et al. 2016). This estimation of the wetted surface will, therefore, be used in the benchmark study. This means that several factors will be replaced or dropped, these changes are summarized in Table 2 below, except for the fact that Eq. 1 will be dropped completely.

Data validation
Before performing the benchmark, the data was extensively checked. Missing data and outliers were cross-checked with other available sources (Fairplay 2015; Shipvault 2015; Traffic, Marine 2019; Vesselfinder 2019), to identify if this peculiar value was persistent or more likely an entry error. The checks included also a visual inspection if photos were available. If the value was suspected to be an error, the value was adjusted accordingly. To check the data the Length (L), Width (B) and Draft (T) were checked against the deadweight (DWT), Main Engine Power (PME), Froude number (Fn), lightweight (LDT), length over width (L/B), the EEDI and the area ratio to check for any outliers. The resulting data and graphs can be found in the Additional file 1 online Benchmark Data and Graphs.

Benchmark results
After the data validation, the final step is the benchmark. For each of the ten variables the values and trend over the years 2010-2016 will be presented first. This should give a first indication if the value of the indicator has improved over time or not. Most relations between DWT, Length or Speed (Fn) and the selected indicators are in the form of a power relation, Y = a*X b , however, the idea is to use trendline comparison, requiring a linear form. To convert these relations to the linear form the natural logarithm is taken for both the indicator and the explaining variable, Y = ln(a) + b*X. If the relation between the indicator and the explaining variable is relevant for most years, the significance of the difference can be tested. This is done in two steps; first, the analysis of covariance (ANCOVA) is used to test the hypothesis of equal slopes and equal elevation. This test is relatively strict and only allows the testing for all years at the same time. The test compares the individual results with the pooled results (all data in one pool) to see if the group should be considered equal. If it rejects equality, it does not indicate which year(s) are not equal. However, if both the slope and elevation are equal, there is no difference between all trend lines. If the slope is not equal, a Student-Newman-Keuls (SNK) test is executed to find the location of the differences. With SNK there is a risk of falls positives and Tukey could also be selected for a stricter approach, losing some power in the process. It was decided to accept the chance of creating extra groups within the results, over the risk of missing a significant difference. The SNK test provides pairwise insight into the equality of either slopes or elevations Pruyn (2013). This grouping progress is arbitrary to some extent. For this paper, the goal is to form as large as possible a group with consecutive years. Section 4.1 will discuss the first phase of the benchmark, the check on yearly trends. Section 4.2 will continue with the trend line estimation and significance discussion. This is followed by the ANCOVA and SNK test results in section 4.3. Where section 4.4 will discuss the outcome of the benchmarks.
Phase 1: discussion of yearly trends in indicators The following trends will be discussed, DWT, EEDI, Admiralty constant, Installed Power, LDT fraction, design speed fraction, Length over Width, Length ratio and Area ratio. For each year the average and median are presented in Table 3, as well as the coefficient of the trend line. The fitness value R 2 is not provided here, as the trend line is only used to indicate direction, significance and fit will be investigated in phase 2 and 3 of this benchmark. Also, the graphs of the data have been studied, but these are not presented here, to not overly clog the paper. The difference between the median and the average can indicate a difference in the spread of the values, while the trend line coefficient helps identify the trend in the data. This trend line is based on all data points not just on the averages, but of course, a similar trend should be visible in the averages as well.
Both DWT and building time shows a decrease over time, especially the average delivery time reduces from almost 40 months in 2010 to about 30 months in 2016. Of course, the 2009 crisis (Chen et al. 2018;Merika et al. 2019) plays a big role in this picture. Before 2009, yards were getting fuller and fuller, while since the crisis, owners are not ordering many ships, reducing the building time significantly as order books are depleted. A small test was performed by sorting the data sets also by year of contract signing instead of delivery of the vessel; however, the results were similar and as vessels tend to be known for their year of delivery and not their year of ordering, it was decided to stick with the delivery date for this research. Finally, for DWT the decrease is only minimal and most likely due to the decline in the number of deliveries of vessels over 300,000 DWT. EEDI and the Admiralty constant show an improvement over time. The average EEDI is declining over time, meaning less power is consumed per ton*mile. The admiralty constant is increasing, which indicates that the same volume and design speed is obtained with lower installed power, or a higher design speed with the same installed power. Also, it is important to realise that the trends displayed by both EEDI and Admiralty constant go against the trend of a decline in DWT. Larger vessels are more power-efficient in general, so an increase in size, would result in an improvement, without there being an improvement on vessel level. The current trends indicate that there is potentially an improvement at vessel level.
The improved performance could be related to more efficient conversion of energy from the engine to forward speed, but also to a decrease in lightweight (LDT), reducing the total weight to move. Or with the same volume increasing the cargo weight. This will be further investigated in the next phase. The trend in lightweight suggests a decline over time. As stated before this fraction would be higher for smaller vessels, so it seems a genuine improvement over time. This will also be further investigated.
Besides the impact of (extra) weight and installed power, speed is also an important factor in the performance. All of the above could simply be the result of a speed reduction, rather than clever designs. The speed ratio, which increases over time, contradicts this. Therefore, in combination with the variables discussed above, vessels seem to have become more efficient in the past 7 years. This greater efficiency can be achieved in several ways, besides those already mentioned, there is the option of lowering the losses in the conversion from engine power to propulsion and lowering the resistance, through a different shape. The first element cannot be checked with the current benchmark information. For the second option, it would be expected that the vessels are slenderer. To check on this both the Length over Width (L/B) and the Length ratio indicators are consulted. Both trend lines are rather flat. The values of the formulas indicate a total 1.5% increase in both values in 7 years. These trends will be further investigated, but even if differences turn out to be significant, their absolute impact is minimal. In other words, the fleet did not get slenderer over time, so other factors most likely played a role in the potential increase in efficiency.
The Area ratio spread is very small and the trend line is very close to horizontal. In this case, the increase over time is only 0.2% in 7 years. Of course, the fact that the wetted area is estimated and not measured may play a role in this. At least it will make little sense to further investigate this value in this research. Even a significant trend, will not result in an absolute difference that helps identify quality.
In conclusion of this first phase, improvements have been observed for almost all indicators identified, except for slenderness and area. In the second phase trend lines will be obtained for each delivery year against two indications of size, DWT and Length, and one for speed, the Froude number.

Phase 2: OLS estimation of trend lines for each year of delivery
In Table 4 information on the OLS regressions per year against DWT, Length and Froude number are given for all variables. For each input variable the number of observations (N), the fitness value R 2 and the coefficients 'a' (elevation) and 'b' (slope) are given for each year. Due to a large number of observations, a very good fit cannot be expected, as ships are a rather heterogeneous group of elements. However, a very bad fit should still not be accepted. As a group of estimations needs to be considered and some variation is to be expected, the average of R 2 is set to be above 0.5, with no individual values below 0.4. These values are to some extent arbitrary and based on experience in ship design. For the reader to consider; an R 2 of 0.8 is already quite high in design trend line analyses.
With the limits set, it is clear from Table 4 that the relation between DWT and EEDI as well as Length and EEDI meets the requirements, however, the Froude number does not hold enough explanatory power. In Fig. 1 the resulting lines for both relevant combinations are given. For readability, the data points are omitted. Both graphs clearly show the later years resulting in the lower lines over the entire range of ships. This is of course in line with the trend observed in Section 4.1. The additional input the graph provides is that this is the case over the entire range.
To not overly clog the graph the data points are not displayed. To give some insight into the presence of data for Length and DWT, the input to draw the lines is based on steps of 5000 DWT and steps of five meters of length. If in a certain step range no ship was delivered, no point on the graph is calculated either, for each range with one or more vessels a point is placed on the graph.
The second indicator is the admiralty constant, details provided in Table 4. The fitness values here are very low, a maximum of only 0.269 is found with an average of about 0.1. This means that the variations in admiralty constant are not explained properly by any of the values. This is unexpected and not easily explained. It most likely is caused by large variations in design speed for smaller vessels (10-18 knots), combined with the cubic power in the formula. In any case, the admiralty constant is unsuitable to function as a benchmark indicator for bulk ships for the current selection of years.
The third indicator is the installed power. The overview is provided in Table 4. Again the overall picture is similar to that of the EEDI; both DWT and Length are relevant, however, the Froude number is not a very relevant input. Although in this case, it does come closer to the set boundaries. If the graphs of installed power against DWT and length are considered (Fig. 2), the trends of lower power for the same sized ship is visible for the larger vessels (above 125.000 DWT). In case of a 200,000 DWT vessel, the difference is for example about 15% between the lowest and highest line value. As already mentioned earlier, a good way to reduce required power is to reduce design speed as it has a cubic relation to installed power. As the design speed of the vessel may be a good explanation for the observed trends in reduced power, it is considered first, instead of LDT, which will follow afterwards. Table 4 shows the results for the speed ratio (actual speed divided by the speed expected based on the average Froude number). Of course, it does not make sense to use the Froude number as input here, as it is also part of the output directly. For both DWT and length and with the current limits for fitness both results will be considered for further investigation. In Fig. 3 it becomes clear that design speed is not lowered, it is increasing over time, though primarily for the range till 125.000 DWT. This means that the reduction in power is an even greater achievement, though the design speed increase is only in the range of 2-3%, the cubic relation would indicate a further increase in power required of about 5-10%.
Lighter ships can carry the same amount of cargo at a higher speed for lower power. Unfortunately, the fitness values for LDT are all on the low side. Especially the relation between the Froude number and LDT is very low. This is strange as in essence the Froude number (the dimensionless speed) is used as input for the estimation of the block coefficient (Cb). This coefficient, in turn, is used to estimate the displacement, which leads to LDT. So a stronger link between LDT and Fn was expected. Both DWT and length do not have the required fitness. However, their lines run rather parallel (see Fig. 4), which was not the case with other low fitness results.
An explanation for the loss of fitness between Fn and LDT described above could be a significant change in dimensions of the vessel (e.g. caused by higher design speeds), either an increase in DWT for similar-sized vessels or a change in the relation between length and width for vessels of similar DWT. In phase 1 no clear trend over the years in slenderness was found and the expectation is therefore that the trendlines will not differ from each other. However to be certain the relation of length to width will still be investigated here. Table 4 presents the estimation results, while Fig. 5 presents the estimated lines beside a graph showing the averages for each size step. To explain the patterns seen, the important port and canal limits are added to the right-side graph. All relations are highly significant, with R 2 almost always above 0.8. The estimation in Table 4 is based on the real values, not the natural logarithms of length and width as this is a known linear relation. Slopes differ slightly but confirm the steady-state from phase 1.
Both the length ratio and area ratio showed similar behaviour and are therefore excluded from further investigations. This leaves only the EEDI, installed power and the Fig. 3 OLS estimated lines for Speed ratio based on DWT (left) and Length (right) speed ratio to be considered for testing of significance of the differences. Given the trends in Fig. 4, LDT will also be considered for this, though the low R 2 values will make the results most likely disputable to some extent.

Phase 3: ANCOVA and SNK investigation of significant differences
Out of the original ten indicators, only four remain to be tested for significant differences and thus significant improvements in a period of 7 years. For this section, first, an analysis of covariance (ANCOVA) test is performed to see if all slopes and elevations should be considered equal. Thereafter, if the ANCOVA thesis fails, the SNK pairwise comparison is used to see which lines are equal and which lines are not. This results in groups of years that can be considered equal in their performance for that particular indicator. Of course, for each group, an ANCOVA test is performed once more to make sure the results of the slightly less strict SNK-test are valid. For details on these tests please refer to Pruyn (2013).
From Table 5 it should be clear that the ANCOVA test for equal slopes is rejected for all variables except the LDT-ratio. The lower R 2 for the LDT-ratio indeed allowed the ANCOVA test to confirm all slopes to be equal. In all other cases, the period 2014-2016 shows a clear break from the rest. It seems that 2013 is a transition year, the lower regions are close to the improvements of 2014-2016, yet the larger vessels are closer to the state of 2010-2012. This split is not consistent for all variables, some  Length (left) and the pattern of averages for Width against Length (right) variables show more groups than the two mentioned. These differences in the split may be caused by the choice for SNK rather than Tukey, a false positive might result in extra groups being formed. However as the break around 2013 is consistent for all cases, Tukey might reduce the number of groups, but is not expected to alter this aspect of the conclusions drawn. The impact was therefore not further investigated.

Benchmark conclusions
Consulting the figures of the trend lines from phase 2, the following conclusions can be drawn: it is proven that for both situations the years of 2014-2016 have delivered ships with a significantly lower EEDI value and higher propulsion efficiency. This coincides with the introduction of the EEDI in 2013, as ships designed in 2013 are commonly not delivered before 2014. Many researchers (Devanney 2010a(Devanney , 2010bDulebenets 2016;Papanikolaou 2014;Randers 2012;Stevens et al. 2015) predicted EEDI would result in a design speed reduction, however, the data seems to indicate a design speed increase. Taking a closer look at the data the following stands out. 2014-2016 does have a higher design speed for the smaller vessels, at the largest sizes, above 125,000 DWT 2011-2013 build vessels have a higher speed. So, the conclusions drawn before about the increased design speed is only true for the vessels below 125,000 DWT. It could very well be that for these smaller vessels, there was a margin in the propulsion installed. Rather than installing less power, the reported design speed was increased, resulting also in a significant EEDI reduction, at least on paper. This does not mean speed reduction will still be a major part of the solution, it only means they may only be implemented later. The decrease in lightweight was not significant, however with the suspicion of the increase in design speed being only there on paper, the fact that the block coefficient was estimated based on the design speed, would make this apparent reduction susceptible too. To prove the suspicion of higher reported design speeds, would require comparing actual speeds of ships using their AIS-data. This is something for further research.  2010, 2011, 2012 2013 2014, 2015, 2016 Conclusions The implications of this paper are twofold; Currently, the main considerations when buying a vessel are DWT, price and perhaps EEDI, due to regulations. However, considering the goals of the IMO (2018), this is not sufficient. These measures do not focus on energy-efficient vessels, nor promote the implementation of CO 2 reducing measures. As a policy recommendation, it would be beneficial for shipping to focus on true improvements of the fleet as well as the individual vessels. The indicators identified in this paper can be of use, especially those concerning the weight of the vessel and the improvements in the speed-power relations. A stronger focus on these values for individual vessels will lead to more efficient ships, but still, leave the design speed open to be selected based on route qualities. The second implication of this paper is the unmasking of a trend to report higher design speeds for vessels up to 125.000 DWT. To the author's knowledge, this was not foreseen in any of the papers dealing with the EEDI, however, from a practical point of view, it makes sense. There is always a margin for error taken up in the design and in a time of crisis, yards would be willing to reduce this margin, especially as new regulations will provide some legitimacy to the effect it brings, a lower EEDI, with no loss of design speed or performance and most importantly at no extra costs.
The identified benchmark indicators can also be used to identify differences between yard designs, when used by yards, or to check individual vessels by ship owners comparing the performance of their vessel with others or with a fleet average. In both described situations the user would have access to more detailed data and may be able to use all ten original indicators in the phased test. The relevance of the test would improve with this more accurate data, eliminating the requirement for calculated values.
Finally, the recommendations for future research are to research if it is possible to identify further improvements with new indicators representing some form of efficiency or efficiency contribution. It will be of great help if propulsion types, hull performance and other contributing factors can be easily identified for large groups of vessels, allowing to pinpoint improvements further. Secondly from a policy point of view, the availability of more detailed data may help policy-making as well. This will allow for a better focus on the true targets of the policy, which should be CO 2 emissions reduction. Also, yards could be challenged to improve their in-house design as less efficient aspects of their design can be indicated using these indicators.
Additional file 1. Benchmark Data and Graphs.