Thorarinsdottir, Thordis L.; Scheuerer, Michael; Heinz, Christopher
Journal of Computational And Graphical Statistics, vol. 25, p. 105–122, 2016
Any decision making process that relies on a probabilistic forecast of future events necessarily requires a calibrated forecast. This paper proposes new methods for empirically assessing forecast calibration in a multivariate setting where the probabilistic forecast is given by an ensemble of equally probable forecast scenarios. Multivariate properties are mapped to a single dimension through a pre-rank function and the calibration is subsequently assessed visually through a histogram of the ranks of the observation’s pre-ranks. Average ranking assigns a pre-rank based on the average univariate rank while band depth ranking employs the concept of functional band depth where the centrality of the observation within the forecast ensemble is assessed. Several simulation examples and a case study of temperature forecast trajectories at Berlin Tegel Airport in Germany demonstrate that both multivariate ranking methods can successfully detect various sources of miscalibration and scale efficiently to high dimensional settings. Supplemental material in form of computer code is available online.