WHY Verify?

Like most endeavours in life, the act of verifying forecasts must have a purpose. Allan Murphy, who made his career in the mathematics and science of weather forecast verification, put it this way: "Verification activities are useful only if they lead to some decision regarding the product being verified". This means that someone must actually look at the results of the verification and make a decision based on these results. The decision may be to "do nothing for now"– the product is good enough as it stands until more verification information is available - but that is still a decision. It also implies there must be an interested "user" of verification results who will make the decision. The user may or may not be the same person who carries out the verification.

There are many different kinds of users of verification and therefore many different purposes. It is, or should be the user who defines the purpose of the verification. The purpose of the verification should be clearly stated in advance so that appropriate verification methods can be chosen. The purposes of verification can be classified into two general types:

Administrative verification: To support decisions about the administration or budgeting of weather forecast services, for example to justify a new computer. Administrative verification usually means calculating verification statistics over large data samples.

Scientific verification: To direct research into new or improved forecast products. This may involve large or small data samples depending on the exact purpose, and usually involves more exploratory statistical analysis of verification datasets.

Below are some examples of verification tasks. Whether you are potentially a user of verification results, or someone who is interesting in doing your own verification, or both, try to put yourself in the position of someone who is asked to do the verification tasks listed on the left, and think about what you would want to know about the reasons for doing these tasks.

Loading Questions ...

Are the following verification tasks likely intended for administrative or scientific purposes or both? Please choose the best answer

Feedback
Close Feedback

Not likely, since this is only one case, too little data to make a decision.

Feedback
Close Feedback

Yes. The forecaster may be checking how well he (or someone else) performed on yesterday’s forecast, as input to today’s forecast.

Feedback
Close Feedback

Not likely. Try again.

Feedback
Close Feedback

Yes. Here the user is trying to track the long term trend in temperature forecast errors, for example to convince his superiors that improvements are being made.

Feedback
Close Feedback

Not likely. The verification is summarized into one value per year, too much averaging to be of use in determining how to further improve the forecast.

Feedback
Close Feedback

Not likely. Try again.

Feedback
Close Feedback

Yes, possibly. The user may want to know whether the forecasters can still "beat" the model to decide staffing levels in weather offices. For precipitation, one would be wise to use more than 3 months of data.

Feedback
Close Feedback

Yes, possibly. The user may wish to identify those situations or periods where forecasters have the best chance of improving on the model forecasts.

Feedback
Close Feedback

Yes. There could well be both administrative and scientific reasons for comparing forecasts from two sources.

Feedback
Close Feedback

Yes, definitely. This is a request that has been made by high level management in more than one national meteorological service, usually for feedback to the general public or political leaders.

Feedback
Close Feedback

No. It is difficult to think of any defendable scientific reason for verification scores which try to summarize all aspects of accuracy into one number.

Feedback
Close Feedback

No. Try again.

Feedback
Close Feedback

Yes, possibly. This could be at the request of either weather service managers or boating groups to determine the reliability of forecasts of extreme conditions.

Feedback
Close Feedback

Yes, possibly. Stratification of forecast datasets into extreme and non-extreme values using a threshold is a way of determining the quality of a product for important situations to help direct research efforts.

Feedback
Close Feedback

Yes. There could be administrative or scientific reasons for defining threshold values of the predicted variable for the purpose of verification. The threshold should be specified by the user. Check the other answers for examples.