Behind the scenes of a StatsCan crop report

by

Statistics Canada’s crop reports are often subject to criticism, with analysts, traders and farmers questioning or downplaying the accuracy and relevance of the agency’s estimates. Participating in a StatsCan survey when the agency calls or emails you is technically mandatory under federal law, but the running joke is that farmers don’t provide accurate information to surveyors over the phone.

At the same time, whether you trust StatsCan estimates or not, the agency provides what’s really the largest independent national dataset on crop production. In many cases, it’s really the best — or only — public info out there.

The skepticism over StatsCan’s relevance was on full display on Friday when the agency published its first acreage estimates for 2018. The report was full of surprises and numbers outside the range of what analysts were expecting.

The planting intentions survey also marked the first time StatsCan has offered respondents the option of filing their answers electronically.

So how does StatsCan come up with its estimates? What’s the agency’s definition of a “farmer”? How does it handle inaccurate or poor quality responses? And if the survey was done in March, why were the acreage estimates only released four weeks later?

Yves Gilbert, head of the field crop reporting unit at Statistics Canada in Ottawa, describes their collection and validation process, responding to these questions and more in the interview below (read excerpts of our conversation below):


Highlights:

KH: The StatsCan seeding intentions report published on Friday was based on a survey of around 11,600 farmers. What is the definition of a “farmer”? How does the agency decide who is surveyed and who is qualified to respond?

YG: We have the Census of Agriculture, which is conducted every five years, and we are using all of the farms that were reporting at this Census that have reported growing field crops. We make a subset of those farms who are in the field crop business, and we extract a sample of all of these farms… We select a subset of these farms and then we ask to talk to the person in charge of the farm itself. We also make sure we rotate all these farms to not ask them the same questions too many times. We have a maximum of two calls per year, for instance.

The survey period for the seeding intentions report has traditionally been a two week period in later March. This year it was extended to four weeks, from March 2 to March 29. Can you explain why the survey period was extended?

What is different this year is we started for the first time to conduct our surveys using an electronic questionnaire… So we sent this electronic questionnaire invitation earlier in the month, and if they had not responded by mid-March, which is typically coinciding with the time we used to start our survey, we start to do some follow-ups and call them at that point in time to get all the answers we need to conduct the survey properly… The first two weeks were added to let respondents provide the answers at their convenience rather than be caught all of a sudden with our calls in the last two weeks of the month of March.

How does StatsCan account for dishonest or poor quality survey responses?

When we have the evidence, when it’s obvious to us that we have a comment like this, we do not consider their answers and we make sure the imputation from neighbours is being tagged to this particular record. In other words, we take answers that are averages from farms in the same neighbourhood. We want to make sure it’s representative of the area they grow their crops in…

Why does it take StatsCan so long to release survey results?

It may sound like a long time, but we have to keep in mind when we receive this data, we then have to make sure we validate all of this data. We have to analyze all of the records to make sure the answers provided are not out of reality…

Of course there are many respondents that we do not have the ability to reach in the end. For those people, we have to impute, we have to fill-in their record, but with answers that were provided by their neighbours that would have a similar size of farm…

Is there opportunity to further automate this process (and thereby shorten the lag time)?

Many of the various tasks, like validation, confidentially, imputation, most of them are imputed, but we still need the human expertise to go back into those steps and make sure what has been done automatically also makes sense.

Read more about StatsCan’s field crop data sources and methodology here.

Related:

Comments

Please Log in

Log in

or Register

Register

to read or comment!