Skip to content

In-Language Verbatim Coding – A Question of Sport


Today’s blog is focused on an area where GlobaLexicon has developed substantial expertise; the coding of open-ended data. As we will see, even when there is no actual translation involved, there are still many opportunities for a lack of cultural awareness to cause problems. We will highlight some of these and the strategies to avoid them.

Imagine when a Brit, an Aussie and an American are all asked to take part in an online poll about sports. They all speak English, so they can all do the same survey, right?

Or so you might think.

But let’s say there is a question that asks, “Are you a fan of football”? The question seems straightforward, at first glance. What possible misunderstanding could there be?  

For our three respondents, the answer is most likely an easy one. Each will happily answer; “yes, I’m a football fan”. Except of course, they will all be thinking of different sports. For the Brit, that sport is most likely to be what the Americans and Australians call “soccer”. For the American, it’s going to be what the Brits and Aussies call “American football”. And for the Aussie, it’s very likely that he or she will be thinking of what the Brits and Americans know as “Australian Rules football”.

If the company conducting the research actually wanted to capture data about interest in soccer, otherwise known as association football, then two thirds of their data will be flawed.

There are other sports too where the same word will call to mind different things, depending on cultural background. For example, say “hockey” to a Brit, and the first thing that comes to mind will be what most of the rest of the world calls “field hockey” – because in the UK, the hockey played on grass is ingrained and we don’t generally have a culture of winter or ice-based sports. Therefore here, we specify for what is less common to us – “ice hockey”. But in the US and Canada especially, the situation is reversed. To them, “hockey” would call to mind the icy kind and they would instead use the term “field hockey” for the turf-based version.

This kind of variation in response data can be mitigated by careful wording of the survey or by localization. For example, if the intent was to capture data about interest in “ice hockey”, then the question could be worded to specify this, ensuring there will be no misunderstanding regardless of the where the survey is fielded. Or, to return to the “football” example, if the respondent is presented with a list of options, then all specific variants should be given; Australian Rules Football, Association Football/Soccer and American Football/Gridiron.

Data quality can also be affected by the tendency of respondents to be biased by their own cultural and personal preferences when answering an open-ended question. For example, the question “which sports competitions do you follow?” might elicit answers such as “the World Cup”, “the US Open”, or “the Premier League”. In the respondent’s mind, they know exactly what they mean by their answer, but when taken in isolation, how can you be sure which sport they are referring to? By “US Open”, do they mean the US Open golf tournament or the US Open tennis championship? By “Premier League”, are they referring to an association football or a cricket competition? Or to something else entirely?

Often, this comes down to a judgement based on cultural awareness of the respondent’s market. If the respondent was in India, it’s likely they were referring to the Indian Premier League (a cricket competition), rather than the English soccer competition. For the “US Open” dilemma, this would be trickier to solve. Perhaps there would be some clues in the respondent’s other answers – for example, if they previously said that they enjoyed tennis but didn’t mention golf, or if they also listed several other tennis championships in their answer to this question.

Employing a coding team that has specialist subject area knowledge and cultural awareness is crucial. If the coders understand how respondents from different countries might give the same response but mean different things, or vice versa, this enables data to be properly captured and coded that might otherwise at best be coded as ‘Other’, or at worst miscoded entirely.

This is not only important at the coding stage but also for the set-up of the code frames. In exactly the same way as being specific in the survey question formation is important, so too it is vital for the code frame. For the coding to be accurate and thus for the data to have the most value, the code frame must have sufficient granularity – for example, separate codes for the different types of football, a distinction between rugby league and rugby union, or between field hockey and ice hockey – but also there should be codes for the cases where the respondent doesn’t specify and where it’s not possible to make an inference; i.e. “rugby (unspecified)”.  

These examples are just the tip of the iceberg when it comes to the huge variation and complexity in the global sporting arena, but they still demonstrate some key factors at play:

  • Even if the survey is not being translated, check for ambiguity in your master survey. Specify or localize for each market to ensure that your intended meaning is correctly conveyed in all markets.
  • Employ specialist knowledge to advise on the local variations.
  • Ensure your coding team understands the subject and market to maximize the value of your results – even if your survey was perfectly worded, not all respondents will understand questions the same way or give perfectly clear answers.
  • Take care that the code frame has sufficient granularity for the data you want to capture.

To find out more about GlobaLexicon’s coding expertise, visit our Coding page or contact us.

Comments (0)

Leave A Comment

* Indicates a required field