Corpora as Spell Check tools?

Are there any words you tend to spell incorrectly whenever you use them in writing? What spellchecks do you use and which ones would you recommend to your students (if any)? And finally, do you think you can use a corpus as a spellchecking tool?

I’ll show you in a minute that it’s not really a good idea. Corpora can come in handy in many situations but not as a tool to check spelling, I think.

I personally have several pet-hates, i.e. words that make my spellcheckers busy. So, the other day, I opened the enTenTen08 corpus (the largest English corpus in Sketch Engine) to see if these are some of the commonly misspelled words. Apparently, they are!

  • recieve > 7, 047 hits vs. receive > 1, 107, 499 hits
  • managable > 236 hits vs. manageable > 8, 287 hits
  • noticable > 1, 009 hits vs. noticeable > 15, 330 hits
  • prestigeous >  97 hits vs. prestigious > 25, 045 hits
  • liason > 516 hits vs. liaison > 24, 430 hits
  • medeival 43 hits vs. medieval > 52, 271 hits
  • ocassionally > 183 hits vs. occasionally : 1, 035 hits
  • privilige > 78 hits vs. privilege > 85, 0890

When you use Google, for example, and type in the word noticable, you’ll immediately be notified of your error and offered the correct version.


This doesn’t happen when you use a corpus, though. When you type something in, you’ll usually get some results (unless it’s a totally nonsensical word) because any corpus is just a collection of language people use.

Now, I asked myself how this can be turned into an advantage in class. Certainly, you can boost your students’ confidence by showing them the incorrect example sentences. You can show them they are not alone in this; people all around the world make mistakes -simply because there are tricky words.

As the second step, you can get them to use/create some mnemonic strategies to remember the correct version of a tricky word: Why receive and not recieve? “I before E, except after C” is a mnemonic rule of thumb for English spelling. If one is unsure whether a word is spelled with the sequence ei or ie, the rhyme suggests that the correct order is ie unless the preceding letter is c, in which case it is ei.

You can make unusual exercises and tests too. Instead of asking Ss to choose the  *correct* alternative, ask them to choose the more frequent one; they can do so by matching the number of hits with the appropriate spelling version.

  1. recieve                                  a) 7, 047 hits
  2. receive                                  b) 1, 107, 499 hits

answer: 1a, 2b

Alternativelly, you can ask questions like: Why does recieve only get 7, 047 hits in the corpus? Answer: because it’s spelled incorrectly. The correct spelling is ……

This, I believe, draws attention to the fact that 1) there is something like corpora at all, 2) there are words which many users of English find tricky (and you can discuss why).

In conclusion, you can get a lot of mileage out of the errors your students make and there are many ways to do so.

Any other ideas for exploiting corpora in relation to spelling?

About Hana Tichá

I'm an EFL teacher based in the Czech Republic. I've been teaching English to learners of all ages for more than 20 years. I love metaphors and inspiring discussions concerning teaching, learning and linguistics.
This entry was posted in Uncategorized. Bookmark the permalink.

6 Responses to Corpora as Spell Check tools?

  1. eflnotes says:

    hi Hana
    not focused on spelling for a while; last example in class was spelling of rhythm – something i still have difficulties with myself from time to time, i sometimes put in any extra y; though in this case issue was one of pronunciation of rhythm for student.

    looked at corpus resources for this previously but seems that particular resource is down for maintainance but you still may get some useful info from my post – Corpora for spelling []


    Liked by 1 person

    • Hana Tichá says:

      Thanks, Mura. I’ll definitely have a look at the post. Rhythm is a tricky word for students, indeed – its spelling and well as the pronunciation.


  2. M. Makino says:

    Maybe it was on this blog that I said this once before, but all language is communication – the (wonderful) idea of using corpora for spellchecking provides more evidence that mistakes are also messages, just messages with some unintended meanings. I like error correcting strategies that avoid reducing errors to red marks from the teacher’s pen.

    Liked by 1 person

  3. James Thomas says:

    Hi Hana.

    Nice ideas. With students, I would, however, use SKELL for this, as it gives the number of hits per million. e.g.,
    receive 339.88 hits per million
    recieve 0.87 hits per million

    I certainly agree that it’s useful for learners to see errors in use. If they do have access to larger corpora, they might even see the Text Types in which errors are used. It always amuses me to see spelling mistakes in “spoken corpora”, as it is obviously the transcriber’s mistake, not the speakers!

    Liked by 1 person

    • Hana Tichá says:

      Hi, James! I’m happy to see you commenting on my blog. Thanks for your advice regarding the numbers – this is something I still struggle with. Anyway, I like what you say about ‘errors in use’ and the possibility of looking at Text Types. And yes, I also looked at some of the mistakes in spoken corpora. Well, transcribers are only humans, aren’t they. 🙂


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s