Word prediction/Spell correction using python library-Spello

Vinoth Saravanan
5 min readFeb 11, 2023

--

As we all came through searching something in google, for some words we might not know the exact spelling. so we type something related to that word and google would suggest like “this is what you are trying to search right?”. Here is an example, most of us not using medical terms and words in everyday routine life, unless we are from the healthcare domain or healthcare-related domain.

When I tried to develop a web application for the healthcare domain, I wanted to get the disease name from the user. When a user types a disease name with minor spelling mistakes, I need to correct the spelling mistakes or predict the word as per user input. To achieve this, I’ve tried Spello library in Python.

“In this example, I’ve used jupyter notebook for the explanation. If we want, we can utilize it in web applications also.”

Prerequisites:
1. We need to install Python, Pandas. In my example, I’m using Python==3.8.10 & Pandas==1.3.4
2. Install Jupyter notebook — pip install notebook==6.4.6
3. Install Spello — pip install spello==1.3.0

What is Spello?

Spello is a spell-checking library in Python. It is designed to provide an easy-to-use and flexible interface for users to check the spelling of words in their text.

https://pypi.org/project/spello/ — official documentation

One of the key features of Spello is its ability to use multiple dictionaries, which allows users to customize the spell-checking process to fit their specific needs.
For example, a user could use a standard English dictionary for general text, and then switch to a specialized dictionary for technical terms.
Spello also supports fuzzy matching, which can help to identify misspelled words that are similar to words in the dictionary. This feature can be especially useful for catching typos or correcting mistakes in the text.

How Spello handles this?

  1. It is built with a combination of two models, Phoneme and Symspell.
  2. Phoneme Model uses Soundex algorithm in the background and suggests correct spellings using phonetic concepts to identify similar-sounding words.
  3. Symspell Model uses the concept of edit distance in order to suggest correct spellings.

Spello get’s you the best of both, taking into consideration the context of the word as well.
Currently, this module is available in English(en) and Hindi(hi).

let’s dive into the real stuff,

  1. Open the command prompt and move to the concerned folder you like to work
  2. Install all the prerequisites
  3. Open jupyter notebook, by typing “jupyter notebook “.
Once user enters jupyter notebook, command prompt will show like this
Opening Jupyter notebook in cmd

After that, the web browser opens like the below.

Jupyter notebook — home page

To open a new workbook, click on the new button in the top right corner & select Python 3(ipykernel).

Opening new workbook

An empty workbook will be open as below.

New workbook

Jupyter notebook will work as our normal command prompt, we can install packages & run our code for instant results. Here I’m installing spello by using jupyter notebook.

Installing spello in Jupyter notebook

Import required packages & block of code for training our model.

from spello.model import SpellCorrectionModel
from nltk.tokenize import TreebankWordTokenizer

#Defining Model
sp = SpellCorrectionModel(language='en')

#Reading keywords from CSV file
df = pd.read_csv('disease_update.csv')

#Creating list of keywords
disease_name_list = df['disease_name'].tolist()

tokenizer = TreebankWordTokenizer()

#Removing unwanted strings from keywords
list_words = [re.sub('^\W|\s'," ",w).lower().strip() for w in disease_name_list if len(w) > 2]

#Train our model with the list of keywords
sp.train(list_words)

In the above code, after importing the required packages, define the model. Then, we are reading a CSV file to train our model & converting it into a dataframe.
In the next step, we create a list from that file. Then, remove unwanted spaces & characters.
Finally, training our model with our customized keywords

The CSV file will contain data like below.

CSV file data samples

If everything goes well, you will see the screen like this

Training model

Spell-checking

keyword = input('Enter any keyword.. ')
corrected_keyword = sp.spell_correct(keyword)
print('Corrected keyword is : ', corrected_keyword)

In the above code, we get keywords as input from the user. Then, we are passing the input to the spell_correct(input) method to get the corrected keyword.

Getting input from the user
Result

In the above image, we can see that user entered “nemonia” and our model predicted the correct keyword as “Pneumonia”.

Conclusion:

Most spell correction & word prediction libraries have their own dictionary. Some common dictionary words work fine in those libraries. As we discussed in the first paragraph of our story, if we need to achieve this spell correction & word prediction in a particular domain or field, we should have the option to set our own dictionary of keywords. Spello library gives us that option.

--

--

No responses yet