Stemming in Python

FREE Online Courses: Enroll Now, Thank us Later!

We know that we can form different words given a base word. For example, with the base word ‘talk’ we can have ‘talks’, ‘talking’, and ‘talked’.

Do you know we can do the same by programming? And this is called stemming. In this article, we will discuss stemming using Python. Let’s get introduced to stemming first.

What is stemming in python?

Stemming is the process of getting different morphological variations given a root word. The root word is also called the stem and hence the name stemming. For example, for the word ‘like’, we can have different forms such as ‘likes’, ‘likely’, ‘liking’, etc. And we can also have the words such as ‘lik’ which is not a technical English word. This feature helps us search for words in search engines and other applications easier.

The programs are written for the process of stemming using the algorithms called stemming algorithms or stemmers. Most of these are based on rules applying to suffix-stripping. One of them which is the most common is the Porter-Stemmer. Applications of stemming include:
1. It is used in systems used for retrieving information such as search engines.
2. It is used in domain analysis for determining domain vocabularies.

Importing Modules in Python

To implement stemming using Python, we use the nltk module. We can import this module by writing the below statement.

pip install nltk

Using this module, we can stem words or we can also stem the sentence. We will discuss these in further sections.

Stemming words using nltk in Python

To get the stemming words, we can first create an instance of the PorterStemmer() class and we can use the stem() function in this class.

Example of stemming a word:

# importing modules
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

ps = PorterStemmer() #creating an instance of the class

# creating a list of some words to be stemmed
words = ['run','ran','running']

for x in words:
    print(x, " : ", ps.stem(x))

Output:

run : run
ran : ran
running : run

For each word, we printed a stemming word. We can also find the stemming match for a word. Let us see some examples.

Example of stemming a word:

ps.stem('runs')

Output:

‘run’

Example of stemming a word:

ps.stem('runner')

Output:

‘runner’

Stemming a Sentence in Python

We can also do the stemming of a sentence. Here also, we use the stem() function from the PorterStemmer() class. But before doing this, we tokenize the words of the function and we have to download a package named ‘punkt’ from the nltk module.

Example of downloading the punkt package:

import nltk
nltk.download('punkt')

Output:

[nltk_data] Downloading package punkt to C:\Users\…
[nltk_data] ……\nltk_data…
[nltk_data] Unzipping tokenizers\punkt.zip.True

Let us see an example of stemming a sentence for further understanding.

Example of stemming a sentence in Python:

# importing modules
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

ps = PorterStemmer() #creating an instance of the class

sentence = "Runners have planned for 20km run. Previously, they ran a 15km run up." #sentence to be stemmed
words = word_tokenize(sentence) #tokenizing the words of a sentence

#printing the results of stemming the words of a sentence
for x in words:
    print(x, " : ", ps.stem(x))

Output:

Runners : runner
have : have
planned : plan
for : for
20km : 20km
run : run
. : .
Previously : previous
, : ,
they : they
ran : ran
a : a
15km : 15km
run : run
up : up
. : .

Conclusion

We are at the end of the article. We have learned about stemming and its implementation using the Python module. Hope all concepts covered are understood by you. Happy learning!

Did you like our efforts? If Yes, please give PythonGeeks 5 Stars on Google | Facebook

Leave a Reply

Your email address will not be published. Required fields are marked *