How to read Different File Formats using Python?

Data is an important factor in all domains, industries, education, research, etc. And the data is not present in the same format. Even in a domain, we deal with different forms of information. In some cases, it may even be unstructured. Python being versatile, does have its hand in accessing various file formats. In this article, we will be discussing how to handle the most common file formats, namely, Text, CSV, XLSX, JSON, etc. using Python.

Modules to be Imported

We need the following module to be imported when we want to get the data is Pandas. And we can install it by writing the below command:

pip install pandas

Reading from a text file

Test files are the widely used forms of data. In Python, we have a function to access the text files and that is the open() function. We also have different modes of accessing like:
1. ‘r’: reading from a file
2. ‘w’: writing to a file
3. ‘r+’ or ‘w+’: read and write to a file
4. ‘a’: appending to an already existing file
5. ‘a+’: append to a file after reading

We will be dealing with the below file.

reading text file

It also provides three functions to read data from a text file:

1. read(n): This function reads n bytes from the text files. It ‘n’ value is not mentioned, it reads the complete information from the file. If it encounters any delimiter it separates the sentences.

Example of reading data from the text file:

with open(r'info.txt','r') as f:
    print(f.read(15))

Output:

Hello!
Welcome

Example of reading data from a text file:

with open(r'info.txt','r') as f:
    print(f.read())

Output:

Hello!
Welcome to Python Geeks.
Hope you are doing well.

2. readline(n): This function reads n bytes from the file, contained in more than one line.

Example of reading data from a text file:

with open(r'info.txt','r') as f:
    print(f.readline())

Output:

Hello!

3. readlines() – This function reads the complete information in the file, it doesn’t bother about the delimiting character and prints them as well in a list format.

Example of reading data from a text file:

with open(r'info.txt','r') as f:
    print(f.readlines())

Output:

[‘Hello!\n’, ‘Welcome to Python Geeks.\n’, ‘Hope you are doing well.’]

Reading Data from CSV file

In CSV files, the values are comma-separated. It can be thought of as a text file that holds tabular data in the form of plain text. To access the data from a CSV file, we can use the pandas module. The following example shows the way to extract data from the below CSV.

Example of reading data from CSV file:

import pandas as pd #importing the pandas module

# reading the csv file into a DataFrame. ‘R’ represent read mode
df = pd.read_csv(r'std.csv')
# displaying the DataFrame
df

Output:

reading csv file

Reading certain rows

We can also read some of the rows by indexing, in a similar way we do with the lists and other ordered iterables.
Example of reading rows from csv file:

df = pd.read_csv(r'std.csv')
# displaying the some of the rows of the df
df[1:3]

Output:

reading rows from csv

Reading certain columns

We can also read certain rows by using the loc method.

Example of reading certain columns from csv file:

df = pd.read_csv(r'std.csv')
# displaying the some of the columns of the df
df.loc[:,['Name','Section']]

Output:

reading column from csv

Reading certain rows and columns

Similarly, we can read certain rows and columns by changing ‘:’ in loc to the required range.

Example of reading certain rows and columns from csv file:

df = pd.read_csv(r'std.csv')
# displaying the some of the columns of the df
df.loc[2:4,['Name','Section']]

Output:

reading rows columns from csv

Python CSV module

We can also use the CSV module to read from the CSV file.

Example of reading data from csv file using csv module:

import csv
clmns=[]
rows=[]
with open('std.csv','r') as file: #opening the file
                reader=csv.reader(file) # creating a reader object
                clmns=next(reader) 
                for row in reader: #reading lines one by one
                                rows.append(row)
                             
for col in clmns:
    print(col,end=' ')
print()
for row in rows:
                for col in row:
                                print(col,end=' ')
                print('\n')

Output:

Name RollNo Section
ABC 9 A
XYZ 5 A
PQR 8 B
RST 1 C
ORT 6 A

Reading from Excel File

We can use the pandas module here as well. But since it is an excel file, we will be using the read_excel() function. We will be reading the following excel file.

reading from excel

Example of reading data from excel file:

import pandas as pd #importing the pandas module

df = pd.read_excel(r'std.xlsx')
# displaying the columns of the df
df

Output:

reading data from excel

We can also use the ExcelFile() function to read the data from the excel sheet. The syntax is almost the same, the only difference is this function returns an excel object rather than a data frame. and we can read the above excel by writing the below code.

Example of reading excel data:

xl = pd.ExcelFile(r'std.xlsx')
xl

Output:

<pandas.io.excel._base.ExcelFile at 0x26503806580>

Similar to the CSV files, since we have converted it into a data frame we can indeed apply the loc method to get required rows and columns.

Reading from a particular sheet

We can also read data from a sheet in excel by giving the second argument as the sheet name.
Example of reading a sheet in an excel:

df = pd.read_excel(r'std.xlsx',sheet_name='Sheet1')
df

Output:

reading data from excel

Reading more than one sheet

We can read more than one excel file at once and store it in different variables as the data frames. For example, if we want to get two excel files named ‘Sheet1’ and ‘Sheet2’ from the excel file ‘exl’, then we can write the below code.

Example of reading data from two sheets:

with pd.ExcelFile('exl.xlsx') as e:
            	df1=pandas.read_excel(e,'Sheet1')
            	df2=pandas.read_excel(e,'Sheet2')

Using xlrd module

We can use the xlrd module in Python to read the excel files.

Example of reading data from excel file using xlrd:

import xlrd
exl=xlrd.open_workbook('std.xlsx') #reading the excel file
data=exl.sheet_by_index(0) #getting the sheet in the excel file

print("No of rows:",data.nrows)
print("No of columns:",data.ncols)
print("The first row:",data.row_values(1) )
for i in range(data.ncols):
                print(data.cell_value(0,i),end=',')  

Output:

No of rows: 6
No of columns: 3
The first row: [‘ABC’, 9.0, ‘A’]
Name,RollNo,Section,

Reading Data from JSON file

JSON (JavaScript Object Notation) files are lightweight and human-readable forms. These files store data within {}, like dictionaries. They are language-independent, they can be used with any programming language. We will be reading the data from the following JSON file using Pandas module.

reading data from json

Example of reading data from json file:

import pandas as pd
df = pd.read_json('std.json')
df

Output:

reading data from excel

 

Using JSON module

We can also use the json module to read the json data.
Example of reading data from json file using json file:

import json

# opening the json file
with open('std.json','r') as file:
    data = json.load(file)

# json dictionary
print("Type of data:",type(data))

# converting the json into a DataFrame
df = pd.DataFrame(data)
df

Output:

Type of data: <class ‘dict’>
reading data from excel

Quiz on Python File Formats

Conclusion

In this article, we learned accessing data from text, csv, excel and json files. We saw different methods to access them. Hope all the concepts are clear. Happy learning!

Did you like our efforts? If Yes, please give PythonGeeks 5 Stars on Google | Facebook

Leave a Reply

Your email address will not be published. Required fields are marked *