Earn 20 XP


Learning Objectives

  • Reading a data file
  • Writing a data file

Reading Data Files

  • It is always good to be able to create a dataframe by hand. But, generally, we don't make our data by hand. We work on the data that already exists.
  • Data exists in several formats. The most basic of these is the CSV file. CSV stands for comma-separated-values.

What is a CSV file?

image.png

  • CSV files are normally created by programs that handle large amounts of data. They are a convenient way to export data from spreadsheets and databases and import or use it in other programs.
  • CSV (Comma Separated Values) is a simple file format that stores tabular data, such as a spreadsheet or database.
  • A CSV file stores tabular data (numbers and text) in plain text.
  • Each line of the file is a data record.
  • Each record consists of one or more fields, separated by commas.
  • The use of the comma as a field separator is the source of the name for this file format.

How does CSV look like?

image.png

Working with CSV files in Python

  • For working with CSV files in Python, there is an inbuilt module named csv.
  • However, a common method for working with CSV files is using Pandas. It makes importing and analyzing data much easier.
  • One crucial feature of Pandas is its ability to write and read Excel, CSV, and many other types of files.

Pandas read_csv

  • Functions like the Pandas read_csv() method enable you to work with files effectively.
  • The read_csv() function reads the CSV file into a DataFrame object.
  • A CSV file is similar to a two-dimensional table, and the DataFrame object represents a two-dimensional tabular view.
  • The most basic way to read a CSV file in Pandas:

image.png

  • Now, let's understand how to provide the filename

image.png

  • One can do many other things through this function only to change the returned object completely.
  • For instance, one can read a CSV file not only locally but from a URL through read_csv or choose what columns are to be imported so that we don't have to edit the array later.
  • These modifications can be done by the various arguments it takes.
  • We don't need to memorize all the arguments, though. Let's look at a few important ones below.

Pandas to_csv with example

  • The easiest way to write DataFrames to CSV files is using the Pandas to_csv function.
  • Syntax:

image.png

  • If you want to export without the index, add index=False

image.png

  • Example:

image.png

Comprehensive Tutorial

You can download the slides for this topic from here.