Earn 20 XP


Learning Objectives

  • Basic Functions to check data in R
  • Working with Variables

Dataset

We will be using the Lung patient dataset. You can download the dataset from the given links:

Basic Functions to Check Data in R

head( )

  • Once you have loaded (or imported) data in R, we can check how our data looks.

  • We can take a look at the first six observations of the data using the head() function in R. See the screenshot below

image.png

tail( )

  • We can take a look at the last six observations of the data using the tail() function in R. See the screenshot below

image.png

Checking Particular Records

  • We can use the concatenate function ( i.e. c( ) ) to select some random records.

image.png

Try Yourself!

If you remember from the earlier lesson, we used colon ( : ) to generate a sequence. Use this to display some data.

Solution to Previous Problem

image.png

Display the Column Names of the Data

  • We can use the names() function to display the column names of the data. See the screenshot below

image.png

Working With Variables

Selecting A Single Column from Data

To select a single variable or column from the dataset, we use $. For example,

LungCapData$Age

image.png

These are the ages of the patients.

Mean of a Variable

We can use the mean() function to calculate the mean of the data points of a variable.

mean(LungCapData$Age)

image.png

If you access the column directly, it will throw an error.

Summary of the Dataset

To get the statistical summary of the data, we can use the summary() function in R.

summary(LungCapData)

image.png

You can see the summary of numerical variables including minimum value (min), quantiles, mean, etc. In contrast, categorical (or character or factor) variables include total data points count, their class, and mode.

  • In the video coming up in the next slide, the tutor will discuss the following things in the hands-on tutorial:

    ▶︎ How to check variable names for datasets in R?

    ▶︎ How to extract a variable from a dataset in R?

    ▶︎ How to check the variable type (numeric or categorical) in R?

    ▶︎ How to ask R for different levels/categories of a categorical variable?

    ▶︎ How to produce a summary for variable in R? Summary function in R will produce a summary of variables based on their type, for example, numeric values will be summarized by mean, median, and quartiles and factors or categorical variables will be summarized as frequencies.