• Angela Yu

R OVERVIEW

Updated: Oct 12, 2019


PLEASE FORMAT ACCORDING TO PREVIOUS SITE


I. Introduction to R

We use "data.csv" (which you can download here) introduced in Data Manipulation for this section.

1. About R

R is a free software environment for statistical computing and graphics. It has been popular in academia more than 20 years because of its strong data analysis features. You can check more of the history about and comments on R on its official website.

2. Install R User Interface - RStudio

Software: Before you can download and use R Studio, you will need to download and install R. In this process, you need to find a mirror and download. For instance, you can use the UCLA mirror and download. Please make sure that you down a .pkg file and install by clicking on it. Download: then, you can visit this webpage and download R Studio for free. Installation process: You can see this post for details or refer to this video for Mac users or this video for Windows users. Open the UI: You may click on the R Studio Button to open the user interface after the download and installation.

3. Import and Export Data using R

In R, you can directly import and export data. If we have a data file called "data.csv", you can put it into the same folder of the R or RMD file. You can use the first line of code below for import data and second line of the code below for export data.

II. Data Cleaning in R - Data Manipulation I

We use "person_info.csv" (which you can download here) introduced in Data Manipulation for this session.

1. Check and Clean Missing Values / NA

You can use the is.na() and the which() function to locate missing values:

Please click on the READ MORE button to read more details about finding NAs and impute them.

READ MORE

2. Check and Convert Data Types

You can use str, summary, class, and typeof functions to check data type:

Yu can use as.numeric, as.logical, as.integer, as.double, as.factor, and as.character functions to convert data types:

Please click on the READ MORE button to read more details about checking and manipulating data types.

READ MORE

3. Check and Clean Outliers

READ MORE

III. Data Wrangling - Data Manipulation II

We use "person_info.csv" (which you can download here) introduced in Data Manipulation for this session.

1. Date time Manipulation

In R, we recommend you use the lubridate library to do string manipulation.

2. String Manipulation

In R, we recommend you use the stringr library to do string manipulation. You can use this library to normalize strings and to approximate matchings.

©2019-2020 by the OSCR Project in UCLA