Simple Exploratory Plots

Goal: Learn to create basic plots like scatter plots, line plots, histograms, and box plots using R’s built-in functions.

Prerequisites: Basic understanding of R syntax and data structures (vectors, data frames).

Getting Started: The plot() Function (Scatter Plots)

The most versatile base R plotting function is plot(). By default, when given two numeric vectors, it creates a scatter plot.

Let’s generate some sample data:

Code
# Generate some sample data
set.seed(123) # for reproducibility
x <- 1:50
y <- x + rnorm(50, mean = 0, sd = 10) # y is x plus some noise
z <- x^2 - 2*x + rnorm(50, mean = 0, sd = 50) # another relationship
categories <- sample(c("A", "B", "C"), 50, replace = TRUE)

Scatter plot

Now, let’s create our first scatter plot:

Code
# Basic Scatter Plot
plot(x, y)

Adding Customization to plot():

Code
plot(x, y,
     main = "My First Scatter Plot", # Main title
     xlab = "X-axis Label",          # X-axis label
     ylab = "Y-axis Label",          # Y-axis label
     col = "blue",                   # Color of points
     pch = 19,                       # Plotting character (point type, 19 is a solid circle)
     cex = 1.2,                      # Character expansion (size of points)
     xlim = c(0, 60),                # X-axis limits
     ylim = c(-20, 70)               # Y-axis limits
     )

  • main: Sets the main title of the plot.
  • xlab, ylab: Set the labels for the x and y axes.
  • col: Controls the color of the plotting elements. You can use names (“red”, “blue”, “green”), hexadecimal codes (“#FF0000”), or numbers.
  • pch: Determines the plotting character (the symbol used for points). There are 25 standard symbols (0-25). ?pch to see them all.
  • cex: Controls the size of the plotting characters.
  • xlim, ylim: Set the minimum and maximum values for the x and y axes.

Line Plots

To create a line plot, you use the same plot() function, but specify the type argument.

Code
# Line Plot
plot(x, z,
     type = "l",           # "l" for line
     main = "Line Plot Example",
     xlab = "Index",
     ylab = "Value",
     col = "darkgreen",
     lwd = 2               # Line width
     )

  • type:
    • "p": points (default)
    • "l": lines
    • "b": both points and lines
    • "o": both points and lines, overplotted
    • "h": histogram-like vertical lines
    • "s": stair steps
    • "n": no plotting (useful for setting up axes without drawing anything yet)
  • lwd: Line width.

Histograms (hist())

Histograms are used to visualize the distribution of a single numeric variable.

Code
# Histogram of y
hist(y,
     main = "Distribution of Y",
     xlab = "Y Value",
     col = "lightblue",
     border = "darkblue" # Border color of bars
     )

  • breaks: Controls the number of bins or the bin boundaries. Can be a number, a vector, or a character string (“Sturges”, “FD”, “Scott”).
Code
hist(y, breaks = 10, main = "Histogram with 10 Bins", xlab = "Y Value", col = "salmon")

Box Plots (boxplot())

Box plots are excellent for visualizing the distribution of a numeric variable across different categories, showing median, quartiles, and outliers.

Code
# Box plot of y grouped by categories
boxplot(y ~ categories, # Formula: numeric_var ~ categorical_var
        main = "Y by Category",
        xlab = "Category",
        ylab = "Y Value",
        col = c("coral", "lightgreen", "skyblue") # Colors for each box
        )

  • The formula numeric_var ~ categorical_var is a common pattern in R for specifying relationships between variables, particularly for group-wise operations.

Bar Plots (barplot())

Bar plots are used to display the count or some other aggregated statistic for categorical variables. First, you usually need to count the occurrences of each category.

table(): Creates a frequency table (counts) for a categorical variable.

Adding Elements to an Existing Plot

One of the key features of base R graphics is the ability to add elements to an existing plot.

Code
# Recreate the scatter plot
plot(x, y,
     main = "Scatter Plot with Added Elements",
     xlab = "X", ylab = "Y",
     pch = 16, col = "darkgray",
     cex = 1.2
     )

# Add a horizontal line
abline(h = mean(y), col = "red", lty = 2, lwd = 2) # Horizontal line at mean of y

# Add a vertical line
abline(v = median(x), col = "purple", lty = 3, lwd = 2) # Vertical line at median of x

# Add text to the plot
text(x = 10, y = 60, labels = "Mean Y", col = "red")
text(x = 40, y = -10, labels = "Median X", col = "purple")

# Add a legend (useful if you have multiple series or colors)
# For this plot, let's pretend we have two groups, even though we only plotted one 'y'
# This is just to demonstrate the legend function
legend("topleft",       # Position of the legend
       legend = c("Data Points", "Mean Y", "Median X"), # Labels for legend items
       col = c("darkgray", "red", "purple"),
       pch = c(16, NA, NA), # NA for lines, number for points
       lty = c(NA, 2, 3),   # NA for points, number for line type
       lwd = c(NA, 2, 2)
       )

  • abline(h=...): Adds a horizontal line.
  • abline(v=...): Adds a vertical line.
  • abline(a, b): Adds a line with intercept a and slope b.
  • text(x, y, labels): Adds text at specific coordinates.
  • legend(): Adds a legend to the plot. Takes arguments for position, text labels, colors, point types (pch), and line types (lty), and line widths (lwd).

Multiple Plots in One Window (par())

The par() function allows you to control many graphical parameters, including arranging multiple plots on a single display.

Code
# Set up a 2x2 grid for plots
par(mfrow = c(2, 2)) # mfrow = c(rows, columns)

# Plot 1: Scatter plot
plot(x, y, main = "X vs Y", xlab = "X", ylab = "Y", col = "blue", pch = 19)

# Plot 2: Histogram
hist(y, main = "Histogram of Y", xlab = "Y Value", col = "lightblue")

# Plot 3: Box plot
boxplot(y ~ categories, main = "Y by Category", xlab = "Category", ylab = "Y Value", col = c("coral", "lightgreen", "skyblue"))

# Plot 4: Line plot
plot(x, z, type = "l", main = "X vs Z (Line)", xlab = "X", ylab = "Z", col = "darkgreen", lwd = 2)

Code
# Reset graphical parameters to default (IMPORTANT!)
par(mfrow = c(1, 1))
  • par(mfrow = c(rows, columns)): Arranges plots in a grid. mfcol arranges them by column first.
  • par(mfrow = c(1, 1)): Always reset this after you’re done! Otherwise, subsequent single plots will still try to fit into the grid.

Saving Plots

You can save your R plots to various file formats.

Code
# Save as a PNG file
png("my_scatter_plot.png", width = 800, height = 600) # Open PNG device
plot(x, y, main = "Scatter Plot Saved as PNG", xlab = "X", ylab = "Y", col = "steelblue", pch = 16)
dev.off() # Close the device, saving the file
png 
  2 
Code
# Save as a PDF file
pdf("my_box_plot.pdf", width = 7, height = 5) # Open PDF device (dimensions in inches)
boxplot(y ~ categories, main = "Box Plot Saved as PDF", xlab = "Category", ylab = "Y Value", col = "lightcoral")
dev.off() # Close the device, saving the file
png 
  2 
  • Functions like png(), jpeg(), pdf(), tiff(), svg() open a graphics device.
  • All plotting commands executed after opening a device and before closing it with dev.off() will be drawn to that file.
  • dev.off() is crucial to actually save the file.

Summary and Conclusion

R’s base graphics system is powerful for quick, exploratory visualizations and offers extensive control over plot elements. While packages like ggplot2 provide a more consistent and often more aesthetically pleasing approach for complex, publication-ready graphics, understanding base R plotting is a fundamental skill that will enhance your overall R proficiency. Experiment with these functions and their arguments to get comfortable creating your own visualizations!

Resources