Plot_ss in R. Smoothing splines and polynomial regression plots

Plot your Smoothing Splines regression easily with R! From base stats to ggplot2 geom_smooth(). We show you how to deal with it!

R ggplot2 plot of the lm and smoothing splines with geom_smooth().

What is plot_ss and Smoothing Splines? Work with them in R

Smoothing splines are a method used in statistics and data analysis to create a smooth curve through a set of data points. They are particularly useful in situations where you have noisy data and want to fit a curve that captures the underlying trend without overfitting to the random noise in the data.

Here are some key points about smoothing splines:

  • Purpose: They are used for smoothing data, which means reducing noise and making the underlying pattern in the data more apparent.

  • Mathematical Foundation: A smoothing spline is a type of spline, which is a piecewise polynomial function. In simple terms, it’s a series of connected polynomial segments that create a smooth curve.

  • Flexibility: One of the advantages of smoothing splines is their flexibility. They can fit a wide range of data shapes because the curve is not restricted to a specific form like a straight line or a quadratic curve.

Plot_ss in R using base R

Recently, we dove into teaching the world of smoothing splines in R, and guess what? It’s simpler than you might think, especially with base R functions. If you’re looking to create elegant, smooth curves through your data, smooth.spline() is your new best friend.

n <- 200
x <- seq(0, 1, length.out = n)
fx <- sin(2 * pi * x)

# generate noisy data
y <- fx + rnorm(n, sd = .1)

Use the base smooth.spline() without any restriction in knots:

ss = smooth.spline(x,y)
## Call:
## smooth.spline(x = x, y = y)
## Smoothing Parameter  spar= 0.7888247  lambda= 0.0007344578 (12 iterations)
## Equivalent Degrees of Freedom (Df): 9.104315
## Penalized Criterion (RSS): 1.619316
## GCV: 0.008887293

Let’s start with a sinusoidal function, adding it some noise. As we all know, this is an easy case were regression with linear terms fails to fit the data. In the next plot we create a lm regression and asmooth.spline fit to that kind of function.

# plot data and f(x)
plot(x, y)             # data
lines(x, fx, lwd = 2)  # f(x)
abline(coef(lm(y ~ x)), lty = 2, col=2)
lines(x, ss$y, lty = 3, col = 3, lwd = 2)
legend("topright", legend = c("f(x)", "lm", "smooth.spline"), lty = 1:3, col=1:3, lwd = 2, bty = "n")

Ggplot geom_smooth for plot ss (smoothing splines) in R

In a more modern R ecosystem, in 2024 the usage of ggplot2 is widely spread even for beginners. We have available geom_smooth() in ggplot2 package to fit the plotted data.

We can use geom smooth for smoithing splines, for linear lm regression. But wait, there’s more! Polynomial regression with geom_smooth() is where things get really interesting. It’s like adding swirls and curls to your path, allowing for bends and turns. This is handy when your data’s story is more complex, and a straight line just won’t do. You can add higher-order terms while keeping the coefficients linear, which is a fancy way of saying you can make your line wiggle and waggle in just the right way to fit the ups and downs of your data.

Let’s check and easy code example:

df <- data.frame(x = x, y = y)
ggplot(df, aes(x, y)) 
  + geom_point() 
  + geom_smooth() 
  + geom_smooth( method='lm', color="yellow") 
  + stat_smooth(method='lm', formula = y~poly(x,3), color="green")
  + labs(title="Plot Smoothing Splines method and Polynomial regression", linetype = NULL)

Math behind Smoothing Splines

Smoothing Splines formula.

Here, yi are the data points, f(xi) is the value of the spline at point xi and f ′′ (t) is the second derivative of the spline.

Smoothing Parameter (λ):

There’s a parameter, often denoted as λ, that controls the trade-off between smoothness and data fitting. A high λ values give more weight to smoothness, leading to a smoother curve that might not fit the data as closely. A low λ values do the opposite, fitting the data more closely but potentially resulting in a less smooth curve.

Carlos Vecina
Carlos Vecina
Senior Data Scientist at Jobandtalent

Senior Data Scientist at Jobandtalent | AI & Data Science for Business