# Plot_ss in R. Smoothing splines and polynomial regression plots

Plot your Smoothing Splines regression easily with R! From base stats to ggplot2 geom_smooth(). We show you how to deal with it!

# What is plot_ss and Smoothing Splines? Work with them in R

Smoothing splines are a method used in statistics and data analysis to create a smooth curve through a set of data points. They are particularly useful in situations where you have noisy data and want to fit a curve that captures the underlying trend without overfitting to the random noise in the data.

Here are some key points about smoothing splines:

• Purpose: They are used for smoothing data, which means reducing noise and making the underlying pattern in the data more apparent.

• Mathematical Foundation: A smoothing spline is a type of spline, which is a piecewise polynomial function. In simple terms, it’s a series of connected polynomial segments that create a smooth curve.

• Flexibility: One of the advantages of smoothing splines is their flexibility. They can fit a wide range of data shapes because the curve is not restricted to a specific form like a straight line or a quadratic curve.

## Plot_ss in R using base R

Recently, we dove into teaching the world of smoothing splines in R, and guess what? It’s simpler than you might think, especially with base R functions. If you’re looking to create elegant, smooth curves through your data, `smooth.spline()` is your new best friend.

``````n <- 200
x <- seq(0, 1, length.out = n)
fx <- sin(2 * pi * x)

# generate noisy data
set.seed(1)
y <- fx + rnorm(n, sd = .1)``````

Use the base `smooth.spline()` without any restriction in knots:

``````ss = smooth.spline(x,y)
ss``````
``````## Call:
## smooth.spline(x = x, y = y)
##
## Smoothing Parameter  spar= 0.7888247  lambda= 0.0007344578 (12 iterations)
## Equivalent Degrees of Freedom (Df): 9.104315
## GCV: 0.008887293``````

Let’s start with a sinusoidal function, adding it some noise. As we all know, this is an easy case were regression with linear terms fails to fit the data. In the next plot we create a lm regression and a`smooth.spline` fit to that kind of function.

``````# plot data and f(x)
plot(x, y)             # data
lines(x, fx, lwd = 2)  # f(x)
abline(coef(lm(y ~ x)), lty = 2, col=2)
lines(x, ss\$y, lty = 3, col = 3, lwd = 2)
legend("topright", legend = c("f(x)", "lm", "smooth.spline"), lty = 1:3, col=1:3, lwd = 2, bty = "n")``````

## Ggplot geom_smooth for plot ss (smoothing splines) in R

In a more modern R ecosystem, in 2024 the usage of ggplot2 is widely spread even for beginners. We have available `geom_smooth()` in ggplot2 package to fit the plotted data.

We can use geom smooth for smoithing splines, for linear lm regression. But wait, there’s more! Polynomial regression with `geom_smooth()` is where things get really interesting. It’s like adding swirls and curls to your path, allowing for bends and turns. This is handy when your data’s story is more complex, and a straight line just won’t do. You can add higher-order terms while keeping the coefficients linear, which is a fancy way of saying you can make your line wiggle and waggle in just the right way to fit the ups and downs of your data.

Let’s check and easy code example:

``````library(tidyverse)
df <- data.frame(x = x, y = y)
(
ggplot(df, aes(x, y))
+ geom_point()
+ geom_smooth()
+ geom_smooth( method='lm', color="yellow")
+ stat_smooth(method='lm', formula = y~poly(x,3), color="green")
+ labs(title="Plot Smoothing Splines method and Polynomial regression", linetype = NULL)
)``````

## Math behind Smoothing Splines

Here, yi are the data points, f(xi) is the value of the spline at point xi and f ′′ (t) is the second derivative of the spline.

Smoothing Parameter (λ):

There’s a parameter, often denoted as λ, that controls the trade-off between smoothness and data fitting. A high λ values give more weight to smoothness, leading to a smoother curve that might not fit the data as closely. A low λ values do the opposite, fitting the data more closely but potentially resulting in a less smooth curve.

##### Carlos Vecina
###### Senior Data Scientist at Jobandtalent

Senior Data Scientist at Jobandtalent | AI & Data Science for Business