r - smooth.spline(): fitted model does not match user-specified degree of freedom -


here code ran

fun <- function(x) {1 + 3*sin(4*pi*x-pi)} set.seed(1) num.samples <- 1000 x <- runif(num.samples) y <- fun(x) + rnorm(num.samples) * 1.5 fit <- smooth.spline(x, y, all.knots=true, df=3) 

despite df=3, when checked fitted model, output was

call: smooth.spline(x = x, y = y, df = 3, all.knots = true) smoothing parameter  spar= 1.499954  lambda= 0.002508571 (26 iterations) equivalent degrees of freedom (df): 9.86422 

could please help? thanks!

note r-3.4.0 (2017-04-21), smooth.spline can accept direct specification of λ newly added argument lambda. still converted internal 1 spar during estimation. following answer not affected.


smoothing parameter λ / spar lies in centre of smoothness control

smoothness controlled smoothing parameter λ.smooth.spline() uses internal smoothing parameter spar rather λ:

spar = s0 + 0.0601 * log(λ) 

such logarithm transform necessary in order unconstrained minimization, gcv/cv. user can specify spar indirectly specify λ. when spar grows linearly, λ grow exponentially. there need using large spar value.

the degree of freedom df, defined in terms of λ:

edf

where x model matrix b-spline basis , s penalty matrix.

you can have check on relationships dataset:

spar <- seq(1, 2.5, = 0.1) <- sapply(spar, function (spar_i) unlist(smooth.spline(x, y, all.knots=true, spar = spar_i)[c("df","lambda")])) 

let's sketch df ~ spar, λ ~ spar , log(λ) ~ spar:

par(mfrow = c(1,3)) plot(spar, a[1, ], type = "b", main = "df ~ spar",      xlab = "spar", ylab = "df") plot(spar, a[2, ], type = "b", main = "lambda ~ spar",      xlab = "spar", ylab = "lambda") plot(spar, log(a[2,]), type = "b", main = "log(lambda) ~ spar",      xlab = "spar", ylab = "log(lambda)") 

plot

note radical growth of λ spar, linear relationship between log(λ) , spar, , relatively smooth relationship between df , spar.


smooth.spline() fitting iterations spar

if manually specify value of spar, did in sapply(), no fitting iterations done selecting spar; otherwise smooth.spline() needs iterate through number of spar values. if we

  • specify cv = true / false, fitting iterations aims minimize cv/gcv score;
  • specify df = mydf, fitting iterations aims minimize (df(spar) - mydf) ^ 2.

minimizing gcv easy follow. don't care gcv score, care corresponding spar. on contrary, when minimizing (df(spar) - mydf)^2, care df value @ end of iteration rather spar! bearing in mind minimization problem, never guaranteed final df matches our target value mydf.


why put df = 3, df = 9.864?

the end of iteration, either implies hitting minimum, or reaching searching boundary, or reaching maximum number of iterations.

we far maximum iterations limit (default 500); yet not hit minimum. well, might reach boundary.

do not focus on df, think spar.

smooth.spline(x, y, all.knots=true, df=3)$spar   # 1.4999 

according ?smooth.spline, default, smooth.spline() searches spar between [-1.5, 1.5]. i.e., when put df = 3, minimization terminates @ searching boundary, rather hitting df = 3.

have @ our graph of relationship between df , spar, again. figure, looks need spar value near 2 in order result in df = 3.

let's use control.spar argument:

fit <- smooth.spline(x, y, all.knots=true, df=3, control.spar = list(high = 2.5)) # smoothing parameter  spar= 1.859066  lambda= 0.9855336 (14 iterations) # equivalent degrees of freedom (df): 3.000305 

now see, end df = 3. , need spar = 1.86.


a better suggestion: not use all.knots = true

look, have 1000 data. all.knots = true use 1000 parameters. wishing end df = 3 implies 997 out of 1000 parameters suppressed. imagine how large λ hence spar need!

try using penalized regression spline instead. suppressing 200 parameters 3 easier:

fit <- smooth.spline(x, y, nknots = 200, df=3)  ## using 200 knots # smoothing parameter  spar= 1.317883  lambda= 0.9853648 (16 iterations) # equivalent degrees of freedom (df): 3.000386 

now, end df = 3 without spar control.


Comments

Popular posts from this blog

Django REST Framework perform_create: You cannot call `.save()` after accessing `serializer.data` -

Why does Go error when trying to marshal this JSON? -