r - smooth.spline(): fitted model does not match user-specified degree of freedom -
here code ran
fun <- function(x) {1 + 3*sin(4*pi*x-pi)} set.seed(1) num.samples <- 1000 x <- runif(num.samples) y <- fun(x) + rnorm(num.samples) * 1.5 fit <- smooth.spline(x, y, all.knots=true, df=3)
despite df=3
, when checked fitted model, output was
call: smooth.spline(x = x, y = y, df = 3, all.knots = true) smoothing parameter spar= 1.499954 lambda= 0.002508571 (26 iterations) equivalent degrees of freedom (df): 9.86422
could please help? thanks!
note r-3.4.0 (2017-04-21), smooth.spline
can accept direct specification of λ
newly added argument lambda
. still converted internal 1 spar
during estimation. following answer not affected.
smoothing parameter λ
/ spar
lies in centre of smoothness control
smoothness controlled smoothing parameter λ
.smooth.spline()
uses internal smoothing parameter spar
rather λ
:
spar = s0 + 0.0601 * log(λ)
such logarithm transform necessary in order unconstrained minimization, gcv/cv. user can specify spar
indirectly specify λ
. when spar
grows linearly, λ
grow exponentially. there need using large spar
value.
the degree of freedom df
, defined in terms of λ
:
where x
model matrix b-spline basis , s
penalty matrix.
you can have check on relationships dataset:
spar <- seq(1, 2.5, = 0.1) <- sapply(spar, function (spar_i) unlist(smooth.spline(x, y, all.knots=true, spar = spar_i)[c("df","lambda")]))
let's sketch df ~ spar
, λ ~ spar
, log(λ) ~ spar
:
par(mfrow = c(1,3)) plot(spar, a[1, ], type = "b", main = "df ~ spar", xlab = "spar", ylab = "df") plot(spar, a[2, ], type = "b", main = "lambda ~ spar", xlab = "spar", ylab = "lambda") plot(spar, log(a[2,]), type = "b", main = "log(lambda) ~ spar", xlab = "spar", ylab = "log(lambda)")
note radical growth of λ
spar
, linear relationship between log(λ)
, spar
, , relatively smooth relationship between df
, spar
.
smooth.spline()
fitting iterations spar
if manually specify value of spar
, did in sapply()
, no fitting iterations done selecting spar
; otherwise smooth.spline()
needs iterate through number of spar
values. if we
- specify
cv = true / false
, fitting iterations aims minimize cv/gcv score; - specify
df = mydf
, fitting iterations aims minimize(df(spar) - mydf) ^ 2
.
minimizing gcv easy follow. don't care gcv score, care corresponding spar
. on contrary, when minimizing (df(spar) - mydf)^2
, care df
value @ end of iteration rather spar
! bearing in mind minimization problem, never guaranteed final df
matches our target value mydf
.
why put df = 3
, df = 9.864?
the end of iteration, either implies hitting minimum, or reaching searching boundary, or reaching maximum number of iterations.
we far maximum iterations limit (default 500); yet not hit minimum. well, might reach boundary.
do not focus on df
, think spar
.
smooth.spline(x, y, all.knots=true, df=3)$spar # 1.4999
according ?smooth.spline
, default, smooth.spline()
searches spar
between [-1.5, 1.5]
. i.e., when put df = 3
, minimization terminates @ searching boundary, rather hitting df = 3
.
have @ our graph of relationship between df
, spar
, again. figure, looks need spar
value near 2 in order result in df = 3
.
let's use control.spar
argument:
fit <- smooth.spline(x, y, all.knots=true, df=3, control.spar = list(high = 2.5)) # smoothing parameter spar= 1.859066 lambda= 0.9855336 (14 iterations) # equivalent degrees of freedom (df): 3.000305
now see, end df = 3
. , need spar = 1.86
.
a better suggestion: not use all.knots = true
look, have 1000 data. all.knots = true
use 1000 parameters. wishing end df = 3
implies 997 out of 1000 parameters suppressed. imagine how large λ
hence spar
need!
try using penalized regression spline instead. suppressing 200 parameters 3 easier:
fit <- smooth.spline(x, y, nknots = 200, df=3) ## using 200 knots # smoothing parameter spar= 1.317883 lambda= 0.9853648 (16 iterations) # equivalent degrees of freedom (df): 3.000386
now, end df = 3
without spar
control.
Comments
Post a Comment