You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using 100 KDE features and all the categorical variables, I end up with a dataset that's 840x6578 so I'm inclined to do ridge regression. I tried to implement it in Stan but it's taking forever to sample the 6578 parameters, I think because there's so much correlation among the covariate. One trick that speeds things up greatly is to do a QR decomposition in R (see stan-dev/rstanarm#30) and then learn 840 parameters based on an orthogonal design matrix. This works great, but I'm not sure how to recover the original 6578 parameters. But in any case, this might all be moot, as even with 6578 parameters I don't really know how to read off something like effect sizes / percent of variance explained. So now I'm thinking I should just do good old fashioned forward stepwise regression. But even that is slow with Stan--i.e. at the moment I've got a matrix that 840x100 because I want to use the 100 KDE features for the age variable.
So should I forget Stan? Or do something else? Interestingly, the GP regression perspective on this is efficient in Stan: considering a linear kernel, the covariance K becomes 840x840 and you just sample the observations:
y ~ N(0, K + \sigma^2 I)
But again this doesn't give us a clear interpretation of which are the important variables. So maybe I should really just do group lasso? I guess there are some group lasso packages in R to try...
The text was updated successfully, but these errors were encountered:
Using 100 KDE features and all the categorical variables, I end up with a dataset that's
840x6578
so I'm inclined to do ridge regression. I tried to implement it in Stan but it's taking forever to sample the 6578 parameters, I think because there's so much correlation among the covariate. One trick that speeds things up greatly is to do a QR decomposition in R (see stan-dev/rstanarm#30) and then learn 840 parameters based on an orthogonal design matrix. This works great, but I'm not sure how to recover the original 6578 parameters. But in any case, this might all be moot, as even with 6578 parameters I don't really know how to read off something like effect sizes / percent of variance explained. So now I'm thinking I should just do good old fashioned forward stepwise regression. But even that is slow with Stan--i.e. at the moment I've got a matrix that840x100
because I want to use the 100 KDE features for the age variable.So should I forget Stan? Or do something else? Interestingly, the GP regression perspective on this is efficient in Stan: considering a linear kernel, the covariance
K
becomes840x840
and you just sample the observations:But again this doesn't give us a clear interpretation of which are the important variables. So maybe I should really just do group lasso? I guess there are some group lasso packages in R to try...
The text was updated successfully, but these errors were encountered: