We develop methods to accurately predict whether pre-symptomatic individuals are at

We develop methods to accurately predict whether pre-symptomatic individuals are at risk of a disease based on their various marker profiles which offers an opportunity for early intervention well before definitive clinical diagnosis. in the analysis we propose a local smoothing large margin classifier implemented with support vector machine (SVM) to construct effective age-dependent classification rules. The method adaptively adjusts age effect and tunes age and other markers to achieve optimal performance separately. We derive the asymptotic risk bound of the local smoothing SVM and perform extensive simulation studies to compare with standard approaches. We apply the proposed method to two studies of premanifest Huntington’s disease (HD) subjects and controls to construct age-sensitive predictive scores for the risk of HD and risk of receiving HD diagnosis during the study period. be the dichotomous at-risk status coded as 1 and ?1 for subjects at risk of a disease and not at Camostat mesylate risk respectively. Let denote a subject’s age and let be a vector of the other potential risk-altering markers for this subject. The goal is to determine an age-dependent classification rule using to predict at each age (the target variable). For this purpose we first consider the following composite predictive score is a tuning parameter depending on the sample size and ?(·) belongs to a class of margin-based loss functions. Examples of margin-based loss functions include hinge loss i.e. SVM loss {1 ? = is its bandwidth. Specifically we fit (1) by solving as serving similar roles as residuals in a regression model problem (2) can be thought as Rabbit Polyclonal to Catenin-beta (phospho-Tyr489). minimizing a penalized locally weighted “residual” subject to linear constraints. Using the Lagrange multipliers we can derive the corresponding dual form as at each age point ? in the RKHS. This locally weighted problem is solved in the dual space by replacing in (3) by using a fully non-parametric boundary is Camostat mesylate and λ; ; = 1 = and is twice-continuously differentiable with respect to is twice-continuously bounded and differentiable away from zero;(C.2) The conditional distribution of such that is the dimension of to set {: to set {: → 0 and = (Tsybakov noise exponent if ≥ 1. In condition (C.4) as indicated in the proof and also in Steinwart and Scovel (2007) the choice of σis optimal in terms of approximating the Bayesian error bound using the decision function for the reproducing kernel Hilbert space. Camostat mesylate Our main theoretical result is the following. Theorem 1. Define ; = such that for any > and is assumed to vanish as goes to infinity. Note the rate of risk bound is characterized through the geometric noise exponent α local kernel smoothing parameter for SVM. In addition we obtain the supreme norm risk bound over the support of age. The proof of Theorem 1 uses the embedding properties of the reproducing kernel Hilbert space the large deviation results of empirical processes and the approximation using the kernel function. In the proof we first note that = is the marginal density of is and the derived rate becomes to be where γ = 1/[6 + 2(+ 2)/[(α + 1 500 or 1000. For each setting we carried out 200 simulation runs. The standardized ages were generated from a uniform distribution with support (0 1 Camostat mesylate In the first set of experiments we simulated data retrospectively. We generated dichotomous outcomes and = (but no to train a standard SVM (SVM0); (2) Using as input variables to train a standard SVM (SVM1); and (3) the proposed local smoothing SVM (KSVM). Within each of these methods we compared using a linear kernel for input variables versus using a Gaussian kernel. To evaluate the performance of different approaches we recorded Camostat mesylate the misclassification rate and area under the receiver operating characteristic (ROC) curve (AUC) at each age point and computed an overall AUC and mean misclassification rate pooling data across all age points. For KSVM the bandwidth and the tuning parameter λwere separately chosen by 5-fold cross validation. For the multiple marker case both markers were included by us averaged across simulation repetitions. When the true coefficient function is a sine function KSVM dominates Camostat mesylate the alternatives over the entire range of = 0.15 and = 0.85 (high misclassification rate and low AUC) as shown by two subfigures. Figure 1.