The study of gene-environment interaction (G × E) has garnered widespread

The study of gene-environment interaction (G × E) has garnered widespread attention. However when using a three-category polymorphic genotype as is commonly carried out when modeling an additive effect both false positive and false negative results can occur and the nature of the interaction can be misrepresented. We present a reparameterized regression equation that accurately captures interaction effects without the constraints imposed by modeling interactions using a single cross-product term. In addition we provide a series of recommendations for making conclusions about the presence of meaningful G × E interactions INCA-6 which take into account the nature of the observed interactions and whether they map onto sensible genotypic models. and used to represent the genotypic and environmental variables respectively. The traditional regression equation with an conversation term has the form: coefficients (to simplify formulas we omit covariates in the regression model other than the gene and environment variables and do not make use of a subscript or error term specific to an individual in the sample). The conversation effect is explained by the slopes resulting from the above equation for different levels of the genotypic variable. The phenotype’s regression on the environment at each fixed levelof the genotype can bemodeled as: = is the genotypic level takes the value of the genotype e.g. 0 1 2 Physique 1 shows an example of a three level genetic variant and the regression lines for three genotypes in three-dimensional space and corresponding projections in two-dimensional space. Fig. 1 Regression lines as illustrated for any INCA-6 three category genotype From here forward we will use two-dimensional plots of projections of regression lines onto the plane genotype = 0 instead of three-dimensional plots. Let us first presume that the genotype has two groups 0 and 1 as follows: coefficients using the additive genetic × regression model for all those individuals from Eq. INCA-6 (1). When the INCA-6 β coefficients of Eq. (1) are available for the subset of individuals with = 0 plugging the value 0 for the variable into Eq. (1) eliminates the and × terms and we have: coefficients for the coefficients in Eq. (1) for the case where Rabbit Polyclonal to B-Raf (phospho-Thr753). = 1 we can write: and coefficients: = 0 and = 1 in (1) we obtain: variable as = 0 1 and = 0 1 2 INCA-6 The four coefficients in the conversation regression Eq. (1) cannot in general determine all six unknown coefficients. Hence regression with the traditional cross-product conversation term cannot reliably recover the exact levels for all those three genotypic groups. In Appendix A we delineate the specific conditions (which are unlikely to occur in practice) under which the traditional cross-product approach yields equivalent estimates. The shortage of parameters to estimate all six parameters creates potential problems with modeling × using the cross-product term as delineated in Eq. (1) when there are three levels as it imposes a number of constraints which are generally unacknowledged and may not be accurate for the data. The first is that this lines are usually ordered; the second is that this slope differences between adjacent regression lines are usually the same; and the third is that the lines usually cross at the same point or are parallel. Note that the correspondence between the and coefficients as derived above means that the equations for = 0 1 2 are respectively: which means that the differences between the slopes of the lines are usually the same and that the slopes are usually ordered from your = 0 collection to the = 1 collection to the = 2 collection. However there is no theoretical reason to presume that the axis differences between the lines should always be the same; in theory they could differ and in practice they will usually differ due to sampling error and possibly from genuine non-linear G × E effects. Further forcing the lines to be ordered (although this may make biological sense because it constrains the model to an additive effect) could lead to inaccurate representation of the data. Essentially it will usually predict an ordered effect even when the data do not follow this pattern. Another problem is INCA-6 usually that all three lines must cross at the same point or all must be parallel depending on the = 0 or = ?= ?× × problematic. A reparameterization of the equation to.