Finding Subgroups of Patients Responsive to Treatment

There has been a growing interest in finding subgroups to administer an individualized treatment to a subgroup characterized by special properties in modern medicine. When it comes to the problem of finding responsive subgroups of a treatment, it involves a set of classification methods that include logistic regression, discriminate analysis, nearest neighborhood, or CART, etc. Specially, traditional methods tend to examine the treatment effect influenced by subgroups by incorporating the characteristic factor as a moderator in a multiple regression model. For example, the regression model Y= a + b*X + c*T + d*X*T + e has always been in use to examine whether the effect of a treatment (T) may have a significant change between subgroups (X). If the interactive effect between X and T is significant, the treatment (T) effect may be subject to a change between different subgroups (X), and the subgroup with positive treatment effect can be identified as a responsive subgroup in terms of the treatment (T). This method seems straightforward. However, in reality, as the number of characteristic factors grows, this method may be vulnerable to high Type I error rate when examining numerous characteristic factors as moderators of the treatment effect. An application of Bonferroni adjustment can be overly conservative, and leads to no significant treatment-by-subgroup effect. In addition, the sequence of moderator-by-treatment tests may ignore examining multiple moderators meantime, so to cause difficulties in terms of interpreting the results. Thirdly, when the model needs cross-validation, due to the limited samples in both training and test datasets, there may exist large error rate (2-fold), or inconsistent moderator-by-treatment interactions across several sets of training data (n-fold). Therefore, it's very necessary to develop methodology to identify subgroups with numerous characteristics based on multivariate modeling techniques.  

The method used here is derived from principal component and Bayesian statistics, which allows to pull in multiple characteristic factors simultaneously in a single principal component, and the goal is to maximize the posterior probabilities of a characteristic factor given a certain group membership. The number of subgroups can be determined by model comparisons on criteria, such as AIC, BIC, aBIC, etc. More technique details can be found at http://methodology.psu.edu/ra/lca. The method has also been applied to another study in diagnostic testing, Concordance between Gambling Disorder Diagnoses in the DSM-IV and DSM-5, which provides with nicely interpretable results. The difference here is that this type of work is adding a distal outcome (responsive/non-responsive to treatment) which is associated with group memberships. 

The outcome of the study was positive/non-positive changes in patients' lipid profile affected by pharmaceutical drugs. As seen from the plot above, the entire patient population was classified to 2 subgroups - one was responsive group, and the other was non-responsive group. In patients who were responders, they are more likely to be males, and have high levels of cholesterol and LDL in pre-treatment conditions, and less likely to have high systolic blood pressures and low levels of HDL prior to the treatment. The finding about male responders was consistent with another study which applied traditional regression method to the same dataset. The patients' characteristics about levels of cholesterol, LDL, HDL, and blood pressure seem to be associated with the distal outcome, the lipid profile change. Although further interpretations related to functional mechanisms of the pharmaceutical drugs are needed, such a classification method depicts an entire profile of patients who are responsive differently to a certain type of treatment with the likelihood of each individual characteristic factor, which may serve as a Bayesian based multivariate machine learning method to be applied elsewhere.