![]() Thus for each run we will have, among other things, regression coefficients and their standard errors for each variable. Now that we have five data sets with complete data and with Outcome01 coded correctly as 0/1, we will run five separate logistic regressions, saving the results of each. The new data set is named "cleaned," and that is what we will use for the next analysis. Setwd("~/Dropbox/Webs/StatPages/Missing_Data/Missing-Logistic") # Logistic regression on survrate data set with missing data. Toggle triangle to corresponding code in R for Full and Missing Data Since we rarely care about the intercept anyway, I am not It may just be my imagination, that tests on the intercept, and, indeed, the value of the intercept Again intrus is not a significant predictor. The test on the intercept is no where near significant, but the other tests are similar to what The results of this analysis are quite similar even though we have eliminated 35 observations. That we just used, except that we would first read in the file with missing data and use The SAS commands would be the same as those Is based on the file that contains missing data. "NH" indicates that there is not a header for this file.) Analysis with Missing Dataīefore we go ahead and impute data for the missing values, we will look at an analysis that Include variable labels in line 1, and for other software I will leave the labels out. Will change "." to "999", and for R I will change "." to "NA." For some software I will Indicates that for this particular set I used "." as a missing data code. Because this truly was a random process, the data are missingĬompletely at random (MCAR). Having a full data set, I randomly deleted 35 observations and replaced them withĪ missing data code. We rarely careĪbout the intercept, but the others are important. Survrate, gsi, and avoid are all significant, but the test on intrus is not. ![]() These are chi-square tests instead of t tests. These areĪnalgous to the standard regression coefficients and their tests in multiple regression, although Next in the printout we come to the Analysis of Maximum Likelihood Estimates. Which is the log likelihood that we just used to test the overall model. Notice that if you take -2 Log Likelihood forĪ model with no predictors and one with four predictors, you get 154.691 - 57.243 = 97.448, See if there is a significant drop in predictability if we were to drop Intrus asĪ predictor. For example, we might use those statistics to Above those statistics are three fit statistics That test is roughly equivalent to the overall F test for a multiple correlation coefficient. Hypothesis that there is no relationship between these predictors and the dependent ![]() Near the top of the printout we have a Likelihood Ratio test on the null Ods output ParameterEstimates = lgsparms CovB = lgscovb Model outcome01 = survrate gsi avoid intrus /covb Proc Logistic data = Data /* Logistic regression with 4 predictors */ Input count id outcome01 survrate prognos amttreat gsi avoid intrus Notice that only four predictors are used for theĪnalysis, and one of those might best be dropped. The first few lines of this data set is shown below.Ĭount id outcome survrate prognos amttreat gsi avoid intrus Set and use these results for comparison with what we find with missing data. so we can begin with a logistic regression on a full data I did this simply to create a better example. ![]() This left me with the original 66 cases and an additionalĦ6 pseudocases. I doubled the sample size by randomly adding or subtracting random numbers to or from theĭata in the original set. Rating by the oncologist of the individual's expected survival time, Prognosis (aįour point scale), Amttreat (amount of treatment), GSI (the Global Symptom Index fromĭerogatis' Symptom Checklist 90), Avoid (a measure of avoidance behavior), and Intrus Versus not improved) as a function of several variables. I am going to use a set of data from a study that I was involved with some timeĪgo and published as Epping-Jordan, Compas, & Howell (1994). 20/.80, the results obtained from both will be In fact, if the outcome proportionsĪre no more extreme than about. Where the dependent variable is a dichotomy. Logistic regression is very similar to a standard multiple regression Reference point because it started out with complete data. The process is not a lot different,īut it gives me a chance to use another data set that provides a convenient I will focus on the process of "imputing" observations to replace missing Treatment of missing data, so I will not go over that ground here. That first page covers the basic issues in the You can see these at ( Missing-Part-One.html and Missing data are a part of almost all research, and we all have to decide ![]() Repeated Measures Analysis of Variance Using R Logistic Regression With Missing Data David C. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |