A recent post by Dr. Feinman reminded me of a very meaty post by Dr. Mike Eades, so I thought I would share both blogposts here. Both posts discuss the different aspects of Intention To Treat, an analysis method that can result in misleading or biased results.
Personally, I am from the camp that you don't just throw out data points to "clean things up", nor do you make up data points to make everything even at the end. The making up of actual data points used to be a common practice before computers did everything. Data wasn't usually made up out of thin air, but a common practice when doing an ANOVA was that if you were taking say 4 samples per treatment, and when you got the data back and found that some of the treatments had only 3 samples, you would take an average of the three real samples and use that average as the 4th fake sample. This would make the calculations go more smoothly.
Kind of like that joke where the guy is looking for his wallet under the street lamp because the light is better.
Anyway, here are the links:
Another big (well, to me) problem with some of these studies is that in addition to the methods of analysis outlined in the above links, researchers attempt to explain the reasons it's OK to make up data. Here's how they do it.
Before any experiment is started, the appropriate sample size is selected. It is usually a dance between what you want to know, and how much $$$$ you have to learn it.
You'll usually find information about how this was done in the paper, and almost never in the abstract. It is in code, which is just as well, because if you don't really understand the stuff about statistical significance and the power of tests, it is probably better to just let your eyes glaze over that part anyway.
The study is designed to answer a main question, and the appropriate sample size is determined based on that question, what type of data you are collecting and what risks you are willing to take. When people drop out of a study, for any reason, sometimes the researchers can get information on why, and sometimes they do analyze this information to see if something is up. Trouble is, they are attempting to deal with the situation with a much different sample size than what is typically necessary to make such a determination.
For example, in a weight loss study, they might choose a sample size in order to detect a difference between say, 10 pounds lost in 8 weeks. They pick a sample size that they can afford, but it might not be enough to detect a difference in, say, 5 pounds. It's just not powerful enough.
Now, in Dr. Eades example, he choose a total of 100 participants, but in addition to keeping track of the weight lost, there was also this huge difference between the percentage of participants who dropped out of each group. Usually, the proportions are closer together, and the situation is a bit more murky, and because the proper sample size was not determined to answer THIS question, it is almost certainly too low. And the usual conclusion is that the differences in drop-out rates are no different. This then gives the researchers green light to go away and feel justified in making up some data, assuming that the biases introduced are the same for both cases.
So here are the clif notes:
1. Researchers design a study to answer question A
2. Researchers get a bunch of bad data.
3. Researchers run the data through a bunch of check-point Charlie's (question B, C....) that are ill-formulated and not powerful.
4. Of course, Check-point Charlie's say all is OK
5. Researchers proceed
There will be a bunch of statistical mumbo-jumbo about the additional questions, to make those check-point Charlie's seem like real statistics. Yes, badges and guns, but no bullets.