Department of Neurology Neurosciences Centre, and Clinical Epidemiology Unit, All India Institute of Medical Sciences, New Delhi Delhi, India
Critical appraisal has four main parts:
Critical Appraisal Questions for a Therapy Paper
Critical appraisal has four main parts:
To determine if the study at hand is relevant to your question.
To determine if the study is saying the right things (i.e. observed effects are likely to be correct or trustworthy): this is called validity check or quality assessment. In other words, is the information likely to be valid?
What is it saying? In other words, what is the potential impact of the treatment? This may be called ‘results assessment’.
To assess whether the treatment can help you in caring for your patient(s)? I call this applicability assessment, and application to your patient.
So, we have four main assessment tasks:
Applicability assessment and application
Having found the paper, you need to check if it really addresses your question. Check whether the research question in the paper matches your need to an extent that you feel like investing time and effort to go further. Sometimes, the population may not match; at other times, intervention or outcome may not match your need or settings. For example, you wanted to look for the role of surgery in brain haemorrhage, and you found a paper that describes the results of endoscopic neurosurgery in such cases. Your hospital does not have endoscope for neurosurgery and nobody can do it. In this case you may decide to drop this paper and look for some other paper to deal with your question. Alternatively, you may read only the abstract for getting some information and move to find another paper to proceed with the next steps.
Before 1945, researchers were commonly using one group of patients to test any new treatment. Typically, they will use the new treatment in a series of patients and count the outcome (success or failure, death or survival) and compare with their experience in previous series of patients of the same disease.
This method is alright if the treatment effects are ‘huge’. Penicillin in pneumococcal pneumonia may be an example. But such treatments are rare these days. Certainly, approval of any new drug for a given condition requires a two-group study (for details, Chap. 3: ‘why do we need a control group’).
Typically, the investigations are required to have two groups (one experimental and one control), which are similar or balanced in all the factors, that influence the outcome (called ‘prognostic factors’). Both the groups receive similar standard treatment, but one group is treated with the new treatment in addition. (The second group is often treated with a matching placebo.)
There are three main questions with regard to validity assessment that is start well, run well and finish well:
Start: Did the authors start with ‘balanced’ groups?
Run: Was the initial balance left undisturbed till the end?
In other words, did they maintain the balance during care?
Finish: Did the study end well? All subjects were followed up; their outcome was assessed properly and analysis was proper.
Let us deal with them individually.
Q.1. Did the Authors Start with ‘Balanced’ Groups (i.e. at Baseline)?
For this, authors need to plan for it, do it properly and check it.
Here we need to know how balanced groups are formed. The most effective and popular method is similar to ‘coin tossing’. We decide that as soon as an eligible patient comes, we will toss a coin – if it comes head, the patient will go to Group A, and if it comes tail, he will go to Group B. If we repeat this process with a ‘fair coin and fair tossing’, you will find that the two groups do become similar or ‘balanced’ in all prognostic factors if there are sufficient number of patients. Allocating patients to one group or another in this way is one method of ‘random allocation’, also called randomisation.
Nowadays, instead of coin tossing, people use a process similar to it, for example, computer ‘randomisation’. So, to plan for creating balanced groups requires a plan to randomise. A plan or design to randomise is called ‘randomised control design’. Sometimes, a chart is kept ready for randomisation. The chart is prepared using coin tossing or random number tables.
Next, they need to do the randomisation properly. In this context, a problem occurs if the investigator knows the group to which the patient under consideration is going to go. For example, if in a trial of surgical versus medical treatment, the investigator used a chart. If his next patient was to go to surgery and happens to be very sick, he will not include him in the study. He will wait till a ‘good risk’ case comes. But if he knew that the next patient was to go to the medical group, he will not hesitate to include him in the study. Thus, even though he was using properly designed randomisation chart, his two groups would not turn out to be ‘balanced’.
So, it’s very important that the group to which the next patient goes is kept undisclosed (or ‘concealed’) from the investigator who is recruiting the patient. This is best achieved by ‘telephone randomisation’ or use of similar looking placebos in random sequence with the experimental drug. Someone who is otherwise not involved in the study has the chart or computer, which gives group assignment. The recruiting physician checks eligibility and takes consent. Then he calls the randomising person or centre. After checking that the patient fulfils the key eligibility criteria, the randomising person registers the patient into the study, allots a study number and then assigns the patient to one of the groups. Once registered, the patient irrevocably remains in the study. This process ensures that the recruiting physician cannot anticipate the group to which the next patient goes and thus cannot consciously or subconsciously tamper with the randomisation process. Such a process in which the group assignment of the next patient remains undisclosed (or concealed) from the recruiting physician is termed ‘concealed randomisation’.
Randomisation is called ‘concealed’ if the group to which the next patient will be assigned remains undisclosed (concealed) from the recruiting physician.
Even after planning (randomised control design) and doing (concealed randomisation) everything properly, we cannot be 100 % sure that the resulting groups are balanced (just as even after careful curriculum planning and flawless teaching, we cannot say that 100 % students will pass). We need to check the results. This means we need to check whether the percentage of patients with the various prognostic factors is similar in the two groups. In other words, are the groups prognostically similar at baseline? This can be done by first recollecting the prognostic factors of the condition and then checking the table of baseline characteristics whether the per cent of patients in the two groups are similar.
Randomisation is no guarantee that the resulting groups will be similar. You need to check the comparability of the groups at baseline.
To summarise, starting well means having a control group, such a control group which is similar to the experimental one (created through randomisation). So, we should check three things:
Is the study design randomised control one?
Was the randomisation concealed?
Are the baseline characteristics (prognostic factors) comparable (balanced or similar) between the two groups?
There are three Cs here – (1) control group, (2) concealed randomisation and (3) comparability of groups.
Q.2. Did the Investigators or Subjects Disturb the Balance?
The answer to this question requires checking many things. A story might make it easy to understand. Two housewives, good neighbours and friends, one day went to the market to buy potatoes. Each picked up a box of potatoes, weighing 5 kg. The salesman checked both and each were exactly 5 kg. They came home. While relaxing on sofa and watching TV, they asked the housemaid to bring the potato boxes from the car, wash them and put them back in the respective bags. When the housemaid reported back, she was a bit nervous and asked if she could have a balance. She had found that some potatoes had fallen off from the boxes in the car, but she had done her best to put them in relevant boxes. She wanted to check whether the boxes were of equal weight. She got a balance but had no weights. When she put the two boxes, one on each side of the balance, they were unequal. Assuming that they were exactly equal at the shop, what are the possible reasons for the imbalance now? The possible reasons may arise as follows:
On the way to home – unequal number or size of potatoes may fall off (loss to follow-up).
In the washing process – one group may be washed more thoroughly than the other (unequal care).
Some potatoes may have been mixed up between the boxes (crossovers).
The measuring instrument or process may be biased (measurement bias).
Similarly, in treatment studies, imbalance may arise as a result of unequal care, or crossovers from one group to another or losses to follow-up, or from biased measurement or analysis. So, you need to ask:
Were patients in the two groups treated equally? Giving more care to the experimental group with intervention other than the experimental one is called ‘co-intervention’.
Were the crossovers nil or minimum?
Was there adequate compliance?
Again, there are 3 Cs here: co-intervention, crossovers (also called contamination) and compliance (see below for explanation).
Good finish means all patients are followed up (complete follow-up), their outcomes are measured correctly (with reliable and valid instruments and without bias) and the analysis is credible (so that it does not introduce bias).
Again there are three Cs at finish:
Correct outcome measurement
To summarise, the questions to assess therapy paper are related to 3 × 3 Cs as shown in the Box 5.1:
Let us take each question in detail.
Start well: 3 Cs
Comparability of groups at baseline
Run well: 3 Cs
Co-intervention minimal or nil
Contamination minimal or nil
Compliance maximal or adequate
Finish well: 3 Cs
Correct outcome measurement
Q.1. Was There a Control Group?
1A. Why do we ask this question?
We ask this question because improvement may occur in natural course or as a result of Hawthorne effect, placebo effect or regression to the mean.
Many patients improve in a natural course of their disease or remain symptom free. For example, many patients with stroke recover. Many patients with berry aneurysm remain symptom free for a variable length of time. If we do not have a control group, we would not know whether the improvement is due to the new treatment or in natural course.
The control group should be contemporaneous (i.e. current, along with the new treatment group), not historical. The reason is that patients in a previous time period may be prognostically different or may not have had the benefit of organisational or technological advances, which occur with time. Also patients (or people) change their behaviour or perception when they join a research project. This is a natural human tendency. This is known as Hawthorne effect. The effect that we might see in the new treatment group could be due to Hawthorne effect, not necessarily due to the new treatment. But if there is a control group, then any difference, which we observe between the two groups, can be attributed to the new treatment (provided other conditions given below are met) (for placebo effect and regression to the mean, see Chap. 3: Need for a control group).
1B. How do we answer this question?
It would be apparent from the abstract of the paper if there is any comparison with a control group.
1C. How do we interpret the answer?
If the answer is yes, proceed further to evaluate. If it is no, you cannot consider the study as definitive, unless all alternative (competing) explanations for the observed effect can be dismissed. The alternative explanations can be natural history, Hawthorne effect, placebo effect, regression to the mean, effect of a confounder (see below), or co-intervention or bias in outcome measurement.
Q.2. Was Control Group Created Through Random Allocation (Randomisation)? Was Allocation Concealed?
The question has two parts:
First, is there random allocation and second, if yes, is it concealed?
2A. Why do we ask this question?
We ask this question as a follow-up to the first question. If there is a control group, one needs to know whether it is appropriate. How has it been created or assembled? One intuitively appealing method to create a control group is what is called ‘matching’. One control matched in certain characteristics to each case is included in the control group. There are two main problems with this approach: First, we need to match for all prognostic factors, then and then only the treatment and control group can be prognostically balanced, and the difference between the two groups can be attributed to the new treatment. Unfortunately, it is very hard, nearly impossible, to find controls matching in all prognostic factors to each case. Even the usual matching of age and sex is so difficult to do that investigators have to accept controls who are 2–5 years younger or older. (Age and sex matched controls are very common for case control studies to determine aetiologic factors. Such studies adjust for other factors in analysis by using multivariate statistics. For treatment studies, this approach is not acceptable.)
Second, matching is possible only for the known prognostic factors. Unfortunately, our knowledge with respect to prognostic factors is very limited. This knowledge is not able to explain, in many diseases, even 50 % of the good and bad outcomes. In other words, there are some known and many unknown prognostic factors. In a treatment study, prognostic factors meet all the epidemiologic criteria of ‘confounders’ or ‘confounding variables’ and are often referred to as confounders. You may wonder, if this is the case, how can one balance for unknown prognostic factors? Well, there is a method which can do this, and the method is as simple as coin tossing, but called, in technical language, randomisation (see Chap. 3).
For randomisation to succeed, patient recruitment (or enrolment) into the study should be done without the knowledge of the group (control or treatment) to which the patient will go (be allocated). This is called concealed (or blinded) allocation. The reason is that if the allocation is unconcealed, then those enrolling the patients may systematically recruit sicker – or less sick – patients into one of the study groups. This will either underestimate (if treatment group is sicker) or overestimate (if control group is sicker) the treatment effect, i.e. the study will give a biased result.
Sometimes, there is confusion between ‘concealment’ and ‘blinding’ (e.g. of patients and physicians). One way to distinguish between the two is to consider concealment as blinding before initiation of treatment and the usual blinding as blinding after initiation of treatment. Before randomisation, two research steps are necessary – eligibility assessment and consent (See Fig. 4.1). Bias can be introduced in any of these steps.
‘Concealed randomisation’ is to take place before initiation of treatment, whereas ‘blinding’ involves steps at and after initiation of treatment.