segunda-feira, 6 de julho de 2015

Effects of Exercise on Depressive Symptoms in Adults With Arthritis and Other Rheumatic Disease

A Systematic Review of Meta-analyses

George A Kelley; Kristi S Kelley

Abstract

Background. Depression is a major public health problem among adults with arthritis and other rheumatic disease. The purpose of this study was to conduct a systematic review of previous meta-analyses addressing the effects of exercise (aerobic, strength or both) on depressive symptoms in adults with osteoarthritis, rheumatoid arthritis, fibromyalgia and systemic lupus erythematous.
Methods. Previous meta-analyses of randomized controlled trials were included by searching nine electronic databases and cross-referencing. Methodological quality was assessed using the Assessment of Multiple Systematic Reviews (AMSTAR) Instrument. Random-effects models that included the standardized mean difference (SMD) and 95% confidence intervals (CIs) were reported. The alpha value for statistical significance was set at p ≤ 0.05. The U3 index, number needed to treat (NNT) and number of US people who could benefit were also calculated.
Results. Of the 95 citations initially identified, two aggregate data meta-analyses representing 6 and 19 effect sizes in as many as 870 fibromyalgia participants were included. Methodological quality was 91% and 82%, respectively. Exercise minus control group reductions in depressive symptoms were found for both meta-analyses (SMD, -0.61, 95% CI, -0.99 to -0.23, p = 0.002; SMD, -0.32, 95% CI, -0.53 to -0.12, p = 0.002). Percentile improvements (U3) were equivalent to 22.9 and 12.6. The number needed to treat was 6 and 9 with an estimated 0.83 and 0.56 million US people with fibromyalgia potentially benefitting.
Conclusions. Exercise improves depressive symptoms in adults with fibromyalgia. However, a need exists for additional meta-analytic work on this topic.

Background

Arthritis is a broad term used to describe more than 100 rheumatic diseases and conditions that affect joints as well as the surrounding tissues around joints.[1] The most common form of disability in the United States (US), arthritis affects all racial and ethnic groups and is more common in women than men.[1] Based on 2007–2009 data, the prevalence of doctor-diagnosed arthritis in the US was reported to be 50 million, or about 20%, of all adults.[2] In terms of costs, an increase of 41.8 billion dollars in total costs (from 86.2 to 128 billion dollars) was reported between 1997 and 2003 in the US.[3]
Four common types of arthritis and other rheumatic diseases are osteoarthritis, rheumatoid arthritis, fibromyalgia and systemic lupus erythematous. More specifically, the prevalence of osteoarthritis, rheumatoid arthritis, fibromyalgia and systemic lupus erythematous have been estimated to be 27 million,[4] 1.5 million,[5] 5 million,[4] and 161,000,[4]respectively. A common problem among adults with arthritis is depression. For example, a recent study that included 1,793 US adults 45 years of age and older with arthritis found that 18% had depression while only slightly more than half (51.3%) sought help for their depression.[6]
One potential treatment option for adults with arthritis and depression is exercise, a low-cost nonpharmacologic intervention that is available to the vast majority of the general population. Systematic reviews with meta-analysis, a quantitative approach for combining the results of different studies on the same topic,[7] are considered by many to be the most important type of evidence for determining the efficacy and effectiveness of various treatments on selected outcomes.[8,9]Unfortunately, with the proliferation of systematic reviews on the same topic, it becomes difficult to make informed decisions regarding the effects of various interventions on selected outcomes. For example, a recent systematic review identified 33 previous meta-analyses examining the effects of exercise on blood pressure.[10] Given the proliferation of systematic reviews, with or without meta-analysis on the same topic, a need now exists to systematically review these previous reviews in order to provide decision-makers and practitioners with the information they need to make evidence-based decisions regarding the efficacy and effectiveness of various interventions on selected outcomes as well as provide researchers with direction for future research.[11] Given the former, the purpose of the current study was to conduct a systematic review of previous meta-analyses addressing the effects of exercise (aerobic, strength training or both) on depressive symptoms in adults with osteoarthritis, rheumatoid arthritis, fibromyalgia or systemic lupus erythematous.

Methods

Study Eligibility

The a priori inclusion criteria for this study were as follows: (1) previous systematic reviews with meta-analysis of randomized controlled trials or data reported separately for randomized controlled trials if the meta-analysis included other study designs, (2) adults 18 years of age and older with osteoarthritis, rheumatoid arthritis, fibromyalgia or systemic lupus erythematous, as defined by the inclusion criteria of the authors of the original meta-analyses, (3) aerobic and/or strength training intervention(s) lasting an average of at least 4 weeks, (4) published and unpublished (dissertations and master's theses) studies in any language, (5) exercise minus control group difference in depressive symptoms as a primary outcome in the original meta-analysis and reported as the standardized mean difference (SMD). Meta-analyses were limited to randomized controlled trials because they are the only way to control for unknown confounders as well as the fact that nonrandomized controlled trials tend to overestimate the effects of treatment in healthcare interventions.[12,13] In addition, meta-analyses in which the focus was on acute studies, for example studies in which participants would perform one or more bouts of exercise and then immediately be assessed for depressive symptoms, were avoided. Given the different instruments used to assess depressive symptoms, the inclusion of meta-analyses were limited to those in which the SMD was reported. Any studies that did not meet all of the above criteria were excluded. Ineligible studies were excluded based on at least one of the following: (1) inappropriate population (for example, children), (2) inappropriate intervention (for example, pharmacologic), (3) inappropriate comparison (for example, exercise versus pharmacologic), (4) inappropriate outcome (for example, anxiety), (5) inappropriate study type (for example, meta-analysis that included non-randomized controlled trials, systematic review without meta-analysis).

Data Sources

Using the graphical-user interfaces for each database, the following electronic sources were searched from their inception forward: (1) PubMed (1966 to July 4, 2013), (2) Sport Discus (1975 to July 4, 2013), (3) Web of Science (1955 to July 4, 2013), (4) Scopus (1823 to July 4, 2013), (5) Proquest (1861 to July 4, 2013), (6) Cochrane Database of Systematic Reviews (1996 to July 4, 2013), (7) Physiotherapy Evidence Database [(PEDRO) (1929 to July 5, 2013)], (8) Database of Abstract of Reviews of Effects (DARE) (1991 to July 5, 2013), (9) Health Evidence Canada (HEC) (1985 to July 5, 2013). Scopus was included in our database searches because it has been reported to provide coverage of Embase, a database that was not available to us.[14] While the specific search strategies varied depending on the database searched, key terms or forms of key terms included exercise, physical activity, physical fitness, arthritis, fibromyalgia, lupus, randomized, depression, systematic review and meta-analysis. A copy of the search strategies used for each database can be found in Additional File 1. After removing duplicates, the overall precision of the searches was calculated by dividing the number of studies included by the total number of studies screened.[15] The number needed to read (NNR) was then calculated as the inverse of the precision.[15] In addition to electronic database searches, cross-referencing for potentially eligible meta-analyses from retrieved reviews was also conducted. All studies were stored in Reference Manager, version 12.0.[16]

Study Selection

All studies were selected by both authors, independent of each other. They then met and reviewed their selections for agreement. Any disagreements were resolved by consensus.

Data Abstraction

Prior to data abstraction, coding sheets were developed in Microsoft Excel 2010.[17] The coding sheets could hold up to 193 items from each included meta-analysis. Both authors coded all studies independent of each other. Upon completion of coding, all coding sheets were merged into one common codebook and reviewed by both authors for correctness. Disagreements were resolved by consensus. Using Cohen's kappa statistic,[18] the overall agreement rate prior to correcting discrepancies was k = 0.88, considered to be "almost perfect".[19]

Methodological Quality

Methodological quality for each included meta-analysis was assessed using the Assessment of Multiple Systematic Reviews (AMSTAR) Instrument.[20–23] AMSTAR was chosen over other instruments[24,25]because of its reported inter-rater reliability (k = 0.70), construct validity (intra-class correlation coefficient = 0.84) and feasibility (average of 15 minutes per study to complete).[22] The 11-item questionnaire is designed to elicit responses of "Yes", "No", "Can't Answer", or "Not Applicable". The response "Can't Answer" is chosen when an item is relevant but not described. The response "Not Applicable" is chosen when an item is not relevant (for example, meta-analysis of data not possible).[20–23] For consistency when summing responses, the following question was modified from "Was the status of publication (i.e. grey literature) used as an inclusion criterion?" to "Was the status of publication (i.e. grey literature) as an inclusion criterion avoided?" In addition, we considered the question regarding conflict of interest as adequately met if the authors of the systematic review provided a statement on conflict of interest versus the reporting of conflict of interest by both the authors of the systematic review and all the original studies included in the meta-analysis. Both authors, independent of each other, assessed methodological quality. They then met and reviewed every item for agreement. Disagreements were resolved by consensus. The overall agreement rate prior to correcting discrepancies was k = 0.94, considered to be "almost perfect".[19]

Data Synthesis

The main results from each meta-analysis were extracted a priori[7] with a focus on random-effects models because they incorporate between-study heterogeneity into the model.[26,27] The SMD, 95% confidence intervals (CIs) and associated z and alpha value for z were abstracted or calculated if sufficient data were available to do so. Standardized mean differences were classified as trivial (<0.20), small (0.20 to 0.49), medium (0.50 to 0.79) or large (>0.80).[28] Two-tailed alpha values ≤ 0.05 along with non-overlapping 95% CIs were considered as statistically significant. The Q statistic, a measure of heterogeneity, was also extracted for each outcome. An alpha value ≤ 0.10 was considered to represent statistically significant heterogeneity.[29] Because of issues surrounding the power of the Q statistic, the I 2 statistic was also reported if it was provided in the meta-analysis or calculated if sufficient data existed to do so.[29] Values were considered to be representative of low (0 to 25%), moderate (25 to 50%), large (50 to 75%) or very large (>75%) inconsistency.[29] In addition to Q and I 2 , tau-squared (τ2) was also reported or calculated if sufficient data were available. An a priori decision was made to not pool results from the different meta-analysis because of the expectation that many of the same studies would be included in the different meta-analyses, thus violating the assumption of independence. Post hoc, a decision was made to pool the results of one included meta-analysis[30] that reported separate results for studies that met the American College of Sports Medicine (ACSM) Guidelines for aerobic fitness[31,32] (4 studies), those that did not meet the recommendation (1 study), and those in which strength training was performed (1 study). The rationale for this decision was based on the desire to examine exercise in a broader context. However, results were also reported separately as was done in the original meta-analysis.[30]
Since it was assumed that none of the eligible meta-analyses would include 95% prediction intervals (PIs), these were calculated if the findings were statistically significant and the results from each study included in each meta-analysis were provided.[33–35] Prediction intervals are used to estimate the treatment effect in a new trial[33–35] and may be more appropriate in decision analysis.[36]
In order to enhance practical application, the number-needed-to treat (NNT) was calculated for any overall findings that were reported as statistically significant. This was accomplished using the approach suggested by the Cochrane Collaboration and was based on a control group risk of 30%.[8] Since both included meta-analyses were limited to depressive symptoms in those with fibromyalgia,[30,37] a post hocdecision was made to provide a gross estimate, based on the NNT results, of the number of US people with fibromyalgia who could improve their depressive symptoms by starting and maintaining a regular exercise program. Estimates were based on previous research showing that approximately 5 million people in the US have fibromyalgia.[4] In addition, Cohen's U3 index was used to determine the percentile gain in the intervention group.[38] For example, a SMD of 0.50 suggests that on average, a person in the exercise group would be at the 69th percentile in terms of improving their depressive symptoms. This translates into being 19 percentiles higher than the control group.[39]
If not already provided and if sufficient data were available to do so, small-study effects (for example, publication bias) were assessed quantitatively using the regression-intercept approach of Egger et al..[40] One-tailed alpha values ≤ 0.05 for t were considered to be representative of statistically significant small-study effects. In addition, influence analysis was conducted with each SMD deleted from the model once. Cumulative meta-analysis, ranked by year, was also conducted to examine results over time.[41] In addition, since one meta-analysis[37] included studies in which there were active control groups (hot packs, stretching, etc.), we used a two-category, mixed-effects ANOVA model to compare for statistically significant differences between active control groups versus all other included controls (usual care and attention control). A between group alpha level ≤ 0.05 was considered to be statistically significant.
Negative SMDs were indicative of benefit, i.e., decreases in depressive symptoms. Analyses were carried out using Comprehensive Meta-Analysis (version 2.2)[42] and Microsoft Excel 2010.[17]

Results

Characteristics of Included Meta-analyses

Of the 95 citations initially identified, 69 (72.6%) remained after removing duplicates. Of the 69 articles that were screened, two aggregate data meta-analyses, both in participants with fibromyalgia, met the criteria for inclusion.[30,37] Post hoc, one study that was initially included was removed because it was not focused specifically on the effects of exercise on depressive symptoms in adults with a specific type of arthritis or other rheumatic disease, but rather, in adults with a variety of chronic illnesses.[43] The precision of the searches was 0.03 while the NNR was 35. The major reasons for exclusion of ineligible studies were an inappropriate study design (48.4%) followed by an inappropriate intervention (17.9%), population (18.9%), outcome (9.5%) and comparison (6.3%). No meta-analysis was deleted because they did not report their results as a SMD. A flow diagram that depicts the search process can be found in Figure 1 while a list of all excluded studies, including the reasons for exclusion, is shown in Additional File 2 .
For the two included meta-analyses,[30,37] one included studies on aerobic or strength training exercise[30] while the second was limited to aerobic exercise studies but also included studies in which participants performed strength training as long as the number of minutes of strength training did not exceed the number of minutes spent performing aerobic exercise.[37] Both meta-analyses included fibromyalgia participants as defined by the diagnostic criteria of the original studies.[30,37] A general description of the characteristics of each meta-analysis is provided in Table 1 .

Figure 1.

Flow diagram for the selection of studies. *, number of reasons exceeds the number of studies because some studies were excluded for more than one reason.

Methodological Quality

Item by item results for each meta-analysis using the AMSTAR instrument is shown in Additional File 3 . The meta-analysis by Busch et al.[30] satisfied 10 of the 11 (91%) of the AMSTAR criteria while the study by Hauser et al.[37] met 9 of the 11 criteria (82%). One meta-analysis was judged as (1) not avoiding the status of publication as an inclusion criterion and (2) not providing a list of excluded studies.[37]

Data Synthesis

The overall results for both included systematic reviews with meta-analysis are shown in Table 2 . As can be seen, SMD reductions in depressive symptoms included non-overlapping confidence intervals for both with one meta-analysis yielding a small SMD[37] and one yielding a medium SMD.[30] While the overall SMD was approximately twice as large for the Busch et al. meta-analysis,[30] the between-meta-analysis 95% CIs for both were overlapping, suggesting no statistically significant difference between the two studies.[30,37] In addition, the pooled SMD for the Busch et al. meta-analysis was the result of the current investigative team combining the results from those studies meeting the ACSM guidelines for aerobic exercise with the one strength training study and one aerobic study not meeting the ACSM recommendations, as reported by the authors.[30] A statistically significant and a large amount of heterogeneity were found for both meta-analyses as well as overlapping 95% PIs.[30,37] Data for the NNT, number who could benefit and percentile improvement are shown in Table 3 . No small-study effects were observed for the overall meta-analysis results of Busch et al.[30] (β0, -3.8, 95% CI, -10.9 to 3.3, p = 0.11) while statistically significant small study effects were observed for the Hauser et al.[37]meta-analysis (β0, -2.4, 95% CI, -4.8 to -0.01, p = 0.02). With each study deleted from the overall model once for each meta-analysis,[30,37] SMD changes in depressive symptoms remained statistically significant with non-overlapping confidence intervals. Results ranged from -0.49 (95% CI, -0.12 to -0.85, p = 0.009) to -0.72 (95% CI, -0.35 to -1.10, p < 0.0001) in the study by Busch et al.,[30] and -0.23 (95% CI, -0.08 to -0.38, p = 0.003) to -0.36 (95% CI, -0.15 to -0.57, p = 0.001) in the study by Hauser et al..[37] Cumulative meta-analysis, ranked by year, revealed that SMD changes in depressive symptoms have remained statistically significant with non-overlapping confidence intervals since 2001 for the meta-analysis by Busch et al.[30] (range of years, 1996 to 2004) and from 2004 onward in the meta-analysis by Hauser et al.[37] (range of years, 1996 to 2009). No statistically significant difference was found between active control groups and the other types of control groups included in the Hauser et al. meta-analysis (Qb = 1.13, p = 0.29).[37]
Subgroup results were provided for both meta-analyses.[30,37] For the Busch et al. meta-analysis,[30]results for depressive symptoms were reported according to studies meeting the ACSM recommendations for aerobic exercise,[30,31] (4 studies), those not meeting the recommendations (1 study) and those limited to strength training exercise (1 study). For those studies meeting the ACSM recommendations, a statistically significant reduction in depressive symptoms was observed (SMD, -0.40, 95% CI, -0.04 to -0.76, p = 0.003). The SMD for the study not meeting the recommendations was -1.22 (95% CI, -1.90 to -0.54) while the SMD for the study in which strength training was performed was -1.14 (95% CI, -2.08 to -0.20). Subgroup analyses in the study by Hauser et al.[37] included results partitioned according to two studies in which low intensity exercise (50% to 60% of maximum heart rate) was compared to moderate intensity exercise (60% to 80% of maximum heart rate) as well as eight studies in which land-based exercise was compared to water-based exercise. The authors reported no statistically significant differences between either low or moderate intensity exercise (SMD, -0.16, 95% CI, -0.67 to 0.13, p = 0.53) or land versus water-based exercise (SMD, -0.44, 95% CI, -0.88 to 0.01, p = 0.05).

Discussion

Findings

The purpose of the current study was to conduct a systematic review of previous meta-analyses addressing the effects of exercise (aerobic, strength training or both) in the treatment of depressive symptoms in adults with osteoarthritis, rheumatoid arthritis, fibromyalgia or systemic lupus erythematous. While no meta-analyses were included for participants with osteoarthritis, rheumatoid arthritis or systemic lupus erythematous, two meta-analyses in fibromyalgia participants met the eligibility criteria.[30,37] Generally speaking, it appears that exercise can reduce depressive symptoms in adults with fibromyalgia. This interpretation is supported by (1) the non-overlapping confidence intervals for SMDs, (2) statistical significance of the SMDs, (3) sensitivity of results with each study deleted from the model once, (4) cumulative meta-analysis, (5) low NNT, (6) absolute number of people in the US who might benefit by starting and maintaining an exercise program, (7) percentile improvements as a result of exercise and (8) good overall methodological quality of each meta-analysis as assessed by the AMSTAR instrument. However, for both,[30,37] a statistically significant and relatively large amount of heterogeneity was observed as well as overlapping prediction intervals. In addition, small-study effects were found for the Hauser et al.[37] meta-analysis and based on the work of others,[44] may have been underpowered for the Busch et al.[30] study. Consequently, the strength of the overall findings may be weakened by these results.
The overall findings of the included meta-analyses compare quite favorably to the effects of pharmacologic interventions on depressive symptoms in adults with fibromyalgia. For example, Hauser et al.[45] conducted a meta-analysis of randomized controlled trials on the effects of antidepressants (tricyclic and tetracyclic antidepressants, selective serotonin reuptake inhibitors, serotonin and noradrenaline reuptake inhibitors, monoamine oxidase inhibitors) on depressed mood in adults with fibromyalgia. Across 10 SMDs that included 887 participants (451 treatment, 436 control) a small SMD improvement of -0.26 (95% CI, -0.39, -0.12, p < 0.001) was reported. However, in contrast to the exercise meta-analyses included in the current study,[30,37] no statistically significant heterogeneity was observed (Q = 6.39, p = 0.70, I 2 = 0%). Thus, while the effects of antidepressants were generally smaller,[45] the results were more consistent than the two exercise meta-analyses included in the current study.[30,37] The former notwithstanding, one should also consider the potential side-effects and costs of any type of pharmacotherapy, including antidepressants.

Implications for Research

The results of the current systematic review of previous meta-analyses have at least eight implications for future research. First, while the overall quality of both meta-analyses was considered to be good, there are several areas that might be improved upon in future meta-analytic work. These include avoiding the use of publication status as an inclusion criterion as well as documenting and providing a list of not only included studies but also excluded studies, including the reasons for exclusion. While there is little doubt in the investigators' minds regarding the latter recommendation, avoiding the use of publication status as an inclusion criterion could be questioned. For example, van Driel et al. suggested that (1) the difficulty in retrieving unpublished work could lead to selection bias, (2) many unpublished trials are eventually published, (3) the methodological quality of such studies are poorer than those that are published, and (4) the effort and resources required to obtain unpublished work may not be warranted.[46]
Second, both included studies were aggregate data meta-analyses.[30,37] While still the most common type of meta-analysis, individual-participant data meta-analyses (IPD) have been suggested to be the gold standard when attempting to quantitatively combine data from different studies on the same topic.[47] Thus, future meta-analysts may want to consider using the IPD approach when addressing the effects of exercise in the treatment of depressive symptoms in adults with arthritis and other rheumatic diseases. However, the use of the IPD approach needs to be considered with respect to the ability to retrieve IPD from investigators as well as the increased costs and time associated with the conduct of such.[48]
Third, given the apparent paucity of data available on adverse events and cost-effectiveness in the original studies included in both meta-analyses,[30,37] there is a need for future randomized controlled trials to collect and report this information. The inclusion of such information is critical when making decisions regarding which interventions to recommend over others.
Fourth, the dose–response effects of exercise on depressive symptoms in adults with fibromyalgia are still unknown. While the meta-analysis by Hauser et al. found no statistically significant differences between either low or moderate intensity aerobic exercise and land versus water-based exercise,[37]future research in this area appears warranted. Greater knowledge of the dose–response effects of exercise on depressive symptoms in adults with fibromyalgia should lead to better treatment in this population.
Fifth, no meta-analysis that was limited to the effects of exercise on depressive symptoms in adults with osteoarthritis, rheumatoid arthritis or systemic lupus erythematous met the eligibility criteria for the current study. Since the effects of exercise on depressive symptoms may vary across different populations, it appears plausible to suggest that future meta-analytic work be limited and focused on these groups. This is of course assuming that previous randomized controlled trials have assessed depressive symptoms in these populations.
Sixth, because neither meta-analysis reported NNT with respect to depressive symptoms,[30,37] it is suggested that future meta-analytic work include such. From the investigators' perspective, the inclusion of such information is important because it provides practically relevant information to decision-makers (practitioners, policy-makers, etc.) regarding the effects of exercise on depressive symptoms in adults with fibromyalgia.
Seventh, given the significant heterogeneity in the included meta-analyses, future meta-analytic research on depressive symptoms in adults with fibromyalgia should try and identify the sources of this heterogeneity. Broadly, this may include such things as participant characteristics (for example, age, gender), intervention characteristics (for example, length, frequency, intensity, duration, mode) and outcome assessment methods (for example, type of instrument used to assess depression). Again, this is of course assuming that sufficient data are available to examine these potential predictors.
Eighth, the majority of the participants that comprised both meta-analyses were women.[30,37] The inclusion of primarily women for the studies nested within each meta-analysis appears plausible given that the prevalence of fibromyalgia is greater in women than men.[4] However, it would appear appropriate to suggest that future research examine the effects of exercise on depressive symptoms in men to ensure that no differences in response exist.

Implications for Practice

The results of the current systematic review of previous meta-analyses have important implications for practice. First, while there was a lack of adverse event and cost-effectiveness data as well as substantial between-study heterogeneity in both meta-analyses,[30,37] exercise appears to improve depressive symptoms in adults with fibromyalgia and could be recommended as part of an overall treatment plan that may also include education and/or pharmacotherapy. This exercise recommendation is consistent with previous recommendations on aerobic and strength training for a variety of outcomes in adults with fibromyalgia.[49,50] Second, while the dose–response effects of exercise in the treatment of depressive symptoms in adults with fibromyalgia have not been firmly established,[51] it would appear prudent to recommend that practitioners follow the general recommendations described by Skinner.[51]These include exercise programs that (1) minimize any increase in pain, fatigue or other symptoms, (2) begin at a low level and progress gradually, (3) allow for day to day variations based on how the participant feels, (4) improve the physiological and psychological functioning of the participant and (5) promote long-term adherence.[51] More specifically, a combined program of low to moderate intensity aerobic exercise (walking and swimming for example) combined with low to moderate intensity strength training may be better tolerated than high intensity activity.[51] Given the day to day variation in how fibromyalgia participants may feel, intensity may be better monitored using something like rating of perceived exertion scales[52–55] versus a percentage of 1-repetition maximum (strength training) and maximum heart rate, heart rate reserve or percentage of maximum oxygen consumption (aerobic training).[51]

Strengths and Potential Limitations of Current Study

There are at least five strengths of the current study. First, to the best of the authors' knowledge, this is the first systematic review of previous meta-analyses that has examined the effects of exercise on depressive symptoms in adults with arthritis and other rheumatic disease, an increasingly important approach for addressing the effects of various healthcare interventions and making subsequent decisions regarding such.[11] Second, the additional analyses conducted based on the available data (small-study effects, influence analysis, NNT, etc.), helped strengthen the information from which conclusions could be drawn from both included meta-analyses.[30,37] Third, the calculation and inclusion of PIs for the overall results from each included meta-analysis provides investigators with information that can aid them in planning future randomized controlled trials examining the effects of exercise on depressive symptoms in adults with fibromyalgia. Fourth, the investigative team believes that the calculation of percentile improvements, NNT and gross estimates of the absolute number of adults with fibromyalgia who could improve their depressive symptoms by initiating and maintaining a program of regular exercise enhances the practical applicability and importance of findings. Fifth, while the inclusion of only two meta-analyses may initially appear to be a limitation of the current study, the investigators view this as a strength given that as many as 32 SMDs on depressive symptoms in as many as 870 participants across multiple studies were included. To put this in perspective, the Cochrane Collaboration suggests that the minimum number of studies needed to conduct a meta-analysis is two.[8]Given this line of thinking, it would appear plausible to suggest that the minimum number of meta-analyses that need to be included in a study of systematic reviews with meta-analyses is one.
While there are several strengths of the current study, there are also at least four potential limitations. First, the investigative team focused on depressive symptoms.[30,37] While more focused and applicable, other relevant outcomes (anxiety, quality of life, quality of sleep, pain, fatigue, stiffness, physical function) were not captured. Second, the gross population estimates for the number of people in the US with fibromyalgia who could improve their depressive symptoms by beginning and maintaining a regular exercise program assumed that none of those with fibromyalgia in the US exercise on a regular basis. Unfortunately, the investigative team is not aware of any current research on the prevalence of physical activity in US adults with fibromyalgia, and thus, was unable to adjust for such. In addition, it was not possible to adjust for any other potentially confounding factors (for example, age). Therefore, the reported estimates might be inflated. Third, as with any systematic review, many of the biases inherent in both the included meta-analyses as well as the randomized controlled trials that comprised each meta-analysis may have also been present in the current study. Fourth, while results were generalized to both men and women, the majority of participants included in both meta-analyses were women.[30,37]Thus, such a generalization may have been inappropriate.

Conclusions

The results of the current systematic review of previous meta-analyses suggest that exercise improves depressive symptoms in adults with fibromyalgia. However, a need exists for additional meta-analytic work in this area, including, but not limited to, the inclusion of adults with osteoarthritis, rheumatoid arthritis and systemic lupus erythematous.



Predicting Response to Physiotherapy Treatment for Musculoskeletal Shoulder Pain

A Systematic Review

Rachel Chester; Lee Shepstone; Helena Daniell; David Sweeting; Jeremy Lewis; Christina Jerosch-Herold


Abstract

Background People suffering from musculoskeletal shoulder pain are frequently referred to physiotherapy. Physiotherapy generally involves a multimodal approach to management that may include; exercise, manual therapy and techniques to reduce pain. At present it is not possible to predict which patients will respond positively to physiotherapy treatment. The purpose of this systematic review was to identify which prognostic factors are associated with the outcome of physiotherapy in the management of musculoskeletal shoulder pain.
Methods A comprehensive search was undertaken of Ovid Medline, EMBASE, CINAHL and AMED (from inception to January 2013). Prospective studies of participants with shoulder pain receiving physiotherapy which investigated the association between baseline prognostic factors and change in pain and function over time were included. Study selection, data extraction and appraisal of study quality were undertaken by two independent assessors. Quality criteria were selected from previously published guidelines to form a checklist of 24 items. The study protocol was prospectively registered onto the International Prospective Register of Systematic Reviews.
Results A total of 5023 titles were retrieved and screened for eligibility, 154 articles were assessed as full text and 16 met the inclusion criteria: 11 cohort studies, 3 randomised controlled trials and 2 controlled trials. Results were presented for the 9 studies meeting 13 or more of the 24 quality criteria. Clinical and statistical heterogeneity resulted in qualitative synthesis rather than meta-analysis. Three studies demonstrated that high functional disability at baseline was associated with poor functional outcome (p ≤ 0.05). Four studies demonstrated a significant association (p ≤ 0.05) between longer duration of shoulder pain and poorer outcome. Three studies, demonstrated a significant association (p ≤ 0.05) between increasing age and poorer function; three studies demonstrated no association (p > 0.05).
Conclusion Associations between prognostic factors and outcome were often inconsistent between studies. This may be due to clinical heterogeneity or type II errors. Only two baseline prognostic factors demonstrated a consistent association with outcome in two or more studies; duration of shoulder pain and baseline function. Prior to developing a predictive model for the outcome of physiotherapy treatment for shoulder pain, a large adequately powered prospective cohort study is required in which a broad range of prognostic factors are incorporated.

Background

Shoulder pain has a lifetime prevalence of one in three[1] and is the third most common musculoskeletal condition presenting in primary care.[2] However just 50% of people referred to primary care with first episode shoulder pain show complete recovery within six months, rising to only sixty percent after one year.[3]
Shoulder pain is one of the most common musculoskeletal disorders in the working population.[4] In 2011–2012, for the first time in Great Britain, the prevalence of work related upper limb disorders exceeded those of low back pain.[5]
The most effective treatment for musculoskeletal shoulder pain is not known. Reports indicate that up to one third of patients referred to physiotherapy musculsoskeletal outpatient services have shoulder pain.[6] However clear indicators of who will and will not respond favourably to physiotherapy treatment is currently unavailable. When physiotherapy is unsuccessful, other interventions are often considered. However for some patients, the time spent in an unsuccessful course of physiotherapy may delay referral along another, possibly more appropriate pathway. This increases the likelihood of chronic pain and reduces the effectiveness of future interventions.[5]
The exact cost of shoulder pain to healthcare and the economy is unclear. Studies in the Netherlands[7]and Sweden[8] have demonstrated that 12[7] to 22[8] percent of patients who visit primary care with shoulder pain incur between 74 and 91 percent of the total cost respectively; a relatively small percentage of patients incur a high percentage of the cost. This suggests that for some patients there may be a more effective and efficient management pathway for the resolution of shoulder pain. Between 47[7] and 84 percent[8] of the total incurred cost is related to sickness absence. These same studies demonstrated that physiotherapy accounted for between 37 percent[7] and 60 percent[8] of the mean total healthcare cost. Those patients that used direct access to physiotherapy had lower healthcare and overall costs to the economy.[8] This comparatively low cost, non-invasive resource is therefore an obvious choice as a first line treatment for shoulder pain. However, a greater knowledge of prognostic factors in terms of who is likely to respond to physiotherapy and who will not is vital for patients, healthcare professionals and commissioners and ensures effective and efficient use of limited resources. Referral to physiotherapy for patients who respond favourably will be of considerable benefit. However for those patients who do not respond favourably to physiotherapy, delayed referral along a more effective pathway may be costly. A review of previous research has suggested that a range of biopsychosocial factors are related to outcome following General Practitioner management of shoulder pain.[9] The objective of this systematic review was to identify which prognostic factors are associated with the outcome from physiotherapy treatment for musculoskeletal shoulder pain. Primary outcomes of interest were functional recovery and pain over any time period.

Methods

A systematic review was undertaken. The study protocol was published in advance and may be viewed on the International prospective register of systematic reviews (PROSPERO) (Submitted 21 December 2011, Registration number CRD42011001719,http://www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD42011001719).
PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analysis)[10] guidelines were followed.

Search Strategy

Medline, EMBASE, CINAHL and AMED were searched via Ovid using the NHS electronic library from inception to January 2013 using medical subject headings (MeSH), text terms and Boolean operators (RC). The full Medline search strategy is presented in Additional file 1. Search terms were adapted for the other databases. No language limits were applied. Reference lists of eligible publications were hand searched.

Study Selection

Two independent reviewers (RC and DS, RC and HD) evaluated all retrieved titles, and abstracts if required, against the pre-defined eligibility criteria. All potentially eligible publications were retrieved in full text and independently evaluated by two reviewers (RC and DS; RC and HD).
To be included in this review study participants had to have received physiotherapy for the management of musculoskeletal shoulder pain. Reports had to be published, at least in part, in a peer reviewed journal.

Study Design

Prospective studies, of the following designs were included: i) Longitudinal cohort studies ii) controlled trials which carried out a subgroup analysis relating outcome in one or more arm of physiotherapy treatment to baseline variables and iii) controlled trials in which two or more groups of subjects, different at baseline, received the same physiotherapy treatment/package.
Controlled trials in which two or more groups of participants received (i) different forms of management, not all of which were physiotherapy, and (ii) prognostic factors were presented for all participants, such that prognostic factors for physiotherapy were not differentiated from those for other treatment group(s), were not included. Studies in which retrospective collection of prognostic factors took place were not included.

Participants

Studies could include participants of any age, with musculoskeletal shoulder pain of any duration. Studies in which more than 20% of participants presented post operatively, post fracture or traumatic dislocation or with pathologies or syndromes which referred directly to the shoulder from other regions were excluded. Studies that included anatomical regions in addition to the shoulder but did not report results for the shoulder as a distinct anatomical region were excluded.

Physiotherapy Interventions

Participants must have received at least one session of physiotherapy, delivered by a physiotherapist and involving some direct clinical contact. Ideally all participants should have received a full course of physiotherapy; however this was likely to exclude a high proportion of valuable studies.

Prognostic Factors

Potential prognostic factors had to be collected at baseline and had to include one or more of the following; individual participant characteristics, lifestyle, psychosocial factors, past experience and expectations of physiotherapy, shoulder symptoms and general health, signs of impairment from the objective/clinical examination, activity and participation, radiological imaging. Blood tests, surgical and arthroscopic findings are not usually undertaken prior to commencing physiotherapy and were therefore not considered as prognostic factors in this review.

Outcome Measures

Studies which included any of the following outcome measures at any time point (including time to resolution of outcome) were included; pain, functional/disability scores measured by self-administered validated questionnaires, adverse events, Constant score,[11] quality of life scores, return to work/days off work, range of shoulder movement and shoulder strength.

Data Extraction

Data from each included study were entered onto a custom designed data extraction form (Additional file 2) by two independent reviewers (RC and DS; RC and HD). The form was developed by RC, pilot tested by RC and DS on five studies, and after discussion with all reviewers, refined accordingly. The form included criteria relating to study design and setting, participant characteristics, physiotherapy treatment details, outcome measures and prognostic factors as well as factors relating to study quality and risk of bias. When more than one published paper reported results for the same group of participants all were utilized to gain information. If further clarity was required, attempts were made to contact the original authors.

Quality Assessment of External Validity, Risk of Bias, and Presentation of Results

To the authors' knowledge there is as yet no recommended validated tool for the assessment of quality in reviews of prognostic research using a variety of study designs. In addition, none of the tools identified assessed all the criteria necessary to address the objective of our review. Selection of criteria were therefore based on guidelines published by Hayden et al.,[12] Downs and Black,[13] the Newcastle Ottowa Score,[14] relevant PEDro items,[15] criteria previously used by Kuijpers[9] and additional clinical items which may have presented a risk of bias or limit the transferability of findings. These criteria formed a checklist (Additional file 3), each item being referenced to their original source(s), and against which each study was independently assessed by two reviewers (RC and DS; RC and HD). Twenty four items covering 3 domains were included; transferability of findings (A, B, C, L-P), potential for bias (D-P) and reporting quality (Q-Y).

Results

Study Selection

The results of the search strategy are presented in the PRISMA flow diagram in Figure 1. A total of 16 publications were included in the final review. One study included more than one anatomical region and assessed prognostic indicators for conservative management generally rather than physiotherapy specifically.[16] One of the authors revisited study data specifically for this review and provided results for those participants with shoulder pain who had received physiotherapy [Personal communications: Palmer K and Ntani G, University of Southampton, 2012].
Figure 1.
PRISMA Flow chart outlining the literature search and study selection.

Summary Measures

Results are presented for each study and grouped according to outcome measure. Where results are presented in different formats within the same subheadings or full details omitted, this is because further details were unavailable.
Where available all statistical details of multiple regression analysis are tabulated. In view of the high number of potential prognostic factors investigated on univariate analysis and the variation in the measurement tools and categories used, full statistical details of univariate analysis are not included. Instead, to aid comparison between studies, prognostic factors which were investigated but not statistically significant within the final multiple regression analysis are listed and divided into two sections based on whether or not the probability of a random error on univariate analysis was 10% or less.
For studies that divided participants into two or more groups according to i) baseline characteristics[17,18] or ii) successful versus unsuccessful outcome,[19] mean differences plus standard deviation and/or 95% confidence intervals (CI) for each group, and if available between groups are presented. Where studies have performed accuracy statistics for a clinical prediction rule, details of the former are presented.[20,21]
In view of heterogeneity on a number of levels (study design, characteristics of shoulder pain, physiotherapy treatment, prognostic factors, outcome factors and selection of measurement tools), this review provides a best evidence synthesis rather than meta-analysis. Predictive factors demonstrated to have a statistically significant association with outcome on multiple regression analysis (or equivalent) in two or more studies are summarised.

Study Characteristics

Study design, participant and physiotherapy treatment characteristics are outlined for each study inTable 1 .
Study Design Of the 16 studies finally selected for the review, eleven were cohort studies and five were controlled trials. Three of the controlled trials randomized participants into 2 or more groups, all of whom received some form of physiotherapy;[18,22,23] two divided participants into two groups according to differences in baseline characteristics and administered the same physiotherapy treatment to both groups.[17,24]
Classification of shoulder pain Clinical eligibility criteria were provided in enough detail to allow transferability of findings to clinical practice in 11 of the 16 studies. However a common omission was clarification that somatic referred pain from the cervical spine, distinct from radiculopathy, was excluded as a source of shoulder pain; one study[21] excluded patients with nerve root signs and another excluded patients with cervical spondylosis,[24] three studies[19,22,23] stated that the cervical spine was excluded as a source of referral, but only one study[22] stated the mechanism by which this decision was made. One study purposely did not exclude participants with cervical spine pathology.[25] Five studies only included participants with adhesive capsulitis,[18,19,24–26] four studies only included participants with subacromial impingement syndrome,[20,22,23,27] one study only included participants with posterior inferior instability of the shoulder[17] and one study only included participants with a positive posterior impingement sign and the presence of a posterosuperior glenoid labral lesion on MRI.[28] One study[29]used the International Classification of Diseases (ICD-9) codes[30] to divide "musculoskeletal shoulder pain" into 8 disease categories. The authors themselves report ICD-9 codes as lacking specificity and reliability, yet rather than report comprehensive results for their full cohort, only report results for these disease specific categories. Within each sub-group of shoulder classification, no two studies used the same eligibility criteria. Five studies,[16,21,29,31,32] did not sub-categorize shoulder pain using a clinical diagnosis; all providing minimal details of eligibility criteria. However these results are transferable to the wider range of patients.
Physiotherapy Treatment The number of participants receiving physiotherapy treatment ranged from 14[22] to 5252.[31] Of the 13 (of the total of 16) studies that reported any details of physiotherapy, treatment included home exercises (n = 10), supervised exercises (n = 9), exercises (unable to determine whether supervised or at home, n = 2), manual therapy to the shoulder (n = 7), treatment applied to the spine (n = 1) and electrotherapy (n = 4). Prognostic factors and outcomes varied across studies.

Quality Assessment of External Validity, Risk of Bias, and Presentation of Results

The assessment of study quality based on the 24 items (Additional file 3) is presented in Table 2 . Over two thirds of studies identified a priori and reported baseline prognostic factors and outcome measures using standardized measurement tools, and reported percentage loss to follow up. None of the studies stated whether outcome assessors, including participants completing patient rated questionnaires, were blind to baseline prognostic variables.
Population Representation at Baseline Proportional eligibility was often stated but only four studies explicitly reported recruitment rate in proportion to those eligible and/or invited onto the study.[16,18,23,32]Research investigating areas other than shoulder pain have identified differences in baseline characteristics between potential participants who consent and do not consent.[33–35] One study[32]within this review compared demographic variables between participants and non-participants and found no difference between groups with respect to age and sex, although non-participants had a longer duration of symptoms than participants (381 v 229 days, p = 0.07). Generally baseline information for potential participants who do not consent is by definition restricted, making comparisons at best limited.
Appointment Attendance and Exercise Compliance There is evidence that treatment adherence is correlated with a better treatment outcome.[36,37] The number of participants not completing the full course of physiotherapy was either not stated or below 80% in nine of the 16 studies. One study[31]within this review investigated and demonstrated an association between good appointment attendance and better outcome (n = 5252, p < 0.001). Home exercises were prescribed in ten studies; six reported rates of compliance.[17–19,22,25,31] Two studies within this review investigated the association between home exercise compliance and outcome. Deutscher[31] demonstrated that good home exercise compliance was the joint second most predictive variable for a better outcome (p < 0.001). Tanaka[18]demonstrated a significant improvement in range of abduction and over a shorter time period for participants performing their home exercises daily in comparison to those not doing them at all (p = <0.001). A shorter time period to full improvement was also demonstrated for those who exercised daily in comparison to several times a week (P < 0.017). Tanaka was the only study to explicitly state whether participants received additional treatment to the package defined at onset.[18] Appointment attendance and compliance with home exercises should be recorded and analyzed as possible interactions when investigating the correlation between baseline prognostic factors and treatment outcome.
Presentation of results Presentation of results varied considerably. Only two studies[28,29] included a power analysis, one of which was retrospective.[29] Some studies included within the review may have suffered from a type I or type II error. A number of studies demonstrated clear trends between some prognostic factors and outcome which were not statistically significant. Seven studies[22,24–29] omitted details of random variability and measures of association between prognostic variables and outcome (or differences between prognostic groups). The material available to present in our results section for these studies was therefore minimal. In addition four of these studies did not report precise p-values so that the probability of any association (or differences between prognostic groups) being due to chance was not available if more than 5%.[22,24,27,29] None of these studies reported on more than three of nine items assessed within our quality criteria specifically for reporting results. With two exceptions,[22,29]these same studies did not report on more than half the items within our quality assessment criteria specifically selected for external validity and potential for bias ( Table 1 and Additional file 3). The limited results described within these seven studies were therefore not reported. The remaining nine studies met between 13.5 and 18 of the 24 criteria. Within the relevant subheadings, studies meeting the highest number of quality criteria are presented first.
Loss to follow up It was important to ascertain whether participants lost to follow up were a random subset of the whole or if there was a systematic difference between groups which if ignored may affect outcome.[38] In 12 of 16 studies for which it was reported, loss to follow up ranged from 0% to approximately 61%. Patients who completed and did not complete final follow up were compared on baseline characteristics in four studies.[16,23,29,31] Two of these studies only included participants who had completed physiotherapy and provided discharge data; this was from a larger group of patients attending physiotherapy whose details had been captured on an electronic database at the start of physiotherapy.[29,31] The high loss to follow up in these two studies is probably reflective of this mechanism of participant selection. The 43–53% (depending on outcome) of patients in Sindhu et al's study who did not complete physiotherapy and/or were lost to follow up at discharge, and therefore not selected for the study, were significantly different (p < 0.05) from those completing physiotherapy and available for follow up at discharge.[29] This was in terms of age, geographic region, pain intensity and function at intake; directional details are not provided. Patients not completing physiotherapy and lost to follow up for a range of conditions in addition to the shoulder in Deutscher et al's[31] study (n was approximately 61%) were more likely to have a history of 90 or more days of pain and more co-morbidities than those completing physiotherapy and not lost to follow up (p < 0.001). Of the 10 participants lost to follow up in Engebretson et al's study,[23] 80% (n = 8) were not working at baseline in comparison with 25% (n = 24) who completed the one year follow up. Participants lost to follow up were slightly older (57 versus 49 years) and had a higher mean SPADI score at baseline (56 versus 49) compared with the study group as a whole. Ryall[16] reported no significant differences (p > 0.05) in age, gender, somatising tendency, and scores for anxiety, depression, hypochondriasis and health beliefs for those available and not available to follow up. How much the results of these studies reflect the profile of participants lost to follow up in similar studies cannot be gauged.

Results From Individual Studies

Prognostic factors were reported for six different outcome categories. A number of individual studies investigated over 15 prognostic factors.[16,21,23,31,32] For each outcome measure, results of studies meeting the highest number of quality assessment criteria will be presented first. Predictive factors found to have a significant association with any of these outcomes (on multiple regression analysis or equivalent) in two or more of the studies will be summarised in the section following.
Patient-rated Functional Outcome Five of the nine studies within this review for which results for patient rated functional outcomes are reported, used a total of seven different questionnaires, none of which were used by more than one study.
One study, meeting 18 of our 24 quality assessment criteria, investigated the potential predictive factor of approximately 16 baseline characteristics on the shoulder pain and disability index (SPADI)[39] at one year follow up.[23] Univariate linear regression identified 11 possible predictors (p < 0.1), only three of which were retained in the final backward multiple regression model (Additional file 4). Lower education, previous shoulder pain and high baseline SPADI predicted poor outcome and accounted for 30% of the variance in the final SPADI score at one year.
One study[31] meeting 16 of our quality assessment criteria investigated the association of approximately 22 baseline characteristics with the computerized adaptive test (FT-CAT)[40] at discharge. Only statistically significant results were presented in their multiple regression analysis (Additional file 5). In this table ß is "the coefficient that represents the amount of expected change in discharge [FT-CAT] given a 1-unit change in the value of the variable, given that all other variables in the model are held constant".[31] These factors accounted for 30% of the variance in FT-CAT at discharge; a negative beta value (ß) is associated with a poor outcome and a positive beta value (ß) is associated with a better outcome.
One study[17] meeting 15.5 of our quality assessment criteria divided participants with posterior inferior instability of the shoulder (n = 81) into those with (n = 33) and without a painful jerk test (n = 48).[41] This test involves stabilising the scapula and concurrently applying an axial force along the humerus whilst the shoulder is placed in 90 degrees abduction and internally rotated. The arm is then horizontally adducted whilst maintaining the axial load. A clunk is indicative of the humeral head sliding off the back of the glenoid and is the criteria for a positive test, a second clunk may be observed as the arm is returned the start position and the humeral head relocates.[17] Clinically and statistically significant improvements (Mann–Whitney U test, p < 0.001) were demonstrated in functional status measured by the i) Rowe Score for Instability,[42] ii) University of California-Los Angeles Shoulder Scale (UCLA)[43]and iii) modified American Shoulder and Elbow Surgeons Shoulder Index (ASES)[44] for participants with a painless compared with a painful jerk test following a 6 month rehabilitation programme (Additional file 6).
Another study[19] meeting 15.5 of our quality assessment criteria divided participants into improvers and non-improvers based upon a positive change of more than 20% on the Flexilevel Scale of Shoulder Function (FLEX-SF)[45] over a 3 month rehabilitation period. Two of three movements of the shoulder complex, detectable by clinical examination (rather than laboratory testing) were significantly different between groups and were included within a clinical prediction model; humeral elevation > 97° and external rotation > 39° at baseline were associated with successful treatment (Additional file 7).
One study[32] meeting 14 of our quality assessment criteria investigated the association between approximately 24 baseline characteristics and i) final Disability of the Arm, Shoulder and Hand (DASH) scores[46] and ii) change in DASH scores 12 weeks after commencing physiotherapy. Twenty one factors were significant (<10% probability of chance) on univariate analysis and advanced to the final multiple regression models (Additional file 8). There is some inconsistency of reporting; the authors' narrative summary states that being female is a predictor of greater disability at discharge, yet the statistical presentation of results suggests the opposite; that being female is a predictor of lower DASH score (i.e. better function) at discharge. Similarly higher pain intensity and previous shoulder surgery appear to be statistically predictive of deterioration yet are reported as predictors of improvement. Only one of five predictive factors, younger age, was common to both outcomes, and predicted a better outcome. This highlights that seemingly similar outcomes can have associations with very different predictive factors.[32]
Global Impression of Change (GROC) Two studies meeting 18[21] and 16[20] of our quality assessment criteria investigated whether treatment success, based on a score of +4 on the 15 point Patient Global Rating of Change (GROC)[47] was associated with approximately 27 and approximately 12 baseline measures respectively. Following logistic regression analysis, Mintken[21] included five factors within a clinical prediction rule developed to identify the patients most likely to improve after 1–2 treatments of cervico-thoracic manipulation. Three of these factors (duration of shoulder pain, range of shoulder flexion and internal rotation) were also investigated in a smaller study by Hung;[20] no association with successful treatment (P ≥ 0.3) was demonstrated (Additional file 9). Hung[20] associated successful treatment with reduced strength of the humeral external rotators (p = 0.076), serratus anterior (p = 0.040) and lower function, indicated by lower FLEX-SF scores[45] (p < 0.00005) at baseline. In their final model only the latter two were included together with an additional measurement from laboratory testing. These factors were not investigated by Mintken.[21]
Pain Two studies[16,17] meeting 15.5 and 13.5 of our quality assessment criteria investigated the association of potential predictive factors with the outcome of shoulder pain following physiotherapy treatment. Kim et al.[17] demonstrated that the group of participants with a painless rather than painful jerk test[41] had significantly lower mean pain scores at follow up (Mann–Whitney U test, p < 0.001) (Additional file 10). Ryall et al.[16] investigated the potential predictive factor of approximately 17 baseline characteristics on the prevalence of three aspects of "same site pain" at 12 month follow up. Whilst the odds of continuing pain in terms of point prevalence was higher for a number of baseline characteristics, with only two exceptions, (Additional file 10), confidence intervals passed through one. The lack of statistical significance may reflect the lower power of this sub group analysis specifically undertaken for this review [Personal communications: Palmer K and Ntani G, University of Southampton, 2012].
Work Two studies, both of which met 18 of our 24 quality assessment criteria, investigated baseline characteristics as predictors of whether or not participants were working either 48 hours after the first physiotherapy treatment[21] or at one year follow up[23] (Additional file 11). Within the 48 hour treatment period, high fear avoidance beliefs specific to work, measured by the Fear Avoidance Beliefs Questionnaire – Work Beliefs[48] were strongly predictive of missing work, although lower scores were not predictive of remaining at work.[21] Fear avoidance specific to physical activity, measured by the Fear Avoidance Beliefs Questionnaire – Physical Activity[48] was not associated with outcome. At one year follow up Engebreston[23] identified a number of possible predictors on univariate linear regression, only two of which were included in the final forward logistic regression model. Higher education and better self-reported health status were predictive of working at one year.
Range of Movement One study meeting 14 of our quality assessment criteria investigated the potential predictive factor of four baseline characteristics on i) improved range of active abduction and ii) point in time at which improved range had plateaued for more than one month, in 120 participants with adhesive capsulitis.[18] Statistically significant predictors of improved range of abduction included younger age, shorter duration of symptoms and hand dominance (Additional file 12). No difference between categories was detected in time for improvement to plateau.
Adverse Outcomes Two studies reported adverse outcomes,[21,23] one over a year[23] and the other over a maximum two week follow up;[21,49] the later treatment included spinal manipulation. No adverse events were observed. Four studies reported how many participants were worse[17,21,23,32,49] or remained the same[17] during treatment[23] or at follow up[17,21,32,49] (Additional file 13). Getting worse with physiotherapy was clearly related to a painful jerk test in Kim et al's study.[17] However in the few studies for which it was reported, less than 10 per cent of participants worsened with physiotherapy.

Summary of Results

Some predictive factors were found to have a significant association with outcome from physiotherapy treatment (on multiple regression analysis or equivalent) in two or more of the studies described above. For these predictive factors, the results are synthesised and summarised below.
Function at Baseline Three studies investigated the association of functional disability at baseline with functional outcome. Results were consistently significant in the same direction on multiple regression analysis; high baseline disability was associated with poor functional outcome,[23,32] low baseline disability was associated with a better functional outcome.[31] Two studies investigated the association of baseline disability with successful treatment defined by the global rating of change (GROC). Results were inconsistent; one study did not detect any difference between successful and unsuccessful treatment groups,[21] the other associated higher baseline disability with better outcome.[20]
Duration of Shoulder Symptoms Six studies investigated the association between duration of shoulder symptoms and outcome. Longer duration of symptoms was consistently associated with a poorer outcome[18,31] and shorter duration of symptoms with a better outcome.[21,31,32] Engebretson demonstrated a similar pattern on uni-variate but not multi regression analysis[23] and although statistically insignificant, visual inspection of Hung et al's[20] results indicates a similar trend. The latter were the two studies reported within this review which only included participants with subacromial impingement syndrome.
Age Six studies investigated the association between age and outcome. Two studies demonstrated an association between increasing age and poorer functional outcome on multiple regression analysis[31,32]and one study demonstrated that older age groups experienced less improvement in range of shoulder abduction.[18] No association between age and outcome was demonstrated in the remaining three studies.[16,20,23]
Range of Shoulder Flexion The association between baseline range of movement and outcome was less consistent. Two studies identified range of shoulder flexion at baseline as a predictor of outcome; one study demonstrated that greater restriction of flexion was predictive of a good outcome (GROC),[21]the other demonstrated that less restriction of flexion was predictive of a better functional outcome.[19]Two studies identified an association on uni-variate but not multivariate analysis[23,32] and one study reported no association.[20] The latter included the two studies reported within this review which only included participants with subacromial 

Results

Study Selection

The results of the search strategy are presented in the PRISMA flow diagram in Figure 1. A total of 16 publications were included in the final review. One study included more than one anatomical region and assessed prognostic indicators for conservative management generally rather than physiotherapy specifically.[16] One of the authors revisited study data specifically for this review and provided results for those participants with shoulder pain who had received physiotherapy [Personal communications: Palmer K and Ntani G, University of Southampton, 2012].
Figure 1.
PRISMA Flow chart outlining the literature search and study selection.

Summary Measures

Results are presented for each study and grouped according to outcome measure. Where results are presented in different formats within the same subheadings or full details omitted, this is because further details were unavailable.
Where available all statistical details of multiple regression analysis are tabulated. In view of the high number of potential prognostic factors investigated on univariate analysis and the variation in the measurement tools and categories used, full statistical details of univariate analysis are not included. Instead, to aid comparison between studies, prognostic factors which were investigated but not statistically significant within the final multiple regression analysis are listed and divided into two sections based on whether or not the probability of a random error on univariate analysis was 10% or less.
For studies that divided participants into two or more groups according to i) baseline characteristics[17,18] or ii) successful versus unsuccessful outcome,[19] mean differences plus standard deviation and/or 95% confidence intervals (CI) for each group, and if available between groups are presented. Where studies have performed accuracy statistics for a clinical prediction rule, details of the former are presented.[20,21]
In view of heterogeneity on a number of levels (study design, characteristics of shoulder pain, physiotherapy treatment, prognostic factors, outcome factors and selection of measurement tools), this review provides a best evidence synthesis rather than meta-analysis. Predictive factors demonstrated to have a statistically significant association with outcome on multiple regression analysis (or equivalent) in two or more studies are summarised.

Study Characteristics

Study design, participant and physiotherapy treatment characteristics are outlined for each study inTable 1 .
Study Design Of the 16 studies finally selected for the review, eleven were cohort studies and five were controlled trials. Three of the controlled trials randomized participants into 2 or more groups, all of whom received some form of physiotherapy;[18,22,23] two divided participants into two groups according to differences in baseline characteristics and administered the same physiotherapy treatment to both groups.[17,24]
Classification of shoulder pain Clinical eligibility criteria were provided in enough detail to allow transferability of findings to clinical practice in 11 of the 16 studies. However a common omission was clarification that somatic referred pain from the cervical spine, distinct from radiculopathy, was excluded as a source of shoulder pain; one study[21] excluded patients with nerve root signs and another excluded patients with cervical spondylosis,[24] three studies[19,22,23] stated that the cervical spine was excluded as a source of referral, but only one study[22] stated the mechanism by which this decision was made. One study purposely did not exclude participants with cervical spine pathology.[25] Five studies only included participants with adhesive capsulitis,[18,19,24–26] four studies only included participants with subacromial impingement syndrome,[20,22,23,27] one study only included participants with posterior inferior instability of the shoulder[17] and one study only included participants with a positive posterior impingement sign and the presence of a posterosuperior glenoid labral lesion on MRI.[28] One study[29]used the International Classification of Diseases (ICD-9) codes[30] to divide "musculoskeletal shoulder pain" into 8 disease categories. The authors themselves report ICD-9 codes as lacking specificity and reliability, yet rather than report comprehensive results for their full cohort, only report results for these disease specific categories. Within each sub-group of shoulder classification, no two studies used the same eligibility criteria. Five studies,[16,21,29,31,32] did not sub-categorize shoulder pain using a clinical diagnosis; all providing minimal details of eligibility criteria. However these results are transferable to the wider range of patients.
Physiotherapy Treatment The number of participants receiving physiotherapy treatment ranged from 14[22] to 5252.[31] Of the 13 (of the total of 16) studies that reported any details of physiotherapy, treatment included home exercises (n = 10), supervised exercises (n = 9), exercises (unable to determine whether supervised or at home, n = 2), manual therapy to the shoulder (n = 7), treatment applied to the spine (n = 1) and electrotherapy (n = 4). Prognostic factors and outcomes varied across studies.

Quality Assessment of External Validity, Risk of Bias, and Presentation of Results

The assessment of study quality based on the 24 items (Additional file 3) is presented in Table 2 . Over two thirds of studies identified a priori and reported baseline prognostic factors and outcome measures using standardized measurement tools, and reported percentage loss to follow up. None of the studies stated whether outcome assessors, including participants completing patient rated questionnaires, were blind to baseline prognostic variables.
Population Representation at Baseline Proportional eligibility was often stated but only four studies explicitly reported recruitment rate in proportion to those eligible and/or invited onto the study.[16,18,23,32]Research investigating areas other than shoulder pain have identified differences in baseline characteristics between potential participants who consent and do not consent.[33–35] One study[32]within this review compared demographic variables between participants and non-participants and found no difference between groups with respect to age and sex, although non-participants had a longer duration of symptoms than participants (381 v 229 days, p = 0.07). Generally baseline information for potential participants who do not consent is by definition restricted, making comparisons at best limited.
Appointment Attendance and Exercise Compliance There is evidence that treatment adherence is correlated with a better treatment outcome.[36,37] The number of participants not completing the full course of physiotherapy was either not stated or below 80% in nine of the 16 studies. One study[31]within this review investigated and demonstrated an association between good appointment attendance and better outcome (n = 5252, p < 0.001). Home exercises were prescribed in ten studies; six reported rates of compliance.[17–19,22,25,31] Two studies within this review investigated the association between home exercise compliance and outcome. Deutscher[31] demonstrated that good home exercise compliance was the joint second most predictive variable for a better outcome (p < 0.001). Tanaka[18]demonstrated a significant improvement in range of abduction and over a shorter time period for participants performing their home exercises daily in comparison to those not doing them at all (p = <0.001). A shorter time period to full improvement was also demonstrated for those who exercised daily in comparison to several times a week (P < 0.017). Tanaka was the only study to explicitly state whether participants received additional treatment to the package defined at onset.[18] Appointment attendance and compliance with home exercises should be recorded and analyzed as possible interactions when investigating the correlation between baseline prognostic factors and treatment outcome.
Presentation of results Presentation of results varied considerably. Only two studies[28,29] included a power analysis, one of which was retrospective.[29] Some studies included within the review may have suffered from a type I or type II error. A number of studies demonstrated clear trends between some prognostic factors and outcome which were not statistically significant. Seven studies[22,24–29] omitted details of random variability and measures of association between prognostic variables and outcome (or differences between prognostic groups). The material available to present in our results section for these studies was therefore minimal. In addition four of these studies did not report precise p-values so that the probability of any association (or differences between prognostic groups) being due to chance was not available if more than 5%.[22,24,27,29] None of these studies reported on more than three of nine items assessed within our quality criteria specifically for reporting results. With two exceptions,[22,29]these same studies did not report on more than half the items within our quality assessment criteria specifically selected for external validity and potential for bias ( Table 1 and Additional file 3). The limited results described within these seven studies were therefore not reported. The remaining nine studies met between 13.5 and 18 of the 24 criteria. Within the relevant subheadings, studies meeting the highest number of quality criteria are presented first.
Loss to follow up It was important to ascertain whether participants lost to follow up were a random subset of the whole or if there was a systematic difference between groups which if ignored may affect outcome.[38] In 12 of 16 studies for which it was reported, loss to follow up ranged from 0% to approximately 61%. Patients who completed and did not complete final follow up were compared on baseline characteristics in four studies.[16,23,29,31] Two of these studies only included participants who had completed physiotherapy and provided discharge data; this was from a larger group of patients attending physiotherapy whose details had been captured on an electronic database at the start of physiotherapy.[29,31] The high loss to follow up in these two studies is probably reflective of this mechanism of participant selection. The 43–53% (depending on outcome) of patients in Sindhu et al's study who did not complete physiotherapy and/or were lost to follow up at discharge, and therefore not selected for the study, were significantly different (p < 0.05) from those completing physiotherapy and available for follow up at discharge.[29] This was in terms of age, geographic region, pain intensity and function at intake; directional details are not provided. Patients not completing physiotherapy and lost to follow up for a range of conditions in addition to the shoulder in Deutscher et al's[31] study (n was approximately 61%) were more likely to have a history of 90 or more days of pain and more co-morbidities than those completing physiotherapy and not lost to follow up (p < 0.001). Of the 10 participants lost to follow up in Engebretson et al's study,[23] 80% (n = 8) were not working at baseline in comparison with 25% (n = 24) who completed the one year follow up. Participants lost to follow up were slightly older (57 versus 49 years) and had a higher mean SPADI score at baseline (56 versus 49) compared with the study group as a whole. Ryall[16] reported no significant differences (p > 0.05) in age, gender, somatising tendency, and scores for anxiety, depression, hypochondriasis and health beliefs for those available and not available to follow up. How much the results of these studies reflect the profile of participants lost to follow up in similar studies cannot be gauged.

Results From Individual Studies

Prognostic factors were reported for six different outcome categories. A number of individual studies investigated over 15 prognostic factors.[16,21,23,31,32] For each outcome measure, results of studies meeting the highest number of quality assessment criteria will be presented first. Predictive factors found to have a significant association with any of these outcomes (on multiple regression analysis or equivalent) in two or more of the studies will be summarised in the section following.
Patient-rated Functional Outcome Five of the nine studies within this review for which results for patient rated functional outcomes are reported, used a total of seven different questionnaires, none of which were used by more than one study.
One study, meeting 18 of our 24 quality assessment criteria, investigated the potential predictive factor of approximately 16 baseline characteristics on the shoulder pain and disability index (SPADI)[39] at one year follow up.[23] Univariate linear regression identified 11 possible predictors (p < 0.1), only three of which were retained in the final backward multiple regression model (Additional file 4). Lower education, previous shoulder pain and high baseline SPADI predicted poor outcome and accounted for 30% of the variance in the final SPADI score at one year.
One study[31] meeting 16 of our quality assessment criteria investigated the association of approximately 22 baseline characteristics with the computerized adaptive test (FT-CAT)[40] at discharge. Only statistically significant results were presented in their multiple regression analysis (Additional file 5). In this table ß is "the coefficient that represents the amount of expected change in discharge [FT-CAT] given a 1-unit change in the value of the variable, given that all other variables in the model are held constant".[31] These factors accounted for 30% of the variance in FT-CAT at discharge; a negative beta value (ß) is associated with a poor outcome and a positive beta value (ß) is associated with a better outcome.
One study[17] meeting 15.5 of our quality assessment criteria divided participants with posterior inferior instability of the shoulder (n = 81) into those with (n = 33) and without a painful jerk test (n = 48).[41] This test involves stabilising the scapula and concurrently applying an axial force along the humerus whilst the shoulder is placed in 90 degrees abduction and internally rotated. The arm is then horizontally adducted whilst maintaining the axial load. A clunk is indicative of the humeral head sliding off the back of the glenoid and is the criteria for a positive test, a second clunk may be observed as the arm is returned the start position and the humeral head relocates.[17] Clinically and statistically significant improvements (Mann–Whitney U test, p < 0.001) were demonstrated in functional status measured by the i) Rowe Score for Instability,[42] ii) University of California-Los Angeles Shoulder Scale (UCLA)[43]and iii) modified American Shoulder and Elbow Surgeons Shoulder Index (ASES)[44] for participants with a painless compared with a painful jerk test following a 6 month rehabilitation programme (Additional file 6).
Another study[19] meeting 15.5 of our quality assessment criteria divided participants into improvers and non-improvers based upon a positive change of more than 20% on the Flexilevel Scale of Shoulder Function (FLEX-SF)[45] over a 3 month rehabilitation period. Two of three movements of the shoulder complex, detectable by clinical examination (rather than laboratory testing) were significantly different between groups and were included within a clinical prediction model; humeral elevation > 97° and external rotation > 39° at baseline were associated with successful treatment (Additional file 7).
One study[32] meeting 14 of our quality assessment criteria investigated the association between approximately 24 baseline characteristics and i) final Disability of the Arm, Shoulder and Hand (DASH) scores[46] and ii) change in DASH scores 12 weeks after commencing physiotherapy. Twenty one factors were significant (<10% probability of chance) on univariate analysis and advanced to the final multiple regression models (Additional file 8). There is some inconsistency of reporting; the authors' narrative summary states that being female is a predictor of greater disability at discharge, yet the statistical presentation of results suggests the opposite; that being female is a predictor of lower DASH score (i.e. better function) at discharge. Similarly higher pain intensity and previous shoulder surgery appear to be statistically predictive of deterioration yet are reported as predictors of improvement. Only one of five predictive factors, younger age, was common to both outcomes, and predicted a better outcome. This highlights that seemingly similar outcomes can have associations with very different predictive factors.[32]
Global Impression of Change (GROC) Two studies meeting 18[21] and 16[20] of our quality assessment criteria investigated whether treatment success, based on a score of +4 on the 15 point Patient Global Rating of Change (GROC)[47] was associated with approximately 27 and approximately 12 baseline measures respectively. Following logistic regression analysis, Mintken[21] included five factors within a clinical prediction rule developed to identify the patients most likely to improve after 1–2 treatments of cervico-thoracic manipulation. Three of these factors (duration of shoulder pain, range of shoulder flexion and internal rotation) were also investigated in a smaller study by Hung;[20] no association with successful treatment (P ≥ 0.3) was demonstrated (Additional file 9). Hung[20] associated successful treatment with reduced strength of the humeral external rotators (p = 0.076), serratus anterior (p = 0.040) and lower function, indicated by lower FLEX-SF scores[45] (p < 0.00005) at baseline. In their final model only the latter two were included together with an additional measurement from laboratory testing. These factors were not investigated by Mintken.[21]
Pain Two studies[16,17] meeting 15.5 and 13.5 of our quality assessment criteria investigated the association of potential predictive factors with the outcome of shoulder pain following physiotherapy treatment. Kim et al.[17] demonstrated that the group of participants with a painless rather than painful jerk test[41] had significantly lower mean pain scores at follow up (Mann–Whitney U test, p < 0.001) (Additional file 10). Ryall et al.[16] investigated the potential predictive factor of approximately 17 baseline characteristics on the prevalence of three aspects of "same site pain" at 12 month follow up. Whilst the odds of continuing pain in terms of point prevalence was higher for a number of baseline characteristics, with only two exceptions, (Additional file 10), confidence intervals passed through one. The lack of statistical significance may reflect the lower power of this sub group analysis specifically undertaken for this review [Personal communications: Palmer K and Ntani G, University of Southampton, 2012].
Work Two studies, both of which met 18 of our 24 quality assessment criteria, investigated baseline characteristics as predictors of whether or not participants were working either 48 hours after the first physiotherapy treatment[21] or at one year follow up[23] (Additional file 11). Within the 48 hour treatment period, high fear avoidance beliefs specific to work, measured by the Fear Avoidance Beliefs Questionnaire – Work Beliefs[48] were strongly predictive of missing work, although lower scores were not predictive of remaining at work.[21] Fear avoidance specific to physical activity, measured by the Fear Avoidance Beliefs Questionnaire – Physical Activity[48] was not associated with outcome. At one year follow up Engebreston[23] identified a number of possible predictors on univariate linear regression, only two of which were included in the final forward logistic regression model. Higher education and better self-reported health status were predictive of working at one year.
Range of Movement One study meeting 14 of our quality assessment criteria investigated the potential predictive factor of four baseline characteristics on i) improved range of active abduction and ii) point in time at which improved range had plateaued for more than one month, in 120 participants with adhesive capsulitis.[18] Statistically significant predictors of improved range of abduction included younger age, shorter duration of symptoms and hand dominance (Additional file 12). No difference between categories was detected in time for improvement to plateau.
Adverse Outcomes Two studies reported adverse outcomes,[21,23] one over a year[23] and the other over a maximum two week follow up;[21,49] the later treatment included spinal manipulation. No adverse events were observed. Four studies reported how many participants were worse[17,21,23,32,49] or remained the same[17] during treatment[23] or at follow up[17,21,32,49] (Additional file 13). Getting worse with physiotherapy was clearly related to a painful jerk test in Kim et al's study.[17] However in the few studies for which it was reported, less than 10 per cent of participants worsened with physiotherapy.

Summary of Results

Some predictive factors were found to have a significant association with outcome from physiotherapy treatment (on multiple regression analysis or equivalent) in two or more of the studies described above. For these predictive factors, the results are synthesised and summarised below.
Function at Baseline Three studies investigated the association of functional disability at baseline with functional outcome. Results were consistently significant in the same direction on multiple regression analysis; high baseline disability was associated with poor functional outcome,[23,32] low baseline disability was associated with a better functional outcome.[31] Two studies investigated the association of baseline disability with successful treatment defined by the global rating of change (GROC). Results were inconsistent; one study did not detect any difference between successful and unsuccessful treatment groups,[21] the other associated higher baseline disability with better outcome.[20]
Duration of Shoulder Symptoms Six studies investigated the association between duration of shoulder symptoms and outcome. Longer duration of symptoms was consistently associated with a poorer outcome[18,31] and shorter duration of symptoms with a better outcome.[21,31,32] Engebretson demonstrated a similar pattern on uni-variate but not multi regression analysis[23] and although statistically insignificant, visual inspection of Hung et al's[20] results indicates a similar trend. The latter were the two studies reported within this review which only included participants with subacromial impingement syndrome.
Age Six studies investigated the association between age and outcome. Two studies demonstrated an association between increasing age and poorer functional outcome on multiple regression analysis[31,32]and one study demonstrated that older age groups experienced less improvement in range of shoulder abduction.[18] No association between age and outcome was demonstrated in the remaining three studies.[16,20,23]
Range of Shoulder Flexion The association between baseline range of movement and outcome was less consistent. Two studies identified range of shoulder flexion at baseline as a predictor of outcome; one study demonstrated that greater restriction of flexion was predictive of a good outcome (GROC),[21]the other demonstrated that less restriction of flexion was predictive of a better functional outcome.[19]Two studies identified an association on uni-variate but not multivariate analysis[23,32] and one study reported no association.[20] The latter included the two studies reported within this review which only included participants with subacromial impingement syndrome.   

Discussion

There was consistent evidence from two or more studies meeting 13 or more of our 24 quality assessment criteria, of an association between the following predictive factors and outcome i) higher disability at baseline was predictive of a higher disability at follow up or low disability at baseline was associated with a lower disability at follow up ii) longer duration of shoulder symptoms was associated with poorer outcome or shorter duration of symptoms with better outcome, iii) increasing age was associated with poorer outcome. Restricted range of shoulder flexion predicted outcome in two studies; however one study demonstrated that higher shoulder flexion at baseline (>97°) was predictive of a good outcome and another demonstrated that lower shoulder flexion at baseline (<127°) was predictive of a good outcome.
For many potential prognostic factors results were inconsistent between studies. Clinical heterogeneity in terms of the presentation of shoulder pain, treatment type, dose, duration, attendance, compliance, as well as differences in follow up period and measurement tools may account for some of the variability of results and their significance. Physiotherapy attendance rates and adherence to prescribed exercise is important as this review seeks to identify prognostic factors specific to physiotherapy treatment rather than simply referral to physiotherapy and for non-attenders, the natural course of shoulder pain.
Patients present to physiotherapy with shoulder pain arising from a number of potential sources. Studies which included patients with upper quadrant pain but did not clearly state the shoulder as the source of symptoms were excluded from this review. However eligibility criteria differed considerably between studies and it was not always clear that the cervical spine was explicitly cleared as a potential source of symptoms. Based on the patient's history and physiotherapist's clinical examination shoulder pain is sometimes categorised using a number of diagnostic labels. Studies which sub categorise shoulder pain may detect prognostic factors which may not be detected in a more generic patient group.
Within this review two sub-groups of shoulder pain contained more than one study; adhesive capsulitis and subacromial impingement syndrome. Within these subgroups, no two studies used the same eligibility criteria. This lack of standardisation or discrepancy in labelling shoulder pain has been reported previously.[50–52] Differing exclusion as well as inclusion criteria can contribute to heterogeneity between studies seemingly investigating the same subgroup of patients with shoulder pain and hamper effective comparison.[50,51]
Meaningful sub group analysis according to any criteria was limited by heterogeneity in other areas. Studies within both the adhesive capsulitis and subacromial impingement syndrome groups used different outcome measures. In addition the two studies for which results were reported for participants with adhesive capsulitis investigated different prognostic factors,[18,19] rendering comparisons impossible. On final multivariate analysis duration of symptoms and range of shoulder flexion did not demonstrate any statistical association with outcome for the two studies reporting results for participants with subacromial impingement syndrome.[20,23] However a trend was observed between duration of symptoms and outcome in these latter two studies and reflects the findings of the review overall, and the findings in the three studies[16,31,32] reported, which included participants with a variety of shoulder presentations including subacromial impingement.
There is evidence of poor inter-rater reliability for the sub-classification of shoulder pain.[53,54] As stated previously the majority of studies within this review clearly outlined their eligibility criteria. However earlier reviews have demonstrated that most clinical tests used for the sub-classification of shoulder pain demonstrate poor diagnostic accuracy.[55,56] A number of studies used radiological findings as eligibility criteria.[17,18,24,25,27,28] However in the physiotherapy clinic radiological findings are not always clinically indicated and if present, details may not be accessible to physiotherapists at the first appointment. In addition there is often a poor correlation between structural pathology and the clinical presentation of shoulder pain.[57–63] Some researchers have suggested that musculoskeletal shoulder pain should not be sub-categorised according to structural pathology.[50,52,64] The four largest studies in this review did not sub-categorise shoulder pain,[16,29,31,32] two stating that this was an active decision based upon the poor reliability of shoulder classification.[16,32]
In the field of musculoskeletal low back pain many clinicians base decision making about initial management options based upon prognostic indicators[65] rather than diagnostic classifications. This in part builds upon similar observations for the poor reliability of structural diagnoses and their poor correlation with clinical presentation.[66]
To our knowledge, this is the first systematic review of the current literature on potential predictive factors specific to the outcome of physiotherapy management for musculoskeletal shoulder pain. A previous systematic review of cohort studies investigated potential prognostic factors irrespective of management type. Two of their 16 studies included physiotherapy management; these were excluded from our review because they were retrospective analyses.[67,68] Overall their review reported strong evidence that aged 45–54 years in occupational settings and high pain intensity in primary care were strong predictors of a poor prognosis. Age was not a factor considered in relation to work status for the studies in our review. However one[32] of two[20,32] studies demonstrated a strong correlation between high pain intensity at baseline and poor prognosis. Within a primary care setting these same researchers reported some evidence that longer duration of shoulder symptoms and high disability at baseline were predictors of poor prognosis. These were the strongest predictors within our review of outcome specific to physiotherapy treatment.

Potential Biases in the Review Process

The main search for this review was restricted to four databases. Studies presenting interesting or significant findings are more likely to be published than those with non-significant findings.[33]Conference proceedings per se were not included within our search, however there is evidence that higher quality conference abstracts are more likely to be published as a full article than lower quality abstracts[34] and the inclusion of unpublished potentially poorer quality material in reviews may actually be a source of bias.[35] Searching the grey literature is important when the objective is avoidance of biasing review results towards significant reports of prognostic factors. However this review is the first of its kind and our intention was to gather evidence of the most likely significant predictive factors of outcome for further investigation. Studies were therefore required which were presented in enough detail to carry out a quality assessment appraisal and of a standard appropriate for peer reviewed publication.
Although a number of validated quality assessment tools are available, none covered all the criteria important for a study addressing our objectives. Criteria were therefore selected from a number of sources. Whilst our method of quality assessment was repeatable within our team, reproducibility has not been tested externally and the number of criteria met should not be confused with a scoring system. Seven studies provided minimum reporting of results specific to our objective and were therefore omitted from our results section due to the quality of reporting but also on a pragmatic basis.
Implications for Future Research Large adequately powered prospective studies are required and should include as a minimum, investigation of the association between baseline disability, age, duration of shoulder symptoms and range of shoulder flexion with functional outcome. Inclusion of the additional significant prognostic factors identified on multiple regression analysis or equivalent by the nine studies presented within this review should also be considered given the possibility of a type II error in some of these studies. Eligibility criteria should apply to somatic as well as radicular referral of pain from the cervical spine as the primary source of symptoms. Given the common omission of any detail of concurrent pain from other sources in the affected upper quadrant, it would be appropriate to include this as a possible predictive factor. Given the poor reliability of shoulder classification systems based on diagnostic labels and the poor correlation between structural pathology and clinical presentation, eligibility criteria should be based upon patient characteristics and reproducible baseline data. Exercise adherence, treatment attendance and whether or not participants have completed the full course physiotherapy should be recorded as these have been demonstrated to have a significant effect on treatment outcome. Comparisons should be made between participants available and not available for follow up and the results should inform the final analysis. Information about patients who are eligible and have been invited to take part in a study but have not consented will be absent or limited at best. However the proportion of eligible patients who were asked and agreed compared to those who did not agree to take part should be stated and for those factors on which data may be available, differences stated. As well as predictors for participants who will improve with physiotherapy, analysis should include predictors for those whose shoulder symptoms may worsen during physiotherapy.

Conclusion

Associations between prognostic factors and outcome were often inconsistent between studies. This may be reflective of a type II error or heterogeneity on a number of levels including treatment selection, adherence or outcome measure. Only two baseline prognostic factors consistently demonstrated anassociation with outcome in two or more studies; duration of shoulder pain and baseline function.
Decisions based on prognostic factors may be clinically more useful given the poor reliability of shoulder sub-categorization based on diagnostic labels. Prior to developing a predictive model for the outcome of physiotherapy treatment for shoulder pain, a large adequately powered cohort study is required in which a broad range of prognostic factors are incorporated.