Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Within-session test-retest reliability of pressure pain threshold and mechanical temporal summation in healthy subjects

  • Catherine Mailloux ,

    Contributed equally to this work with: Catherine Mailloux, Hugo Massé-Alarie

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Writing – original draft

    Affiliation Centre Interdisciplinaire de Recherche en Réadaptation et Intégration Sociale (CIRRIS), Université Laval, Quebec, Canada

  • Louis-David Beaulieu ,

    Roles Conceptualization, Methodology, Supervision, Writing – review & editing

    ‡ These authors contributed also equally to this work.

    Affiliation BioNR Research Lab, Université du Québec à Chicoutimi, Chicoutimi, Canada

  • Timothy H. Wideman ,

    Roles Conceptualization, Methodology, Supervision, Writing – review & editing

    ‡ These authors contributed also equally to this work.

    Affiliation School of Physical and Occupational Therapy, McGill University, Montreal, Canada

  • Hugo Massé-Alarie

    Contributed equally to this work with: Catherine Mailloux, Hugo Massé-Alarie

    Roles Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    hugo.masse-alarie@fmed.ulaval.ca

    Affiliation Centre Interdisciplinaire de Recherche en Réadaptation et Intégration Sociale (CIRRIS), Université Laval, Quebec, Canada

Abstract

Objective

To determine the absolute and relative intra-rater within-session test-retest reliability of pressure pain threshold (PPT) and mechanical temporal summation of pain (TSP) at the low back and the forearm in healthy participants and to test the influence of the number and sequence of measurements on reliability metrics.

Methods

In 24 participants, three PPT and TSP measures were assessed at four sites (2 at the low back, 2 at the forearm) in two blocks of measurements separated by 20 minutes. The standard error of measurement, the minimal detectable change (MDC) and the intraclass correlation coefficient (ICC) were investigated for five different sequences of measurements (e.g. measurement 1, 1–2, 1-2-3).

Results

The MDC for the group (MDCgr) for PPT ranged from 28.71 to 50.56 kPa across the sites tested, whereas MDCgr for TSP varied from 0.33 to 0.57 out of 10 (numeric scale). Almost all ICC showed an excellent relative reliability (between 0.80 and 0.97), except when only the first measurement was considered (moderate). Although minimal differences in absolute PPT reliability were present between the different sequences, in general, using only the first measurement increase measurement error. Three TSP measures reduced the measurement error.

Discussion

We established that two measurements of PPT and three of TSP reduced the measurement error and demonstrated an excellent relative reliability. Our results could be used in future pain research to confirm the presence of true hypo/hyperalgesia for paradigms such as conditioned pain modulation or exercise-induced hypoalgesia, indicated by a change exceeding the measurement variability.

1. Introduction

The experience of pain is highly variable and influenced by biological, psychological and social factors [1]. One essential feature of the experience of pain is the capacity of the nervous system to modulate pain through the interplay of multiple areas and mechanisms [2]. The complexity of pain modulation mechanisms makes it difficult to evaluate. In humans, psychophysical experimental paradigms have been developed in research as proxy of pain modulation. Conditioned pain modulation (CPM) and exercise-induced hypoalgesia (EIH), for instance, are used to approximate the efficacy of pain inhibition [35]. Recently, these paradigms have been increasingly used in research to determine if individuals with chronic pain have an altered pain inhibition response. For example, some studies reported an alteration of pain inhibition in individuals with chronic low back pain (CLBP) using CPM [69] and EIH [10,11].

CPM refers to the decrease in pain sensitivity after the administration of a painful stimulus on a remote body part (e.g. cold water immersion of the hand [12]). EIH represents the decrease in pain sensitivity that occurs following an isometric, resistance or aerobic exercise [13]. Opioidergic, serotonergic and noradrenergic systems contribute to both CPM [2,1417] and EIH [1824]. Experimental protocols developed to assess EIH and CPM take generally part in a within-session design, including pain sensitivity measures collected before and after a conditioning stimulus, and they are determined by the change in pain sensitivity between test and retest. The pressure pain threshold (PPT) is the most used pain sensitivity measure for these paradigms [13,25,26]. PPT is a static measure of pain and would reflect the basal state of pain perception [4,27]. Temporal summation of pain (TSP) has also been used as a pain sensitivity measure to quantify CPM/EIH [2830] and rather constitutes a dynamic measure of pain sensitivity that refers to the perception of increasing pain in response to stable (continuous or repeated) noxious stimuli [3133]. To ensure the validity of these paradigms, it is essential that pain sensitivity measures chosen as testing stimulus (PPT and TSP) present small measurement error. This permits to define a minimal level of true change, exceeding the measurement variability, to confirm the presence of hyper- or hypoalgesia following a conditioning stimulus. In addition, reliability needs to be considered in respect of the site tested. Therefore, with a view of studying CLBP population, measuring variability of PPT and TSP at the low back area and at a remote site is essential to provide a general view of pain sensitivity and pain modulation functioning. Some studies documented PPT reliability at the low back [3437], but the sample size tested in these studies remains limited. For TSP, no study measured the reliability at the lower back area. Moreover, the number of measures needed to reach an acceptable level of test-retest reliability for TSP is not known. This is essential to ensure the applicability and the feasibility of measuring PPT and TSP in research and clinical practice, and to reduce the number of painful stimulus needed to obtain reliable and valid data.

The objectives are to 1) determine the absolute and relative intra-rater within-session test-retest reliability of PPT and mechanical TSP at the low back and at the forearm in healthy participants and 2) test the influence of the number and sequence of measurements for PPT and TSP on reliability metrics. Our study focused on absolute reliability that will provide the minimal level to reach a real change in pain sensitivity, which is essential to interpret the presence/absence of hyper/hypoalgesia using pain modulation paradigms such as CPM and EIH.

2. Materials and methods

2.1 Participants

Twenty-four healthy subjects (12 females | 12 males; 28.3 ± 11.0 years old) aged between 18 and 65 years old were recruited between December 2018 and June 2019 (see Table 1 for descriptive statistics). Sampling method was by convenience with emails sent to Laval University community (60,000 individuals comprising students and employees) and by solicitation at the research center. Determination of selection criteria was based on consensus statement by the EUROPAIN and NEUROPAIN consortia for quantitative sensory testing (QST)-based studies to ensure validity of the data [38]. Exclusion criteria were: 1) pain lasting three months or longer, located anywhere in the body, 2) severe health problem (such as cancer, major rheumatoid, cardiac, neurologic or psychiatric disease), 3) low back pain lasting more than 7 days in the last 6 months, 4) consultation with a health professional because of low back pain in the last 6 months, 5) current bilateral wrist or forearm pain and 6) current pregnancy and/or gave birth in the last year. Subjects who were currently taking medication like antidepressants, opioids, neuroleptics, anticonvulsive drugs or steroids were also excluded. This study was approved by local ethic committee (CIUSSS-Capitale Nationale, project #2019–1547) and all participants provided informed written consent prior experimentation. The body mass index was calculated for each participant and the Global Physical Activity Questionnaire (GPAQ) was self-administered to rate the level of physical activity [39].

2.2 Study design

All measurements were collected in a single session at the research center by the same rater (CM). The rater was a physical therapist with five years of clinical experience who had undertaken a QST training with experienced researchers.

PPT and TSP were tested in two blocks, lasting approximately 15 minutes each, separated by a pause of 20 minutes (Fig 1). The 20 min-pause was established in consideration of the time interval duration between blocks of testing when assessing EIH and CPM (e.g. time interval between the measure of PPT pre to post conditioning stimulus) [26,4043]. This was done to ensure that PPT and TSP reliability were tested in a design reproducing CPM/EIH protocols.

thumbnail
Fig 1. Study procedure and sequences of measurements analyzed for PPT.

https://doi.org/10.1371/journal.pone.0245278.g001

2.3 Quantitative sensory testing (QST)

PPT and TSP measures were collected in the same environment (same room with stable conditions regarding light, temperature and noise) and QST testing order within each testing block was randomized. Side was randomized independently of dominance and measures. If one side (wrist or low back) presented impairments (not related to pain) other than exclusion criteria, the opposite side was tested (1 participant had a limited wrist extension due to a scaphoid fracture 10 years ago [non-painful] and 1 participant had a low back lipoma). QST measures were first tested on the calf or the thigh for familiarization with the procedure.

2.3.1 Pressure pain threshold.

PPT were assessed with a handheld digital algometer (1-cm2 probe–FPIX, Wagner Instruments, Greenwich, CT, USA) for 22 participants and with a handheld dial algometer (1-cm2 probe–FPK, Wagner Instruments, Greenwich, CT, USA) for 2 participants. Since FPK algometer does not allow to measure between 0 and 1 kg/cm2, FPIX algometer was used for the remaining participants. To determine if the use of FPK algometer affected the reliability, we performed the analysis with and without the first two participants. Considering both analyses provided similar results, we included the first two participants in our analysis. A rate of ~0.5 kg/cm2/s was applied, at two back and two upper limb sites: i) lumbar erector spinae (LES), 2–3 cm laterally to L4/L5 ii) S1 spinous process, iii) dorsal aspect of the wrist on capitatum (WD) and iv) wrist flexors muscles (WF), 10 cm distally from medial humeral epicondyle on a line from medial epicondyle to styloid process of ulna, over the muscles bulk of the wrist flexors. All sites were located and marked before testing. Assessment of the back and the forearm were done in a prone lying position with a pillow under the abdomen and in sitting position with arm supported, respectively. Standardized verbal instructions were given based on German Research Network on Neuropathic Pain (DFNS) recommendations: “This is a test of your sensitivity to deep pain. Now I will press this pressure meter against your back/wrist/forearm and will gradually increase the pressure. Please say ´Now´ as soon as the pressure starts to be painful. Remember that this is not a pain tolerance test, it is a pain threshold test” [32]. Instructions were freely translated into French. PPT were measured three times with one-minute break between measurements. To reduce the variability of the measurement and reduce the impact of potential outlier, a fourth measure was taken if the following two conditions were met: 1) the standard deviation (SD) of the three measures was larger than 1 kg/cm2 and 2) one measure was outside the mean±SD interval (Fig 1). The same criteria were applied to the four values to determine if three or four data were kept for analysis. Because of the accuracy limit of the FPIX algometer, PPT data above 11.5 kg/cm2 were considered as 11.5 kg/cm2 (2 trials at S1). For the FPK, PPT between 0 and 1 kg/cm2, were considered as 1 kg/cm2 (4 trials at WD– 2 first participants). This may have caused a small overestimation of the reliability of the PPT measurement at S1 and WD. PPT were transformed from kg/cm2 to kPa (1 kg/cm2 = 98.07 kPa) to facilitate comparisons with the PPT literature.

2.3.2 Temporal summation of pain.

TSP was tested using a pinprick stimulator (256 mN, MRC Systems GmbH, Heidelberg, Germany) and a series of ten punctuate stimuli at 1 Hz over (i) L4/L5 interspinous process line and (ii) hand dorsum. The 1-Hz frequency was displayed to the rater by a light metronome out of participant’s sight. TSP was considered as the difference between the highest numeric rating scale (NRS) (anchored between 0 [no pain] and 10 [worst imaginable pain]) pain through the ten trials and the pain after a single stimulus [27,44]. Standardized verbal instructions were given based on DFNS recommendations: “This is a test of repeated pinpricks. I will now apply a single pinprick. Please give a number between ´0´ and ´10´ for the pain of that stimulus. I will now apply a series of 10 pinpricks in a row. Please give a number between ´0´ and ´10´ for the highest pain over that whole series of 10 pinpricks. This procedure will be repeated 3 times, with a 30-s break between” [32].

2.4 Statistical analysis

SPSS software was used for statistical analysis (IBM SPSS 25 for Mac, Armonk, NY, USA). First, Shapiro-Wilk’s test and visual appreciation of histogram and Normal Q-Q Plots were used to assess normality of PPT and TSP distributions. Second, presence of outliers was assessed by inspection of a boxplot for values greater than 1.5 box-lengths from the edge of the box (representing a 99.3% confidence interval). Third, homoscedasticity i.e. absence of correlation between the size of error and the magnitude of the observed scores [45,46] was assessed. This was done by visual inspection of Bland-Altman plots of differences between the two values collected at test and retest for each participant by the means of these two values [47,48]. The correlation (R2) between the absolute differences and the means values was also calculated for each variable by a linear regression analysis [49]. In presence of the two following conditions: 1) R2 values greater than 0.1 [49] and 2) p<0.05 for the linear regression model, heteroscedasticity was considered present. Natural logarithmic transformation was applied to data in the presence of non-normality or heteroscedasticity and normality/homoscedasticity were re-tested to ensure that previous assumptions were met.

2.4.1 Absolute reliability.

Absolute reliability was assessed using the standard error of the measurement (SEMeas) [50]. SEMeas represents the within-subject variation and is defined as “the standard deviation of errors of measurement that is associated with the test scores for a specific group of test takers” [50,51]. SEMeas was estimated as the square root of the mean square error (WMS) calculated by a one-way ANOVA (ANOVARM) applied on test and retest measurements () [45,52].

To evaluate measures responsiveness, the minimal detectable change for each individual (MDCind) was estimated for PPT and TSP with the following formula: [53]. MDC for the group (MDCgr) was also computed: , where ‘‘n” represents the sample size [5456]. The values of SEMeas, MDCind and MDCgr were also presented as percentage of pooled means (average of test and retest measurements) [50,52] designated as %SEMeas, %MDCind and %MDCgr. %SEMeas and %MDC allow comparisons between studies and facilitate interpretation, since they represent a dimension-less/unit-less measure [50,52]. If SEMeas and MDC were obtained from a log-transformation of the data, an antilog (i.e. exponential (ex)) was applied, resulting in a multiplication/division (×/÷) factor to the test data. This factor indicates the random error component of the mean bias [45]. A constant was added to the distributions before the log-transformation to obtain positive and no-null value.

2.4.2 Relative reliability.

The relative reliability was quantified using the intraclass correlation coefficient (ICC). ICC’s model 2 was used since each subject was assessed by the same rater and the rater represents the population of possible raters [57]. The ICC was calculated using the Fisher’s test of a two-way ANOVARM. ″Random effects, absolute agreement, average measures″ (2,k) model, for sequences including more than one measurement, and “single-measure” model (2,1) for the sequence of one measurement only were used (see next section for the description of the different sequences of measurements). ICC permits to determine systematic bias between test and retest (difference between means) [46] and ICC 95% confidence interval was calculated to represent ICC variability. An ICC > 0.80 was considered “excellent”, between 0.61 and 0.80; “good”, between 0.41 and 0.60; “moderate”, between 0.21 and 0.40; “acceptable” and 0–0.20 represents “poor” reliability [58].

Considering that the ICC is influenced by the variability of the measurement, the coefficient of variation (CV) of the data was also calculated using the formula: , where SD is the standard deviation of test-retest data and is the mean of test-retest data. CV measures the relative spread of data and helps to consider ICC results in function of its variability [50,52].

2.4.3 Comparison of different sequences of measurements.

To determine how many measurements provided the best reliability metrics, five sequences of measurements were considered for PPT (Fig 1): first measurement only (1), the first two measurements (1–2), the first three measurements (1-2-3), the first three or the first fourth when applicable (1-2-3(4)) and the last two trials including the fourth when applicable (2-3-4). The sequence 2-3-4 was included because some studies suggest that the first measure of PPT tends to be higher [36,37,59], but this remains debated [34]. For TSP, three sequences were evaluated: 3 measurements (1-2-3), the first two (1–2) and the first only (1). For all sequences, relative and absolute within-subject short-term test-retest reliabilities were assessed.

3. Results

3.1 Pain sensitivity outcomes

3.1.1 Missing data.

One participant TSP data were excluded of analysis because of a technical problem with the pinprick during the data collection (n = 23). For PPT, twenty-four participants were included in the analysis.

3.1.2 Assumptions validation and data transformation.

Two distributions required natural logarithmic (ln) transformation because of non-normality: 1) TSP at the back at retest for sequence 1–2 (p = 0.02) and 2) TSP at the hand at test for sequence 1 (p = 0.03). Data were distributed normally after the transformation. One or two outliers were detected for 20 and 3 out of 52 data distributions, respectively. To determine if outliers affected the reliability analysis, we calculated the reliability with and without outliers. Considering both analyses provided similar reliability that was not affecting the interpretation of the results, outliers were included. All data distributions were homoscedastic.

3.1.3 Raw PPT and TSP data.

Group means and SD for PPT and TSP at test and retest are reported in Table 2. The results of the one-way ANOVARM showed that there was no statistically significant difference between test and retest occasions for PPT and TSP at all sites for all sequences (all p>0.076 –Table 2).

thumbnail
Table 2. Means of PPT and TSP at test and retest occasions for each sequence of measurements.

https://doi.org/10.1371/journal.pone.0245278.t002

3.2 Reliability analysis

3.2.1 Absolute reliability for PPT and TSP.

Considering that (i) MDCgr is proportional to SEMeas and MDCind, and that (ii) %MDCgr, %SEMeas and %MDCind results present the same pattern between sequences, only MDCgr and %MDCgr will be described to facilitate the presentation of the results. Each metric is detailed in Table 3. In general, differences in the measures of absolute reliability for PPT are small across the sequences and seem to be dependent on the site tested (Fig 2). For S1, MDCgr varied from 31.10 kPa for sequence 1–2 to 40.71 kPa for sequence 2-3-4 (%MDCgr: 5.91% (1–2) to 7.76% (2-3-4)). For LES, MDCgr ranged from 46.13 kPa (1–2) to 50.56 kPa (1) (%MDCgr: 8.40% (1–2) to 9.14% (1)). For WD, MDCgr varied from 28.71 kPa (1-2-3) to 31.84 kPa (1) (%MDCgr: 7.87% (1-2-3) to 8.58% (2-3-4)). For WF, MDCgr ranged from 28.87 kPa (1-2-3(4)) to 38.52 kPa (1) (%MDCgr: 7.99% (1-2-3(4)) to 10.19% (1)). Sequences 1-2-3(4) and 1-2-3 presented similar results at WD and WF and seem to present a smaller measurement error compared to other sequences regarding absolute reliability. At the low back (S1 and LES), sequence 1–2 presented the lowest MDCgr compared to other sequences although the difference remains minimal. In terms of the different sites, S1 presented the lowest measurement error compared to the three other sites that were very similar. Again, these differences were small.

thumbnail
Fig 2. %MDCgr for PPT at all sites for the five sequences of measurements tested.

https://doi.org/10.1371/journal.pone.0245278.g002

thumbnail
Table 3. Within-session reliability parameters for PPT for each site using different sequences of measurements (n = 24).

https://doi.org/10.1371/journal.pone.0245278.t003

For TSP, absolute reliability is depicted in Table 4. MDCgr at the hand varied from 0.33 for sequence 1-2-3 to 0.40/10 for sequence 1–2 (%MDCgr: 13.58% (1-2-3) to 17.29% (1–2)). At the back, MDCgr ranged from 0.46 for sequence 1-2-3 to 0.57/10 for sequence 1 (%MDCgr: 23.52% (1-2-3) to 29.22% (1)). An increase in the number of measurements for TSP leads to an increased absolute reliability. %MDCgr was lower at the hand compared to the back. However, although we presented the non-transformed data in Table 4, comparisons with other sequences must be done with caution. For log-transformation data, calculation of %SEMeas and %MDC was not applicable because transformed results do not correspond to the same ratio scale [54,55]. Antilog factor (×/÷) derived from SEMeas and MDCind were also calculated and presented in Table 4.

thumbnail
Table 4. Within-session reliability parameters for TSP for each site using different sequences of measurements (n = 23).

https://doi.org/10.1371/journal.pone.0245278.t004

3.2.2 Relative reliability for PPT and TSP.

For PPT, almost all ICC values are above 0.80, between 0.80 and 0.97, denoting that almost all sequences presented an excellent relative reliability, except sequence 1 for PPT at LES and WF, for which an acceptable reliability was observed with ICC of 0.79 and 0.73 respectively (Table 3). Sequence 1 presented lower ICCs for all sites, associated to large confidence intervals at LES, WD and WF. As illustrated in Fig 3, ICCs of sequence 1–2 are larger at S1 and LES than others. ICCs for WD and WF were similar among sequences, except for sequence 1 that presents with lower ICCs.

thumbnail
Fig 3. ICC for PPT for all sites by each sequence of measurements.

https://doi.org/10.1371/journal.pone.0245278.g003

For TSP, results are depicted in Table 4. Sequences 1–2 and 1-2-3 showed an excellent reliability with ICC ranged between 0.81 and 0.91 and narrow confidence intervals. Visually, larger ICC confidence intervals were present at the back compared to the hand. As illustrated in Fig 4, there was a large difference between sequence 1 and 1-2-3 for ICC at the two sites, with sequence 1-2-3 presenting larger ICCs compared to sequence 1.

thumbnail
Fig 4. ICC for TSP at the hand and at the back by each sequence of measurements.

https://doi.org/10.1371/journal.pone.0245278.g004

4. Discussion

The objectives of this study were to determine the reliability of PPT and TSP and to determine the sequence of measurements providing the best reliability metrics. The design of the study was established to measure the minimal change exceeding the variability of PPT and TSP technique within a session (20 minutes apart). Our results could be used in future pain research to confirm the presence of ‘true’ hypo- or hyperalgesia (i.e. change over normal variability) for paradigms such as CPM or EIH. In addition, we established that two measures of PPT and three measures of TSP reduced the measurement error and demonstrated an excellent relative reliability.

4.1 PPT reliability

Four studies investigated the intra-rater reliability of PPT at the low back and at the wrist in healthy participants [3437], but only two of them [34,37] evaluated the within-session test-retest reliability of PPT at the low back. The two other studies based their reliability analysis on the comparison of two or three consecutive measurements from a single block of measurements. Therefore, their results are not suitable in a context of pain modulation evaluation such as CPM/EIH and cannot be directly compared to our results. Balaguier et al. [34] evaluated absolute reliability at the low back over two sessions separated by one hour and observed a MDCind ranging from 94 to 253 kPa. These results are similar to our findings for the low back that varied from 152.37 to 247.71 kPa. In addition, our ICCs for PPT are consistent with the literature. One study reported excellent relative reliability (ICCs: 0.86 to 0.99) at 14 anatomical locations at the low back and another study reported good to excellent reliability (ICC: 0.40 to 0.99) for three sites over lumbar erector spinae (L1, L3, L5) [37]. Previous results must be interpreted with caution because of the limited sample size (n = 5 [37] and n = 15 [34]).

4.2 TSP reliability

Only three studies investigated mechanical TSP reliability [6062] but none assessed the low back area. In two studies [60,62], 5 consecutive series of ten punctuate stimuli were used instead of two blocks of 3 series tested 20-min apart. One study [60] observed a poor reliability for mechanical TSP at the face, hand and foot and another study reported poor to good intra-rater reliability at the tongue, face and gingiva [62]. The third study investigated the test-retest reliability of mechanical TSP at the hand within a two-weeks periods and observed an acceptable reliability in younger adults and a moderate to good reliability in older adults [61]. These findings are inconsistent with the excellent reliability observed for mechanical TSP at back and hand sites in our study. This discrepancy can be caused by differences in sites tested, study designs and calculation of TSP. In two of the above-mentioned studies [60,62], TSP reliability was analyzed using the wind-up ratio calculation (WUR = the mean pain ratings of trains divided by the mean pain rating to single stimulus). WUR cannot be calculated if the single pinprick stimulus is rated as non-painful (NRS = 0/10, meaning a null denominator), leading to an undefined division and limiting the number of patients included in the analysis (e.g. up to ~27% participants excluded of the analysis [60]). We considered that the subtraction method used for TSP also reflect the facilitation process of nociceptive inputs but with the advantage to include more participants into the analysis.

Antilog factors were calculated for two TSP sequences considering that a log-transformation was applied in presence of normality violation. For example, an antilog factor for the MDCind of ×/÷1.60 for method 1–2 at the back was obtained. This factor is applied on the test score to obtain lower/higher limits representing the level below/above which the retest has to change to be considered as a real change. For example, if a TSP of 2/10 is measured at test, it implies that a TSP ≥ 3.20/10 (2×1.60) or TSP ≤ 1.25/10 (2÷1.60) at retest for the same participant are considered as a true change [45,47,54]. Some methodological limitations derived from the application of the antilog factor, for example, the antilog factor could not be used if TSP at test was rated 0/10. Also, considering that TSP represents ordinal data, interpretation appears to be less precise. Therefore, antilog factor calculation constitutes an alternative way to analyze reliability in presence of normality violation, but its concrete and clinical applicability remains challenging with TSP data.

4.3 Comparison of sequences of measurements

For PPT, there was substantial heterogeneity in terms of the sequences of measurements producing the least measurement error, which differs in function of the site tested. Absolute reliability with sequence 1–2 was superior for S1 and LES, sequence 1-2-3 for WD and 1-2-3-(4) for WF. However, these differences were minimal across sequences. In addition, all the sequences tested for PPT demonstrated an excellent relative reliability at all sites, except sequence 1 which showed a good relative reliability at LES and WF. Our results are generally in accordance with current literature suggesting that the mean of two or three consecutive measurements are enough to provide reliable PPT [34,36]. However, one study proposed that using only the first measure constitutes an excellent method [34] in opposition to our findings. Some studies observed that the first PPT measurement tends to be significantly higher compared to the subsequent ones and recommended to exclude it from the analysis to reduce the variability [36,37,59]. Our results suggest that excluding the first measurement (2-3-4) did not reduce the measurement error. However, when the first measure was used alone, usually, reliability worsen, meaning that adding only one more measurement seemed sufficient to stabilize the measure. Disparity regarding the first measure can be caused by the fact that some studies did not include familiarization trial prior to PPT evaluation [36,37] or evaluated PPT at different sites than those evaluated in the present study (i.e. biceps brachii) [59]. Thus, we recommend to measure PPT at least twice in each block of testing to improve reliability.

For TSP, our results suggest that an increase in the number of measurements leads to an increased absolute reliability. Two and three measurements showed an excellent relative reliability at the hand and the back, in contrast a single measurement demonstrated a good relative reliability at the hand and moderate at the low back. These results may have been influenced by the sample heterogeneity, as represented by high coefficients of variation for TSP. Although some research groups recommend to measure 5 series of ten punctuate stimuli [32,60,62], our results suggest that a sequence of 3 series separated by 30 seconds provides an excellent reliability. Taking 3 measures instead of 5 may reduce the duration of the test and minimize skin irritation due to repeated pinprick stimulations. Therefore, we suggest to take at least three TSP measurement to improve the TSP reliability.

4.4 Applicability of absolute reliability in research

The MDC obtained in our study provides specific threshold to determine if change in PPT or TSP in a within-session design exceeds the variability of these measures for a group (MDCgr) or an individual (MDCind). In research, MDC can serve as a cut-off value to determine if the pain sensitivity change (as measured by PPT or TSP) exceeds measurement error and can be considered as true hypo- or hyperalgesia following the conditioning stimulus in pain modulation paradigm (e.g. CPM and EIH). For instance, a previous study investigating EIH reported a significant within-session PPT change of 29.78 kPa at the low back in healthy controls following a repetitive lifting task [11]. We determined that the lowest MDCgr at the low back was 46.13 kPa. Thus, a change inferior to this value cannot be considered as a real change despite statistical significance.

Also, the response to conditioning stimulus (e.g. CPM) has been used to stratify healthy and chronic pain participants in function of the change in pain sensitivity (decrease vs. increase) [9,63,64] suggesting a bias toward inhibitory or facilitatory descending control (e.g. anti- vs pro-nociceptive). However, this stratification is usually done without considering the measurement error. This may result in stratifying participants for whom changes remain within the measurement error (i.e. pain sensitivity did not change following conditioned stimulus). Our MDCind (or %MDCind) could be used in future studies using a similar design (CPM/EIH) as cut-off values to subgroup participants. For example, an individual change of ~ 35% is necessary to exceed the %MDCind at S1. Considering that this change is large, it questions the validity of stratification methods using PPT.

4.5 Methodological considerations

Our study did not conduct the reliability analysis in different groups for each sequence and this could have underestimated PPT and TSP measurement variability. Also, considering that this study was conducted in a healthy pain-free population, current results are not generalizable to other populations. It is acknowledged that factors such as race/ethnicity, sex (e.g. phases of menstrual cycle [65]) and age could influence pain sensitivity [66,67], but considering that we measured reliability of pain sensitivity in a single session rather than pain sensitivity per se, the effect of these factors on our results remain limited. Future studies must be conducted in chronic pain participants, such as chronic low back pain.

4.6 Summary

This study observes that PPT and TSP at back and hand sites have small error of measurement and an excellent relative reliability using a within-session test-retest design. Our results also suggest that at least two PPT and three TSP consecutive measures are needed to optimize reliability and these recommendations may be used in future research and in clinical practice. Our results also provide cut-off values that may be used with pain modulation paradigms such as CPM and EIH to confirm that changes following conditioning stimulus exceed PPT and TSP measurement error (true hypo-/hyperalgesia). Further studies are warranted to investigate the within-session test-retest reliability of these parameters in chronic pain populations.

References

  1. 1. Moseley GL, Butler DS. Explain pain supercharged: The clinician’s manual. First ed: NOIgroup Publications; 2017. 225 p.
  2. 2. Ossipov MH, Dussor GO, Porreca F. Central modulation of pain. J Clin Invest. 2010;120(11):3779–87. pmid:21041960
  3. 3. Starkweather AR, Heineman A, Storey S, Rubia G, Lyon DE, Greenspan J, et al. Methods to measure peripheral and central sensitization using quantitative sensory testing: A focus on individuals with low back pain. Appl Nurs Res. 2016;29:237–41. pmid:26856520
  4. 4. Arendt-Nielsen L, Yarnitsky D. Experimental and clinical applications of quantitative sensory testing applied to skin, muscles and viscera. J Pain. 2009;10(6):556–72. pmid:19380256
  5. 5. Graven-Nielsen T, Arendt-Nielsen L. Assessment of mechanisms in localized and widespread musculoskeletal pain. Nat Rev Rheumatol. 2010;6(10):599–606. pmid:20664523
  6. 6. Neelapala YVR, Bhagat M, Frey-Law L. Conditioned Pain Modulation in Chronic Low Back Pain: A Systematic Review of Literature. Clin J Pain. 2020;36(2):135–41. pmid:31764164
  7. 7. Correa JB, Costa LO, de Oliveira NT, Sluka KA, Liebano RE. Central sensitization and changes in conditioned pain modulation in people with chronic nonspecific low back pain: a case-control study. Exp Brain Res. 2015;233(8):2391–9. pmid:25963754
  8. 8. Aoyagi K, He J, Nicol AL, Clauw DJ, Kluding PM, Jernigan S, et al. A Subgroup of Chronic Low Back Pain Patients with Central Sensitization. Clin J Pain. 2019. pmid:31408011
  9. 9. Rabey M, Poon C, Wray J, Thamajaree C, East R, Slater H. Pro-nociceptive and anti-nociceptive effects of a conditioned pain modulation protocol in participants with chronic low back pain and healthy control subjects. Man Ther. 2015;20(6):763–8. pmid:25795107
  10. 10. Falla D, Gizzi L, Tschapek M, Erlenwein J, Petzke F. Reduced task-induced variations in the distribution of activity across back muscle regions in individuals with low back pain. Pain. 2014;155(5):944–53. pmid:24502841
  11. 11. Kuithan P, Heneghan NR, Rushton A, Sanderson A, Falla D. Lack of Exercise-Induced Hypoalgesia to Repetitive Back Movement in People with Chronic Low Back Pain. Pain Pract. 2019. pmid:31187932
  12. 12. Yarnitsky D, Bouhassira D, Drewes AM, Fillingim RB, Granot M, Hansson P, et al. Recommendations on practice of conditioned pain modulation (CPM) testing. European Journal of Pain. 2015;19(6):805–6. pmid:25330039
  13. 13. Naugle KM, Fillingim RB, Riley JL, 3rd. A meta-analytic review of the hypoalgesic effects of exercise. J Pain. 2012;13(12):1139–50. pmid:23141188
  14. 14. Sprenger C, Bingel U, Buchel C. Treating pain with pain: supraspinal mechanisms of endogenous analgesia elicited by heterotopic noxious conditioning stimulation. Pain. 2011;152(2):428–39. pmid:21196078
  15. 15. Yarnitsky D, Granot M, Granovsky Y. Pain modulation profile and pain therapy: between pro- and antinociception. Pain. 2014;155(4):663–5. pmid:24269491
  16. 16. Pud D, Granovsky Y, Yarnitsky D. The methodology of experimentally induced diffuse noxious inhibitory control (DNIC)-like effect in humans. Pain. 2009;144(1–2):16–9. pmid:19359095
  17. 17. Gebhart GF. Descending modulation of pain. Neurosci Biobehav Rev. 2004;27(8):729–37. pmid:15019423
  18. 18. Da Silva Santos R, Galdino G. Endogenous systems involved in exercise-induced analgesia. J Physiol Pharmacol. 2018;69(1):3–13. pmid:29769416
  19. 19. Rice D, Nijs J, Kosek E, Wideman T, Hasenbring MI, Koltyn K, et al. Exercise-Induced Hypoalgesia in Pain-Free and Chronic Pain Populations: State of the Art and Future Directions. J Pain. 2019. pmid:30904519
  20. 20. Sluka KA, Frey-Law L, Hoeger Bement M. Exercise-induced pain and analgesia? Underlying mechanisms and clinical translation. Pain. 2018;159 Suppl 1:S91–S7. pmid:30113953
  21. 21. Paungmali A, Joseph LH, Punturee K, Sitilertpisan P, Pirunsan U, Uthaikhup S. Immediate Effects of Core Stabilization Exercise on beta-Endorphin and Cortisol Levels Among Patients With Chronic Nonspecific Low Back Pain: A Randomized Crossover Design. J Manipulative Physiol Ther. 2018;41(3):181–8. pmid:29459120
  22. 22. Crombie KM, Brellenthin AG, Hillard CJ, Koltyn KF. Endocannabinoid and Opioid System Interactions in Exercise-Induced Hypoalgesia. Pain Med. 2018;19(1):118–23. pmid:28387833
  23. 23. Tour J, Lofgren M, Mannerkorpi K, Gerdle B, Larsson A, Palstam A, et al. Gene-to-gene interactions regulate endogenous pain modulation in fibromyalgia patients and healthy controls-antagonistic effects between opioid and serotonin-related genes. Pain. 2017;158(7):1194–203. pmid:28282362
  24. 24. Koltyn KF, Brellenthin AG, Cook DB, Sehgal N, Hillard C. Mechanisms of exercise-induced hypoalgesia. J Pain. 2014;15(12):1294–304. pmid:25261342
  25. 25. Yarnitsky D, Arendt-Nielsen L, Bouhassira D, Edwards RR, Fillingim RB, Granot M, et al. Recommendations on terminology and practice of psychophysical DNIC testing. Eur J Pain. 2010;14(4):339. pmid:20227310
  26. 26. Kennedy DL, Kemp HI, Ridout D, Yarnitsky D, Rice AS. Reliability of conditioned pain modulation: a systematic review. Pain. 2016;157(11):2410–9. pmid:27559835
  27. 27. Coronado RA, Bialosky JE, Robinsen ME, George SZ. Pain sensitivity subgroups in individuals with spine pain: potential relevance to short-term clinical outcome. Physical Therapy. 2014;94(8):1111–22. pmid:24764070
  28. 28. Nahman-Averbuch H, Yarnitsky D, Granovsky Y, Gerber E, Dagul P, Granot M. The role of stimulation parameters on the conditioned pain modulation response. Scand J Pain. 2013;4(1):10–4. pmid:29913877
  29. 29. Marchand S, Arsenault P. Spatial summation for pain perception: interaction of inhibitory and excitatory mechanisms. Pain. 2002;95:201–6. pmid:11839419
  30. 30. Tousignant-Laflamme Y, Page S, Goffaux P, Marchand S. An experimental model to measure excitatory and inhibitory pain mechanisms in humans. Brain Res. 2008;1230:73–9. pmid:18652808
  31. 31. Price DD, Hu JW, Dubner R, Gracely RH. Peripheral suppression of first pain and central summation of second pain evoked by noxious heat pulses. Pain. 1977;3:57–68. pmid:876667
  32. 32. Rolke R, Baron R, Maier C, Tolle TR, Treede RD, Beyer A, et al. Quantitative sensory testing in the German Research Network on Neuropathic Pain (DFNS): standardized protocol and reference values. Pain. 2006;123(3):231–43. pmid:16697110
  33. 33. Herrero JF, Laird JMA, Lopez Garcia JA. Wind-up of spinal cord neurones and pain sensation: much ado about something? Progress in neurobiology. 2000;61:169–203. pmid:10704997
  34. 34. Balaguier R, Madeleine P, Vuillerme N. Is One Trial Sufficient to Obtain Excellent Pressure Pain Threshold Reliability in the Low Back of Asymptomatic Individuals? A Test-Retest Study. PLoS One. 2016;11(8):e0160866. pmid:27513474
  35. 35. Waller R, Straker L, O'Sullivan P, Sterling M, Smith A. Reliability of pressure pain threshold testing in healthy pain free young adults. Scand J Pain. 2015;9(1):38–41. pmid:29911647
  36. 36. Lacourt TE, Houtveen JH, van Doornen LJP. Experimental pressure-pain assessments: Test-retest reliability, convergence and dimensionality. Scand J Pain. 2012;3(1):31–7. pmid:29913770
  37. 37. Farasyn A, Meeusen R. Pressure pain thresholds in healthy subjects: influence of physical activity, history of lower back pain factors and the use of endermology as a placebo-like treatment. Journal of Bodywork and Movement Therapies. 2003;7(1):53–61.
  38. 38. Gierthmuhlen J, Enax-Krumova EK, Attal N, Bouhassira D, Cruccu G, Finnerup NB, et al. Who is healthy? Aspects to consider when including healthy volunteers in QST—based studies-a consensus statement by the EUROPAIN and NEUROPAIN consortia. Pain. 2015;156(11):2203–11. pmid:26075963
  39. 39. Riviere F, Widad FZ, Speyer E, Erpelding ML, Escalon H, Vuillemin A. Reliability and validity of the French version of the global physical activity questionnaire. J Sport Health Sci. 2018;7(3):339–45. pmid:30356654
  40. 40. Vaegter HB, Handberg G, Emmeluth C, Graven-Nielsen T. Preoperative Hypoalgesia After Cold Pressor Test and Aerobic Exercise is Associated With Pain Relief 6 Months After Total Knee Replacement. Clin J Pain. 2017;33(6):475–84. pmid:27526332
  41. 41. Vaegter HB, Dorge DB, Schmidt KS, Jensen AH, Graven-Nielsen T. Test-Retest Reliabilty of Exercise-Induced Hypoalgesia After Aerobic Exercise. Pain Med. 2018;19(11):2212–22. pmid:29425326
  42. 42. Vaegter HB, Lyng KD, Yttereng FW, Christensen MH, Sorensen MB, Graven-Nielsen T. Exercise-Induced Hypoalgesia After Isometric Wall Squat Exercise: A Test-Retest Reliabilty Study. Pain Med. 2019;20(1):129–37. pmid:29788440
  43. 43. Gehling J, Mainka T, Vollert J, Pogatzki-Zahn EM, Maier C, Enax-Krumova EK. Short-term test-retest-reliability of conditioned pain modulation using the cold-heat-pain method in healthy subjects and its correlation to parameters of standardized quantitative sensory testing. BMC Neurol. 2016;16:125. pmid:27495743
  44. 44. Anderson RJ, Craggs JG, Bialosky JE, Bishop MD, George SZ, Staud R, et al. Temporal summation of second pain: variability in responses to a fixed protocol. Eur J Pain. 2013;17(1):67–74. pmid:22899549
  45. 45. Atkinson G, Nevill AM. Statistical methods for assessing measurement error (Reliability) in variables relevant to sports medicine. Sports Med. 1998;26(4):217–38. pmid:9820922
  46. 46. Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. Journal of Strength and Conditioning Research. 2005;19(1):231–40. pmid:15705040
  47. 47. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet. 1986:307–10. pmid:2868172
  48. 48. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. International Journal of Nursing Studies. 2010;47(8):931–6.
  49. 49. Damron LA, Dearth DJ, Hoffman RL, Clark BC. Quantification of the corticospinal silent period evoked via transcranial magnetic stimulation. J Neurosci Methods. 2008;173(1):121–8. pmid:18588914
  50. 50. Hopkins WG. Measures of reliability in sports medicine and science. Sports Med. 2000;1(30):1–15. pmid:10907753
  51. 51. Harvill LM. Standard error of measurement. Instructional Topics in Educational Measurement. 1991:33–41.
  52. 52. Lexell JE, Downham DY. How to assess the reliability of measurements in rehabilitation. Am J Phys Med Rehabil. 2005;84(9):719–23. pmid:16141752
  53. 53. Beckerman H, Roebroeck ME, Lankhorst GJ, B J.G., Bezemer PD, Verbreek ALM. Smallest real difference, a link between reproducibility and responsiveness. Quality of Life Research. 2001;10:571–8. pmid:11822790
  54. 54. Beaulieu LD, Masse-Alarie H, Ribot-Ciscar E, Schneider C. Reliability of lower limb transcranial magnetic stimulation outcomes in the ipsi- and contralesional hemispheres of adults with chronic stroke. Clin Neurophysiol. 2017;128(7):1290–8. pmid:28549277
  55. 55. Schambra HM, Ogden RT, Martinez-Hernandez IE, Lin X, Chang YB, Rahman A, et al. The reliability of repeated TMS measures in older adults and in patients with subacute and chronic stroke. Front Cell Neurosci. 2015;9:335. pmid:26388729
  56. 56. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42. pmid:17161752
  57. 57. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychological bulletin. 1979;86(2):420–8. pmid:18839484
  58. 58. Portney LG, Watkins MP. Foundations of clinical research applications to practice. 3rd ed. Upper Saddle River (New Jersey): Pearson Education Inc.; 2009. 912 p.
  59. 59. Nussbaum EL, Downes L. Reliability of clinical pressure-pain algometric measurements obtained on consecutives days. Physical Therapy. 1998;78(2).
  60. 60. Geber C, Klein T, Azad S, Birklein F, Gierthmuhlen J, Huge V, et al. Test-retest and interobserver reliability of quantitative sensory testing according to the protocol of the German Research Network on Neuropathic Pain (DFNS): a multi-centre study. Pain. 2011;152(3):548–56. pmid:21237569
  61. 61. Ohlman T, Miller L, Naugle K. (169) Comparison of Temporal Stability of Conditioned Pain Modulation and Temporal Summation of Pain in Healthy Older and Younger Adults. The Journal of Pain. 2019;20(4).
  62. 62. Pigg M, Baad-Hansen L, Svensson P, Drangsholt M, List T. Reliability of intraoral quantitative sensory testing (QST). Pain. 2010;148(2):220–6. pmid:20022428
  63. 63. Fingleton C, Smart KM, Doody CM. Exercise-induced Hypoalgesia in People With Knee Osteoarthritis With Normal and Abnormal Conditioned Pain Modulation. Clin J Pain. 2017;33(5):395–404. pmid:27518487
  64. 64. O'Neill S, Manniche C, Graven-Nielsen T, Arendt-Nielsen L. Association between a composite score of pain sensitivity and clinical parameters in low-back pain. Clin J Pain. 2014;30(10):831–8. pmid:24121529
  65. 65. Pogatzki-Zahn EM, Drescher C, Englbrecht JS, Klein T, Magerl W, Zahn PK. Progesterone relates to enhanced incisional acute pain and pinprick hyperalgesia in the luteal phase of female volunteers. Pain. 2019;160(8):1781–93. pmid:31335647
  66. 66. Ostrom C, Bair E, Maixner W, Dubner R, Fillingim RB, Ohrbach R, et al. Demographic Predictors of Pain Sensitivity: Results From the OPPERA Study. J Pain. 2017;18(3):295–307. pmid:27884689
  67. 67. Riley JL 3rd, Cruz-Almeida Y, Glover TL, King CD, Goodin BR, Sibille KT, et al. Age and race effects on pain sensitivity and modulation among middle-aged and older adults. J Pain. 2014;15(3):272–82. pmid:24239561