Skip to main content


Reactivity to smartphone-based ecological momentary assessment of depressive symptoms (MoodMonitor): protocol of a randomised controlled trial

Article metrics



Ecological momentary assessment (EMA) of mental health symptoms may influence the symptoms that it measures, i.e. assessment reactivity. In the field of depression, EMA reactivity has received little attention. We aim to investigate whether EMA of depressive symptoms induces assessment reactivity. Reactivity will be operationalised as an effect of EMA on depressive symptoms measured by a retrospective questionnaire, and, secondly, as a change in response rate and variance of the EMA ratings.


This study is a 12-week randomised controlled trial comprising three groups: group 1 carries out EMA of mood and completes a retrospective questionnaire, group 2 carries out EMA of how energetic they feel and completes a retrospective questionnaire, group 3 is the control group, which completes only the retrospective questionnaire. The retrospective questionnaire (Centre for Epidemiologic Studies Depression scale; CES-D) assesses depressive symptoms and is administered at baseline, 6 weeks after baseline and 12 weeks after baseline. We aim to recruit 160 participants who experience mild to moderate depressive symptoms, defined as a Patient Health Questionnaire (PHQ-9) score of 5 to 15. This study is powered to detect a small between-groups effect, where no clinically relevant effect is defined as the effect size margin −0.25< d <0.25.


To our knowledge, this is the first study to investigate whether self-rated EMA of depressive symptoms could induce assessment reactivity among mildly depressed individuals.

Trial registration

Netherlands Trial Register NTR5803. Registered 12 April 2016.


Daily, repeated measurements of mental health symptoms, also known as experience sampling or ecological momentary assessment (EMA), enables clinicians, researchers and patients to monitor psychological processes in real time. EMA is usually operationalised as active monitoring, in which patients respond to prompted or self-initiated items or questions [1]. EMA involves repeated self-report assessments of a current state, such as daily assessments of mood. Several advantages of active EMA for assessing mental health problems, such as depression, have been noted [2]: absence of recall bias; sensitivity to mood fluctuations [3]; and real-time insight into treatment response, which can also be used as feedback to individual patients (e.g. [4]). Several studies have employed smartphones for EMA of depressive symptoms (e.g. [59]).

Active EMA may not only measure, but also influence mental health symptoms, which is known as assessment reactivity. Studies on alcohol abuse interventions show that repeated assessments draw attention to the monitored behaviour, which can identify problematic behaviour and highlight personal responsibility [10]. In that sense, EMA is similar to a brief intervention and can have a positive influence on treatment outcome [11]. Another type of reactivity is response fatigue, which is a declining response rate or declining response accuracy over time (e.g. [12]). A declining response accuracy could be observed by a declining correlation between a repeated measure and another measure that assesses a theoretically associated construct [12].

In the field of depression, EMA reactivity has received little attention. Kramer et al. [13] found that EMA of positive and negative affect may have a beneficial effect on depressive symptoms after 7 weeks, although the effect had disappeared at the 6-month follow-up [13]. EMA was conducted alongside pharmacotherapy in that study and the EMA was short and intensive, i.e. over 10 self-rated items, each of which had to be rated 10 times a day for 5 consecutive days and for 5 more consecutive days a few weeks later [13]. Positive and negative affect are related to depression, so reactivity to EMA of depressive symptoms may follow a similar pattern as was found by Kramer et al.. Little is known about reactivity to EMA of depressive symptoms when conducted for a longer consecutive period, such as 12 weeks.

This study aims to investigate whether EMA of depressive symptoms induces assessment reactivity. First, we will investigate whether EMA of depressive symptoms during a 12-week period has an effect on depressive symptoms measured by a retrospective questionnaire. Secondly, we will investigate whether response fatigue affects the EMA ratings. Response fatigue will be operationalised as response rate over time and correlations with measures of associated constructs. To minimise response burden, participants will monitor only one symptom and 1 to 3 times a day. Because depression is a multidimensional construct [14] we will conduct EMA of two core depression symptoms in two groups, where group 1 monitors mood and group 2 monitors how energetic they feel. Based on the literature we expect a small positive short-term effect of EMA on depressive symptoms.


Study design

This study is a 12-week randomised controlled trial among participants who experience mild to moderate depressive symptoms. The trial consists of three groups: group 1 carries out EMA of mood and completes a retrospective questionnaire; group 2 carries out EMA of how energetic they feel and completes a retrospective questionnaire; group 3 is the control group, which completes only the retrospective questionnaire. The retrospective questionnaire is the Centre for Epidemiologic Studies Depression scale (CES-D) [15], which assesses depressive symptoms. The CES-D is administered at baseline (T0), 6 weeks after baseline (T6) and 12 weeks after baseline (T12).

Study population and inclusion criteria

We aim to recruit 160 adult (18+) participants who experience mild to moderate depressive symptoms among college students and users of mental health websites. We define mild to moderate depressive symptoms as a PHQ-9 score of 5 to 15 [16]. All participants are required to own a smartphone that runs the Android operating system, version 4.0 or later, because the EMA application runs only on that platform. Recruitment within a student population and among website users will ensure that many of the interested individuals own a smartphone and are used to installing and using apps.

MoodMonitor, a self-monitoring application

Participants in group 1 and 2 install the MoodMonitor application on their smartphones, which conducts EMA of mood and energy level. This app has been developed by the E-Compared consortium [8]. Every day, the participant will receive one notification on his/her smartphone at a random time point between ten o’clock in the morning and ten o’clock in the evening. This notification directs the participant to the question ‘How is your mood right now?’ (group 1) or ‘How energetic do you feel right now?’ (group 2), which the participant can answer on a visual analogue scale from 1 (worst) to 10 (best) with a precision of 1 digit after the decimal point, e.g. 8.1. The notification remains accessible until the question is answered. If a notification remains unanswered, it is replaced when the next notification is sent. The measurement will be time stamped, i.e. the system records the exact time when the participant enters a rating, in addition to the time when the notification was sent. Furthermore, participants are free to provide a rating at any time they want by opening the app. During week 1 and week 12, the participant rates his/her mood three times a day instead of just once, in order to measure mood fluctuations during the day. The entered ratings are instantly visible to the participants on a graph which they can access through the MoodMonitor application.


We employ three recruitment strategies. First, we distribute flyers on the university campus. Second, we post advertisements on Dutch websites for mental health issues. Third, advertisements are posted on Facebook and Twitter. The flyers and advertisements specifically target people who experience low mood or mood fluctuations and direct them to a website ( that contains more detailed information. Interested individuals can apply to participate in the study via this website by completing the screening questionnaire (PHQ-9, age, possession of an Android-compatible smartphone), after which they can read the study information again and can agree to participate by entering a valid email address (electronic informed consent). Applicants who do not meet the inclusion criteria will be notified instantly. If they are excluded because they score above 15 on the PHQ-9, they are advised to contact their general practitioner. Participants who meet the inclusion criteria will be randomised equally to the three groups, i.e. 1:1:1. An independent researcher performs the allocation using a computerised random number generator. Next, participants are sent a link to the baseline questionnaire (CES-D, demographics), and participants in groups 1 and 2 receive an email with instructions to download and install the MoodMonitor app on their smartphones. Even though we do not notify participants directly as to which group they are randomised, they can find out easily by reading through the study information on the website, and therefore participants cannot be considered to be blinded. After week 6 and after week 12 participants in all three groups are sent an email containing a link to the questionnaires (CES-D at week 6, CES-D and SUS at week 12). Participants who complete a measurement are offered 7.50 Euro, i.e. 22.50 for all three measurements, and an additional 10 Euro if they respond to 80 % or more of the EMA prompts. Participants will receive this incentive in the form of an electronic gift voucher sent to their email address. We will continue recruitment until 160 applicants have been randomised. Data collection runs from April 2016 to November 2016. This trial’s results will be published on the study homepage when the scientific papers concerning the main objectives have been published.


Inclusion criteria will be determined using the Dutch version of the Patient Health Questionnaire (PHQ-9) [16], which will be administered online. The PHQ-9 contains nine items and covers nine criteria listed in the Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5), requiring respondents to rate the frequency of present difficulties during the past 2 weeks. Scores indicate the presence and severity of depression symptoms, with a maximum score of 27 and a minimum score of 0. Scores of 5, 10, 15, and 20 indicate mild, moderate, moderately severe, and severe depression, respectively. The internal consistency (Cronbach’s alpha) of the PHQ-9 with a clinical population was in the range of .86–.89 [17]. The instrument has shown good inter-format reliability between the original pen-and-paper and the online version [18]. We aim to include mild to moderately depressed participants, so we will include participants who score within the range of 5 to 15.

The primary outcome measure of this study is retrospectively measured depressive symptoms. We will use the Centre for Epidemiologic Studies Depression scale (CES-D) [15, 19] which will be administered online at baseline, after week 6 and after week 12. This scale consists of 20 self-rated items, each scored from 0 to 3. The total score ranges from 0 to 60, where higher scores represent more severe depressive symptoms. When used online, the CES-D has a good internal consistency (Cronbach’s alpha = .89–.93) and consists of 2 to 4 factors [20].

Several demographic variables (e.g. age, gender, educational level, marital status) will be gathered at baseline to determine the characteristics of the sample. We will also ask the participants how often and for which purposes they use their smartphones. At all measurement points participants are also asked whether they receive any professional help for mental health related problems, because that could influence their depressive symptoms.

The final questionnaire at week 12 contains the System Usability Scale (SUS) [21, 22] to evaluate the usability of the MoodMonitor app. Therefore the SUS will be administered to groups 1 and 2 only. The SUS comprises ten questions (e.g. ‘I would imagine that most people would learn to use this application very quickly.’) with five response options (ranging from ‘strongly disagree’ to ‘strongly agree’). Total scores range from 0 to 100, with higher scores representing higher usability. A SUS score above 70 could be considered adequate [22].

At the end of the study, we will randomly select ten participants in the EMA groups for a semi-structured interview by telephone to obtain qualitative information about: 1) their experience with tracking their mood and energy level; 2) their experience with the app and how it could be improved; 3) participation in this study in general (Table 1).

Table 1 Schedule of enrolment, interventions and assessments

Data management

Data obtained by the online questionnaires are stored in a password-protected database maintained by an independent data manager. The email addresses of participants, which is the only personal information we obtain, will be stored in a separate database, which is also maintained by an independent data manager and will be destroyed after data collection. Both databases are not accessible by the researchers and the data of the online questionnaires will become available for research purposes when data collection has been completed. The data obtained by the MoodMonitor app are not stored on the participants’ smartphones, but on a remote server, to which the anonymised data are sent through a secured connection. The app data (i.e. mood and energy ratings) are also not accessible by the researchers until data gathering has ended. The independent data manager monitors the data flow. There is no (other) data monitoring committee, due to the minimal burden and risk associated with participating in this study. After the publication of this study’s main results, the data obtained by this study will become available on request. Requests should be sent to with the topic name MoodMonitor.


The primary outcome is the effect of EMA on depressive symptoms as measured by the CES-D at T6 and T12, comparing both EMA groups with the control group. To test this on all available data, we will conduct a mixed models repeated measures regression analysis, with T6 and T12 CES-D data as the dependent variable, and baseline CES-D scores, group (1, 2, 3), time (T6, T12) and the time*group interaction as independent variables. Effects will be expressed in terms of percentage of variance explained and Cohen’s d (by dividing raw regression parameter point estimates and confidence intervals by the pooled CES-D standard deviation).

For our secondary analysis, we will examine the EMA response rate over time, which gives an indication of response fatigue. Response fatigue will also be analysed by assessing response accuracy, here defined as a declining correlation over time between theoretically associated measures. Literature has shown that mood swings are associated with higher depressive symptoms [23]. Therefore, we will analyse the variance of the EMA and its correlation with the CES-D scores, taking into account the EMA ratings of the two weeks prior to the CES-D (i.e. weeks 5 and 6 for T6 and weeks 11 and 12 for T12). A decreasing variance of EMA ratings (i.e. increasingly stable responses) will result in a decreasing correlation coefficient (r) with the CES-D scores if the CES-D scores do not decrease. This might indicate a decline in validity.

We will maintain a family-wise two-tailed p-value of .05 using the Holm-Bonferroni method [24]. IBM SPSS Statistics 23 [25] and R [26] will be used for all analyses.

Sample size

For the sample size calculation, we assume that a difference between either EMA group and the control group within the margin −0.25< Cohen’s d <0.25 is clinically negligible. Therefore, a difference between groups of d = 0.25 should reach statistical significance in our analysis. We ran a power calculation in G*Power [27] for a repeated-measures analysis of variance (ANOVA), which is essentially the same analysis as linear mixed modelling when there are no missing values. A total number of 120 participants (40 per group) is required in order to obtain d = 0.25 with three assessment waves, a targeted power of .85, a significance level alpha of .05, and conservative estimations of correlations between measurements (r = .65) and variances of the differences between groups (non-sphericity correction epsilon = .8). Expecting 25 % drop-out at T12, we will recruit 160 participants. All participants will be included in the linear mixed model, which is robust for missing data.


This study aims to investigate whether self-rated EMA of depressive symptoms could induce assessment reactivity among mildly depressed individuals. The linear mixed model analyses can show an effect of Cohen’s d >0.25 (and < −0.25) on retrospectively measured depressive symptoms between groups. The 12-week study period and repeated measures design enable us to detect both short-term (6 weeks) and longer-term (12 weeks) effects on retrospectively measured depressive symptoms, as well as response fatigue.


This study will give a first indication of EMA reactivity on depressive symptoms. An effect of EMA on depressive symptoms would limit EMA as an instrument, because a change in EMA ratings might be attributed to the EMA itself instead of treatment or other factors. This is of immediate interest to research projects in which participants carry out EMA over longer periods, such as the E-COMPARED project [8]. However, a positive effect would indicate that EMA can be applied as an intervention, e.g. alongside psychotherapy or pharmacotherapy. A negative effect would indicate that it may not be ethical to use EMA as an instrument. If we find indications of response fatigue at T6 and/or T12, we can recommend the optimal period of conducting EMA.


First, a change in retrospectively measured depressive symptoms could be attributed not only to a change in depression severity, but also to a change in response behaviour. If we find an effect, we can explore changes in response behaviour by testing the CES-D outcomes for measurement invariance [28]. Second, one of our secondary aims is to investigate response rate, but the response rate might be artificially high, because participants are given a monetary reward when they complete 80 % or more of the EMA prompts. This reward is necessary to answer the primary research question, because any effect of EMA can only be shown when participants conduct EMA.


To our knowledge, this is the first study to investigate whether self-rated EMA of depressive symptoms could induce assessment reactivity among mildly depressed individuals.



Centre for epidemiologic studies depression scale


Ecological momentary assessment


Patient health questionnaire


System usability scale


  1. 1.

    Trull TJ, Ebner-Priemer UW. Using experience sampling methods/ecological momentary assessment (ESM/EMA) in clinical assessment and clinical research: introduction to the special section. Psychol Assess. 2009;21:457–62. Available from:

  2. 2.

    Bos FM, Schoevers RA, aan het Rot M. Experience sampling and ecological momentary assessment studies in psychopharmacology: A systematic review. Eur Neuropsychopharmacol. 2015;25:1853–64.

  3. 3.

    Solhan MB, Trull TJ, Jahng S, Wood PK. Clinical assessment of affective instability: Comparing EMA indices, questionnaire reports, and retrospective recall. Psychol Assess. 2009;21:425–36. Available from:

  4. 4.

    Wichers M, Simons CJP, Kramer IMA, Hartmann JA, Lothmann C, Myin-Germeys I, et al. Momentary assessment technology as a tool to help patients with depression help themselves. Acta Psychiatr Scand. 2011;124:262–72.

  5. 5.

    Band R, Barrowclough C, Emsley R, Machin M, Wearden AJ. Significant other behavioural responses and patient chronic fatigue syndrome symptom fluctuations in the context of daily life: An experience sampling study. Br J Health Psychol. 2016;21:499–514.

  6. 6.

    Clasen PC, Fisher AJ, Beevers CG. Mood-reactive self-esteem and depression vulnerability: Person-specific symptom dynamics via smart phone assessment. PLoS One. 2015;10:e0129774.

  7. 7.

    Juengst SB, Graham KM, Pulantara IW, McCue M, Whyte EM, Dicianno BE, et al. Pilot feasibility of an mHealth system for conducting ecological momentary assessment of mood-related symptoms following traumatic brain injury. Brain Inj. 2015;29:1351–61. Available from:

  8. 8.

    Kleiboer A, Smit J, Bosmans J, Ruwaard J, Andersson G, Topooco N, et al. European COMPARative Effectiveness research on blended Depression treatment versus treatment-as-usual (E-COMPARED): study protocol for a randomized controlled, non-inferiority trial in eight European countries. Trials. 2016;17:1.

  9. 9.

    Seidel M, Petermann J, Diestel S, Ritschel F, Boehm I, King JA, et al. A naturalistic examination of negative affect and disorder-related rumination in anorexia nervosa. Eur. Child Adolesc. Psychiatry. 2016 [Epub ahead of print]

  10. 10.

    Schrimsher G, Filtz K. Assessment Reactivity: Can Assessment of Alcohol Use During Research be an Active Treatment? Alcohol Treat Q. 2011;29:108–15.

  11. 11.

    Clifford PR, Davis CM. Alcohol treatment research assessment exposure: a critical review of the literature. Psychol Addict Behav. 2012;26:773–81. Available from:

  12. 12.

    Reynolds BM, Robles TF, Repetti RL. Measurement Reactivity and Fatigue Effects in Daily Diary Research With Families. Dev Psychol. 2016;52:442–56.

  13. 13.

    Kramer I, Simons CJP, Hartmann JA, Menne-Lothmann C, Viechtbauer W, Peeters F, et al. A therapeutic application of the experience sampling method in the treatment of depression: A randomized controlled trial. World Psychiatry. 2014;13:68–77.

  14. 14.

    World Health Organization. ICD-10 Version: 2016 [Internet]. 2016. Available from: Accessed 19 Oct 2016.

  15. 15.

    Radloff LS. The CES-D scale: A self-report depression scale for research in the general population. Appl Psychol Meas. 1977;1:385–401. Available from: http://10.1177/014662167700100306\n

  16. 16.

    Spitzer RL, Kroenke K, Williams JB. Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire. JAMA. 1999;282:1737–44.

  17. 17.

    Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: Validity of a brief depression severity measure. J Gen Intern Med. 2001;16:606–13.

  18. 18.

    Erbe D, Eichert HC, Rietz C, Ebert DD. Interformat reliability of the patient health questionnaire: Validation of the computerized version of the PHQ-9. Internet Interv. 2016;5:1–4.

  19. 19.

    Bouma J, Ranchor A V, Sanderman R, van Sonderen E. Het meten van symptomen van depressie met de CES-D. Een handleiding. Groningen: Noordelijk Centrum voor Gezondheidsvraagstukken, Rijksuniversiteit Groningen; 1995.

  20. 20.

    van Ballegooijen W, Riper H, Cuijpers P, van Oppen P, Smit JH. Validation of online psychometric instruments for common mental health disorders: a systematic review. BMC Psychiatry. 2016;16:45. Available from:

  21. 21.

    Brooke J. SUS - A quick and dirty usability scale. Usability Eval Ind. 1996;189:4–7. Available from:

  22. 22.

    Bangor A, Kortum PT, Miller JT. An Empirical Evaluation of the System Usability Scale. Int J Hum Comput Interact. 2008;24:574–94.

  23. 23.

    Maciejewski DF, Van Lier PAC, Neumann A, Van Der Giessen D, Branje SJT, Meeus WHJ, et al. The development of adolescent generalized anxiety and depressive symptoms in the context of adolescent mood variability and parent-adolescent negative interactions. J Abnorm Child Psychol. 2014;42:515–26.

  24. 24.

    Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1979;6:65–70.

  25. 25.

    IBM Corp. Released. IBM SPSS Statistics for Windows, Version 23.0. 2015. 2015.

  26. 26.

    R Core Team. R: A Language and Environment for Statistical Computing [Internet]. R Found. Stat. Comput. Vienna Austria. 2015. p. {ISBN} 3-900051-07-0. Available from: Accessed 19 Oct 2016.

  27. 27.

    Faul F, Erdfelder E, Lang A-G, Buchner A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods. 2007;39:175–91. Available from:

  28. 28.

    Hirschfeld G, von Brachel R. Multiple-Group confirmatory factor analysis in R–A tutorial in measurement invariance with continuous and ordinal indicators. Pract Assess Res Eval. 2014;19:2.

Download references


We would like to thank Pepijn van de Ven, Artur Rocha, Mário Ricardo Henriques, Fernando Cassola and Michel Klein for their collaboration and for building and maintaining the MoodMonitor system. We thank Emiel Tybout, Bep Verkerk and Bianca Lever-van Milligen for field work and data management.


This study is funded by two sources. This study is part of the European COMPARative Effectiveness research on blended Depression treatment versus treatment-as-usual (E-COMPARED) project, which is funded by the European Commission FP7-Health-2013-Innovation-1 program, grant agreement number: 603098. The second is internal funding of the mental health centre GGZ inGeest, Department of Research and Innovation.

Availability of data and materials

Data gathering was not completed when this manuscript was submitted. After the publication of this study’s main results, the data obtained by this study will become available on request. Requests should be sent to with the topic name MoodMonitor.

Authors’ contributions

HR conceived the study. HR, JHS, WvB, EK and JR designed the study. WvB and JR drafted the manuscript. DDE provided feedback on the manuscript and study design. All authors critically reviewed the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

The medical research ethics committee of the Vrije Universiteit Medical Centre (METc VUmc) judged participant risks and burden to be minimal and granted an exemption from requiring ethics approval in accordance with the Dutch Medical Research Involving Human Subjects Act (Wet Medisch-Wetenschappelijk Onderzoek bij Mensen, WMO), file number 15.333. Therefore, it is permitted that participants provide anonymous electronic informed consent.

Author information

Correspondence to Wouter van Ballegooijen.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

van Ballegooijen, W., Ruwaard, J., Karyotaki, E. et al. Reactivity to smartphone-based ecological momentary assessment of depressive symptoms (MoodMonitor): protocol of a randomised controlled trial. BMC Psychiatry 16, 359 (2016) doi:10.1186/s12888-016-1065-5

Download citation


  • Ecological momentary assessment
  • Experience sampling
  • Assessment reactivity
  • Smartphones
  • Mobile health
  • Depression
  • Mood