Assessing psychological well-being measures among South African adults in the birth to twenty plus cohort

2 months apart. Overall, the measures of PWB were characterised as having unidimensional factor structures, good model fit indices, high internal consistency and reliability to the paragraph. This study demonstrated that the PWB measures evaluated here are psychometrically sound, and suitable to be used in the South African context.

& Bartels, 2018). Subjective well-being can be operationalised with constructs that measure affect as well as those that cover cognitive aspects for example, Harmony in Life Scale (Nima, Cloninger, Persson, Sikström, & Garcia, 2020). Psychological well-being refers to measures that assess efficacious or nonefficacious functioning at inter-and intra-individual levels, and is operationalised through constructs such as personal growth, purpose in life and self-acceptance (Ryff, 2014).
In this article, we evaluate specific domains of the National Institutes of Health (NIH) Toolbox Emotion Battery (Salsman et al., 2013) namely, hope, faith, social support, general selfefficacy, and life satisfaction which are measures of PWB. These PWB scales were selected because they are widely used in the South African public health and community research contexts (Brinker & Cheruvu, 2017;Pacico, Bastianello, Zanon, & Hutz, 2013;Van Zyl & Dhurup, 2018). It is important to re-evaluate the psychometric properties of these validated scales, especially in a local context, to see how a particular measurement theory is reflected in local empirical data (Flora & Flake, 2017). There is paucity of psychometric data on PWB scales in South Africa, which makes it difficult to tell whether the scales are measuring latent constructs per the original design.
Hope is considered a psychological strength used to ensure that goals are attained through planning, overcoming behavioural or physical health issues, and dealing with any unintended outcomes from stressful life events (Pacico et al., 2013;Savahl, Casas, & Adams, 2016). Faith has been defined as how an individual understands their 'ultimate reality' (Fowler, 1981) by putting confidence in a higher power or being pious (Bai, Lazenby, Jeon, Dixon, & McCorkle, 2015). The faith construct has been shown to have positive associations with physical and mental health as well as other measures such as coping, and self-esteem (Abdel-Khalek & Tekke, 2019). Social support has been shown to buffer adverse life events through the action of others and belief of support, which leads to an appraisal of life situations as non-threatening. Social support is widely incorporated into interventions and used to explain behaviour change (Cohen, 2004). Many types of social support were evaluated and shown to be consistent; for example in relationships and risky behaviours, and in promoting physical activity (Brinker & Cheruvu, 2017;Cohen, 2004;Ory et al., 2018;Simoni, Frick, & Huang, 2006;Wright, 2016). Self-efficacy is the belief that one can accomplish tasks and goals in unpredictable circumstances. Efficacious individuals welcome challenging tasks as motivating factors, while inefficacious individuals dwell on their weaknesses (Bandura, 1986;Mpondo et al., 2015). Selfefficacy has been used extensively in health promotion studies and interventions (Dennis, Brennenstuhl, & Abbass-Dick, 2018;Ory et al., 2018). General life satisfaction is an individual's judgement of the consonance of their living conditions and standards without comparing themselves to others (Diener, Emmons, Larsen, & Griffin, 1985). According to Veenhoven (1993, p. 213), 'general life satisfaction is the degree to which a person evaluates their life'. Recent studies have looked at general life satisfaction in association with self-rated health and social capital constructs (Gigantesco et al., 2019;Maass, Kloeckner, Lindstrøm, & Lillefjell, 2016).
The objective of this study was to assess the psychometric properties of PWB measures in the context of urban South Africa. We conducted exploratory factor analyses (EFAs) to evaluate structure patterns, and confirmatory factor analysis (CFA) to get fit indices. We checked for internal consistencies using Cronbach's alpha and scale validity by calculating correlations between all the scales, and we also conducted test-retest reliability as well as intraclass correlations (ICCs).

Sampling
The Birth to Twenty Plus (BT20+) cohort was established to observe growth, development and health of children and adolescents in an urban cohort, following the democratic transition in the Republic of South Africa. The cohort enrolled 3273 singleton babies from Soweto and Johannesburg, South Africa, who were born between 23 April and 8 June 1990, and who continued to live in the area for the first 6 months of the child's life. Since birth, information on socioeconomic, family and personal factors influencing physical and psychological health and wellbeing has been collected 21 times. This article uses data collected when cohort members were 28 years old. A detailed description of the study and its cohort is published elsewhere (Richter, Norris, Pettifor, Yach, & Cameron, 2007). Data used here were collected between June 2018 and June 2019 from 1327 individuals, who had data on all the measures.
We collected test-retest reliability data from a sub-set of the cohort (n = 43) participants, who were seen at three time points (T1, T2 and T3). The average T1 -T2 time point interval was 57 days, T2 -T3 was 14 days, and T1 -T3 was 191 days. Participants completed the same questionnaires and were seen by the same assessor at each time point.

Measures
All measures come from the NIH Toolbox Emotion battery, which identified and developed measures suitable for use in epidemiology research across different ethnicities and cultures in high income countries (Salsman et al., 2013).
Hope was measured using the WHO Quality of Life assessment (WHOQOL) study (Group, 1998). The scale comprises four Likert scale items with answer options ranging from 1 (not at all) to 5 (extremely). These items were shown to have good psychometric properties, that is, coefficient alpha of 0.74 in the original WHOQOL study (Group, 1998) under the psychological facet-spirituality domain.
Faith was also measured using the WHOQOL assessment (Group, 1998). The scale comprises four Likert scale items with answer options ranging from 1 (not at all) to 5 (extremely). This measure had good psychometric properties that is, coefficient alpha of 0.74 in the WHOQOL validation study (Group, 1998) under the psychological facet-spirituality domain.
Social support was measured using the NIH Toolbox Social Support questionnaire (Salsman et al., 2013). This scale comprises eight self-report items with response options ranging from 1 (never) to 5 (always). These items have been shown to have a good model fit (i.e., CFI = 0.99; root mean square error of approximation [RMSEA] = 0.112) and excellent psychometric properties (i.e. coefficient alpha 0.96) in the NIH Toolbox validation study (Salsman et al., 2013).
Life satisfaction was measured using the NIH Toolbox Life Satisfaction Scale (Salsman et al., 2013), which comprises five Likert scale items, with response options ranging from 1 (strongly disagree) to 5 (strongly agree). The psychometric properties of the scale in the NIH Toolbox validation study were good (i.e. coefficient alphas of 0.79-0.89; Salsman et al., 2013).
Self-efficacy was measured using the NIH Toolbox general Self-Efficacy Scale, which comprises nine items with response options ranging from 1 (never) to 5 (very often). This scale has been shown to have excellent psychometric properties (i.e., coefficient alphas of 0.93; CFI = 0.99 and RMSEA = 0.73; Salsman et al., 2013).

Analysis
A total sample of 1327 participants was used to conduct EFAs to evaluate the factor structure patterns of the hope, faith, social support, general life satisfaction and self-efficacy measures. We used the Keiser Meyer Olkin-Bartlett's (KMO) test for sampling adequacy: KMO values between 0.8 and 1 indicate sampling adequacy, values < 0.6 indicate inadequacy of the sample, and KMO values close to zero indicate widespread correlation. To understand the structure of variable clusters and identify latent variables we used the principal factor (pf) estimation technique. We also used the estat anti command to check for variables that were correlating too high. We chose oblique oblimin rotation to get the simplest factor structure. To extract factors, we used Kaiser's criterion by checking the scree plots. Factors with loadings 0.30 or higher were considered components of one domain; at least three items needed to load onto a domain to be considered a valid factor. To obtain fit indices we conducted CFA using maximum likelihood (ML) estimation, and default bootstrap settings. Fit indices calculated were: chi-square (χ 2 ), chi-square/degree of freedom ration (χ 2 /df), the comparative fit index (CFI; Hu & Bentler, 1999), the Tucker-Lewis index (TLI: Hu & Bentler, 1999), the RMSEA (Steiger, 1990), and a standardised root mean square residual (SRMR; Hu & Bentler, 1999). Best practice guidelines suggest that χ 2 /df should be less than 5, SRMR should be close to zero, RMSEA should be < 0.05, thus indicating a good fit, whereas a value that is < 0.08 indicates a reasonable model, and values exceeding that indicate a mediocre or a poor fit (Byrne, 2010). For a good fit, the CFI and TLI are recommended to be ≥ 0.90 (Byrne, 2010;Hu & Bentler, 1999). Internal consistency and reliability were determined using Cronbach's alpha (α). To determine scale validity, we used Pearson's correlation matrix. STATA version 14 was used for analysis (StataCorp, 2015).
For the test-retest reliability, we evaluated practice effects using t-tests and effect sizes. Cohen's d was used to determine the magnitude of the practice effects, 0.2 is interpreted as a small effect, 0.5 as moderate and 0.8 as a large effect (Cohen, 2004). We also used ICCs to determine test-retest reliability. Intra-class correlation coefficients were interpreted as: poor (< 0.5), moderate (0.50-0.74), good (0.75-0.90) and above 0.90 as excellent test-retest reliability (Koo & Li, 2016).

Ethical considerations
The Human Research Ethics Committee of University of the Witwatersrand (South Africa) granted ethical clearance for this study (reference number: M180225) and the study was conducted in line with the Principles of the Declaration of Helsinki for research involving human subjects. Participants provided written informed assent consent.

Results
A total of 1327 participants were interviewed, and about 99% of those had complete data for variables of interest. About 639 (48%) were male and 698 (52%) females. Results of normality are presented in Table 1 and item means for the PWB measures stratified by sex are presented in Table 2.

Factor analysis
All measures had KMO test values between 0.8 and 1, and thus suitable for further factor analysis. The scree plots showed that all latent variables converged into a single higher-order factor: eigenvalues > 1. Table 3 displays exploratory factor analysis (EFA) results. Factors were regarded as stable if at least three items had significant loadings; this was the case for all measures.
The CFA results are presented in Table 4. For the Hope scale the unadjusted model fit was poor that is, RMSEA = 0.13; CFI = 0.60, and TLI = 0.82. We identified items that may have been ambiguous or may have had an unclear meaning to the participant and lower factor loadings that

Scale consistency and validity
The mean and standard deviations of the summed scores of all the measures are presented in Table 4. The individual scales for Hope, Faith, Social Support, General Life Satisfaction, and Self-Efficacy produced high internal consistencies (α's). General Life Satisfaction and Hope showed α's > 0.70; Faith, Social Support and Self-Efficacy α > 0.80 (Figure 1.).
The Pearson correlation coefficients are presented in Table 5. Most of the correlations showed significant positive associations of medium magnitudes, and Faith vs. Hope and Self-Efficacy vs. Hope showed strong correlations.

Test-retest
We assessed participants at three-time points: T1 and T2, and each had 43 participants, and T3, which had 30  participants (see Table 6 for means and SD at each time point). At time point T1 to T2, Self-Efficacy showed significant practice effects with a small magnitude, Hope had moderate non-significant practice effects. All other practice effects at T1 and T2 were small and non-significant. At T2 and T3, General Life Satisfaction had small and significant effects, Self-Efficacy had large non-significant effects, and all other measures had small non-significant effects.

Discussion
This article aimed to evaluate the psychometric properties of PWB measures: Hope, Faith, Social Support, Self-Efficacy, and General Life Satisfaction, in a sample of young adult urban South Africans. The factor structures for all the measures were unidimensional similar to other studies (De Maria, Vellone, Durante, Biagioli, & Matarese, 2018;Hinz et al., 2018;Nel & Boshoff, 2014;Salsman et al., 2013). We removed some items in our CFA to improve fit indices (for Hope, Faith, and Social Support). This suggests that the language of the removed items needs to be re-evaluated to ensure acceptability to local understandings. The correlations allowed comparison of the magnitude of associations between the measures; Faith, Hope, Self-Efficacy, and General Life Satisfaction were shown to be valid as confirmed by good Cronbach's alphas (Westen & Rosenthal, 2003). This result suggests future studies can potentially assess these measures together. The test-retest results showed small practice effects for Self-Efficacy and General Life Satisfaction to some extent expected given the relatively short period of time between test-retest intervals. This was expected as participants had become familiar with the measures. Intraclass correlations were moderate at T1 -T2 for all the measures, and for Faith the ICC was good, as well as for Social Support at T2 -T3 thus implying that there were small variations that originated from the instruments or circumstances under which measurements were taken. This suggests that the measures were reliable for application in the South African context (De Vet, Terwee, Knol, & Bouter, 2006;Koo & Li, 2016).
Participants reported moderate to high levels of Hope, Faith, Social Support and General Life Satisfaction, and low to moderate levels of Self-efficacy. Because these measures have been shown to have buffering effect against mental health disorders, and to enhance one's reserves of social cognitive and problem-solving capabilities, they can be targeted for mental health promotion interventions (Nyqvist, Forsman, Giuntoli, & Cattan, 2013). The interventions can be delivered in various ways by community healthcare workers who would use a community-based model or through using digital technology (e.g. zero-rated platforms on cellular phones). The interventions could teach people how to cultivate positive feelings, exercise cognitive flexibility, self-compassion, have hope and optimism while providing and using support resources intentionally.    Another study conducted on coping in the Soweto population showed that religious activity (i.e. gathering for prayer in a group or praying) was perceived to be a good source of resilience and coping (Kim, Kaiser, Bosire, Shahbazian, & Mendenhall, 2019). This is a pre-existing psychosocial resource that can be incorporated into interventions, not to endorse religion per se, in the organisational sense, but to use some of the tenets embodied therein such as altruism, forgiveness, gratitude and social support as tools to buffer against mental health issues (Sharma & Singh, 2019).
The limitation of this study pertains to the generalisability of some of the measures (Hope, Faith and Social Support) because some items were removed to improve fit indices and indeed reliability. It may be difficult to compare our findings to other validated studies. However, the removal of the items was in line with the purpose of testing the psychometric properties of a scale in a local context. Removing items is warranted when those items have weak loadings or are ambiguous -concerning how a participant interacts with an item (i.e. obscure, sophisticated or complex vocabulary). Literature shows that the removal of items from a scale does not compromise the reliability of that scale (McCrae, Kurtz, Yamagata, & Terracciano, 2011). Another limitation is the sample size used for the test-retest: it too might affect the generalisability of the results. Because of time constraints we could not collect repeat measures for the PWB scales for a bigger sample.

Conclusion
In conclusion, the fact that all PWB measures were shown to have high internal consistency, validity and reliability when used within an urban and multi-cultural context is a strength and points to their usefulness of the tools for assessing whether individuals are languishing or thriving. Therefore, the measures are relevant for the community and/or research setting to be administered by trained non-clinical assessors.