Time limits and English proficiency tests : Predicting academic performance

English has become the dominant language of business, public life and higher education (Benzie, 2010; Casale & Posel, 2011; Coleman, 2006; Nunan, 2003). Therefore, formal acquisition of English language skills has become essential for success in both higher education and business contexts to enhance economic opportunities in a multinational and international economy (Bedenlier & Zawacki-Richter, 2015; Prinsloo & Heugh, 2013). Higher education serves an essential role in enhancing the future career prospects in a competitive social and economic framework, making success integral for many young people (Coleman, 2006; Cross & Carpentier, 2009; Prinsloo & Heugh, 2013). Although higher academic success has become essential for entry into the 21st century economy (Jackson, 2015), academic English language proficiency remains a challenge for the majority of South African students in a linguistically diverse society (Andrade, 2006; Cross & Carpentier, 2009; Murray, 2010; Trenkic & Warmington, 2018).


Introduction
Apart from basic literacy, the context of higher education often requires content-specific skills (linked to CALP; Cummins, 2000), which are reliant on technical vocabulary beside general contextual identification and understanding (Dalton-Puffer, 2011;Fenton-Smith, Humphreys, & Walkinshaw, 2018;Millin & Millin, 2018). Global research has implied that basic skills are a necessary component for developing technical/academic language (Birrell, 2006;Coleman, 2006). Consequentially, students lacking English proficiency skills, or exhibiting competency gaps, may be at an academic disadvantage on entering English language institutions.
Internationally, English proficiency tests are frequently conducted pre-admission for selection purposes. Although these tests could be utilised for admitting students in first year, they are often time-consuming, expensive and focused on overall proficiency rather than critical basic skills more relevant to post-admissions phase (Arrigoni & Clark, 2015;Feast, 2002;Goto, Maki, & Kasai, 2010;Murray, 2010). These traditional gate-keeping tests include the International English Language Testing System (IETLS) and the Test of English as a Foreign Language (TOEFL). The viability and financial feasibility of utilising these assessments post-admissions to identify competency gaps is insufficient. Post-admissions, other options, including the Diagnostic English Language Test and Diagnostic English Language Needs Assessment, have been used globally for screening and diagnosis with good predictive and diagnostic validity (Doe, 2014;Read, 2008). Similar to pre-admission tests, the foci include vocabulary, speedreading, listening and interpretation of texts. In both cases, complex, rather than base skills are inherent to the tests. Thus, other research has indicated that briefer, basic ability tests, including Cloze procedure protocols and vocabulary assessments, are time-and cost-effective whilst retaining sufficient psychometric properties (Goto et al., 2010;Sun & Henrichsen, 2010).
Cloze procedure protocols require the reader to insert missing words or phrases, illustrating semantic and contextual understanding linked to reading comprehension and writing skills (Gellert & Elbro, 2013;Trace, Brown, Janssen, & Kozhevnikova, 2017). Such skills are considered essential in higher education and significantly vulnerable for second-language English speakers, perhaps because of inability to decode new information and translate key words within specific contexts (Escamilla, 2009;Huettig, 2015;Staub, Grant, Astheimer, & Cohen, 2015). Decoding, recognition and translation to English (in the case of nonnative speakers) have been closely related to Cloze procedure protocol performance in children and adults (Gellert & Elbro, 2013;Keenan, Betjemann, & Olson, 2008). These findings suggest that background and fundamental learning could play a role in developing essential skills which are transferable to higher education English language requirements. Similarly, vocabulary acquisition has been linked to success in the context of higher education. Acquired vocabulary has often been used as a proxy for general proficiency, demonstrating predictive power (Masrai & Milton, 2018;Trenkic & Warmington, 2018). Non-technical vocabulary levels have been further linked to academic writing, reading comprehension and general academic performance (Harrington & Roche, 2014;Qian, 2002;Schmitt, Jiang, & Grabe, 2011;Snow, Lawrence, & White, 2009;Trenkic & Warmington, 2018). These findings are supportive of the inclusion of vocabulary components in traditional gate-keeping tests, lending support for the use of these tests as a proxy for proficiency even post-admissions in first-year students. In both cases, the feasibility of reduction in time and cost is a significant benefit.
Although research has demonstrated that both Cloze procedure protocols and contextually based vocabulary tests may be used as proxies to understand English proficiency, these assessments are often conducted under time constraints, potentially confounding content performance with response time (e.g. Goto et al., 2010;Harrington & Roche, 2014;Masrai & Milton, 2018). Administration under time-constrained conditions remains a common practice for a variety of reasons but may result in decreased validity and reliability values (Van der Linden, 2011). Concomitantly, the test may then lack accuracy for its stated purpose, which is problematic for both selections and post-admission competency identification contexts. Therefore, a balance between internal consistency, predictive validity, length of assessment and other administration factors is required to enhance identification of the status of English language skills. The question then arises as to whether a sufficient balance of time-effectiveness, practicality and predictive validity is present when time constraints are implemented.
Researchers have reported improvements in performance on various English language tests with additional time allocations (Bridgeman et al., 2004;Powers & Fowles, 1997), suggesting a focus on performance in complex understandings may be more important for academic outcomes than timeconstrained responses (Daly & Stahmann, 1968;Harrington & Roche, 2014;MacIntyre & Gardner, 1994). The removal of time constraints may also mitigate other factors associated with poorer performance, including inadequate test-taking strategies, test anxiety and familiarity with testing contexts (Anderson, 1991;Fairbairn, 2007;Solano-Flores, 2008). Similar findings are present in the context of higher education, for which increased predictive validity, reliability and construct validity of Cloze procedure protocols and vocabulary tests have been reported when time constraints are removed (Hajebi, Taheri, & Allami, 2018;Snow et al., 2009;Trace et al., 2017).
Researchers have hypothesised that changes in performance under different time constraints may be linked to the number of items attempted, changes to item structures or content functions operating differently (Luke & Christianson, 2016;Talento-Miller, Guo, & Han, 2013;Van der Linden, 2011). Other research has suggested that increased time may allow for better translation and internal reconstructions of semantics and syntax, although this may only be true for lengthy fragments in Cloze procedure protocols or when a wide range of possible responses is presented (Hajebi et al., 2018;Staub et al., 2015). Although this research has considered Cloze procedure protocols, vocabulary and other English proficiency tests without time constraints, limited published work (e.g. Goto, Maki, & Kasai, 2010) has considered different predictive validity of short assessments under various time constraints.
The present study assessed the relative influence of time limits on two English language proficiency tests, that is, a Cloze procedure protocol and contextual vocabulary assessment, to understand differences in the predictive validity under each time limit in determining first-test academic outcomes. The importance of this study lies in differentiation between English proficiency itself and the impact of time constraints on the expression of that proficiency in predicting academic outcomes. Thus, the study intends to contribute through further understanding English proficiency testing in terms of the potentially detrimental impact of time limitations on test outcomes. These findings are potentially useful in enhancing mass language postadmission screening to improve skills-targeted interventions which are time-efficient and effective.

Participants
Participants comprised commencing first-year students (n = 81) enrolled in an institute for a tourism management national diploma course with common first-year academic subjects and admission requirements. The restriction for course enrolment was intended to indirectly standardise minimum English language entry criteria. The majority of enrolled first-year students at the institute were aged between 18 and 20 years, with a vast majority being of black ethnicity equally split between males and females.

Research design
The present research made use of a cross-sectional, quasiexperimental design to assess the impact of different time limits on performance of both Cloze procedure protocol and contextual vocabulary assessment. Kaleidoprax (2014) developed English Literacy Skills Assessment (ELSA) as two modified tests for the institute conducting the study: the Cloze procedure and the Vocabulary in Context tests. At present, no psychometric properties have been made available for the tests (Kaleidoprax, 2014). The Cloze procedure test requires the insertion of missing words within the context of a sentence. Cloze procedure comprises 20 questions, each with four possible responses, of which one is correct (max = 20). The Vocabulary in Context test identifies words in the context of a full sentence to require extrapolation of meaningof definitions, synonyms, antonyms and usage.

Instruments
Vocabulary in Context comprises 30 questions, each with four possible responses, of which one is correct (max = 30). No penalty scoring is implemented for either test. In this study, academic performance was assessed using percentages for the first-test marks for first-year subjects of national diploma courses in the department of tourism management (min = 1%, max = 100%). All marks obtained were above 0%.

Procedure
Data on the ELSA were generated as part of administration of a battery which took place after English language portion. The battery was solicited by the academic departments of the institute as part of a post-admissions first-year student assessment. Academic departments granted permission to modify the English language portion for research purposes, and all participants gave informed consent. No data were used for exclusionary, probationary or placement purposes.
The full sample (n = 81) was broken down into three groups: Normal time limit (n = 44), double time limit (n = 23) and no time limit (n = 15). Separate test sessions took place for each group. Participants had freedom to join the group of their choice. Participation in the experimental group was voluntary, and verbal informed consent was obtained with written signatory. Because of the voluntary nature of participation, a convenience sample was produced. Resultantly, control for Grade-12 English performance and the size of groups were not possible. Voluntariness of participation, however, was essential because of the testing (personal development) and deviation from the normal quasi-experimental protocol. Thus, it was not possible to specifically split students in experimental and control groups whilst retaining the intent of the testing session and considering the autonomy.
Examples were administered, and the test methods were explained, including the use of multiple-choice answer sheet, demands of the assessment and use of examples for familiarity and understanding. Participants were informed about relevant time limit and provided with a clock to monitor timings. Completed answer sheets were collected and checked for clarity of response prior to optical scanning and passing through a software program. Electronic data scores were collated with first-test subject performance marks from the institute's management information systems. Data were anonymised and stored appropriately and securely for analysis.

Data analyses
Data analyses were conducted on SPSS ® version 25. Comparisons of the three time-limit groups were conducted using a one-way analysis of variance and Tukey's Honest Significant Difference (HSD) post hoc test of mean differences and significances. Pearson's r correlation coefficients and standard linear regression models (standardised beta weights because of range discrepancies) were used to assess the relationship between scores of tests and first-test marks.

Results
The Cloze procedure subtest yielded a maximum score of 20, whilst the Vocabulary in Context subtest score was out of a possible 30. First-test subject marks were expressed as a percentage value out of 100 possible points. Table 1 shows the mean values (M) and standard deviations (SD) of variables. Table 1 shows similar levels of dispersion across different groups and subjects. Performance on the ELSA tests improved when time constraints were reduced but levels of dispersion remained stable despite differing sample sizes. No substantial differences in academic marks were present between the three time-limit groups.

Differences between the time-limit groups
The one-way analysis of variance with Tukey's HSD post hoc revealed that the three time-limit groups differed significantly. The group without a time limit had higher scores on the Cloze procedure subtest (M = 13.93, SD = 4.30) than the double time-limit group (M = 12.22, SD = 4.40) or the normal time-limit group (M = 6.80, SD = 4.08). The one-way analysis of variance demonstrated that the groups differed significantly (F = 22.156, p = 0.000) and the Levene's test of homogeneity of variance met the required assumption of equal variances (F = 0.100, p = 0.905). The significant differences were identified as involving the normal timelimit group, for which scores were significantly lower than that of the double time-limit group (M Difference = 5.442, p = 0.000) and the no time-limit group (M Difference = 7.138, p = 0.000). However, the no time-limit and double time-limit groups did not differ significantly, despite slightly better performance by the no time-limit group (M Difference = 1.716, p = 0.440). Similar findings were observed for the Vocabulary in Context subtest.
The no time-limit group performed best on the Vocabulary in Context subtest (M = 12.07, SD = 4.98), whilst the double time-limit group's scores were slightly lower (M = 11.13, SD = 5.36) and the normal time-limit group's scores were considerably lower (M = 6.48, SD = 4.61). The one-way analysis of variance revealed that the groups differed significantly (F = 10.902, p = 0.000) and the requirement of homogeneity of variance was satisfied (F = 0.666, p = 0.517). Examination of Tukey's HSD post hoc showed that the statistically significant differences were present between the normal time-limit group and the double time-limit group (M Difference = 4.653, p = 0.001) as well as the no time-limit group (M Difference = 5.589, p = 0.001). The no time-limit and double time-limit groups did not differ significantly (M Difference = 0.936, p = 0.833). Therefore, significant differences were observed between the three time-limit groups, suggesting that time limitations influenced measuring English language skills by these tests. As a result, the timed conditions may also have affected the predictive power of each test.

Prediction of first-test subject marks
Pearson's r correlation coefficients were calculated to examine the association between performance on the ELSA tests and performance in the first-test of each subject, followed by separate regression models for each group. Table 2 shows the correlation coefficients between the three time-limit mean values and subject performance.  Statistically significant positive correlations were present between the Cloze procedure subtest and the subject of 'Communications', which had a strong emphasis on English language. Similar coefficients were observed for the normal time-limit group (r = 0.437, p = 0.003) and the double timelimit group (r = 0.473, p = 0.023). A stronger statistically significant correlation was observed between the no timelimit group and the scores of the subject of 'Communications' (r = 0.706, p = 0.003). The no time-limit group scores were also significantly correlated with scores of the first-test of tourism development (r = 0.574, p = 0.025), whilst the normal time-limit group was less strongly, but more significantly, correlated (r = 0.373, p = 0.013). The same is true about correlations between travel and tourism practice and Cloze procedure for the normal time-limit group (r = 0.450, p = 0.002) and the no time-limit group (r = 0.656, p = 0.008). No other statistically significant correlation coefficients were present. The correlational findings tentatively suggested that higher scores on the Cloze procedure test were associated with better performance on the subjects of 'Communications', 'Tourism Development' and 'Travel and Tourism Practice'. In most of the cases, the relationship between the scores and academic performance was strongest when no time limit was present, although the double time-limit coefficients were frequently similar. Significant positive correlation coefficients were also observed between the Vocabulary in Context test scores and the first-test subject marks, particularly if no time limit was implemented. Vocabulary in Context was more strongly associated with academic performance than the Cloze procedure.
Correlations between the subject of 'Communications' scores and Vocabulary in Context scores were statistically significant for the normal time-limit (r = 0.313, p = 0.038), double time-limit (r = 0.600, p = 0.002) and no time-limit groups (r = 0.634, p = 0.011). The double time-limit group was also significantly correlated with 'Travel and Tourism Practice' scores (r = 0.544, p = 0.007). However, only the no time-limit group was statistically significantly correlated with the first-test marks on 'Marketing for Tourism' (r = 0.648, p = 0.009), 'Tourism Development' (r = 0.708, p = 0.003) and 'Travel and Tourism Practice' (r = 0.590, p = 0.210). For the Cloze procedure subtest, no statistically significant correlations were present with the first-test marks on 'Travel and Tourism Management'. For both tests, the no time-limit group appeared to be the most strongly associated group with performance on the first-test of various subjects of tourism management, particularly the subject of 'Communications'.
Regression models were used to understand the relative predictive power of different time limit groups of each subject. Table 3 shows the standardised beta weights, statistically significant levels of the Cloze procedure subtest and coefficients of determination reporting the amount of variance explained.
When Cloze procedure is used as a predictor of the first-test marks, the regression on the subject of 'Communications' was strong, but the 'Marketing for Tourism' and 'Travel and Tourism Management' scores were not well predicted. Statistically significant increase in the SDs of first-test scores were associated with a single SD increase in Cloze procedure for the no time-limit group for the subjects of 'Communications' (β = 0.706, p = 0.003), 'Tourism Development' (β = 0.574, p = 0.025) and 'Travel and Tourism Practice' (β = 0.656, p = 0.008). However, a slight inverse predictive function was observed for 'Travel and Tourism Management' (β = -0.265, p = 0.013). The first-test scores for the subject of 'Communications' were also predicted by scores on the Cloze procedure for the normal time-limit group (β = 0.437, p = 0.003) and the double time-limit condition (β = 0.473, p = 0.023). The same was true for the subject of 'Travel and Tourism Practice' for the no time-limit (β = 0.656, p = 0.008), double time-limit (β = 0.407, p = 0.054) and normal time-limit groups (β = 0.450, p = 0.002). For the subject of 'Travel and Tourism Practice', all three conditions had similar predictive power. For the Cloze procedure, no time limits resulted in stronger strength of prediction than doubling the time limits or implementing the normal time limit. Similar findings were present for the Vocabulary in Context test. The coefficients of determination, standardised regression values and probability values for Vocabulary in Context are shown in Table 4.
The Vocabulary in Context scores had statistically significant regression values for the subject of 'Communications' for the normal time-limit group (β = 0.313, p = 0.038), double timelimit group (β = 0.600, p = 0.002) and no time-limit group (β = 0.634, p = 0.011). Standard deviation values of subjects were substantially increased with subtest increase for 'Travel and Tourism Practice' for both the double time-limit (β = 0.544, p = 0.007) and the no time-limit groups (β = 0.590, p = 0.021). However, the no time-limit group proved to be the strongest Both ELSA tests showed predictive power for the majority of the first-year subjects of tourism management based on statistically significant correlation coefficients and regression models. However, the no time-limit condition exhibited the strongest predictive power. Variance between ~33% and ~50% in academic first-test subject performance was explicable by English language proficiency measured on each of the two ELSA tests. In spite of not being significantly different from the no time-limit group, the double time-limit group did not show the same predictive relationship, potentially because of a truncated range of scores. The subject of 'Travel and Tourism Management', however, was not sufficiently associated with scores on either of the ELSA tests in terms of correlation or prediction.

Discussion
The findings indicated that performance on both ELSA tests improved relatively to increase in time limitations. The statistics demonstrated that increased time limits resulted in a statistically significant improvement in performance, whilst the SD levels of mean values remained stable, suggesting that a consistent dispersion in scores was retained. Therefore, the findings reflected improvements in test outcome predictive quality when time limits are removed, despite the inherent limitations of comparing groups of differing sizes (Rosenthal & Rosnow, 2008). Nonetheless, similar improvements in English test outcomes were found by Hajebi et al. (2018) and Snow et al. (2009). In this regard, Harrington and Roche (2014) and Van der Linden (2011) also suggested that improvements in performance could be related to the more accurate assessment of constructs in the English language, rather than the ability to perform under time constraints. This disparity could be partially because of long-held notion of the influence of time constraints on the number of item responses and internal reliability of English proficiency tests themselves (Evans & Reilly, 1972).
Similar studies have suggested that implementing time constraints could reduce the reliability and validity of psychometric and language tests for a wide variety of constructs (Lu & Sireci, 2007), resulting in the absence of equivalency across instruments (Cronbach & Warrington, 1951). Additionally, a biased presentation of English language ability is present if response levels below certain thresholds occur, or without readjustment of item functions (e.g. Van der Linden, 2011). The present research findings of improved performance without time constraints cannot necessarily be equated to changes in reliability or validity per se because of the absence of measurement of item response functions, despite studies such as those performed by Harrington and Roche (2014) being focused on similar assessment types. Nonetheless, Talento-Miller et al. (2013) also suggested that increasing the number of items attempted influenced the outcome of English language tests because of the varying difficulties and types of items rather than processing speed. The evidence suggests that inherent, internal test-structure issues under time-constrained conditions are influential, and the present findings concurred that working under time constraints could have negatively affected performance on both ELSA tests for this cohort. Although some other research has explored the inherent reliability issues surrounding time limits on English tests, the reviewed literature has not extensively explored the relative impact of differing time limits on predictive validity in the context of higher education. The regression analyses in the present research provided evidence of a predictive component for the two ELSA tests utilised, which strengthened when time limitations were extended or nulled.
The double time-limit and no time-limit groups' academic performance was positively and significantly correlated with performance on both ELSA tests, whilst the normal timelimit group demonstrated limited predictive power. Interestingly, predictive performance was similar for both double time-limit and no time-limit groups in most of the cases. This finding suggested that item response thresholds, such as those discussed by Van der Linden (2011) andTalento-Miller et al. (2013), could be important for predictive power as well as for internal consistency and reliability of measurement. Therefore, the present academic first-test performance could have been at least a partial function of English language ability, as measured by the ELSA tests. Several English language performance actions applied to the Cloze procedure protocol were used as one of the ELSA tests. However, in the present research, non-technical vocabulary levels measured in the context were found to be better predictors of academic performance. Non-technical vocabulary levels have been successfully used as predictors in higher education institutions (HEIs) as well as a proxy for general English proficiency and Cloze procedures (Daller & Wang, 2014;Masrai & Milton, 2018;Qian, 2002;Schmitt et al., 2011;Snow et al., 2009;Trenkic & Warmington, 2018). The present study's findings suggest that vocabulary levels were more important in accurately predicting academic success than the Cloze procedure test, which required semantic manipulation and decision-making within the context of a passage. However, vocabulary ability could be subsumed into a variety of English functions present in the HEI performance requirements, such as lecture participation and development of text understanding and technical vocabulary. Vocabulary may be linked to other aspects of English language performance related to higher education, including deliberate performance and response selection (Macalister, 2010), improved heuristic learning of phrases and lexical translation (Koehn, Och, & Marcu, 2003), speed of translation and decoding within a finite memory capacity system (Sakurai, 2015), and meta-cognitive focus on syntactical awareness beside reformulation between languages in an attempt at better understanding (Jiménez et al., 2015). Known to be influenced by time constraints, some of these factors directly relate to essential skills measured in vocabulary and Cloze procedure tests, including semantic representations, understanding of words in context, reading speed and quality and the ability to manipulate syntactical arguments.
Reported findings that English proficiency tests encompassing vocabulary, grammar and contextual representation are affected by time limitations (e.g. Bridgeman et al., 2004;Murray, 2010) were confirmed by the present research using two ELSA tests. Such content-specific skill development for understanding may require measurement outside of what could generally be considered as the normal, time-constrained and psychometrically focused framework. Furthermore, development of content/subject-specific technical language could also play a role in academic outcomes, particularly if basic levels have not been fully developed as a foundation (Birrell, 2006). The findings of the present research also suggest that time limitations play an important role in performance and predictive validity, beside choice of test for predictive purposes. Removal of time limitations resulted in more accurate prediction of academic success outcomes, and use of the Vocabulary in Context test resulted in the strongest predictive power. These findings suggested that appropriate English proficiency assessment could hinge more on the determination of specific academic weaknesses within English language whilst reducing the role of time limitation as an essential factor in predicting performance. In spite of the various findings suggesting that time constraints impact a variety of factors concerning English proficiency tests, from a practical perspective, it is unlikely that performing lengthy tests without time limits would be practical in the context of real world. Nonetheless, studies suggest that time constraints could alter the psychometric properties of tests in a variety of ways.
In spite of important findings in the present research, the study carried some limitations which created some uncertainty in the interpretation of the results. Groups of unequal sizes, because of the voluntary nature of participation, may have resulted in misrepresentation of values because of the use of parametric statistics in such a case (Rosenthal & Rosnow, 2008). Similarly, small groups and lack of randomisation could have affected the statistical outcomes. An example of this issue could be the negative correlations seen in for the subject of 'Travel and Tourism Management', although alternate explanations such as subject content could also account for this anomaly. Nonetheless, inequality in ranges of scores between different variables still resulted in Pearson's r and a linear regression model being the most suitable choice, albeit imperfect. In addition, it was not possible to fully standardise the English language pre-entry (Grade 12) performance in this case.Therefore, this criterion was only passively standardised as a minimum level through the use of a specific qualification grouping of students. Pre-entry English ability could have impacted the outcome on either of the English proficiency tests, thus introducing bias in the results or impacting the selection of groups in an attempt by the participants to maximise their performance. Nonetheless, it is believed that present language ability, regardless of prior ability, is the most important factor in interpreting the findings, because the intention is to predict academic performance rather than investigate validity of the assessments in question. Furthermore, the results appear to indicate that time limitations imposed on English proficiency tests are of importance in fully applying the concept of language proficiency to higher education outcomes.

Conclusion
The present research findings demonstrate that performance and predictive power on the modified ELSA versions of Cloze procedure and Vocabulary in Context improves when time limits are increased or removed. The findings imply that factors such as item completion thresholds, reading speed, semantic understanding, and translation for decision-making requirements could contribute to negative changes in performance under time-constrained conditions. Therefore, students may possess some of the English language skills associated with academic performance but are unable to demonstrate these skills within the imposed time constraints. Although these findings are useful, they should be treated with caution as current internal reliability and predictive validity data are not available for full assessment and this pilot study was conducted on smaller, unequal sample groups. Nonetheless, it is apparent that the English proficiency as measured by the ELSA could be inaccurately reflected under time-constrained conditions, limiting the ability of the test to serve as a predictor of academic performance in tertiary education. These findings imply that further investigations are required to develop sufficiently competency gap-targeted English interventions, and the future research should consider larger-scale studies to identify specific components within the tests which contribute to academic success in South African HEIs.