Speak - Quick.Accurate.Innovative.

The Research

Valid and reliable assessments

All AI-powered test scores are grounded in extensive research, providing a reliable basis for evaluating the test-taker's language proficiency.

Contact us »

Rooted in sound research

Our assessment evaluates a test-taker’s English language skills in real-life situations, especially in professional and academic contexts where English communication is essential.

To achieve this, the AI-powered test assesses the accuracy, variety, and clarity of spoken and written English, covering both linguistic and some pragmatic aspects.

Our adaptive listening and reading tests are based on authentic audio and text passages.

The test is structured to enable the use of scores for evaluating the test-taker's language proficiency. All score interpretations, as well as the tests themselves, are built on a strong theoretical foundation.

External Validation Study - Higher Education

Background

The Speaknow assessment is a computer-based test of spoken English. One of the primary uses of the exam is for higher education admissions or placement into an English language programs. As such, it is necessary to evaluate the adequacy of the exam to provide information for making decisions about the English levels of candidates for higher education. The ability of an exam to measure what it purports to measure is known as validity. There are many different methods for establishing validity. This paper explains the construct validity of the Speaknow assessment for use in higher education English programs.

There are two main methods of making admissions and placement decisions. One method is through standardized testing, and a second is through in-house testing. Standardized testing has the advantage of providing scores according to a recognized scale, and providing comparability with other, similar programs. External, standardized exams also have the benefit to the program of being more cost effective for the program, because students are generally the ones who pay for these exams. In-house exams have the advantage of being customizable and specified to the needs of the programs. They have the additional benefit to the students of shifting the cost for testing onto the program, rather than requiring often costly external exams (Ling, et al., 2014).

Personal interviews are often a part of English placement and admissions tests, such as the IELTS, and are also a frequent component of in-house tests. As such, interviews are a logical means of testing the validity of a standardized exam for making placement decisions.

The question that this study explored was how well do the scores of the Speaknow Assessment correspond to the scores of face-to-face interviewers for placement in English programs.

Method

51 prospective students at a teachers’ college in Israel volunteered to participate in the study. All of the students were interviewed by two different faculty members, who scored the students’ speech using CEFR aligned rubrics. Most, but not all of the faculty members have experience using the CEFR. Although each student was interviewed by two different faculty members, the interviewers were not necessarily consistent for each student.

After the interviews, the students took the Speaknow Assessment of speaking and listening. The college compiled the exam and interview scores and provided them to Speaknow with identifying information removed.

Results

The agreement of the Speaknow scores with the college interviewers is reported below. Both Light’s Kappa, and ICC scores are reported. The agreement of the Speaknow CEFR scores with those of the two raters shows that overall, the agreement between the Speaknow test scores and the college rater score is closer (0.923 and 0.931) than are the scores of the two college raters to each other (0.882).

Cohen’s Weighted Kappa scores are also reported. These scores take into account the size of the difference between the scores. Using this test, the relationship of the scores of the Speaknow Assessment with each of the raters (0.826, 0.825) is stronger than the scores between the raters (0.713).

Conclusions can not be drawn about the skill of any of the individual raters, because, as stated above, the interviewers were randomized. Overall, of the 51 exams scored, the Speaknow score agreed with both of the raters 27 times (53%), and with one of the raters in 22 cases (43%). In total, there was exact agreement between the Speaknow CEFR score and one of the interviewers in 96% of the cases.

Speaknow CEFR Score and 1st College Rater

Interrater Reliability

n rater statistic z p
Light's Kappa 51 2 0.681 9.67 0.00

Interrater Reliability

Method Cohen's Kappa for 2 Raters (Weights: equal)
Subjects 51
Raters 2
Agreement % 74.5
Kappa 0.826
z 9.13
p-value <.001

Intraclass correlation coefficient

Subjects Raters Subject variance Rater variance Residual variance Consistency Agreement
Value 51 2 1.87 0.0251 0.132 0.934 0.923

Speaknow CEFR Score and 2nd College Rater

Interrater Reliability

n rater statistic z p
Light's Kappa 51 2 0.657 9.41 0.00

Interrater Reliability

Method Cohen's Kappa for 2 Raters (Weights: equal)
Subjects 51
Raters 2
Agreement % 72.5
Kappa 0.825
z 9.16
p-value <.001

Intraclass correlation coefficient

Subjects Raters Subject variance Rater variance Residual variance Consistency Agreement
Value 51 2 1.86 0.00431 0.133 0.933 0.931

1st and 2nd College Raters

Interrater Reliability

n rater statistic z p
Light's Kappa 51 2 0.444 6.57 4.89e-11

Interrater Reliability

Method Cohen's Kappa for 2 Raters (Weights: equal)
Subjects 51
Raters 2
Agreement % 54.9
Kappa 0.713
z 7.93
p-value <.001

Intraclass correlation coefficient

Subjects Raters Subject variance Rater variance Residual variance Consistency Agreement
Value 51 2 1.90 0.00196 0.253 0.883 0.882

Conclusions

The Speaknow Assessment was more consistent with any one interviewer’s assessment of the students’ English proficiency than were individual raters with each other. The high agreement between the Speaknow Assessment and the human interviewer scores (0.96) indicates that the Speaknow Assessment is a valid means of assessing students’ English proficiency level for use in a college program.

References

Carlsen, C. H. (2018). The Adequacy of the B2 Level as University Entrance Requirement. Language Assessment Quarterly, 15(1), 75–89. doi: 10.1080/15434303.2017.1405962 Chapelle, C. A., & Voss, E. (2014). Evaluation of language tests through validation research. In A. Kunnan (Ed.), The companion to language assessment. New York, NY: Wiley. Bruno Falissard (2012). psy: Various procedures used in psychometry. link Matthias Gamer and Jim Lemon and Ian Fellows Puspendra Singh (2019). irr: Various Coefficients of Interrater Reliability and Agreement. link Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational Measurement (4th ed.). New York, NY: American Council on Education and Praeger. Ling, G., Wolf, M.K., Cho, Y., and Wang, Y., (2014) English-as-a-Second-Language Programs for Matriculated Students in the United States: An Exploratory Survey and Some Issues. Educational Testing Service, Princeton, NJ doi:10.1002/ets2.12010 North, B. (2007, February 6). Common European Framework of Reference for Languages (CEFR). Retrieved from https://www.coe.int/en/web/common-european-framework-reference-languages/documents Seol, H. (2020). seolmatrix: Correlations suite for jamovi. [jamovi module]. Retrieved from https://github.com/hyunsooseol/seolmatrix/. link The jamovi project (2020). jamovi. (Version 1.2) [Computer Software]. Retrieved from https://www.jamovi.org.