Validation
& Norming
Item Development
Tasks delivered by the PhonePassTM testing
system are designed to be simple and intuitive
both for native speakers and for proficient
non-native speakers of English. Items are
designed to cover a broad range of skill
levels and skill profiles, and to elicit
responses that can be analyzed automatically
to produce measures of fluency, listening,
vocabulary, pronunciation and oral reading
ability.
To ensure conversational content, test
developers specified that the test materials
should conform to the vocabulary that is
actually used in conversational English.
Spontaneous conversations from 540 North
Americans guided the design of test items.
Conversation samples were geographically
and gender balanced, encompassed a variety
of topics, and represented every major dialect
of American English. All words used in test
items appeared at least four times in the
spontaneous conversations. To accommodate
test-takers who have been trained to a British
English standard, items have been reviewed
by two British linguists to ensure conformity
to colloquial usage in the United Kingdom
and Australia. PhonePassTM SET test items
were also reviewed for fairness and bias-free
usage by an independent committee of language
experts. The audio item prompts are spoken
by a diverse sample of educated native speakers
of North American English.
Norming
Prototype versions of the PhonePassTM SET-10
test were administered in a series of validation
studies to over 4000 native and non-native
speakers.
The native norming group comprised 376 educated
adults, geographically representative of
the U.S. population and from 18 to 50 in
age. It had a female/male ratio of 60/40,
and was 18% African-American.
The non-native norming group (NNG) consisted
of 514 callers, including native speakers
of 40 different languages. The non-native
norming group was selected from a larger
group of more than 3500 non-native callers.
For the NNG, the language distribution is
similar to the TOEFL (Test of English as
a Foreign Language by ETS) population, with
Arabic, Chinese, Spanish, Japanese, French,
Korean, Italian, and Thai each represented
by more than 15 speakers. Their ages ranged
from 17 to 79, and the female/male ratio
was 50/50.
Validity
During the development of the PhonePassTM
SET -10 test, human graders assigned over
26,000 scores from hundreds of different
test-takers. Item response analysis of the
human grader scores indicates that human
graders produce relatively consistent grades
for fluency, pronunciation, and conversational
skill, with inter-grader reliabilities between
0.82 and 0.86. The overall judgment of conversational
skill based on test-taker responses to open
questions had a reliability of 0.93.

Academic and commercial organizations
across North America, Europe and Asia participated
in the development and validation testing
of the system.
• Bologna University, Italy
• Cañada College, California
• CITO, National Institute for Educational
Measurement, Netherlands
• CUNY, New York
• Defense Language Institute English
Language Center, Lackland Air Force Base,
• Texas Deloitte and Touche
• Eastern Michigan University
• Economics Institute, Boulder, Colorado
• EF International Language School,
Washington
• Foothill College, Los Altos Hills,
California
• IIBC, Institute for International
Business Communication, Japan
• Indiana University, Indiana
• Iowa State University, Iowa
• Monroe Community College, New York
• Monterey Institute of International
Studies, California
• New York University American Language
Institute
• Oklahoma State University, Oklahoma
• Point Loma Nazarene College English
Institute, San Diego, California
• San Francisco State University American
Language Institute
• Sapporo International University,
Japan
• Sierra Academy of Aeronautics, Oakland,
California
• Stanford University Linguistics
Department
• University of North Carolina at
Charlotte, Office of International Programs
• University of Findlay, Ohio
• University of Pennsylvania, English
Language Programs
• University of Southern Mississippi
Comparison Charts
The validity of SET-10 tests is confirmed
when comparing results with other well-known
tests:
Other
Tests |
SET-10
Result vs. Other Test Results |
| TOEFL |
0.75 |
| TOEFL Reading |
0.64 |
| TOEIC Listening |
0.71 |
| TOEFL Listening |
0.79 |
| New TOEFL Listening |
0.78 |
| TSE |
0.88 |
| New TOEFL Speaking |
0.84 |
| Common European Framework, 1st Experiment |
0.84 |
| Common European Framework, 2nd Experiment |
0.94 |
| Common European Framework, 3rd Experiment |
0.88 |
| ILR Speaking |
0.75 |
TSE |
SET-10 |
| 25 |
20-25 |
| 30 |
26-35 |
| 35 |
36-45 |
| 40 |
46-55 |
| 45 |
56-64 |
| 50 |
65-74 |
| 55 |
75-80 |
|
SET-10 and TSE
Score Comparison |
References
Enright, M.K., Bridgeman, B., & Cline,
F. (2002, April). Prototyping a Test Design
for a New
TOEFL. Paper presented at the annual meeting
of the National Council on Measurement in
Education, New Orleans, LA.
Godfrey, J.J., & Holliman, E. (1997).
Switchboard-1 Release 2. LDC Catalog No.:
LDC97S62.
http://www.ldc.upenn.edu
Jescheniak, J.D., Hahne, A., & Schriefers,
H.J. (2003). Information flow in the mental
lexicon
during speech planning: evidence from event-related
brain potentials. Cognitive brain research,
15 (3), 261-276.
Levelt, W. J. M. (1989). Speaking: From
Intention to Articulation. Cambridge, MA:
MIT Press.
Levelt, W. J. M. (2001). Spoken word production:
A theory of lexical access. PNAS, 98 (23),
13464-13471.
Miller, G.A., & Isard, S. (1963). Some
perceptual consequences of linguistic rules.
Journal of
Verbal Learning and Verbal Behavior, 2,
217-228.
Lennon, P. (1990). Investigating fluency
in EFL: A quantitative approach. Language
Learning,
40, 387-412.
Ordinate (2000). Validation summary for
PhonePass SET-10: Spoken English Test-10,
System Revision 43. Menlo Park, CA: Author.
Ordinate (2003). Ordinate SET-10 Can-Do
Guide. Menlo Park, CA: Author.
Perry, J. (2001). Reference and Reflexivity.
Stanford, CA: CSLI. Publications.
Schneider, W., & Shiffrin, R.M. (1977).
Controlled and automatic human information
processing: I. Detection, search, and attention.
Psychological Review, 84, 1-66.
Van Turennout, M., Hagoort, P., & Brown,
C. M. (1998). Brain Activity During Speaking:
From
Syntax to Phonology in 40 Milliseconds.
Science, 280, 572-574.
|