The TILLS (Test of Integrated Language and Literacy Skills), CELF-5 (Clinical Evaluation of Language Fundamentals), and TNL-2 (Test of Narrative Language) are all standardized tests for language abilities in school-aged populations.

Below I discuss the following aspects of these three tests:

organizational/theoretical framework
purpose and design of subtests and composite scores
particularities or challenges with administration and scoring
psychometric properties
appropriateness for children with particular profiles
clinical decision making and application

Organizational/Theoretical Framework

TILLS

The TILLS assesses language and literacy by way of two dimensions (i.e. sound/word level and sentence/discourse level) and four modalities (i.e. listening, speaking, reading, writing) which reflects a quadrant model of language abilities (see below) adapted from Bishop and Snowling (2004) and Nelson, Plante, Helm-Estabrooks, and Hotz (2016a). The quadrant model looks at levels of phonemic awareness (PA) (sound/word level) and comprehension (sentence/discourse level) and can thereby determine the type of oral or written language disorder a child may have. The TILLS assesses required skills for academic success, examines strengths and weaknesses, and tracks changes over time (Nelson et al., 2016a). Paul, Norbury and Gosse (2017) affirm the importance of literacy in academics as they explain how children switch from learning to read to reading to learn while in school.

Screen Shot 2020-04-13 at 12.31.42 PM.png

CELF-5

The CELF-5 examines a child’s language abilities in the areas of phonology, semantics, syntax and memory broadly examining reading comprehension, writing and social language skills (Wiig, Semel & Secord, 2013a). The CELF-5 provides clinicians with information on verbal, written, expressive and receptive language skills (Wiig et al., 2013a).

TNL-2

The TNL-2 tests narrative language abilities, thus measuring comprehension and story-telling ability (Gillam & Pearson, 2017). Specifically, the TNL-2 examines, textual memory, cohesion, and organization, as well as the ability to formulate multiple sentences around a common theme (Gillam & Pearson, 2017).

Purpose and Design of Subtests and Composite Scores

TILLS

The TILLS subtests are designed in the following order: vocabulary awareness, PA, story retelling, nonword repetition, nonword spelling, listening comprehension, reading comprehension, following directions, delayed story retelling, nonword reading, reading fluency, written expression, social communication, digit span forward, and digit span backward and target the following language modalities: listening, speaking, reading, writing and memory (Elleseff, 2018). The composite scores look at different language dimensions, including sound/word level composite score and sentence/discourse level (Elleseff, 2018). The TILLS identifies language and literacy disorders (Nelson et al., 2016a).

CELF-5

The CELF-5 subtests and composite scores provide information about a child’s language abilities in the following areas: phonology, semantics, syntax and memory (Wiig et al., 2013a). The subtests are in the following order: sentence comprehension, linguistic concepts, word structure, word classes, following directions, formulated sentences, recalling sentences, understanding spoken paragraphs, sentence assembly, semantic relationships, reading comprehension, structured writing, in addition to a pragmatics profile and pragmatics activities checklist at the end (Wiig et al., 2013a). The composite scores are the core language scores (i.e. different combinations of test scaled scores) and index scores (i.e. Receptive Language Index score (RLI), Expressive Language Index score (ELI), Language Content Index score (LCI), Language Structure Index score (LSI), and Language Memory Index score (LMI)) (Wiig et al., 2013a).

TNL-2

The TNL-2 subtests identify children with language and learning disorders and are designed in the following order: narrative script with a picture, personal story with sequenced pictures, fictional story with a picture. Each subtest examines comprehension and oral narration production (Gillam & Pearson, 2017). The composite scores include narrative comprehension and oral narration scores (Gillam & Pearson, 2017).

Particularities or Challenges of Administration and Scoring

The TILLS includes a scoring sheet that tracks a child’s progress over time (Nelson et al., 2016a). The administration of the TILLS is approximately 90 minutes, but the clinician is able to administer certain subtests on separate occasions with the maximum length being two weeks (Nelson et al., 2016a). This corresponds with Salvia and Ysseldyke (2004) as they report that an interval of days to weeks is appropriate for longer tests. Knowledge of T-Units is imperative for scoring in the written expression subtest (Nelson et al., 2016a). Moreover, an administration challenge includes adequate preparation to ensure audio files are in order and downloaded. Similarly, to the CELF-5, the basal and ceiling scores differ from subtest to subtest, which could be a challenge while administering if one is not cognisant of this (Wiig et al., 2013a; Nelson et al., 2016a). Specific to the CELF-5, the double-sided subtest flipbook does not flip in sequential order. This increases challenge of administration, as one must take prerequisite steps to ensure subtests are completed in the appropriate order. The TNL-2 has a relatively short administration time (i.e. 15-25 minutes) (Gillam & Pearson, 2017). However, the student’s answers tend to be longer than other tests which may result in children having challenges focusing, thereby making administration and scoring challenging. The phrasing of questions in the narrative comprehension section are often unclear with regards to the number of answers the student must provide. As a result, scoring may be confounded by unclear interpretation of questions.

Psychometric Properties

TILLS

The TILLS has adequate construct validity, concurrent validity, sensitivity, specificity, internal consistency, test-retest stability, inter-rater reliability measures and item review and analysis (Nelson et al., 2016b). With regards to specificity and sensitivity, the TILLS has a range of 80-90% depending on age range (Nelson et al., 2016b), which is acceptable for clinical usage (Plante & Vance, 1994). More evidence is required regarding the test’s predictive validity (Nelson, 2010). Standardization measures were inconsistently met, for example, with regards to representative demographics (Salvia & Ysseldyke, 1995).

CELF-5

The CELF-5 has inadequate construct validity, concurrent validity, sensitivity, specificity, test-retest stability, inter-rater reliability due to the absence of testing entire age ranges (Wiig et al., 2013a).Predictive validity is not mentioned in the manual. Internal consistency, item-review analysis and standardization measures were adequate (Salvia & Ysseldyke, 2004; Wiig, Semel, & Secord, 2013b).

TNL-2

For the TNL-2 construct validity, predictive validity, internal consistency, test-retest stability, standardization and inter-rater reliability measures were adequate (Gillam & Pearson, 2017). However, concurrent validity, sensitivity and specificity measures were inadequate (Gillam & Pearson, 2017). Overall, only the TILLS and TNL-2 have strong psychometric properties. However, Andersson (2005) mentions that “a test that is generally adequate may be inadequate for use with a specific child, and a test that does not meet adequacy criteria in all areas may nonetheless be adequate for use with a specific child” (p. 222). Finally, Friberg (2010) asserts that the most important psychometric property to consider as a clinician is identification accuracy which the TILLS has a strength in.

Appropriateness for Children with Particular Profiles

TILLS

The TILLS has a lengthy administration time and not all the subtests can be completed on different dates (i.e. some must be administered immediately afterwards) (Nelson et al., 2016a). This may be challenging for children with attention deficit disorders (e.g. ADHD). However, children with ADHD were used in the TILLS’ norming sample so they can be examined more accurately (Nelson et al., 2016b). Despite including children with speech sound disorders and ADHD, children with disabilities and language impairments were not included (Nelson et al., 2016b). Therefore, it does not meet the Kirk and Vigeland (2014) criteria for inclusivity, as the sample does not include individuals for which the test was intended.

CELF-5

With regards to the CELF-5, approximately 7% of individuals with a language disorder were included in the norming sample (Wiig et al., 2013a). However, according to Peña, Spaulding, and Plante (2006), including children with language impairments in the norming sample decreases test sensitivity, therefore the CELF-5 may have decreased identification accuracy of more mild language disorders. Therefore, although the CELF-5 may be able to more accurately assess severe language disorders, the TILLS may be able to more accurately assess mild language disorders.

TNL-2

The TNL-2 is inappropriate for children who are nonverbal or who are highly unintelligible as the test requires extensive verbal output. The TNL-2 was norm-referenced on typically developing children and children with disabilities, therefore, test sensitivity may decrease (Peña et al., 2006), but comparing the children to whom the test was intended for is achieved (Gillam & Pearson, 2017).

Clinical Decision Making and Application

This following section is focused on only the CELF-5 and TILLS as they have more overlap.

Choosing between the CELF-5 or the TILLS would largely depend on caseload. For example, the CELF-5 has a memory component, while the TILLS does not. As a result, if your caseload includes children that require assessment for memory, the CELF-5 may be more appropriate. The CELF-5 also has a wider age range (i.e. 5-21) than the TILLS (i.e. 6-18) (Wiig et al., 2013a; Nelson et al., 2016a). Therefore, the ability to formally test kindergarteners is limited by the TILLS, and thus the CELF-5 would be more applicable. Snow, Scarborough, and Burns (1999) assert that children as young as four-year old’s are beginning to develop PA, therefore, it is not ideal that the TILLS does not test children until the age of six. Moreover, Blachman (2000), describes that many children begin kindergarten without an understanding of PA; therefore, educators must be able to understand and cater towards these individual differences in PA. As a result, the ability to perform a standardized test and informally screen for PA in young children improves the utility of the CELF-5. Moreover, the CELF-5 has a quicker administration time (i.e. 30-50 minutes) while the TILLS administration time is around 90 minutes (Wiig et al., 2013a; Nelson et al., 2016a). Therefore, the CELF-5 may be easier to complete for children with lower attention spans. The CELF-5 also examines a wider range of language abilities, as it is not just literacy focused like the TILLS. Thus, the TILLS is used for a narrower range of children. For example, if there are children who cannot read, they would likely be more suited to the CELF-5. Thus, the CELF-5 can be used on a wider range of children. The CELF-5 also has an item analysis which increases ease of goal setting and understanding of strengths and weaknesses for clinicians after scoring and prior to beginning therapy (Wiig et al., 2013a). However, the CELF-5 has weak psychometric properties in comparison to the TILLS. For example, they do not provide psychometric properties for every age-range. Moreover, the TILLS examines similar components to the CELF-5, while also testing PA which is an important skill for future literacy skills (Nelson et al., 2016a). The TILLS, unlike the CELF-5, can diagnose a literacy disorder. However, one must consider the relative importance of a literacy diagnosis in the school system. For example, factors such as increased school supports and funding may be included with a formal diagnosis. Ultimately, the CELF-5 and TILLS provide different strengths and weaknesses, with usefulness of each depending on the clinical context.
Both tests provide redeeming factors for different clinical settings and different assessment and treatment purposes. This is affirmed by Andersson (2005) as they discuss how a tests adequacy depends on a particular child’s needs. The CELF-5 can be used to assess language more broadly while the TILLS can be used for a more intensive analysis into a child’s language. While the CELF-5 provides information on whether or not a child has a language delay or disorder, the TILLS provides a formal diagnosis, as well as information on the child’s academic achievements. Since the children for which these tests are used are school aged, it is important to understand how their deficits may impact their academics, which is provided for by the TILLS. In conclusion to these points and the wide variety of differences listed above, if it were possible I would buy both tests as they both provide benefits that could be useful in various clinical contexts.

Conclusion

Clinicians should consider utilizing both a formal and informal means of testing if one test does not cover all targeted areas of assessment. For example, the CELF-5 covers a wide range of language capacities but does not cover PA. Therefore, using the CELF-5 in addition to an informal test (e.g. PA screener) could be an informative and productive combination. For instance, Friberg (2010) states that “optimally, SLPs should incorporate data from both quantitative and qualitative sources to fully examine the language abilities of any client undergoing a language-based assessment.”

Finally, the importance of literacy for academic success cannot be understated. Paul, Norbury and Gosse (2017) interestingly explained how children make the switch from learning to read to reading to learn. Clearly, children that are unable to quickly overcome reading deficits will continue to fall further behind academically. Additionally, Snow, Scarborough, and Burns (1999) explain that the SLPs role in school settings with regards to literacy is to provide, “in-depth diagnostic assessment of reading-related language abilities” (p. 56). Thus, the impact of literacy on future academic success and achievement should not be underestimated when working in the school system. Clearly, the TILLS proves useful for this clinical setting.

I hope you found this to be a helpful resource! Feel free to comment below your thoughts and experiences on these tests.

-Shannon

References

Andersson, L. (2005). Determining the Adequacy of Tests of Children’s Language. Communication Disorders Quarterly, 26(4), 207–225. https://doi.org/10.1177/15257401050260040301

Bishop, D. V. M., & Snowling, M. J. (2004). Developmental dyslexia and specific language impairment: Same or different? Psychological Bulletin, 130(6), 858–886. doi: 10.1037/0033-2909.130.6.858

Blachman, B. A. (2000). Phonological awareness. In M. Kamil, P. Mosenthal, P. D. Pearson, & R. Barr(Eds.), Handbook of reading research III, (pp. 483-502). Mahwah, NJ: Erlbaum.

Botting, N., & Conti-Ramsden, G. (2003). Characteristics of children with specific language impairment. In Classification of developmental language disorders (pp. 35-50). Psychology Press.

Elleseff, T. (2018). Review of the Test of Integrated Language Skills. Retrieved from https://www.smartspeechtherapy.com/review-of-the-test-of-integrated-language-and-liter acy-tills/

Friberg, J. C. (2010). Considerations for test selection: How do validity and reliability impact diagnostic decisions? Child Language Teaching and Therapy, 26, 77-92. doi: 10.1177/0265659009349972

Gillam, R. B. & Pearson, N. A. (2017). Test of narrative language - Second Edition. Austin, TX: Pro-Ed.

Nelson, N. W. (2010). Test of integrated language and literacy skills validation research. Retrieved from https://ies.ed.gov/funding/grantsearch/details.asp?ID=997

Nelson, N. W., Plante, E., Helm-Estabrooks, N, & Hotz, G. (2016a). Test of integrated language and literacy skills: Examiner’s manual. Baltimore, ML: Paul H. Brookes Publishing Co.

Plante, E., Helm-Estabrooks, N, & Hotz, G. (2016b). Test of integrated language and literacy skills: Technical manual. Baltimore, ML: Paul H. Brookes Publishing Co.

Nippold, M. A., & Schwarz, I. E. (2002). Do children recover from specific language impairment? Advances in Speech Language Pathology, 4, 41-49. doi.org/10.1080/14417040210001669221

Paul, R., Norbury, C. & Gosse, C. (2017). Language disorders: From infancy through adolescence (5th edition). St. Louis: Mosby.

Peña, E. D., Spaulding, T. J., & Plante, E. (2006). The composition of normative groups and diagnostic decision making: Shooting ourselves in the foot. American Journal of Speech-Language Pathology, 15(3), 247-254. doi:10.1044/1058-0360(2006/023)

Plante, E. & Vance, R. (1994). Selection of preschool language tests: A data-based approach. Language, Speech, & Hearing Services in Schools, 25, 15-24.

Salvia, J., & Ysseldyke, J. E. (2004). Assessment: In special and inclusive education (9th ed.). Boston: Houghton Mifflin.

Snow, C.E., Scarborough, H.S., & Burns, M.S. (1999). What speech-language pathologists need to know about early reading. Topics in Language Disorders, 20, 48-58.

Wiig, E. H., Semel, E., & Secord, W. A. (2013a). Clinical Evaluation of Language Fundamentals–Fifth Edition (CELF-5) Technical Manual. Bloomington, MN: NCS Pearson Inc.

Wiig, E. H., Semel, E., & Secord, W. A. (2013b). Clinical Evaluation of Language Fundamentals–Fifth Edition (CELF-5). Bloomington, MN: NCS Pearson.