Development and Validation of a Reading Comprehension Scale

This study developed and validated a scale that measures the reading comprehension of the Filipino learners. This study attempted to address the need for a reliable, valid screening tool suited for Filipino learners, with a high level of feasibility that can be used in the classroom to identify what reading comprehension type a student belongs to by developing a validated scale of reading comprehension for junior high school learners in the local context. The Reading Comprehension Scale, a 13-item instrument that was based on four themes (ap-plied, interpretive, affective and lexical reading comprehension) was designed and crafted. The design used in this research was a mixed-method approach wherein 15 students participated in the interview for the item-generation who answered questions based on the Me- ta-Comprehension Strategy Index of Schmitt (1990). A sample of 476 students participated in the administration of the initial instrument. Cronbach Alpha, as a test of its reliability, was used to indicate the degree of internal consistency of the items and content validity to test the appropriateness of the generated statements. Results showed that the Cronbach alpha ranged from 0.569-0.639 which is considered good reliability. Kaiser-Meyer-Olkin and Bartlett’s Test were used to examine the appropriateness of factor analysis. The approximate of Chi-square is 1408.237 with 465 degrees of freedom, which is highly significant with a p-value of 0.000 at 0.01 level of significance. The Kaiser-Meyer-Olkin Measure of Sampling Adequacy is 0.734 which is greater than 0.5, therefore the Factor Analysis was valid. Based on the statistical values, the scale developed is valid and reliable.


INTRODUCTION
To have the skill to read and comprehend texts is imperative in one's daily life, and it is invested with greater importance for being a component intrinsic to the cognitive development of students when linked with education (Cunha & Capellini, 2013). Functional literacy and higher-order skills are necessary for a Filipino child to develop. Any Filipino child with adequate reading skills is assumed to have better opportunities to succeed in school compared to those whose reading skills are poor. With poor reading skills, poor comprehension and incorrect pronunciations will be manifested. It can affect the academic, psychological and social development of the child if no proper intervention is given early. As such, correct diagnosis of reading disability as quick as possible appears to be essential (Cayubit, 2012). Reading comprehension is known to be a complex but critical skill that students should master to guarantee future academic success (Cain et al., 2004). Currently, available reading comprehension scales are insufficient as to classroom-based screeners for learners who are experiencing difficulty with reading comprehension; while several scales made are psychometrically problematic, some are too long to read and too complicated to serve well as a screener (Keenan et al., 2008). The validity and reliability of teacher-completed behaviour rating scales of academic skills are supported and accepted by research as an effective way of assessing students' academic progress that could address the present instrumentation gap in practice (Demaray & Elliot, 1998;Speece et al., 2010). Based on the previous records from the 2008 Functional Literacy, Education and Mass Media Survey (FLEMMS), 20.1 million Filipinos, who are 10-64 years old, do not understand what they read. Despite the Philippines supposedly having a high literacy rate of 88.6 % still, many Filipinos can barely read and write (Selangan, 2015). It manifests to Filipinos, especially those living in remote areas as well as the slum areas of the country. Many researchers have already suggested classifications of different dimensions (McNamara & Magliano, 2009), and there is still a little argument in the field as to how these reading comprehension processes might validly be classified, or if separable sub-skills of comprehension exist at all (Rupp, 2012).

Reading comprehension as a cognitive skill
Reading comprehension is a difficult cognitive ability that requires the capacity to fit in text information with the understanding of the listener or reader and resulting in the explanation of mental representation (Meneghetti et al., 2006). According to Pressley (2002), good text comprehension develops if a reader is capable to predicting what the text may be about, observes the understanding of the text, asks questions while reading, relates information in the text to background knowledge, and summarizes what is being read. From the interview of PhilStar Global dated 2010 with the head of the DepEd's Bureau of Elementary Education, Dr Yolanda Quijano stated that reading problems are the main culprits for the poor performance of the Filipino students in the National Achievement Test. Orencia (2006) questioned why the Philippines continues to have poor reading proficiency despite its high literacy rate. With this problem that the country has, it is an alarming thing to claim that comprehension is the prime goal of reading and comprehension failures can lead to school failures. As a component of reading, reading comprehension can be best understood if one is adept with the different cognitive procedures as current models suggest that such processes play a significant role in comprehension skills (Meneghetti et al., 2006). It is also commonly regarded as a multidimensional form of processing, as contrasting to being a unitary construct (Van Den Broek, 2012). It recommends that the reading condition, reading purpose, topic, and text will touch the nature of reading in such a way that when determining reading ability, the choice of reading material, the assumed reading objectives, and also item construction will have an important impact on students' test results. This has been already proven in many experiential studies (Best et al., 2006;Keenan et al., 2008). Ghafournia and Afghari (2013) discovered that students with different levels of reading proficiency use different strategies in comprehending reading text and answer reading questions. The result of their study also shows that students with more linguistic knowledge used comprehending test-taking strategies repeatedly to understand the reading text and answer the questions. The ability of students to answer the reading questions is affected by their vocabulary size. Ibrahim et al. (2016) who studied the relationship between vocabulary size and reading comprehension of ESL learners at a public university in Malaysia found that there was a statistically significant difference between the score in reading comprehension and vocabulary test. Another factor was considered that prior knowledge gives a high contribution to comprehending reading text (Ozuru et al., 2009;& Chen 2008). Those students with high prior knowledge have better performance on reading test compared to those with low prior knowledge (Abdelaal & Sase, 2014).

Current reading comprehension measures
Reading comprehension assessment has grown throughout the 20th century both in the types of skills that are measured and the format of tests used (Pearson & Hamm, 2005). Despite years of research and assessment repetitions, reading comprehension assessment remains a subject of debate in the academic field. A review of existing comprehension assessment literature reveals that there are many unresolved issues concerning effective assessment of students in schools, including psychometric problems and issues of utility in the classroom (Sweet, 2005). Proof suggests that measures of reading comprehension differ broadly in the contribution of word decoding vs oral language comprehension skills to their scores, and outcome, therefore, may not associate well across measures (Keenan et al., 2008). For example, Cutting and Scarborough (2006) conducted research examining the relative helps of reading, language, and cognitive skills to three commonly used, standardized measures of reading comprehension, Wechsler Individual Achievement Test -Passage Comprehension subtest (WIAT), the Gates-MacGinitie Reading Test (G-M; MacGinitie, MacGinitie,), and the Gray Oral Reading Test-Third Edition (GORT-3). Using a sample of 97 students, from grades 1 -10, the study looked for and examined relative contributions of oral language comprehension constructs skills and word decoding skills to each reading comprehension measure, the aids of other factors including attention, reading rate and verbal memory, and also correlations across the three measures. The development of an effective reading comprehension assessment is still an issue that remains largely unsolved in the literature. Assessment design, utility, and content vary widely and reflect varying components of reading comprehension, and some lack sounds psychometric qualities. Likewise, despite research supporting the critical nature of reading comprehension both metacognitive and cognitive skills, there is still no available assessment of reading comprehension that integrates the assessment of these observable metacognitive skills (Gebhardt, 2013).

Rating scales of academic behaviors
A few teacher-completed rating scales of behaviours linked to academics are existing and have been in use for research. For example, Begeny et al. (2008) made the 9-item TRSRP assessing students' decoding, reading comprehension, reading fluency, reading accuracy, and application of reading skills to school work for their teacher judgment study. Begeny et al. (2008) used as well the TRSRP for another teacher judgment study. Some teacher judgment studies highlighted, validated, existing brief rating scales of academic skill such as the SSRS-T or the Social Skills Rating System -Teacher, which holds questions relating to both reading-specific behaviours and broader academic concepts such as intellectual functioning and motivation (Demaray & Elliot, 1998). In a research, Elliot et al. (2007) defined how an experimental instrument call the BACESS (Brief Academic Competence Evaluation Scales System) could be used in conjunction with the DIBELS to screen for learners with reading problems. Numerous studies have also begun to discover value academic behaviour rating scales as screeners for student learning difficulties. One such study observed whether teacher ratings of students' progress in phonics are accepted screener for learning incapacity in reading (Snowling et al., 2011). First-grade students (n=146) were evaluated for reading difficulties using the Word Reading, Letter-Sound Knowledge, Sound Isolation and Sound Deletion subtests from the York Reading for Comprehension Assessment (YARC) along with several cognitive processing screeners. Using a researchergenerated rating scale, classroom teachers then rated all participating students that asked for approximations of student progress as defined by the London Department for Children, Families and Schools through the phases of phonics. In another study, researchers projected a model for reading problems using teacher rating scales as a component of the universal screening battery for screening upper elementary students (Speece et al., 2010). One study has also begun to study the function of teacher-completed rating scales as an indicator of which students are consistently struggling and which students are responding well to intervention (Vaughn et al., 2009). As evident on the studies conducted by Begeny et al. (2008), Cutting & Scarborough (2006), Elliot et al. (2007), Speece et al. (2010) and Vaughn et al. (2009), they have developed reading comprehension scales which were inadequate psychometrically and there are no developed tools for systematically assessing reading comprehension in the classroom. The situations highlighted on the abovementioned information also confirmed the lack of content consistency that characterizes current reading comprehension and also emphasizes the need for improved methods of reading comprehension assessment. Thus, this study attempted to address the need for a reliable, valid screening tool suited for Filipino learners, with a high level of feasibility that can be used in the classroom to identify what reading comprehension type a student belongs to by developing a validated scale of reading comprehension for junior high school learners in the local context.

Theoretical Framework
This study was anchored on Messick's theory of test validity, that is profoundly influential in part because it brings together various contributions into a unified framework for building valid arguments. It builds an argument based on multiple sources of evidence like reflections on one's own values and those of others. In line with this theory, the researcher let the participants bring out different ideas through an interview regarding reading comprehension as necessary factors for scale development.

Conceptual Framework
This study aimed at developing and validating a scale of reading comprehension for the Grade 8 students of the two public schools in Candaba, Pampanga. Figure  1 illustrated the paradigm of the study. As shown, the paradigm presents the steps that were accomplished to determine the result of the study. The scale development process began with the creation of items to identify appropriate questions that lift the identified domain and content validity to assess if the items adequately measure the domain of interest. For the validation of the scale, Factor analysis is used to reduce the set of items to see the significance of the scale, and internal consistency assessment to determine the reliability of the scale. Lastly, the test was assessed by the three sets of participants to see the validity of the researcher-made scale.

Research Problems
Generally, this study aimed to develop and validate a scale of reading comprehension for the Grade 8 students of a public school in Candaba, Pampanga. Specifically, it sought to answer the following questions: 1) how may the reading comprehension scale be developed? and 2) how may the reading comprehension scale be validated?

METHODS
This study employed a mixed-method type of research. It refers to an evolving methodology of research that advances the systematic integration of qualitative and quantitative data within a sustained program of inquiry to provide a better understanding of the research problem. The researcher utilized developmental method research. As opposed to simple instructional development, developmental research has been accepted as the systematic study of designing, developing, and evaluating instructional programs, processes, and products that must meet the criteria of internal consistency and effectiveness.

Participants
The participants of the study were the Grade 8 students of the two public schools in Candaba, Pampanga during the School-Year 2019-2020. There were two sets of participants involved in the study. A sample of 15 students, with different levels of intelligence, from the highest to the lowest participated in the item-generation. The second set of the participants with 476 students was utilized for the administration of the initial instrument. Total enumeration was used. All of the Grade 8 students of the two public schools were the participating groups. On the other hand, a Master Teacher major in English, and four Grade 8 English teachers from the locale of the study served as the teacher-respondents for the scale content validation. The validation tool used was a content rating review form from Aravamudhan and Krishnaveni (2015).

Instruments
The researcher developed a reading comprehension scale. The items for the scale were based on the answers given by the 15 participants in an interview that was conducted by the researcher. The questions for the interview were based on the Meta-Comprehension Strategy Index of Schmitt (1990). Questions were divided into three sections. The question asked in the first section is what the student can do to improve their understanding before they read. The second section asked what they can do to improve understanding during reading, and in the last section, it asked what the students can do to improve comprehension after reading.

Procedure Development Stage
Approval of the school administration for the conduct of the study and approval of the teacher and studentparticipants were sought prior to accomplishing the steps. The researcher focused first on the development of the scale that underwent stages: item generation, content validation and reliability calculation. The scale development process started with an interview wherein the researcher used Schmitt's Meta-Comprehension Strategy Index as a guide. The students were asked on what to do to improve understanding before, during and after reading. After that, their answers were subjected to open codes, axial codes, and themes by utilizing Colaizzi's method.

Validation Stage
The developed scale that was made by the researcher went through exploratory factor analysis (EFA) to reduce the set of items to see the significance of the scale. The minimum factor loading cut off point of this study was 0.4. As a rule of thumb to access significance of factor loadings, factor loadings of 0.3 to 0.4 are minimally accepted but still accepted (Hair et al., 2010). Also, in the research conducted by Yusoff et al. (2011), all the items included in the study have factor loadings of more than 0.3 which indicated that they were well clustered together. Kaiser-Meyer-Olkin and Bartlett's test were utilized to measure the validity of the proposed scale. Kaiser (1974) recommends a bare minimum of 0.5 and the value between 0.5 and 0.7 is mediocre, value between 0.7 and 0.8 is good, value between 0.8 is 0.9 are great and value between 0.9 and above is superb. For the internal consistency assessment, to determine the reliability of the scale, Cronbach Alpha Coefficient was used. According to Hulin et al. (2001), for Cronbach's Alpha, a general accepted rule is that of 0.6-0.7 indicates an acceptable level of reliability, and 0.8 or greater is a very good level. Also, Nunally and Bernstein (1994), suggested that composite reliability values of 0.60 to 0. 70 are acceptable in exploratory research, while in more advanced stages of research, values between 0. 70 and 0.90 can be regarded as satisfactory. The researcher used total enumeration to identify the student-participants of the study. Finally, the test was assessed by the two sets of participants which were utilized for development and administration of the initial tool to see the validity of the researcher-made reading comprehension scale. "In choosing a story to read, I always consider the author." "I scan first the entire story before I read it." "I read first the synopsis/description to see what the story is about." "I check first if no pages are missing." "I use dictionary to understand unfamiliar words." "I look at the pictures to better understand the story." "I use context clues to help me define unfamiliar words." "I read the story quickly to find out what happened." "I think of the lesson I learned from the story." "I think of how I would have acted if I were the main character." 2. While you are reading, what are the things you usually do to better understand the story? 3. After you've read, what are the things you usually do to better understand the story? Table 1 shows the phases/stages followed in developing the Reading Comprehension Scale. A sample of 15 students participated in the item-generation. The initial scale was then administered to 476 students to test its reliability. Phase 1 involved the initial pool of items. The 15 students were interviewed for the item-generation. The questions focused on what to do before, during, and after reading a text. (e.g., "What do you usually do to improve your understanding before you read?", "What do you usually do to improve understanding during reading?", and "What do you usually do to improve comprehension after reading?") Questions used for the interview were based on the Meta Comprehension Strategy Index of Schmitt (1990). After the interview, their answers were subjected to open codes, axial codes, and themes by utilizing Colaizzi's method. Generated items totaled to 31. (e.g., "I use context clues to help me define unfamiliar words.") In Phase 2, the initial instrument was administered to a sample of 476 students. The items included in the first tool were from the generated pool items in Phase 1. It covered a total of 31 statements which were proofread and edited. However, that was not the validation part yet. A total of 18 items was deleted, and only 13 items were kept for the final instrument which was based on four themes, namely: applied, interpretive, affective and lexical reading comprehension types. 4 0.6 3. I look for the importance of the story. 5 1 4. I think of the lesson I learned from the story. 5 1 5. I read the title to see what the story is about. 5 1 6. I imagine the scenes in the story. 5 1 7. I reread the best part of the story. 4 0.6 8. I reread some parts to see if things are making sense. 4 0.6 9. I scan first the entire story before I read it. 5 1 10. I think of a better ending for the story. 5 1 11. I use context clues to help me define unfamiliar words. 5 1 12. I look at the pictures to better understand the story. 5 1 13. I look at the pictures to see what the story is about. 5 1 *Note: For the purpose of computing the content validity ratio of each item, the following conversion was done to replace the values reflected in the validation tool to fit the descriptions in the Lawshe method (1975): No Fit (1) -Not Necessary, May Fit (2) -Useful but Not Essential; Good Fit (3) -Excellent Fit (4) -Essential. Table 2.1 shows the rating given by the five validators in determining whether an item fits the concept or domain being measured. All of the validators agreed that the statements included in the reading comprehension scale are essential. Items 1 (I read silently to better understand what I read), 2 (I read in a quiet place to better understand what I read), 7 (I reread the best part of the story) and 8 (I reread some parts to see if things are making sense) all garnered a content validity index of 0.6 but is still an acceptable rate due to more than half of the validators deemed them essential. Also, all of the validators agreed that items 3 (I look for the importance of the story), 4 (I think of the lesson I learned from the story), 5 (I read the title to see what the story is about), 6 (I imagine the scenes in the story), 9 (I scan first the entire story before I read it), 10 (I think of a better ending for the story), 11 (I use context clues to help me define unfamiliar words), 12 (I look at the pictures to better understand the story), and 13 (I look at the pictures to see what the story is about) are essential, that all garnered a validity index of 1 which is a highly acceptable rate. About the findings, according to Pressley (2002), good text comprehension emerges if a reader can predict what the text may be about. This means that these statements are experienced both by the English as a Foreign Language (EFL) and English as a Second Language (ESL) learners. Table 2.2 shows that Cronbach's alpha is 0.610 which indicates a good level of internal consistency for the scale. Nunally and Bernstein (1994) suggested that composite reliability values of 0.60 to 0. 70 are acceptable  Hulin et al. (2001), for Cronbach's Alpha, a general accepted rule is that of 0.6-0.7 indicates an acceptable level of reliability, and 0.8 or greater is a very good level. With the data presented, it shows that the reading comprehension scale can be regarded as a reliable tool with an alpha level which ranged from 0.569-0.638. A total of 18 items was deleted from the initial instrument.  Table 2.4 shows the result of the KMO and Bartlett's Test which is used to measure and examine the appropriateness of factor analysis and appropriateness of the respondents. The approximate of Chi-square is 1408.237 with 465 degrees of freedom, which is highly significant with a p-value of 0.000 at 0.01 level of significance. The Kaiser-Meyer-Olkin Measure of Sampling Adequacy is 0.734 which is greater than 0.5. As mentioned by Field (2000), the sampling is adequate or sufficient if the value of Kaiser-Meyer-Olkin (KMO) is larger than 0.5. Kaiser (1974) also recommends a bare minimum of 0.5 and the value between 0.5 and 0.7 is mediocre, value between 0.7 and 0.8 is good, value between 0.8 and 0.9 is great and value between 0.9 and above is superb. Hence, the result implies that the Factor Analysis is considered as an appropriate technique for further analysis of data and that the reading comprehension scale can be regarded as a reliable tool. The minimum factor loading cut off point of this study was 0.4. As a rule of thumb to access significance of factor loadings, factor loadings of 0.3 to 0.4 are minimally accepted but still accepted (Hair et al., 2010). Also, in the research conducted by Yusoff et al. (2011), all the items included in the study have factor loadings of more than 0.3 which indicated that they were well clustered together. Table 2.5 shows the items under the rotated component matrix. Item numbers 1,2,4,6,8,10,11,15,17,18,21,  Statements included in the first factor fall under advanced dimension of reading comprehension, where most of the statements focused on making inferences, identifying facts and opinions, and recognizing arguments. On the other hand, statements included in the second factor fall under basic dimension of reading comprehension where statements are focused on understanding vocabulary and relationship between ideas in a context. Items included in the instrument were based on four themes namely: applied, interpretive, affective and lexical type of reading comprehension. The results imply that from the crafted tool, it has been observed, like what the statements included in the scale say, readers need to apply specific reading strategies in order to comprehend text just like how Ghafournia & Afghari (2013) discovered in their study, that students with different levels of reading proficiency use different strategies in comprehending reading text and answering reading questions

CONCLUSIONS
Based on the findings of the study, the following conclusions were drawn: 1) After following the steps/ procedures in developing an assessment tool, a Reading Comprehension Scale intended to be used by Filipino Learners, a 13-item scale based on themes: applied, interpretive, affective and lexical reading comprehension types was designed and crafted. 2) The validation process made use of content validity ratio, Cronbach Alpha, Kaiser-Meyer-Olkin and Bartlett's Test, and exploratory factor analysis. Based on statistical values, the scale developed is valid and reliable.

RECOMMENDATIONS
Considering the aforementioned findings and conclusions, the following recommendations are hereby suggested: 1) Reading comprehension scale developers should always follow a certain standard and procedure to guarantee the quality of the tool; 2) Reading comprehension scale developers should also follow a validation process to measure the level of reliability of the tool; 3) English teachers may also use the crafted tool to assess if in what type of reading comprehension student belongs to; 4) the Department of Education, Division of Pampanga, may use the tool to assess reading comprehension type among students and then use it as a basis to create interventions or skill enhancement; 5) other researchers may consider this study as a reference for the same topic and/or may include other theme/s which are not limited to those that were drawn from the responses of the participants in this study.