Using item analysis to evaluate the validity and reliability of an existing online information literacy skills assessment instrument

At the end of 2012 a short course, the Certificate of Information Literacy, was registered by the Cape Peninsula University of Technology Libraries and was first introduced in 2013. The course consisted of five modules and a summative assessment. The assessment, accessed via the Learning Management System, comprised 100 multiple-choice questions of which students received a randomly selected fifty to answer in ninety minutes. This article reports on the effectiveness and validity of the online assessment for the Certificate of Information Literacy: whether the assessment instrument accurately measures the information literacy skills of the students and whether the test can be considered valid and reliable. There are two parts to the study, but this paper focuses on the first part which is an item analysis on the assessment data of the students who completed the test in 2013. The results highlight weaknesses and problematic areas in the test. The second part of the study will focus on comparing the results students received in the test with the results of a subject-specific essay assignment to measure the level of application of their information literacy skills.


Introduction
One of the key roles of an academic library is teaching users to find and make sense of information in an informationoverloaded society (Kermani 2013: 52).Thus, information literacy education is one of the core functions of a library.The Cape Peninsula University of Technology (CPUT) Council approved an information literacy policy in 2009 which states that all academic programmes at the university "must include a component that offers the student adequate training in the level and use of information skills needed to function within the programme" (CPUT 2009).To support the various faculties in implementing the policy, CPUT Libraries developed a short course, the Certificate of Information Literacy.The course was duly registered with the Centre for Professional and Personal Development (CPPD) and was introduced in 2013.
A reason for registering a short course to assist with the information literacy policy of the university was the need to formalise and standardise the teaching of information literacy skills across all faculties.There was also a desire to increase the number of 'quality' contact hours between librarians and students.An increase in quality contact hours would reduce the number of 'one-shot' requests from lecturers for an instruction session for their studentsthese instruction sessions generally do not allow enough time to teach the complete information literacy process.In addition to gaining information literacy skills, students on the short course would be awarded a certificate to add to their portfolio of evidence for when they leave the university.It was thus hoped that a registered information literacy short course would increase the motivation levels of students to attend classes.For the certificate to be awarded there was a need to include in the course a standard, final summative assessment with subject-specific exercises to be done during class.The assessment was developed by CPUT librarians.Teams of two to three librarians were selected to devise twenty relevant questions for each module.In all, the assessment comprised 100 multiple-choice questions.The course was assessed online, with students accessing a test via the Learning Management System (LMS), Blackboard.Students received a random selection of fifty questions to answer in ninety minutes.
The focus of this study was to determine the effectiveness and validity of the online assessment.Based on Venter's approach (2008: 133), it attempted to determine whether the instrument of assessment accurately measures information literacy skills and whether it can be considered valid and reliable to determine the knowledge levels of participants.The assessment instrument was used for the first time in 2013 as a standard assessment for all students registering for the Certificate of Information Literacy short course.The first part of the study involved an item analysis of the assessment data for those who completed the test during 2013.As stated by Nunnally (1972: 186), an item analysis of the results of a test can guide administrators regarding which questions (items) to revise or discard and which to continue using in future tests.The second part of the study (in process) will involve comparing the results students received in the test with the results of a subject-specific essay assignment to measure the level of application of their information literacy skills.The subject-specific essay assignment will be assessed with a rubric with a 30% weighting towards testing information literacy skills.

Literature review
A report for the Association of College and Research Libraries (ACRL) (Oakleaf 2010: 98) indicated that most academic librarians do not participate in assessment activities.Yet, the report said that it is important for librarians to be aware of "current philosophies and movements in higher education assessment".Oakleaf (2008: 233) indicated that the "culture of assessment" in higher education institutions means that academic librarians have to show that the information literacy instruction they provide has a positive impact on student learning.Consequently, libraries are now more and more being required to measure student learning outcomes in a meaningful way (Sonntag, 2008).For this, librarians must determine "what students know and are able to do" because of their intervention (Oakleaf 2008: 233).Thus, the continuous measuring of information literacy assessments is crucial but, according to Mery and Newby (2010: 99), librarians should be given the training to do so.
According to Oakleaf (2006:209), there are various assessment tools available, and the choice of assessment will depend on the purpose of the assessment together with the capabilities of the various assessment methods.She indicated that librarians should carefully consider the purpose of assessments, meticulously select assessment tools, and have a clear understanding of different assessment approaches.In a later study, Oakleaf (2009: 541) stated that "assessment for learning theory combines teaching, learning, and assessment activities in ways that produce both more knowledgeable students and more skilled teachers".It is therefore important to ensure that the correct assessment tool is chosen and that it links to the purpose of the assessment.The ACRL report (Oakleaf 2010) stressed the importance of academics being made aware of assessment of their students taking place in the librarywhich they might like to consider counting towards formal courses.

Multiple-choice questions
The current assessment method used by CPUT Libraries for the summative assessment of the short course, Certificate of Information Literacy, is by multiple-choice questions.The test is administered via the LMS, Blackboard.Some advantages of making use of multiple-choice questions for assessment are that they are easier to score and are very reliable (the reliability increases the longer the test) (Oakleaf 2006: 204).They are also less demanding on time as scoring is done by the LMS, which can collect large amounts of data quickly and provide immediate feedback to students (Oakleaf 2006: 204).Limitations associated with multiple-choice questions are that they often "do not test higher-level thinking skills" and they lack the ability to assess students' authenticity: according to Oakleaf they "over assess student knowledge and under assess student know-how with knowledge" (Oakleaf, 2006: 205).There is also a danger of "over-simplified test items" being presented (Oakleaf, 2006: 205).
In a study by Walsh (2009: 22) where he reviewed literature about information literacy assessment, multiple-choice questionnaires were by far the most popular method used in tests, but the author indicated that a multiple-choice test of good quality was not simple to produce.The instruments used in an information literacy skills study by Dunaway and Orblych (2010: 27) included a multiple-choice pre-assessment before the training intervention so that the information literacy skills of students could be determined in advance.According to Dunaway and Orblych (2010: 26), a preassessment benefits the learner by "reinforcing the material that is the subject of the assessment".They suggested that an information literacy curriculum and teaching plans could be adapted based on pre-assessment results.In a similar study by Fain (2011: 110), a multiple-choice pre-test and post-test were used.Fain emphasised the importance of these tests relating to the main focus of the instruction session and that they played an important role in measuring what students had learned at the end of the course.The comparisons of multiple-choice pre-test and post-test performance showed how proficient students had become over the course of the semester (Fain 2011: 111).

Item analysis
Item analysis was performed to check the current multiple-choice assessment that is being used for the Certificate of Information Literacy short course being run by CPUT Libraries.According to Nunnally (1972: 186), item analysis will indicate which test questions (items) to revise or discard and which to continue using in future tests.In a similar study by Venter (2008: 133), test items were analysed for the purpose of deciding which could be used again.Only items meeting the set item analysis criteria formed part of the test going forward.Ondrusek (2005: 392) pointed out that "problematic items" could be identified by counting the frequency of incorrect responses.Mery and Newby (2010: 98) assessed the reliability and validity of a multiple-choice information literacy test: item analysis was performed in order to determine the validity of individual items and how they affected the test as a whole.

Methodology
Taking a positivist approach, a quantitative method was used in the research reported on in this paper.Assessment data from those who registered for the certificate course and completed the online assessment during 2013 were downloaded from the LMS, collated into one file, and analysed.Data included student information (name, surname and student number), the questions each student answered, the answers they had selected, and their final marks.Item analysis was performed on this assessment data.Final selection or rejection of test items was based on the item difficulty index (IDI), discrimination index (DI), item-to-total correlation (ITT) and the distribution of answers to alternatives.
Over 2,756 students registered for the Certificate of Information Literacy course during 2013, and 1,977 students completed the online assessment (71% of registered students).CPUT is made up of eleven campuses, namely Bellville, Cape Town, Granger Bay, Groote Schuur, Mowbray, Media City, Athlone, Tygerberg, Wellington, Worcester and George, each of them with a library on campus.The language of instruction at the Wellington campus is mostly Afrikaans for students in the Education Faculty; for those taking the small number of Business and Management Sciences and Applied Sciences courses on this campus, the language of instruction is English.Therefore, many students from the Wellington campus completed the test in Afrikaans.Test data in Afrikaans were excluded from this study on the basis that the questions could be understood differently in another language which could influence item analysis results.Therefore, the test data of 1,317 students were used for this study.Nunnally (1972: 194) cautioned against using item analysis for a small number of students.He indicated that the number of students should be at least more than forty.In this case, the number of students exceeded the minimum number specified by Nunnally.

Item difficulty index (IDI)
The index of item difficulty indicates the percentage of students who answered the question correctly (Nunnally, 1972: 186); it is sometimes referred to as the easiness percentage although Nunnally said that 'easiness percentage' is a misnomer as the higher the percentage, the easierrather than more difficultthe item is.As explained by Mery and Newby (2010: 105), "the higher the index value, the lower the difficulty of an item; the lower the index, the greater the difficulty".The IDI is calculated by adding up the number of students who answered the item correctly (S) and dividing that number by the total number of students (P) (Nunnally 1972: 187).The formula is calculated as: IDI = S/P Nunnally (1972) indicated that, for items with up to three answers to choose from, those to keep will have an IDI of 20% to 80%.For items with four or more alternative answers, the range is 35% to 85%.For true/false questions, the range is 60% to 95%.

Discrimination index (DI)
It is important to "determine the extent to which each item goes along with, or measures the same thing as, the total test in which it is included" (Nunnally 1972: 191).A way to determine the relevancy of items is to identify the top 25% of students (T) in relation to the total test scores, as well as the bottom 25% of students (B).Then, determine the percentages of students in both groups who answered a specific question correctly, where the percentage of top students who answered the question correctly is T p and the percentage of students in the bottom group who answered the question correct is B p .Subtract the percentage of the bottom group from the percentage of the top group.The formula is therefore: The DI indicates the extent to which each question contributes towards the total test.Nunnally (1972: 192) indicated that there should be at least a difference of 20% between T p and B p .A very small difference shows that the question does not add much value to the test as a whole.Some reasons given by Nunnally (1972: 192) as to why a question may fail are because of ambiguity of meaning, the question's ease or difficulty, or the incorrect answer erroneously labelled in the system as correct.In most cases, results of the DI will be positive, but in exceptional cases there will be a negative result.In cases like these, more students in the bottom 25% have answered the question correctly than students in the top 25% perhaps, according to Nunnally (1972), because teachers did not select the correct answer or the question was ambiguous.

Item-to-total correlation (ITT)
In addition to the IDI and DI, the item-to-total correlation was measured.In a test, each student has a score for each question (pass or fail; right or wrong) as well as a total score for the completed test (Nunnally 1972: 193).Venter (2006:14) states that only questions with a correlation of 0.20 and higher should be selected for future use.

Distribution of answers to alternatives
It is important to look at the answers students selected for each question and compare them to the alternatives available.According to Venter (2006: 14), the distribution of answers to alternatives looks at the percentage of students who selected the various alternative answers (e.g.a, b, c, d, or e).These percentages are used "to determine the distracting ability of the alternatives".According to Nunnally (1972: 190), the criteria to use for retaining an alternative answer is that at least 5% or more of students should have selected it in a test situation.

Results and discussion
The results per item analysis for all 100 questions are listed in Appendix A. The areas highlighted in grey are the problem areas that were identified after applying the various selection or rejection criteria.Sixty-nine questions had positive outcomes as far as the IDI, DI & ITT were concerned, but some problems were highlighted with the distribution of answers to alternatives.

Item difficulty index (IDI)
Out of the 100 questions, twenty-three had up to three possible answers, seventy-four questions had four to five possible alternative answers and three questions had true/false alternatives.Table 1 shows the results after applying the criteria that were identified for selection or rejection.Therefore, out of the 100 questions where the IDI had been calculated, twenty-five questions were rejected based on the identified selection or rejection criteria.Questions where the percentage was too low meant that very few students answered the questions correctly, possibly indicating that the question was too difficult, that the wording of the question was unclear, or that the incorrect answer was erroneously labelled as the correct answer.Questions where the percentage was too high meant that too many students answered these questions correctly, either because the questions were too easy or that the alternative answers provided were not plausible.Problematic questions in the test were subsequently revisited by the researcher to see whether the questions could be improved upon or should be discarded.

Discrimination index (DI)
The DI was calculated for all 100 questions.Table 2 indicates the results.Eleven questions were rejected based on the DI.These eleven questions were revisited by the researcher to determine how to change and improve the questions and/or their choice of answers.

Item-to-total correlation (ITT)
After looking at the item-to-total correlation on the 100 questions, there were nine questions with a correlation lower than 0.20.Only questions with a correlation of 0.20 and higher were selected; anything lower than that was revised or rejected.The Pearson's correlation coefficient method was used to determine the item-to-total correlation.

Distribution of answers to alternatives
Out of the 100 questions, twenty-one had three possible answers from which to choose, seventy-six questions had four to five answers from which to choose, and three questions had true/false alternatives.For a total of fifty-one questions, fewer than 5% of students selected one or more of the possible alternative answers.It could clearly be seen which answer choices needed to be replaced or removed.For each question, there was a small percentage of students who did not respond.
After considering the IDI, DI, ITT and distribution of answers to alternatives for each of the 100 questions, it was clear that the biggest weakness of the test resided with the distribution of answers to alternatives, which in turn had an effect on the IDI.Here follow examples of problematic questions in the test.The results of the analysis are as follows: IDI = 3.03%; ID = 4.1%; ITT = 0.070.The correct answer is 'd', chosen by only 3.04% of students.This test question was rejected according to the identified selection or rejection criteria in the IDI, ID and ITT, and the actual answer 'd' was highlighted as a concern when looking at the distribution of answers to alternatives.The IDI result of 3.03%indicated that the "rejection is too low" which normally means that the question is too difficult.In this case, however, it was decided that the question was more likely misunderstood by students.Students seemed not to realize that they had to choose the correct in-text citation for the sentence beginning "According to Petzel and Riddle ….The researcher decided to remove 'e' as an alternative answer even though 9.88% of students chose it and it thus exceeded the criteria of 5%.It was felt that four options were sufficient.The results of the analysis are as follows: IDI = 29.20%;ID = 11.3%;ITT = 0.140.This question had four answers from which to choose and was rejected on the basis of an IDI that was lower than the criteria of 35% to 85%.At the time of the design of the original assessment, Safe-Assign was the only anti-plagiarism software at CPUT; since then a license for Turnitin had been purchased.Therefore, two of the alternative answers are correct.Question 75 therefore was changed to: The results of the analysis are as follows: IDI = 19.61%;ID = 9.6%;ITT = 0.140.The correct answer is 'a'.This test question was rejected according to the identified selection or rejection criteria in the IDI, ID and ITT.The majority of students selected 'c' as the answer and then 'b'.The best place to begin your search for information will depend on the type of information that is required.In some instances the 'reference section' will be a good place to start, but in other cases going to the 'counter' and asking a library staff member would be best.In fact, all four options could be correct, depending on the information that is needed.The question is ambiguous and can be interpreted in various ways.This question was therefore rejected and discarded from the test.

Conclusion
Conducting assessment after information literacy teaching interventions is essential, but it is also essential that the assessment instrument be valid and reliable.Therefore, this study was an attempt to ensure that this is the case with the short course Certificate in Information Literacy at CPUT.Performing item analysis on the assessment data provided valuable information about each question and pointed out weaknesses and problematic areas in the test.Each question was re-evaluated and adjusted, or even discarded in some cases.New questions were designed to replace those questions that were discarded.The new and improved test instrument was completed and will be used for all students who enrol for the short course during 2015.Continuous revision of the test is important and it is planned that the test data of students who will complete the revised test in 2015 will be used for item analysis during 2016.It is important to assess the validity and reliability of the test continually.
It is recognised that multiple-choice assessment instruments often do not test higher level skills or how students apply in practice what they have learned.Therefore, a further part to this study is planned in order to compare the marks that students received in this multiple-choice test with the results of an actual subject-specific essay assignment.It is hoped that there will be a correlation between the two marks: a student who performs well in the multiple-choice test also performs well in the subject-specific essay assignment.If not, the librarians might have to consider improving the multiplechoice test further and finding better ways to test information literacy skills.

Results per item analysis and per question
of two authors are mentioned in a source, e.g."According to Petzel and Riddle ….", how should your intext be done?a. (Petzel and Riddle, 2006: 102) b. (Petzel & Riddle, 2006: 102) c.Petzel and Riddle (2006: 102) d. (2006: 102) e.None of these ".The revised question was thus: When the names of two authors are mentioned in a source and the sentence in your essay starts with "According to Petzel and Riddle ….", how should your in-text be done?a. (Petzel and Riddle, 2006: 102) b. (Petzel & Riddle, 2006: 102) c.Petzel and Riddle (2006: 102) d.(2006: 102) anti-plagiarism software used by CPUT lecturers to detect plagiarism?a. Turnitin b.Blackboard c. Safe-Assign d.None of the above What are the anti-plagiarism software programs used by CPUT lecturers to detect plagiarism?a. Turnitin & Safe-Assign b