|Evaluation of ASAP Project Automatic Java Marking Service via the Blackboard User Agent by De Montfort University
1. Introduction and background 1
2. Description of research work undertaken
and methods used 1
3. Conducting the research and the results 5
4. Summary and conclusions 10
Evaluation of ASAP Project Automatic Java Marking Service via the Blackboard User Agent by De Montfort University
1. Introduction and background
The Automatic Java Marking Service (AJM) was developed by Kingston University as part of the ASAP project. The AJM provides, via Blackboard, an automatic marking and recording service for the assessment of Java programming code. Students are expected to write a complete Java class definition for one or more classes, based upon a class description which specifies required attributes, methods and functionality. The student code is submitted to the AJM via Blackboard. The AJM compares the output from the student's code with a pre-written solution using a test program. The results are recorded in the Blackboard grade book. The student is immediately informed of their result. A student may repeat a test and the recorded score is updated by a higher score if achieved.
The role of De Montfort University as a partner institution was to:
Produce original Java workshop code, test classes and documentation (or convert existing material) to a form that could be used with the AJM via the Blackboard User Agent
Carry out user trials of the above with DMU undergraduate students and evaluate the effectiveness of the AJM and the feasibility of using this service in the context of undergraduate teaching of Java at De Montfort University
2. Description of research work undertaken and methods used
Kingston University provided access to materials used for a closed book test that they had recently carried out with their students. This material was studied and a draft set of evaluation criteria were developed for discussion.
The evaluation criteria were discussed and a final copy agreed at a meeting with the Kingston University project team in January 2005. Emphasis was to be placed upon pedagogic concerns rather than the technical aspects of the software.
A fully implemented BankAccount class and its associated test program were provided by Kingston University project team as an example. This was used to produce two new classes and associated test programs.
The first was based upon a class to model a simple StoreCard. Writing the StoreCard class is a simple exercise which is appropriate for a first year undergraduate Java programming module.
A student would be able to score a maximum of eight marks as follows:
class compiles 1 mark
getPoints() method produces correct results 1 mark
addPoints() method produces correct results 1 mark
numberOfVouchersAvailable() method produces correct results 1 mark
takeVouchers() method produces correct results 2 marks
toString() method produces correct results 2 marks
The StoreCard test program detects errors and provides some limited feedback to the students. For example, it will detect and report missing methods
The second test was based upon a class to model a Stack data structure using a fixed-size array.
A student would be able to score a maximum of nine marks as follows:
class compiles 1 mark
size() method produces correct results 1 mark
pop() method throws correct exception when stack is empty 1 mark
pop() method produces correct results when stack is not empty 2 mark
push() method throws correct exception when stack is full 1 mark
push() method produces correct results when stack is not full 1 mark
peek() method throws correct exception when stack is full 1 mark
peek() method produces correct results when stack is not full 1 mark
The ArrayStack test program detects errors and provides more feedback than the StoreCard test program. For example, it will detect and report errors caused by missing methods, array index out of bounds exceptions thrown, incorrect code in methods
A first trial of the ArrayStack test was carried out with a small group of students. They suggested that it would be useful to know where they had scored/lost marks. This additional feedback was incorporated into the final version of the ArrayStack test program
Classes are named JStoreCard and JArrayStack respectively.
The focus group chosen to test the AJM was a small group of second year undergraduates currently studying for BSc (Hons) Internet Computing. They had already completed an introductory Java programming module in their first year and were currently studying a further java programming module .Originally, it was intended to use first year students as the trial group but there were difficulties in integrating the trials into their timetables. The focus group were asked for their views on how useful they would have found the AJM tool in the first year of their degree as well as currently.
As the trial had to be integrated into an existing delivery and assessment pattern it was conducted as a formative, not summative assessment. The students were asked for their views on how effective they felt the tool would be for summative as well as formative or self- assessments.
The trial had to be completed within the timetabled one hour computer laboratory and there was concern that many students would not be able to complete their classes, submit them to the AJM and receive feedback within the time available. A series of incremental skeleton solutions were written and given to the students so that they could all experience how the AJM worked with varying levels of completion, ranging from a class with all important methods missing to a completed, fully working class. The skeleton programs had the following errors:
Version 0 Constructor and toString() method only provided.
Version 1 All methods except peek() provided. No exceptions thrown
Version 2 All methods provided. No exceptions thrown
Version 3 All methods provided. Exception thrown, pop() method has faulty code
Version 4 All methods provided. Exception thrown, push() method has faulty code
Version 5 Copy of model solution
Students were not told what errors were contained in the skeleton programs so that they could assess how useful the feedback provided by AJM was in helping them to find them. They were also asked to make at least one compiler error in their code to test the response of AJM to code that would not compile.
The AJM was also evaluated by some members of academic staff at De Montfort University. This evaluation was carried out through trial and discussion. All staff involved had first hand experience of teaching Java programming to first and second year students. Some had experience of using Blackboard.
3. Conducting the research and the results
(a) Evaluation by the student Focus Group
The trial had to be conducted in their timetabled computer laboratory time, which is one hour each week. A trial was carried out on 23 February 2005. As bad weather prevented most students from attending another trial was done during the following week. The weather was worse but seven students attended and completed the trial.
The trial programs had been installed on Kingston University Blackboard server. Each student in the focus group was given a guest login name and password for the duration of the test.
Students worked individually. They were asked to submit each version of the skeleton programs from Version 0 to Version 5 in turn. After each submission they were asked to identify:
whether their recorded score was correct
whether their recorded score improved after an improved submission
what was wrong with the program that had been submitted
After the trial the students were asked for:
their general perception of the AJM
whether they would choose to use AJM if it was available for self-assessment
how they would view the use of AJM if it was used for summative assessment
specifically, the quality of the feedback
The students, being computing students, were initially focused on the speed of response from the server, the lack of graphical interface and how the testing code had been written. However, they also discussed the pedagogic issues seriously and offered some intelligent, original and useful feedback. This will be discussed in relation to the evaluation criteria.
Evaluation Criteria 1 Ease of Use
In general, students found accessing the AJM very easy. They followed simple login instructions and there were no difficulties. These students had used Blackboard before. A student with no such experience would probably need more support.
The response from the server was slow. The students were frustrated at the length of time they had to wait for feedback. This may have been exacerbated by the nature of the experiment where students were expected to submit six classes in about 45 minutes
The AJM seemed unable to handle some compiler errors e.g. missing braces.
The students were pleased that they could view their grades and would be able to see the grades resulting from all tests taken. The grade entries were time stamped as login time rather then submission time.
The test programs rely on the students submitting specific file names. This is hard coded into the test program. The students found this confusing when working with different versions of a class.
Although their scores appeared to be correctly updated in their Blackboard grade book, the AJM output informing them that their old score was better than their new score was incorrect.
A BACK button should be provided to go back to test submission
The interface was considered inadequate. The students expected a GUI and suggested that error messages could be displayed in red. This, of course, is viewed in light of the points raised below concerning the use of AJM as a self -assessment tool.
Evaluation Criteria 2(a) Educational Objectives
The most interesting feedback related to the use of the AJM as a formative or summative assessment tool. All students said that they thought it was well worth using as a self-assessment tool to help them progress. They could see "little point" in using it as a summative tool which perhaps would provide little feedback. They did agree, however, that if it enabled lecturers to set fortnightly summative assessments, then attendance would probably improve greatly. The focus group (all good attendees) would not like such regular summative assessment and said that they thought that lecturers did not understand how stressful this kind of relentless assessment could be for a conscientious student.
All students said that they would use the AJM as a means of self assessment after each week’s lab exercises. They thought it would be useful and motivating to be able to check their progress in this way. It would also provide an alternative way of getting feedback and help when lab tutors were busy. One student said that in the first year he had often felt “too silly” to ask his tutor for help with a simple problem and would have appreciated an alternative source of help.
Evaluation Criteria 3 Assessment Design
All students felt that the ArrayStack test provided a reasonable level of information and support regarding errors in the ArrayStack class they had submitted, but also that this could be improved. They found the feedback relating to exception handling was confusing as it was unclear why they gained a mark if an IllegalStateException was thrown but did not gain a mark if an ArrayIndexOutOfBoundsException was thrown.
There was no information provided for classes that did not compile other than a message “Your class did not compile”. The students would have liked, at least, to see the error messages produced by the compiler itself.
As a means of summative assessment, the students thought that it could only be used in closed book conditions. Due to the nature of the assessment, all submissions would be very similar and there would be opportunities for plagiarism.
(b) Evaluation by academic staff at De Montfort University
Evaluation Criteria 1 Ease of Use
(a) Using the system
Some of the points raised by academic staff were similar to those identified by the students. Navigation through the system would be improved by an OK or BACK button allowing return from viewing the grade book.
When a score had been improved, a contradictory message stating otherwise was sometimes displayed.
The grade book recorded an attempt at the ArrayStack exercise as an attempt at the StoreCard exercise.
Some errors appeared to cause an infinite loop which could present a serious problem in a time-constrained test.
At first, it appeared that the grade book was not being updated after an improved attempt. In fact, this was done but there was a time delay between submission of the test and the update of the grade book.
Staff liked the immediate feedback provided by the system.
(b) Preparing the content
Producing the tests proved to be quite time-consuming. If the testing were an integral part of the module assessment, rather than in addition to existing assessment, then this could still be considered an efficient use of lecturer time as the preparation time would be compensated for by the automatic marking. Some staff commented that to invest such an amount of time to produce a bank of tests for a particular module would require a guarantee that the module would run for more than one year.
The most effective investment would be to develop a set of test classes that are based on core knowledge requirements for first year modules. Classes such as StoreCard would fall into this category. However, the time overhead to produce more complex test classes such as the ArrayStack may be prohibitive. The ArrayStack test itself had serious limitations. Semantics were not detected correctly and some of the feedback was nonsensical or not correct. When developing the test classes, providing correct and useful feedback proved to be more time-consuming than assessment and scoring.
Evaluation Criteria 2 Educational Objectives
(a) Features that may improve learning e.g. feedback, help with errors, opportunities for self-assessment and measurement of progress
The AJM could have potential to provide more enjoyable self-assessment opportunities for students. This could be a motivating factor but may also prove frustrating if progress was not being made. As a learning tool, it lacks the personal contact of a traditional tutorial or lab session with a tutor. If being used for self-assessment, students should be encouraged to also seek help from their tutor. Perhaps the tutor could also be alerted that a student needed help.
(b) Providing a better assessment scheme that the lecturer currently lacks time to provide e.g. regular testing, opportunities to repeat assessments
There is clear potential for more regular testing in the context of core knowledge for first year modules but the time overhead to produce suitable test classes for second year modules, which contain programs with richer structure, may be prohibitive.
(c) Freeing up lab time for more practice by providing assessment out of class hours
Students may find this useful for self-assessment but, like the student focus group, staff felt that as a means of summative assessment, it could only be used in closed book conditions. Due to the nature of the assessment, all submissions would be very similar and there would be opportunities for plagiarism.
(d) Providing feedback to students in a way that fits in with the lecturers/ universities standards for appropriate feedback?
It would be useful to identify and share good practice across the university but test developers should not be overly constrained.
Evaluation Criteria 3 Assessment design
(a) Relevance to module content
The relevance to module content has been discussed above.
(b) Relevance to module delivery pattern
In terms of module delivery pattern, the two hour laboratory sessions currently provided for De Montfort University students should provide suitable opportunity to use AJM on a regular basis. Using the AJM at various points in the session to check progress could improve the pace and structure of the session
Only compiled programs can be assessed. This may encourage students to make more effort to produce correct syntax. There is currently too much laboratory time wasted with simple syntax errors
(c) Relevance to module assessment
As a summative assessment tool, it would be appropriate for about 20% of assessment of a first year module. It could provide good opportunities for repeat assessment.
(d) Provision for specific learning needs and styles
As a means of self-assessment it could be useful for students to work from home. There would appear to be little opportunity for group work.
4. Summary and conclusions
This section will also discuss Evaluation Criteria 4 – Opportunities for integrated use within the department.
The AJM is still in the process of development and problems identified with the AJM interface and the limitations of the test programs could be resolved.
Overall, the AJM received a positive reaction from both students and academic staff. The students considered it to be a potentially useful means of self-assessment and support. They would like a better (more graphical) user interface and more detailed feedback about faults and errors. However, to achieve this, much more time would need to be invested in the design and production of test classes.
Both Students and staff would like to see compiler error messages displayed.
Staff felt that AJM could be useful for self-assessment and that scoring points may motivate students, although personal help from tutors should also be available. However, the time needed to prepare the test classes was a major concern.
Staff felt the AJM could also be useful for summative assessment. It could provide opportunities for regular, continuous assessment. This could be done on a fortnightly or three-weekly basis. Both students and staff felt that this would improve attendance which is currently poor for first year students. This would probably be most useful for first year modules with large numbers of students. It would have the added advantage of providing consistent and reliable marking.
Integration into first year modules should be based upon required knowledge of core subjects.
Integration into second year modules is more limited, particularly if an incremental approach (testing in stages) is taken to program development.
There appears to be little opportunity for summative assessment outside of a closed book environment as the potential for plagiarism is obvious
It may be useful to provide opportunities for repeating a test if a student fails but there are issues concerning how long the tests should be available, whether this would prevent model answers becoming available to students and the possibility of having to provide reassessment test classes in addition to the original.