ARG

ASF reports

Report of ASF Seminar 1

Cambridge, January 12-13 2004


> download printer-friendly summary
(pdf file 73 KB)

Contents

Summary

1 Background to the Seminar

1.1 The work of the Assessment Reform Group

1.2 The Assessment Systems for the Future (ASF) Project

2 Overview of the seminar programme

3 Key points from discussion groups

3.1 On the understanding of terms

3.2 The multiple purposes of summative assessment

3.3 Relevant features of the current system

3.4 The conditions needed to improve TA for summative purposes

3.5 The role of students in summative assessment

4 Issues raised throughout the seminar

4.1 . Reasons for change from the current assessment systems

4.2 An analysis of what is needed and of the degree to which formative assessment and summative assessment by teachers can provide it.

4.3 The need for research and development to explore and refine proposals and provide exemplars

4.4 A synthesis of current views on what is needed

4.5 Problems facing a new system

4.6 How it looks!

 

References

Appendix A Programme

Appendix B Participants

Appendix C Introduction to the ARG and ASF project

Appendix D Presentation of research review findings

Appendix E Issues in assessment by teachers

Appendix F Discussant presentation

Summary

This is a report of the first in a series of seminars and conferences conducted by the Assessment Systems for the Future (ASF) project. The project, developed by the members of the Assessment Reform Group (ARG) and funded by the Nuffield foundation, is studying the practices and issues relating to the role that assessment by teachers can take for summative purposes. Its overall goals are: to clarify thinking by educational professionals, by politicians and by others involved in education, about the nature, practice, potential and challenges of assessment by teachers; and to provide reports including recommendations for policy and practice abut the role that assessment by teachers can take in assessment systems. The project is advised by a Core Group of practitioners and representatives of policy-making bodies. The members of this group (including all members of ARG) attend all the events; additional participants are invited according to the special focus of each seminar.

The first seminar began by exploring current understandings of assessment by teachers and the involvement of teachers in summative assessment. This revealed a not unexpected need for conceptual clarification about the terms being used in relation to assessment by teachers and of the different purposes it can serve. Several points were raised about the current practice of summative assessment as required by some assessment systems. A presentation of findings from a systematic review of research on the reliability and validity of assessment by teachers used for summative purposes, followed by group discussions, led to a number of suggestions for improving the dependability (that is, the combination of reliability and validity) of teachers' assessments for these purposes.

Issues relating to the formative and summative use of assessment and how development of effective practice of the former might help development of teachers' practice in summative assessment were explored in an input from Paul Black. Finally, participants attempted to bring together their views on the problems of why changes to current assessment systems are needed, what changes are desirable and how to bring them about. The points made also identified some research and development that is needed and the problems facing any new system.

It was recognised that this seminar was an initial one, intended to uncover problems in current systems and in thinking about them, rather than to propose solutions. All the issues raised will be revisited in subsequent seminars with the benefit of inputs about differences in how assessment by teachers is used of summative assessment in various countries within and outside the UK and through the views of participants in and users of educational assessment.

1. Background to the seminar

1.1 The work of the Assessment Reform Group

Assessment Systems for the Future is a project of the Assessment Reform Group (ARG). This is a voluntary group of educational researchers who have, since 1989, been working to bring research findings to bear on policy and practice in assessment. In its early years, ARG was particularly concerned with the development of national assessment systems in the various countries of the UK and shared their findings from studying these innovations through conferences and various publications (including two books).

Although the ARG began as a policy task group of the British Educational Research Association, it has been supported since 1996 by small grants from the Nuffield Foundation. One of these grants enabled the group to commission Paul Black and Dylan Wiliam to undertake a review of research on classroom assessment. This resulted in the publication of Inside the Black Box (Black and Wiliam, 1998a), a summary of the main findings of the review, which was published in full as a special edition of Assessment in Education (Black and Wiliam, 1998b). The persuasive findings of the review, that formative assessment properly implemented can raise levels of pupil attainment, led to development work by the Kings, Medway, Oxfordshire, Formative Assessment Project (KMOFAP), led by Professors Black and Wiliam. The outcome of this project led to the publication of Working Inside the Black Box (Black et al , 2002) and the book Assessment for Learning (Black et al, 2003).

Meanwhile ARG studied the implications of the findings of the Black and Wiliam review and published a pamphlet Beyond the Black Box (ARG 1999) and Assessment for Learning: 10 Principles .(ARG 2002a) It was evident from practice, however, that teachers' ability to take advantage of the benefits of using assessment formatively, to help learning, was being impeded by the emphasis in national assessment systems on external tests, particularly when used for high stakes purposes. The ARG then set up an EPPI-Centre review group in order to conduct a review of research on the impact of tests on students' motivation for learning. The EPPI-Centre grant was supplemented by Nuffield Foundation funding and the review was carried out in 2001/2. Again, a short pamphlet of the findings and recommendations for policy and practice – Testing, Motivation and Learning (ARG, 2002b)– was published as well as the full review (Harlen and Deakin Crick, 2002, 2003).

1.2 The Assessment Systems for the Future (ASF) Project

The ASF project is funded by the Nuffield Foundation, from September 2003 to December 2005. It was proposed by ARG in order to explore the role that assessment by teachers could take for summative purposes. Assessment by teachers has the potential for providing summative information about students' achievement across the full range of activities and goals without having the negative effects on teaching and the curriculum associated with tests. However, although assessment by teachers is used as the main source of information in some national and state assessment systems, in other countries, it has the image of being unreliable and subject to bias. The ASF was set up to address the issues, concerns and reservations that surround teachers' assessment used for summative purposes. It aims to do this by bringing together information from the analysis of current practice, the views of participants in and users of assessment and what is known from research about the benefits and challenges of using assessment by teachers for summative purposes.

The overall goals of the project are:

  • to clarify thinking by educational professionals, by politicians and by various users of education, about the nature, practice, potential and challenges of assessment by teachers, and
  • to provide reports, including recommendations for policy and practice, on the role that assessment by teachers can play in assessment systems.

The method chosen to achieve these goals is to hold a series of five seminars with invited experts and two consultation conferences with potential users. The event reported here was the first of the seminars. At the same time, the Review Group set up by ARG was overseeing a further review of research, conducted by Wynne Harlen, of the evidence of the reliability and validity of assessment by teachers used for summative purposes. The preliminary findings of this review were discussed at the seminar.

The ASF has a Core Group of practitioners and representatives of policy-making bodies, which provides advice on the programme and participants for the seminars and conferences. It also ensures that information about on-going developments in assessment is taken into account as the project works to achieve its goals. The members of this group (including all members of ARG) will attend all the events, with some additional participants invited in relation to the special focus of each seminar. Those attending the first seminar are listed in the appendix to this report.

2. Overview of the seminar programme

The programme is given in Appendix A. The background and aims of the project were described by Mary James (Appendix C). Richard Daugherty then provided a set of questions for group discussion in order to explore the ways in which assessment by teachers was understood. Notes were kept of the discussion but there was no general plenary feedback at that point.

The next session focused round the findings of the review of research on the reliability and validity of assessment by teachers for summative purposes. An extensive draft of the findings had been circulated in advance of the seminar and the presentation offered some ways of bringing the findings together (Appendix D). Discussion begun in the plenary session was continued in groups guided by the questions:

  • How do the findings resonate with your experience?
  • Should TA be judged by the same standards as tests?
  • How realistic is it to strive for such levels of dependability that are needed for TA to be used for purposes for which tests are currently used?
  • What are the implications for policy and practice?

Group reporters took notes, from which the points below were extracted.

On the second day, Paul Black spoke to a paper on ‘Issues in Assessment by Teachers' starting from the experience of the KMOFAP work with teachers (Appendix E). Group discussions which followed were focused by three sets of questions:

1. How can we develop teachers' own (ie internal to the school) summative practices so that they

•  Achieve synergy with their new approaches to learning, and

•  Improve dependability?

2. To make answers to the above productive, - how can we sustain and foster assessment for learning so that it achieves its full potential to radically transform teaching and learning?

3. On what grounds can we argue for any changes to current external test regimes? Do we know what we want to replace them with? What warrants do we (can we) have for any such proposals?

As a result of discussing these questions, small groups of four participants presented their thoughts on posters which were displayed and discussed in a ‘poster parade'.

In the final session, group reporters added points to those made in the posters from the earlier group discussions and then John Gardner provided a synoptic overview of the issues raised in the whole seminar (Appendix F)

3. Key points from group discussions

3.1 On the understanding of terms

•  There is a need for conceptual clarity and perhaps consensus about the different terms being used in relation to teacher assessment and its different purposes.

•  A deeper understanding of formative assessment will help in understanding summative assessment and the essential differences between them.

3.2 The multiple purposes of summative assessment

•  Summative assessment purposes are fundamentally different for parents, students, LEAs, teachers, schools and Governments. This raises the question of whether the same summative TA can be used for all, who will need to interpret it for their purposes, or whether the translation should be made by those who report the assessment.

3.3 Relevant features of the current system

•  Prejudice and mistrust is at the heart of concern about TA; teachers don't trust each other.

•  Teachers' judgements are used throughout the system but this is not always recognised.

•  TA is used without question in higher education, but regarded with suspicion at the school level.

•  The assumption of reliability of external tests needs to be challenged. The Massey Report (Massey et al 2003) reported that teachers' assessment showed less sign of drifting standards than national tests in England.

•  There is a difference between primary and secondary practices in relation to whether judgements are linked to particular tasks, or based on criteria, which can be applied to a range of activities. Specific activities predominate at KS4, whereas criteria, rather than activities or tasks are defined at the primary level.

•  Under pressure of high stakes, teachers define activities so that they can collect the evidence, thus narrowing the curriculum at the secondary level. They may also favour tasks that give the highest grades, rather than constructing ones that enable children to show their capabilities.

•  There are considerable differences among schools in the extent to which they feel able to ‘go beyond the curriculum'. The more successful schools are often those confident enough to do this.

•  In England: the provision for disapplication from national tests does not appear to have worked; recent training in teacher assessment has been driven by a goal of congruence with tests, not of broader understanding of TA.

•  There is bound to be resistance to change, particularly from those favoured by the present system.

3.4 The conditions needed to improve TA for summative purposes

•  System reform will need to go beyond inserting data from TA into existing systems. It will entail working to a new understanding of the roles of different types and modes of assessment.

•  All involved with education need to understand that assessment is to do with the design of tasks and the quality of teaching, not just the application of criteria to products.

•  The nature of professional development required for supporting TA for summative purposes is not sufficiently defined or understood; attention needs to be given to new ways of providing effective PD.

•  It will be necessary to free up the curriculum (reducing the content, focusing on generic skills and a holistic overview of aims rather than constraining with fine detail) to enable teachers to assess effectively. (Such ideas are being explored in Northern Ireland).

•  There needs to be better evidence and understanding of the notion of progression.

•  Whilst giving tasks as exemplars is helpful, teachers need to understand the criteria so that they can be free to compose/customise their own tasks; exemplars serve to give helpful elaboration to criteria. Too fine a level of detail can be damaging in some areas e.g. writing, where broad concepts of ‘level-ness', with examples, give a helpful insight into the meaning of progression.

•  Moderation is best seen as a quality assurance process and as ‘light' as possible whilst securing public confidence.

3.5 The role of students in summative assessment

•  Most students are subjects of, rather than participants in, their assessment. They lack understanding of the assessment system. They distrust procedures such as progress files when used for high stakes purposes.

•  Students could have a role in their summative assessment if they were given time to reflect on what is needed and opportunity to produce evidence of what they know or can do. They need help in understanding the assessment system.

4. Issues raised throughout the seminar

The discussion in which the above points were made raised many issues. These were brought together by participants in the posters created in the final session. They are reported here under five headings:

4.1 Reasons for change from the current assessment systems.

4.2 An analysis of what is needed and of the degree to which formative assessment and summative assessment by teachers can provide it.

4.3 The need for research and development to explore and refine proposals and provide exemplars

4.4 A synthesis of current views on what is needed.

4.5 Problems facing a new system

4.6 How it looks!

4.1 Reasons for change from the current assessment systems

These reasons include:

  • The over-tight assessment regulations/specification constitute a major cause of a narrow instrumental approach to learning
  • The political obsession evidence in some countries with demonstrating improvement in system performance is distorting assessment approaches and impoverishing student learning
  • When high stakes are attached to test results this can create a strait-jacket stopping teachers and pupils from engaging with change in summative assessment and formative assessment practices
  • Some current systems of testing are not working; not only to they have a negative impact on teachers, learners and the curriculum but they not provide valid and reliable information about attainment
  • Better quality summative information is needed for decision-making by students and teachers
  • Working towards the goals of life-long learning and learning to learn is frustrated by certain features of the current systems in the UK
  • There is evidence (from the 1970s and other countries within and outwith the UK) that a better system could be developed.

4.2 An analysis of what is needed in assessment system and of the degree to which formative assessment and summative assessment by teachers can provide it.

Greater clarity is needed about

  • The criteria that an assessment system should meet, as a starting point; including its relationship to a worthwhile curriculum.
  • The difference between the teachers' role in formative assessment and summative assessment and whether improved practice in using assessment for learning will lead to more rigorous summative assessment by teacher.
  • How to articulate the imperative for change and how this will benefit teachers.
  • The needs of different audiences who use the judgments teachers make, eg reporting to parents.
  • The different facets under the headings of summative assessment and formative assessment and their relationships.
  • How to liberate teachers to use their own judgments professionally – not replicate external test systems in their own summative assessment.
  • Alternative approaches to using tests, such as at the start of activities.
  • Differences between primary and secondary schools in their starting points and willingness to change and resist the ‘dependency culture'.
  • How systems in Wales, Northern Ireland and Scotland deal with the issues raised in the seminar.
  • How other parts of the education system (such as adult education and parts of FE) deal with the tensions between the role of the teacher in formative and summative assessment.

4.3. The need for research and development to explore and refine proposals and provide exemplars

Information is needed about:

  • Teachers' individual formative assessment practices in their classrooms, marking, etc and how to share strengths and weaknesses in formative assessment across the school.
  • Teachers' individual summative assessment practices in different subjects, levels, sectors, etc and how to share strengths and weaknesses in summative assessment across the school.
  • How to use external tests more progressively/educationally
  • How to use evidence from pupils about the impact of summative assessment and formative assessment on their learning.
  • How to use performance structures/pay progressively
  • The role that students can take in summative assessment.
  • Why there is a difference between tests and TA and how can each be done well rather than the research making pointless comparisons between different summative assessments.

Research linked with development should be used:

  • In creating a long-term coherent, strategy for change, based on evidence and examples of good practice.
  • In identifying moderation procedures to increase the consistency in the use of TA for summative purposes; recognising that this in itself should be an important vehicle for professional development.
  • To develop strategies for increasing the intrinsic motivation of both teachers and students.

4.4 A synthesis of current views on what is needed

It was agreed that it was too early in the project to arrive at conclusions about preferred aspects of systems, but it was thought important to record the following views at this stage:

•  We should see an assessment system as a ‘Learning Health check', meaning that there is:

- Self-monitoring

- Monitoring and observation by sensitive others

- Selective use of quantitative measures as required

- A general quantitative check at longer intervals

and that each component works efficiently as part of a whole, in known and predictable ways and no part works in opposition to any other.

  • Proposals for summative use of formative assessment can support accountability in respect of students' engagement with education
  • APU-style sampling model can provide state/government with far more reliable and valid information on underlying standards than the use of blanket testing
  • Change should be through local, situated, idiosyncratic activity not top-down
  • In relation to National Curriculum Assessment 5 – 14 in England, Wales and Northern Ireland, what is needed is a strategy

•  with an emphasis on teachers as expert assessors of all students

•  supported by very strong professional development,

•  involving teachers working in partnership to gain expertise, confidence to combine assessment and high quality learning.

  • Some schools need exemption from current external pressures in order to create good examples of synergy (assessment practices and approaches to learning)
  • Teachers and schools need to be properly resourced to implement and maintain change in teachers' role in summative assessment

4.5 Problems facing a new system

  • How will school management be able to compare schools?
  • What about pupil information at points of transfer?
  • What about public accountability in terms of value-added measures for individual schools? What data can be used for this?
  • How can workload issues be accommodated?
  • How should the dependability of TA for summative purposes be judged?
  • How to communicate more effectively with parents and others and increase their confidence in teachers' judgements?
  • How to help teachers and students reconcile the teachers' dual role as teacher and assessor?
How to increase teachers' confidence in their own and other teachers' judgements and support pupils' confidence in these judgments?

 

4.6 How it looks!

 

References

ARG (1999) Assessment for Learning: Beyond the Black Box. Available on the ARG website

ARG (2002a) Assessment for Learning: 10 Principles. Available on the ARG website

ARG (2002b) Testing, Motivation and Learning. Available on the ARG website

Black, P. and Wiliam, D. (1998a) Inside the Black Box Available from the Department of Education and Professional Studies, King’s College, University of London

Black, P. and Wiliam, D. (1998b) Assessment and Classroom Learning. Assessment in Education, 5 (1) 7 - 74

Black, P. Harrison, C., Lee, C., Marshall, B and Wiliam, D. (2002) Working Inside the Black Box Available from the Department of Education and Professional Studies, King’s College, University of London

Black, P. Harrison, C., Lee, C., Marshall, B and Wiliam, D (2003) Assessment for Learning: Putting it into Practice, Maidenhead: Open University Press

Harlen, W. and Deakin Crick, R. (2002) A systematic review of the impact of summative assessment and tests on pupils’ motivation for learning (EPPI-Centre Review). In Research Evidence in Education Library. Issue 1. London: EPPI-Centre, Social Science Research Unit, Institute of Education, University of London.

Harlen, W. and Deakin Crick, R. (2003) Testing and motivation for learning. Assessment in Education, 10 (2) 169 - 208

Massey, A., Green, S., Dexter, T. and Hamnett, L (2003) Comparability of national tests over time: Key stage test standards between 1996 and 2001. London QCA

 

Appendix A

Expert Seminar of the Assessment Systems for the Future Project, January 12th and 13th, 2004, Moller Centre, Cambridge

Programme
Monday, January 12th

12.30 – 1.30 Arrival and lunch

1.30 – 2.15pm
Plenary
Welcome and introduction
Chair: Gordon Stobart
The ASF project (Mary James)

2.15 – 3.45
Plenary and groups
How do we understand assessment by teachers?
Chair: Richard Daugherty
3.45 – 4.00 Tea

4.00 – 5.00
Plenary Presentation of findings of a review of research evidence of the reliability and validity of assessment by teachers for summative purposes
Chair: Mary James
Speaker: Wynne Harlen

5.10 – 6.30
Groups
Responses to the review findings
7.30 Dinner

Tuesday, January 13th

9.00 – 10.45
Plenary and groups Issues in assessment by teachers
Chair: Kathryn Ecclestone
Speaker: Paul Black

10.45 – 11.30 Coffee + Poster session

11.30 – 12.30
Reflections on conceptions of assessment by teachers
Chair: Mary James
Discussant: John Gardner
12.45 Lunch and depart

 

Appendix B

Seminar participants

*Mr David Bartlett Co-ordinator for Assessment, Birmingham LEA
*Prof. Paul Black King’s College, University of London
*Ms Jacky Burnett Programme Leader, Assessment for Learning, QCA
Ms Sheila Dainton ATL
Ms Debora Dhillon AQA
*Prof. Richard Daugherty University of Wales, Aberystwyth
*Dr. Kathryn Ecclestone University of Exeter
*Ms Janet English Head teacher, Malvern Way Infant and Nursery School
*Prof. John Gardner Queen’s University, Belfast
*Prof. Wynne Harlen University of Bristol and University of Cambridge
*Mrs Carolyn Hutchinson Head of Assessment Branch, Scottish Executive
*Dr. Mary James University of Cambridge
*Mr Martin Montgomery Assessment Development Manager, CCEA
Prof Roger Murphy University of Nottingham
Dr Tim Oates QCA
*Dr Catrin Roberts Nuffield Foundation
Ms. Karen Robinson NUT
*Mr Jon Ryder Lord Williams’s School
*Prof. Judy Sebba University of Sussex
*Dr. Gordon Stobart Institute of Education, University of London
Ms Penny Todman St Luke’s Primary School
Dr Mike Walker Head Teacher, King Edward VI Grammar School
*Ms Anne Whipp ACCAC
Dr John Wilmut Independent Consultant
*Core Group members

Appendix C

Introduction to Assessment Reform Group (ARG) and the Assessment systems for the Future (ASF) project (Mary James)

The ARG
• established. in 1989 as BERA APTG to bring insights from research to the attention of policy-makers
• in 1996 ARG became supported largely by Nuffield funding for small projects
• membership changes but numbers about 8 from across UK

Some Outputs
• Books: Policy Issues in National Assessment (1992); Enhancing Quality in Assessment (1994)
• Commissioned Black and Wiliam review of formative assessment: Inside the Black Box (1998a)
• Pamphlets: Assessment for Learning: beyond the black box (1999); Testing, Motivation and Learning (2002b)
• Leaflet: Assessment for Learning: 10 principles (2002a)
• EPPI review group on assessment: ALRSG (3 reviews to date and 4th proposed)
• Seminars and conference symposia

Assessment Systems of the Future Project
• Funding: Nuffield Foundation
• Focus: role of assessment by teachers in national assessment systems
• Form: 5 expert seminars and 2 wider consultation conferences

Background
1. Components of UK Assessment Systems
2. The Issue
• A knowledge society needs lifelong learners who are motivated to learn and who have learned how to learn.
• Assessment for learning can raise standards and improve learning.
• But practice is dominated by summative assessments that mimic external tests and exams because of target-setting based on test results.
• Negative impact on motivation and a narrowing effect on curriculum and pedagogy.
3. Prof J Osborne on Science Ed TES 2nd January
‘…the current system has progressively led to teachers’ disengagement from the process of assessment. Yes, there may be coursework assessment of investigative skills but any teacher of science will tell you … that the environment of high stakes assessment has reduced all assessment of investigation in science to the overwhelming dominance of three practicals – measuring the resistance of a wire, the rates of a chemical reaction, and the rate of osmosis in a potato. (…) Their use is driven simply by the need to maximise student chances of a high grade and, in the process, has reduced empirical enquiry in science to a set of recipe-like steps. (…) this approach to investigative work bears as much relation to science as painting by numbers does to art.
4. Signs of Hope ?
• England – Chief Inspector of Schools: ‘time to look at indicators other than test results of schools’ achievements’.
• Wales – teacher assessment only at KS1; DARG review of KS2 and KS3 tests.
• Scotland – AfL prominent in the recent consultation on ‘Assessment, Testing and Reporting’, 3-14.
• NI – ‘Pathway’ proposals for a new KS3 curriculum embrace principles of AfL as a platform for reform
5. Evidence of Validity & Reliability of Teachers’ Summative Assessments
To be reported (see Appendix D)

ASF Project Aim
To formulate guidelines for policy on assessment which would achieve the best balance between reliability, validity, cost and benefit to learners and teachers.

Goals
General: to clarify thinking about the nature, practice, potential, challenges of assessment by teachers used of summative purposes.

Specific:
• To find out how assessment by teachers is understood.
• To bring together what is known from studies of reliability and validity.
• To report on current roles of assessment by teachers in UK.
• To find out what can be learned from other countries.
• To learn from the perspectives of ‘users’: pupils, parents, teachers, employers.
• To propose ways of making assessments by teachers more dependable and efficient.

Process
• Expert seminar series (3+2) – face to face interaction enables contexts of views and practices to be explored, set against alternatives and more deeply understood.
• 1st consultation conference – to consider the evidence and implications and to carry out interim evaluation of the project.
• 2nd consultation conference – to consider the final outcomes and help to draw up recommendations.
• Core group seminar – to agree outcomes and plan publication.

Participation
• Core group (including all members of ARG).
• Invited seminar participants additional to core group.
• Invited consultation participants additional to core group.

Outcomes and Dissemination
• Monograph
• Pamphlets
• Dissemination events/meetings with policy groups

 

Appendix D

Presentation of research review findings (Wynne Harlen)

Review questions
- What is the research evidence of the reliability and validity of assessment by teachers for the purposes of summative assessment?
- What conditions affect the reliability and validity of teachers’ summative assessment?
- (What are the implications of the findings for policy and practice in summative assessment?)

Assessment by teachers
…the process by which teachers gather evidence in a planned and systematic way about their students’ learning to draw inferences, based on their professional judgement, to report achievement at a particular time.

Reliability
Refers to how accurate the assessment is (as a measurement). If repeated, how far the second result would agree with the first.

Validity
How well what is assessed matches what it is intended to assess. Different forms of validity derive from different ways of estimating it.
Construct validity as an overarching concept.

Dependability
Reliability and validity are not independent of each other; both cannot be maximised. Dependability is a combination of the two. The approach to TA giving the most dependable assessment would protect construct validity whilst optimising reliability.

Main findings of the review
• Reliability of portfolio assessment where tasks were not closely specified was low.
• Tentative evidence that estimates of construct validity of portfolio assessment, derived from evidence of correlations of portfolios and tests, were low.
• Fine specification of criteria is capable of supporting reliable TA whilst allowing evidence to be used from the full range of classroom work.
• Conflicting evidence as to the relationship between teachers’ ratings of students’ achievement and standardised test score of the same achievement when the ratings are not based on specific criteria.
• Teachers’ judgements guided by check-lists and other materials in the Work Sampling System have high concurrent validity for assessing young children.
• The clearer teachers are about the goals of students’ work, the more consistently they apply assessment criteria
• Teachers’ judgments of students’ performance are likely to be more accurate in aspects more thoroughly covered in their teaching.
• Teachers who have participated in developing criteria are able to use them reliably in rating students’ work.

Studies of the NC tests for KS1:
• Considerable evidence of error and bias in early 1990’s
• (But also variation in the administration of standard tasks)
• Introduction of TA had initial beneficial effects on planning, but later collaboration in assessment declined.
• Studies of NCA at KS2: Results of TA and standard tasks, ’96 – ’98, agree to an extent consistent with the recognition that they assess similar but not identical achievements, despite evidence of variation of practice among teachers in their approaches to TA, type of information used and application of national criteria.

The assessment by teachers of oral proficiency in foreign language
Teachers are consistently more lenient than moderators, but are able to place students in the same rank order as experienced examiners.

In assessing practical work in science…
• ‘A’ level projects assessed by teachers were more closely related to teacher assessed laboratory skills than when assessed externally.
• Teachers are able to score hands-on science investigations and projects with high reliability using detailed scoring criteria.

The conditions affecting dependability
• Widespread reporting of bias in TA relating to student characteristics, including behaviour (for young children), gender, special educational needs; overall academic achievement and verbal ability that may influence judgement when assessing specific skills.
• There are considerable differences among teachers in their approaches to TA
• There is variation in the level of TA and in the difference between TA and standard tests related to the school.
• No consistent pattern suggesting that dependability of TA is greater in some subjects than others.
• Teachers can predict with some accuracy their students’ success on specific test items and on examinations (for 16 year-olds) given specimen questions. There is less accuracy in predicting A level grades (for 18 year-olds).
• Detailed criteria describing levels of progress in various aspects of achievement enable teachers to assess students reliably on the basis of regular classroom work

Training…
• should involve teachers as far as possible in the process of identifying criteria so as to develop ownership of them and understanding of the language used.
• should also focus on the sources of potential bias that have been revealed by research.

Moderation
• Moderation through professional collaboration is of benefit to teaching and learning as well as to assessment.
• Dependable assessment needs protected time for teachers to meet and to take advantage of the support that others including assessment advisers can give.

Requirements for dependable assessment
• Decisions about the domain of knowledge, skills etc. to be assessed that are justified in terms of how learning takes place.
• A valid sample of student behaviour in the domain.
• Criteria for judging the sample well matched to the goals of the curriculum and of the domain.
• Procedures for the reliable and unbiased application of the criteria.
• Procedures for reporting and communicating with users of the assessment outcomes.

Problems of methodology
Of studies: How should TA be evaluated (if not against tests)? What exactly is the evidence used in TA?

Of TA used of summative assessment: What should be assessed - snapshot? – data accumulated over time? How can criteria be matched to data?

Possible types of action
• the specification of the tasks
• the specification of the criteria
• training
• moderation
• the development of an ‘assessment community’ within schools, allied to increased confidence in the professional judgement of teachers.

Appendix E

Issues in Assessment by Teachers (Paul Black)

1. Introduction

This paper looks at the issues in relation to the two broad purposes of assessment, the formative and the summative. Results of recent work on formative assessments are first reviewed in the second section in relation to criteria for quality of such assessments. The next section looks at the prospects for the further and widespread development of summative practices, and this is followed by a discussion of evidence and of prospects for the development of teachers' summative assessments where they have to report results, both within school and externally. A final section considers briefly some important problems for the future.

2. The quality of formative assessment by teachers

The research evidence on formative assessment has shown both that it can raise standards of performance and that its present practice is generally weak (Black & Wiliam, 1998). Some of the main features of this evidence have been listed previously i.e. that teachers tend to encourage rote/superficial learning, use questions of poor quality, can predict test results whilst knowing little about learning needs, and over-emphasise the grading function rather than the learning function of assessment. I believe that these findings reflect deeper problems, principally that teachers' interaction with their pupils in classroom dialogue is often restricted, particularly because feedback, both in dialogue and in written work, is judgmental rather than about how to improve. Many teachers see themselves as having to ‘deliver' the curriculum rather than as having to help pupils to learn. Some of these problems have been exacerbated by the pressures of the national curriculum and national testing, but they have been around for far longer.

The work of the Assessment Reform Group (ARG) and of the King's group have developed ways out of this. The King's-Medway-Oxford Formative Assessment Project (KMOFAP – Black & Wiliam, 2003) has shown that, through work on classroom dialogue, on giving feedback on homework using comments without marks, on promoting peer-and self-assessment, and on the formative use of summative tests, teachers can make quite radical changes that raise pupils' test performance. One reason why such work has achieved this success must be that the practices developed reflect, and implement in practice, some established findings about effective learning, i.e. that teachers' should :

• Start from a learner's existing understanding.

• Involve the learner actively in the learning process.

• Develop the learner's overview, i.e. meta-cognition, which requires a view of purpose and an understanding of criteria of quality of achievement so that self-assessment is possible.

• Encourage social learning, i.e. learning through active involvement in discussion.

Key concepts here are the related principles of regulation and feedback. Perrenoud (1998) stresses the primacy of the concept of regulation, which he explains as follows:

I would like to suggest several ways forward, based on distinguishing two levels of the management of situations which favour the interactive regulation of learning processes:

• the first relates to the setting up of such situations through much larger mechanisms and classroom management.

• the second relates to interactive regulation which takes place through didactic situations. (p.92)

It is helpful to see feedback as ‘interactive regulation', for it stresses that feedback requires both ways to evoke evidence from the learner as an indicator of learning need, and to provide feedback from the teacher to the learner. As the review of Kluger and DeNisi (1996) showed, feedback as such only has a benign effect if it helps the recipient to see how to improve. Perrenoud emphasises that the teacher's intervention has to involve :

. . . . an incursion into the representation and thought processes of the pupil to accelerate a breakthrough in understanding, a new point of view or the shaping of a notion which can immediately become operative. (p.97)

On their own, marks or grades do not do this . However, if the discussion is broadened beyond cognitive principles to introduce the dimension of the motivation of students, which depends on their beliefs about themselves as learners and their self-esteem, then a different aspect of feedback assumes importance.

Feedback given as rewards or grades enhances ego rather than task involvement and can damage the self-esteem of low attainers. Feedback which focuses on what needs to be done can encourage all to believe that they can improve. Many studies, notably those of Carol Dweck (2000), have established the apparently subtle yet powerful differences that attend the ways in which feedback is given. Put briefly, feedback whish helps create or confirm belief that one is an intrinsically bright, or dumb, student – the ‘fixed IQ view' – is damaging, whilst feedback which focuses on the qualities, good and bad, of the work under consideration is helpful in confirming the belief that one can always do better. The ‘fixed IQ' view can be harmful even when it leads to praise for the smart pupils, for the evidence is that it leads such pupils to avoid any tasks in which they see a risk that they might not do very well, because their self-esteem rests on reputation for success, not on grasping opportunities to learn.

3 Prospects for development of formative assessment

Is it easy to see, in the UK, many positive signs of the development of formative assessment. Recent publications, notably of the ARG and of the King's group, have achieved widespread impact e.g. the booklet “Working Inside the Black Box” (Black et al. 2002) has sold over 35 000 copies, and the recent King's book “Assessment for Learning: Putting it into practice” (Black et al. 2003) has sold so well that the publishers have ordered a reprint only two months after it first appeared. The Scottish Education Department has conducted trial studies of formative assessment and, in the light of their success, has made it a priority in their current plans for development. The DfES is recognising the importance of this work in making Assessment for Learning a key component of the Key Stage 3 initiatives for England.

However, the optimism that such recognition provokes must be qualified. I identify three areas of concern :

Sustaining commitment and support

The evidence in the KMOAP project was that the changes provoked are demanding, for they imply radical changes in the beliefs of both teachers and learners about the roles they should play in the learning work in school. In that project, the changes happened only slowly, took between one and two years before they yielded results, and depended on sustained commitment and continuing support. Thus, any anticipation of rapid results from a short training course is likely to be disappointed.

Understanding the concept

In the work of the King's team with LEAs and with individual schools, we frequently meet mis-understandings. Many say they are ‘doing it already', when classroom observation shows that they are not. Some say ‘that's just good teaching', which seems innocuously true until discussion reveals that their model of good teaching is one of clear ‘delivery' and of simply telling pupils whether or not they have met the targets. Some, interpreting in terms of old models of assessment, think that regular testing which produces rows of marks in their record books, i.e. frequent summative, is what is being promoted. Much of this mis-interpretation is the genuine response of experienced professionals when they try to accommodate a new idea into their existing framework of beliefs and practices. This problem can be overcome with sustained programmes of development. However, it is not helped by the adoption of the phrase ‘assessment for learning', notably by the DfES and by ministers, as a label to describe a wide range of initiatives. Although in strict logic the broader application can be defended, the concept is thereby so broadened that it loses specific meaning. It may be necessary from now on to emphasise the term ‘formative assessment' rather than ‘assessment for learning' because the vague title may support the many misunderstandings that can undermine the prospects of change.

Summative pressures

Some of the teachers in KMOFAP found ways to make their own summative tests an integral part of the formative approaches. Peer- and self-assessments against statements of aims were used as the basis for improving pupil's revision in preparation for a test. Pupils spent time setting test questions, and worked after a test in groups to mark one another's papers, often being required to first invent a marking scheme. These practices helped pupils to develop a meta-cognitive overview so that they could link the aims of the learning to the aims of the tests. The main point established was that summative tests could be, and should be seen to be, a positive part of the learning process. By active involvement in the test process, students can see that they can be beneficiaries rather than victims of testing, and can also be helped to understand the testing process and to read questions and compose answers more carefully and critically

It is widely recognised that the pressures of summative tests can inhibit good teaching and learning and that these pressures can also be harmful to pupils (Harlen & Deakin-Crick, 2003). The experience reported here shows that such negative interplay is not inevitable. However, there are very serious problems in linking formative and summative practices – and this leads to my next section.

4 Teachers summative assessments

Lessons from the KMOFAP project

Some significant problems at the formative-summative interface became clear in the KMOFAP work, i.e. with teachers who had achieved strong development in their formative assessment practices. It seems that good formative practices may be used in relation to summative testing in three ways. The first is a narrow way is to continue to teach to the test, but making such narrowly focussed teaching more effective by helping pupils – to prepare by revision, to ‘read' the intentions of test questions and to anticipate the criteria by which their answers might be judged. The second is a broad way is to teach for understanding of the concepts and processes which lie at the heart of the subject and trust that such deeper understanding will inevitably produce better test performances. Whilst there was good evidence that the latter strategy would be rewarded, many teachers would rather use both approaches, using the broad approach most of the time, and switching to the test-focussed approach in the immediate run-up to high-stakes tests.

The third way, of particular relevance here, is that through formative assessment work teachers should be better equipped with methods and data to make summative judgements. The development of new ways to assemble and use such methods and data for summative purposes was not an aim of the project, and so the lessons learnt were of ways in which, without the support of the project, teachers' accommodated their new formative with existing summative practices.

The use of teachers' assessments in summative testing can be examined in two stages: the first is to explore the link between teachers' formative assessments and their summative assessments, and the second is to explore their use of their summative assessments to serve purposes of public accountability. Because of the project's aim to make the findings applicable within the normal context of schooling, the second of these was taken as fixed. Partly because the pressures of these external requirements, and because of the influence of the high-stakes regime on the methods and norms for schools' own summative work, the project seemed to have slight and uneven effects even on the formative to summative links for teachers' own assessments.

Some said they felt constrained, by in-school reporting requirements, to report students' achievement as scores in formal tests even although they would personally prefer to trust the improved informal knowledge of their students developed by the formative practices. As one teacher put it :

I know a lot more about this class because of the formative assessment . I mean we discuss things, they do presentations, they talk to me, I talk to them, they talk to each other – and I could tell you for every one of this class their strengths and weaknesses.

Unfortunately, ways in which such knowledge could be translated into defensible numerical scores were not developed.

For the composing their formal written tests, most teachers used questions from the external national tests, despite doubts about the validity and quality of many of the questions. They did this because of pressures of time, and of doubts about the credibility that tests composed by themselves would command. The frequency of the summative reporting required within school was very variable, ranging from once a year to once every two or three weeks. What was not clear, in the case of the frequent formal testing and where records of marks for every piece of written work had to be kept, was the purposes that these numerical scores were required to serve. There seemed to be no developed rationale for aligning the evidence with the various purposes, of tracking, of study choice decisions, and of reporting to parents, which that evidence was meant to inform. The routine was to produce a number, either from aggregation of several test scores in some cases, or from a single terminal test score in others.

The overall outcome was of an uneasy fault line between the formative practices and the summative regimes. Another substantial project would be needed to explore this interface, but in the absence of any promise that policies for external accountability testing could be amended in the light of its findings, such a project might lead only to frustration. As one teacher in the King's project put it at the end of a discussion of these issues by a group of the project's teachers – “ It's a bit depressing that isn't it ”.

Approaches to teachers' summative assessment.

The common model in the UK, for the school-leaving examinations, is to require teacher assessment of set-pieces of students' work which will cover aspects of the curriculum, notably practical work, which written tests cannot explore. This approach has had some deplorable effects: an early paper by Paechter (1995) exposed how, as UK teachers had to ‘administer' tasks constrained by rules et by the examining authority, teachers were uncertain of their role, some behaving as external examiners, others refusing to suspend the normal teaching role which they would play with such tasks. In science education, it is now widely recognised that the ‘investigations' component of the attainment target Sc1 is a disaster area (Duggan & Gott, 1996) . The rules to which teachers' assessments must conform, and the further constraints imposed by external moderators, have reduced the work to a process of getting students to jump through clearly defined hoops. There is little variety in the tasks set. One in widespread use is an investigation of how the length of a piece of wire affects its resistance: this is popular because the results are reliable, repeatable, and always give a good straight line. Such results are no surprise to students – and they have no interest in them. The work has become a travesty of scientific enquiry (King's College, 2003). A similar picture of ambiguity and tension has been described by Baker and O'Neill (1994), in the context of US innovations under the broad umbrella title of ‘performance assessment'.

Studies of a different approach - portfolio assessment – have brought out a sharp contrast between the attractions of the freedom this approach gave to teachers and their students, and yet the weak features which have all but de-railed some initiatives. One aspect was brought out by Stecher (1998) showing how teachers' practices were narrowed down to ‘rubric-driven instruction' as requirements of reliability and validity imposed constraints (see also Koretz, 1998). A more positive prospect for both enriching and under-pinning teachers summative work can be envisaged by external provision of test instruments for teachers to use at their discretion. Both Gilbert (1996) in the UK and Rowe and Hill (1996) for Australia describe the provision of well researched resources, with Gilbert stressing that the development in Art has promoted valuable ‘assessment conversations' both between teachers, and between teachers and their students.

5 Looking ahead

Whilst there is evidence, as Wynne Harlen's (2004) review shows, to support the belief that teachers can produce summative assessments that can match or surpass external tests, both in reliability and validity, there is little evidence that such work can be so developed by teachers that they can constructively align their formative and summative practices. Much of the work to date has not drawn on contexts in which formative practices have been well developed, and so there is far more work to be done if the optimum synergy between these two, and so between assessment for learning and assessment for certification and accountability, is to be achieved.

Such alignment could only be productive if formative practices could be developed on a wide scale in ways that made them robust – i.e. so that teachers themselves were convinced and confident in incorporating them into their practice. Any initiative set up quickly to give teachers more responsibility for those summative assessments that are in the public domain could well create new pressures that could de-rail a fragile development.

What is missing in most of the discussions is attention to the pupils' perspective. The studies of Moni (2003) and of Brookhart and Bronowicz (2003) showed that students may well interpret all assessments as summative, and both devalue and/or resist their involvement in them. Furthermore, attempts to convince disenchanted young learners that assessments can be valid, in rewarding the things that they value and can do well, may have to explore modes of assessment and concepts of validity in unusual directions (see e.g. Johnson, 2003, Jewitt 2003).

References

Baker, E.L. & O'Neil, H.F. (1994) Performance Assessment and Equity: a view from the USA, Assessment in Education , 1 , 11-26.

Black, P. & Wiliam, D. (1998) Assessment and Classroom Learning, Assessment in Education , 5 , 1-74.

Black, P. & Wiliam, D. (2003) ‘In Praise of Educational Research': formative assessment. British Educational Research Journal. 29 (5), 623-37.

Black, P.; Harrison, C.; Lee, C.; Marshall, B. & Wiliam, D. (2002). Working inside the black box: assessment for learning in the classroom . London, UK: King's College London Department of Education and Professional Studies.

Black, P.; Harrison, C.; Lee, C.; Marshall, B. & Wiliam, D. (2003). Assessment for learning: putting it into practice . Buckingham, UK: Open University Press.

Brookhart, S.M. & Bronowicz, D.L. (2003) ‘I don't like Writing. It Makes My Fingers Hurt': students talk about their classroom assessments, Assessment in Education , 10 , 221-242.

Duggan,S. & Gott, G. (1996) Scientific evidence: the new emphasis in the practical science curriculum in England an Wales. The Curriculum Journal 7 (1), 17-33.

Dweck, C. S. (2000). Self-theories: Their role in motivation, personality and development. London: Taylor and Francis.

Gilbert,G. (1996) Developing an Assessment Stance in Primary Art Education in England, Assessment in Education , 3 , 55-74.

Harlen, W. (2004) Private communication.

Harlen, W. & Deakin-Crick, R. (2003) Testing and Motivation for Learning, Assessment in Education , 10 , 169-208.

Jewitt, C. (2003) Re-Thinking Assessment: multimodality, literacy and computer-mediated learning, Assessment in Education , 10 , 83-102.

Johnson, D. (2003) Activity theory, Mediated Action and Literacy: assessing how children make meaning in multiple modes, Assessment in Education , 10 , 103-129.

King's College (2003) Internal report on science teachers' contributions to seminar on the Tomlinson enquiry. London: King's College.

Kluger, A. N. & DeNisi, A. (1996) The Effects of Feedback Interventions on Performance: A Historical Review, a Meta-Analysis, and a Preliminary Feedback Intervention Theory. Psychological Bulletin, 119 (2), 254-284.

Koretz, D. (1998) Large-scale Portfolio Assessments in the US: evidence pertaining to the quality of measurement, Assessment in Education , 5 ,.309-334.

Moni, K.B., van Kraayeenord, C. & Baker, C.D. (2002) Students' Perceptions of Literacy Assessment, Assessment in Education , 9 , 319-342.

Paechter, C. (1995) ‘Doing the Best for the Students': dilemmas and decisions in carrying out statutory assessment tasks, Assessment in Education , 2 , 39-52.

Perrenoud, P. (1998). From Formative Evaluation to a Controlled Regulation of Learning Processes. Towards a wider conceptual field. Assessment in Education. 5 (1), 85-102.

Rowe, J.R. & Hill, P.W. (1996) Assessing, Recording and Reporting Students' Educational Progress: the case for ‘subject profiles', Assessment in Education , 3 , 309-352.

Stecher, B. (1998) The Local Benefits and Burdens of Large-scale Portfolio Assessment, Assessment in Education , 5 , 335-352.


Appendix F

Goals of Seminar 1 Discussant Presentation (John Gardner)

1. Understanding of TA
- Classroom-based
- Over period of time
- Articulation of teacher’ knowledge of their students
- Individualistic-ish
2. Review of Reliability and Validity of TA
- Perceptions: untrustworthy
biased
not as good as ‘tests’
generous
- Evidence:
mixed but TA performance improved when . . .
fine specification of criteria
uniformity of task specification
thorough knowledge of curriculum and learning goals
in-depth coverage of topic
participation in developing the criteria
bias toward student characteristics is addressed? – gender, SEN, behaviour
- Thoughts of Participants:
(a) teachers’ understanding of FA crucial – changes relationship and role of Summative Assessment
(b) must challenge teachers’ beliefs
trust
worthiness of TA and ‘tests’
irony of HE TA
irony of teacher role in GCSE etc
-training
-moderation
(c) are teachers ready for a strong role in Summative Assessment?
do we know how to prepare them? – Professional Development
(d) high stakes: tests ‘loom large’
closer specification of curriculum
more intrusive on/directive of pedagogy
more erosive of time
latest ‘definitions’ AFL?
(e) change process:
our warrant/authority for change?
what to do/how to do it?
how sustain?


 

© ARG 2004

Last update: 28 March 2004