Please note that updates to patient records are applied via a 'difference mask' strategy, which enables an audit trail to be left and mistakes to be contained. It is dangerous to edit the 'active' file, as errors are difficult to spot and undo and in any case the changes will be lost if the file is regenerated in the future. The correct way to apply updates is as follows:
With the exception of second neoplasm sites, ALLC second remission dates and w.b.c. measurements (below), '0' or 'blank' always means 'not (yet) available'.
Ideally, dates are represented as a 7- or 8-digit number representing DMMYYYY or DDMMYYYY.Sometimes the day is not available, in which case please just put MYYYY or MMYYYY (ranged right).
Sometimes, too, only the year is available, in which case please just put YYYY (ranged right).
It isn't necessary to guesstimate the missing digits, which can confuse trialists who didn't supply them (missing information is automatically 'best-guessed' by the software when doing analyses).
Because there is a great deal of 'heritage' (pre-2000) information currently on file awaiting update, a 2-digit year 'YY' would currently be interpreted as '19YY'.
There is a special convention for 'date first remission' in the ALLC overview: because there is no separate 'yes/no' variable, '-1' denotes 'definitely no CR' and '-2' denotes 'CR on unknown date'.
When placing dates in the text area, please include leading underscores where necessary to reduce the risk that an incomplete date might be misconstrued as something else, e.g. __MMYYYY or ____YYYY.
'0' means 'not (yet) available', usually because the trialist neglected to say what the patient died of.'12' means 'unknown' in the sense that the trialist has tried and failed to find out what the patient died of.
Unless the trialist says 'unknown', please don't put in code '12' as this tells us to abandon hope of pursuing it further. If the trialist says 'unknown but no evidence of (overview) disease at death', please put '10' (i.e. non-disease cause) and enter something like "NED (d. unknown)" in the text field (as described below).
'9' really means 'none of the other categories' whereas '10' just means 'not the overview disease', either because that is all that we know or sometimes because more than one death cause is supplied (in which case please enter the death cause text as described below, so that A.N. Expert can adjudicate later).
[Note added for BC 2000 overview: there is now a proliferation of additional codes for uncertain cases - please consider the coding scheme accordingly].
[Note added for BC 2005 overview: there is now a proliferation of additional codes for uncertain cases - please consider the coding scheme accordingly].
A site code '199' means 'not (yet) supplied' (same ethos as '0', which could be mistaken for 'no additional neoplasm' if no date is supplied). A site code '1999' means 'unknown' in the sense that the trialist has tried and failed to find out the site.
If the measurement is actually '0' please put '-13'! Otherwise, it would be mistaken for 'missing'. Sorry, this is for historical reasons.
If the measurement is actually '0' please put '-1'. Otherwise, it would be mistaken for 'missing'.
There is an infinitely-extensible space to put text, e.g. patient name, birth date, hospital number, N.H.S. number, life events, causes of death, multiple ICD codes, sundry comments etc etc.A convention is applied, mainly to make it easier to scan files by eye but also in some cases to enable software to recognise the information (e.g. birth dates and multiple ICD codes), that these items are entered in the following way:
Order of multiple items, where available (suggestion only):
1. | Patient name | Given Names Surname (Maiden Surname) |
2. | Birth date | (b. DDMMYYYY) or (b. __MMYYYY) or (b. ____YYYY) |
3. | Hospital number | (XYZ456789A) |
4. | N.H.S. number | NHS[ABC/123] |
5. | Additional ER measurement | ER[code] |
6. | Additional PR measurement | PR[code] |
7. | Sentinel nodes | SN[nodes] where nodes might be a count (n), a fraction (n/n) or a percentage (n%) |
8. | HER-2 assay | HER2[type of test,HER-2 measurement] |
9. | FISH assay (HER-2/CEP-17 ratio) | FISH[type of FISH,FISH measurement] |
10. | CISH assay | CISH[type of CISH,CISH measurement] |
11. | Life events, e.g. additional malignancies, serious diseases | (event, DDMMYYYY) or event (DDMMYYYY) or (event, __MMYYYY) etc |
12. | Multiple ICD codes for additional malignancies | 3MAL[n,NNN,DDMMYYYY,n,NNN,DDMMYYYY] or 3MAL[n,NNN.N,DDMMYYYY,n,NNN.N,DDMMYYYY] etc, where n is 7, 8, 9 or 10 |
13. | If died definitely 'without disease', i.e. recurrence-free | NED (or NSR, if the trialist prefers) |
14. | Cause of death | (d. cause, cause) or (d. 1a. cause, 1b. cause, 2. cause) |
15. | Multiple ICD codes for death | ICDn[NNN,NNN] or ICDn[NNN.N,NNN,NNN.N] etc, where n is 7, 8, 9 or 10 |
16. | Other remarks | emigrated to Australia; flagged |
17. | Text supplied by trialist, where strange | TEXT[strange illegibilia in tongues] |
There is a program ICDADD which will append the text associated with ICD codes to the end of the record, where required, e.g. before printing out raw data for inspection. However, this can make the record too wide for some text editors and needs to be applied with care. It is also a pain if the ICD code is subsequently changed, as the text has to be stripped out again; for these reasons, ICDADD is not applied to the central data set.
To add a new centre/group
- Select the next free serial number in LIST
- Allocate a slot in the relevant filing cabinet in the Overviews Office
- Create a project file LBCnn, where nn is the centre/group number (the file template is LBC) and add in whatever details are available, including main personages and their contact details, general information and relevant publications; begin the calendar of events and action items (e.g. 'get data')
- Add the main personages' contact details to ZAA
- Add a banner line in LIST, including the centre/group name, country, main personages and date
- Add details for at least one trial/stratum, as described below
- Add a generic entry to BRIEF
- (Optionally) run TERSE and RGBRIEF to update the YLIST and RGBRIEF files
- Run RUGRAT and put the print-out into the filing cabinet together with any other related bumf
To add a new trial/stratum
- Select the next free serial number for the relevant centre/group in LIST
- Add a banner line in LIST, together with at least two treatment groups - (Unknown) suffices as a place-holder
- Add a corresponding entry to BC:PLANET, if relevant, and check that the treatment agent decriptions in LIST are all 'generic', follow the usual machine-readable precepts and are represented in JG$DK:PLANET
- Add entries to BRIEF, FINISH, GUESS, LURCHE, SERIAL and UPDATE, insofar as possible and where relevant
- If data are available, process and add to the database (as described below)
- Add appropriate entries to EXTRAS - at the very least, 'trials with no data' or 'rejects'
- (Optionally) run TERSE and RGBRIEF to update the YLIST and RGBRIEF files
- Run RUGRAT and put the print-out into the filing cabinet together with any other related bumf
- Make sure that all of the treatment comparisons are added to the relevant steering files for subsequent analysis
To add new data
- Normally, add a data conditioning module to TRAUMA and iterate with STARE, INQUIST and LIFE until the new data set is as close to perfect as can be achieved without pestering the trialist. Sometimes it is more convenient to write an ad hoc program than to use TRAUMA. In desperate cases, some help might be found by using external tabulation packages. If no individual patient data are available, although some tabular information has been obtained - typically from publications or reports - the module or program would instead generate a set of synthetic patient records to enable the analysis to do its best with whatever information is available
- Should any questions arise, or any data errors be spotted, make a carefully-considered list of key points and edited highlights from the STARE and perhaps other print-outs to refer back to the trialist. It is paramount to 'measure twice and cut once' because most trialists and data managers tend to respond best to a few very clearly-framed and minimalist questions - if at all!
- Amend all the relevant details in BRIEF, EXTRAS, FINISH, GUESS, LBCnn, LIST, LURCHE, SERIAL, UPDATE and other files mentioned in the 'to add a new trial/stratum' section (above)
- (Optionally) run TERSE and RGBRIEF to update the YLIST and RGBRIEF files
- Run INQUIST, LBASIC, LIFE, RUGRAT and STARE and put the print-outs into the filing cabinet together with any other related bumf. Send carefully-selected highlights to the relevant trialist(s) and/or data manager(s) unless questions have arisen which need to be dealt with first
To update existing data
- The procedure follows the same general course as described in the 'to add new data' section (above), except that the new and existing records for each patient are carefully compared. This enables the quality of all baseline variables to be maintained and helps to prevent errors in the new data from degradating existing, usually well-verified results. Please see the Introduction (above)
Deleting items
The main watchword is 'DON'T'. The nuisance factor of past deletions lives on as a perpetual reminder and can entail the unwelcome tedium of groping through dusty old boxes of papers and near-unintelligible old computer files and media whilst trying to ascertain the provenance of various latter-day data sets and the associated decisions and actions taken down the ages. If the decision to remove a centre/group, trial/stratum or data set is imminent, please consider the alternatives first:
- Centre/group: sometimes it is necessary to rationalise a collection of trials/strata which have been allocated to different centres/groups, or a centre/group might be subsumed by another or even cease to exist. Even if there is no possibility of data becoming 'orphaned' when doing so, please consider instead 'copying' all the relevant data and associated information into the new centre/group (don't forget to make certain that the trial/stratum identifiers on the patient records are all changed correctly), then marking the 'dead' trials/strata as 'rejects' (with appropriate pointers to where the 'live' can now be located) and leaving them intact, together with the old project file (LBCnn) and the other associated materials. If the bumf in the filing cabinet is moved and renumbered, please leave a note in the old slot too.
- Trial/stratum: sometimes it is necessary to delete a duff stratum during the course of improving the description of a trial, often whilst its data set is being rebuilt. This isn't too bad, unless the serial number is subsequently recycled to refer to a different trial. Normally the number of strata increases during this process, so the problem can easily be avoided. Conversely, if a complete trial is to be removed for any reason, however 'nonexistent', 'ineligible' or 'duplicated' it might be, STOP! There is a real danger that someone in the future will 'rediscover' it and waste a great deal of time putting it back into the system before finally arriving at the same conclusions. Please just leave it in situ and mark as 'reject' with an explanatory note (as described below).
- Patient record: of course, the numbers of patients in data sets can go down as well as up - for example, some might prove not to have been properly randomised. Beware when making subsequent updates that such patients are not inadvertently reintroduced just because their information is resupplied by the trialist!
Excluding or 'hiding' items
When a trial/stratum is to be excluded, please note that there are three options in EXTRAS:A trial/stratum may, of course, be marked as a 'reject' in more than one of the above categories. It is extremely helpful to append a brief explanatory phrase to each 'reject' item in the comments field of EXTRAS.Reject studies: these are truly rubbish (e.g. empty trials, duplications, forgeries etc etc). Studies with dubious randomisation: 'nonstandard' is a polite but meaningless word used in LIST to avoid possible offence should a trialist see it. These include trials with the possibility of foreknowledge of the treatment allocation, including 'date-of-birth' or odd-even' allocations, and trials with significant numbers of missing patients or other serious imbalances between treatment allocations or prognostic variables (whether as a result of deliberate 'exclusions' by the trialists or by mistake or chance, unless exhaustively investigated). Some of these trials are otherwise well-conducted and the data might be useful for incidence and other epidemiological analyses where the allocations are not relevant. Studies not eligible for overview: these are otherwise properly-conducted randomised trials addressing questions not currently of interest to the politburo, e.g. DCIS, advanced disease, some primary treatments and various non-fatal endpoints. Trials which would be ignored only on account of their arcane treatment comparisons are kept as 'eligible', but they are relegated to appropriately arcane steering files such as BAG. Updating LIST
As you will see, there are various conventions in LIST to enable software to read it:
- The centre/group entry line consists of the centre/group's name together with, optionally, the city and, mandatorially, the country, with a section at the end enclosed in {curly brackets} indicating the principal personages and the date any records were last modified, e.g:
- 11 Texas University, U.S.A. {Buzdar, Gutterman, Vassilopolou-Sellin: DEC-1994 (SECOND REVISED VERSION)}
- The trial/stratum entry line consists of the trial/stratum's name with, optionally, the principal personages enclosed in (normal brackets) and followed by a colon. After the colon are listed entry criteria, prognostic factors and other remarks including the range of entry dates in MMM-YYYY format, each item separated by semicolons. At the end is a section enclosed in (normal brackets) indicating the number of patients and where we don't have data. Further optional comments in {curly brackets} can be selectively hidden when viewing or printing the file using CUTTER and other programs. e.g:
- 1101 Study 77-30A (Buzdar): Entry JUN-1977 to JUL-1980; N+; 2-Way (141 patients were entered)
- The treatment entry lines consist of the" full names" of all agents together with doses [in square brackets] and other details (in normal brackets). The "full names", known proprietory names and "official abbreviations" are listed in JG$AK:PLANET [link here]. The point of this is to avoid any possibility of ambiguity (if required, a "compact" version of LIST with abbreviations can be generated using TERSE). Where we really don't know whether an agent was Prednisone or Prednisolone, or Vincristine or Vinblastine, or 5-Fluoro-Uracil or Ftorafur for example, it is necessary to fall back on the "official abbreviation" (Pr, V and F respectively) until the ambiguity can be resolved. e.g:
- 1 (5-Fluoro-Uracil + Doxorubicin + Cyclophosphamide) x 2yr
Often a centre/group has sent us a dataset from their 'local' part of a multi-centre trial, or at least we have made a placeholder in the hope of obtaining data from them. Similarly, we might have received a dataset from the coördinating centre/group for the multi-centre trial, consisting typically of records collected from various participating centres/groups.What to do when we have both? The safest answer is probably to operate 'parallel universes', as follows. Datasets supplied directly from participating centres/groups are catalogued and stored under serial numbers appropriate to each centre/group, as would normally be done with single trials. Datasets supplied from the coördinating centre/group are catalogued and stored under serial numbers appropriate to the coördinating centre/group with (if possible) a separate stratum for each participating centre/group.
The advantages of holding datasets of different provenance in parallel include the ease of tracking and updating (one deals directly with whoever supplied each set) and the possibility of comparing the results and using the 'better' data in analyses. Of course, if it is possible to compare paired patient records between the datasets a 'best of both' can be constructed.
By harmonising the 'year numbers' (in SERIAL) and the 'brief names' (in BRIEF) one can represent the various parts of a multi-centre trial as a single entity in forest plots, if required, selecting which datasets to use via the 'reject' option (in EXTRAS).
[End of document, updated to 27 July 2006]