The anatomy of an ADNI table
What exactly does an ADNI data table look like? Some fields are universal to almost all tables, and every table is going to have some number of unique fields. In general, we can subdivide the fields in any given table into three categories:
Administrative fields contain information that is important for internal administrative purposes.
The second category of fields are those that represent the content of a table. These are the fields that are specific to each table and contain the bulk of useful information.
Finally, we have structural identifiers. These fields contain important information on the longitudinal and cross-sectional structure of the observations contained in the tables.
These include participant identifiers - fields that indicate which participant an observation corresponds to, and longitudinal identifiers - fields that indicate exactly when an observation took place.
To demonstrate this breakdown, here is an example of a (simulated) ADNI data table, using a subset of fields from the participant demographics table.
As we will see shortly: PHASE, VISCODE, VISCODE2, and VISDATE are all longitudinal identifiers (structural identifiers), and the RID and PTID fields represent two different sets of participant identifiers (structural identifiers).
The fields here that represent actual demographic information (content) are PTGENDER, PTEDUCAT, PTETHCAT, and PTRACCAT – which give us participant gender, educational attainment in years, self-identified ethnic identity, and self-identified racial identity, respectively.
The final two fields on the left, ID and USERDATE, are administrative fields. The table can be broken down as follows:
Before moving on from this example, there is one detail that we should be mindful of. If we consult the data dictionary for the PTRACCAT field in the ADNI1 phase, we see that the field shows us a participant’s (self-identified) racial category, and is coded using the following scheme:
Code | Definition |
---|---|
1 | American Indian or Alaskan Native |
2 | Asian |
3 | Native Hawaiian or Other Pacific Islander |
4 | Black or African American |
5 | White |
6 | More than one Race |
7 | Unknown |
Taking our example table above, one of the entries does not match up with this table.
In this case, the -4 indicates missing data. There are other similar cases where missing data may be denoted with a -1, or with an empty string for fields that are coded as characters. In all such cases, analysts should be careful to ensure that there are no indicators of missing data mixed up with valid measurements.
A preliminary frequency table is good practice for categorical variables, to identify any recorded outcome that does not match the possible cases in the codebook.
With that important usage note out of the way, let us review the more common examples of administrative and structural fields that recur throughout the data set.
Administrative fields
These fields only have a very niche use case for investigators who are working with the data set. These uses include the identification of data quality issues, sorting out particularly unexpected observations.
These fields are present in almost all of the tabular ADNI data.
USERDATE and USERDATE2
These fields appear in records that are generated from the clinical electronic data capture (EDC) system at sites. USERDATE records the date on which the record was generated (physically entered in the EDC by site staff), and USERDATE2 records the date on which the record was updated.
These dates have NO consistent relationship to the dates on which the clinical observation recorded in the row took place; those are captured in different fields, which are expounded on in the section on structural fields.
ID and record_ID
These columns of numbers are used internally to identify records in the data released on LONI with records in the EDC system. They are not participant identifiers and should not be used as such.
update_stamp
This date is automatically generated by the record-keeping system and represents the most recent build date of the table. It should not be used in analysis.
Structural Fields
These are the fields that are ubiquitous throughout the data set and are very useful for keeping everything lined up neatly. In many ways, they are the skeleton of the data set - providing the scaffolding along which all of the other information is arranged.
Participant identifiers
One ubiquitous aspect of ADNI tables is the presence of fields that uniquely identify every study participant, allowing their data to be collated across multiple domains. There are two common, and closely related fields.
The first is RID - or “roster ID” - this is the most common participant identifier across different tables and should be the go-to participant identifier for most analyses. It is formatted as a 1-5 digit number, with different blocks of numbers corresponding to the first phase in which the participant enrolled.
Phase | RID block |
---|---|
ADNI1/GO | [1, 2500] |
ADNI2 | [4000, 5500] |
ADNI3 | [6000, 7500] |
ADNI4 | [10000, 11500] |
RIDs are assigned to participants on screening and follow them throughout their entire time in the study – including rolling over between different phases.
If a participant with RID 125 has data available for the ADNI4 phase, then we know that they have been continuously enrolled since ADNI1/GO. Similarly, a participant with RID 10342 must have initially enrolled in the ADNI4 phase.
The other commonly used identifier is PTID, which is the identifier that is widely used in imaging data.
It is formatted as ###_S_####, where the first 3 digits correspond to the site at which the participant was first enrolled, and the final digits are the participant’s RID.
For example, a PTID of 067_S_4750 indicates that the participant initially enrolled at site 67, and that their RID is 4750. From this information, we also know that they initially enrolled in the ADNI2 phase as that RID belongs to the block assigned to new ADNI2 participants.
In some tables, the field containing PTID information may be titled as something else - for instance, subject.ID or SID – but the formatting should make it perfectly clear that it is equivalent to a PTID.
Note also that the leading digits of the PTID, giving a participant’s initial site, do not correspond to the same coding system for sites used in the SITEID field.
The ROSTER.csv file explicitly links the two identification systems, however many tables – including the registry - have both PTID and RID columns present and can be used as a crosswalk between the two schemas. PTIDs can also be quickly translated into RIDs by extracting the trailing digits.
There are other identifiers that appear throughout the data - for example, some biofluids analysis results from external labs may include sample identifiers - however, at least one of these two fields should be present in every table.
Longitudinal identifiers
ADNI is a very large and complex longitudinal study that has, at times straddled some parts of the line between a designed experiment and an observational study.
There are scheduled follow-up visits, however many factors can lead to individual “visits” being broken up over longer time spans, or that they occur out of step with the idealized schedule. There are many contributing factors, including scheduling concerns for both sites and participants, issues that require the clinical or imaging components to be separated, and logistic constraints that arise from the scale of the study and the variety of data collected from participants.
In ADNI, there are several fields used in different tables to record information about time. A summary of the most common longitudinal identifiers is presented in the table below:
Phase | The clinical protocol under which the observation was collected. Occasionally rendered in all caps as PHASE |
---|---|
VISCODE | The scheduled visit that corresponds to the actual visit. This field has no relation to the actual time of the visit relative to baseline. These codes are specific to each phase of the study, and that needs to be taken into account when working with them. |
VISCODE2 | An alternative to the VISCODE system that attempts to maintain continuity across phases. For ADNI4 and going forward, this is the date of the in-clinic visit rounded to the nearest 6 months from baseline. |
EXAMDATE | The date of the in-clinic procedure. Note that different EXAMDATEs could correspond to the same VISCODE as a result of scheduled visits being broken up into multiple actual visits. |
VISDATE | Synonymous with EXAMDATE |
COLDATE | Synonymous with EXAMDATE |
COLPROT | “Collection protocol”; this is analogous to the Phase field, but tends to show up more in imaging data where imaging protocols are occasionally uncoupled from clinical protocols. |
ORIGPROT | “Original protocol”; Some tables keep track of the initial phase in which participants were enrolled using this field. It should not be confused with Phase or COLPROT. |
USERDATE | The date at which the clinical record was physically entered into the EDC system by site personnel. |
USERDATE2 | If the record was updated at any point after its initial upload, this records the date of the last update. |
An eagle-eyed reader may have noticed that the table was presented as ‘a summary of the most common longitudinal identifiers’ used in the ADNI data set. This is not an exhaustive list.
A particularly handsome and sharp reader may also note that these myriad fields fall into one of two categories - dates, and things that aren’t dates.
As is the nature of most things that are different - cats and umbrellas, steamed rice and the platonic ideal of the tetrahedron, children’s clothing stores and the nation-state of Honduras – dates and things that are not dates offer different sorts of utility:
Dates are useful for statistical analysis.
Longitudinal identifiers that are not dates are useful for merging disparate tables, identifying missing data, and other niche tasks.
Dates
Dates, or ‘continuous time’ in statistical parlance, are everywhere in the ADNI data set. Some of them are useful, others are vestigial, and some can be quite deceptive.
There are more than 95 distinct fields in use across the ADNI data set that include the word “DATE” in their name. Some of these are unique to their particular tables, and others are shared.
Since there are so many different dates, it is important for us to understand what they are and how they relate to each other. In designing any analysis using ADNI data, whether cross-sectional, longitudinal, or something else entirely, it is critically important to understand what the various date fields reference, how they relate to each other, and how to use them.
EXAMDATE and VISDATE
These are the most common date fields that appear across ADNI tables. It is also one of the trickiest, as different tables use it differently.
As it appears in the REGISTRY table, EXAMDATE is a reference to date of a participant’s clinical examination - that is, the date on which the ADNI participant physically visited the ADNI site and completed their clinical assessment battery or did so over the phone for phone follow-ups.
Note that all of a participant’s clinical assessments are not necessarily conducted on the same date. This is a particular concern when considering data from across multiple clinical tables.
For non-clinical assessments, the usage of this field becomes very vague. For example, in certain biofluid analysis results tables the EXAMDATE field corresponds to the date at which the sample was collected. Due to a variety of factors, it is not always possible for a participant to complete a lumbar puncture on the same date as their clinical assessments, and so this can differ significantly from the EXAMDATE that appears in clinical tables. Similarly, in some imaging tables, EXAMDATE refers to the date on which a PET scan was completed.
SCANDATE, Study.Date, and other imaging dates
These fields appear across ADNI tables and are also associated with imaging data downloaded from the IDA. These dates correspond to the date at which a particular imaging procedure was carried out. The use of “Study.Date” in place of scandate is in reference to the DICOM hierarchy for imaging data.
Similarly to clinical assessments, imaging regiments are often broken up across several days - particularly in the case of PET where imaging with multiple radiotracers is sometimes required. For MR imaging, this most often occurs when different MR modalities are required across a span of days.
For researchers who are interested in working with both PET and MR data. It is important to understand that PET and MR scans are not always acquired during the same visit and may be separated by some significant spans of time.
Specific imaging schedules can vary across phases of ADNI, participant diagnostic categories, and the phase at which the participant initially entered the study. For more information, we recommend that prospective investigators consult the participant schedules of events that can be found in the ADNI procedures manuals.
RUNDATE
This field occurs in two very different contexts, imaging and biofluids, and it’s important not to conflate the two when merging tables.
In the context of biofluids, this generally refers to the date on which the analytical run of the sample was performed, but this can vary - it’s important to consult the data dictionary and documentation for any specific instances.
In the context of imaging this is generally in reference to the date on which an analysis of an image was conducted - for example, the date on which a computational segmentation routine was run.
Longitudinal Identifiers that aren’t Dates
Long-form Visit ID
This is an identifier that goes by many names and can be found in image data and some imaging-derived tables. Examples of these identifiers include “ADNI2 month 12 follow-up” and “ADNI3 screening MRI”, in all their fully spelled-out glory.
These identifiers are few and far between, and in general are equivalent to certain VISCODE entries. An exhaustive list of equivalences can be found in the Visits table
VISCODE2
This is a ubiquitous identifier in much of the ADNI data set. The idea behind VISCODE2 is that it offers an alternative to the VISCODE system that is not phase-specific. The first version of the ADNIMERGE package used VISCODE2 exclusively.
VISCODE2s are formatted as follows: “sc” and “bl” denote screening and baseline visits respectively, and all subsequent VISCODE2s consist of the letter “m”, followed by a number. For example: ‘m06’, ‘m48’, and ‘m124’. In all of these cases, this is meant to reflect the number of months that have elapsed between enrollment and the visit being recorded.
Historically, VISCODE2 was assigned according to a matching system, and there were occasional foibles in this assignment. For ADNI4, VISCODE2 was changed to reflect the number of months elapsed since baseline, rounded to the nearest 6 months, a change which was applied retroactively in most tables.
For more information on how VISCODE2s were assigned historically, please see the following documentation from the ADNI2 phase:
VISCODE
Before moving on, it is good to note that VISCODE2 offers a number of advantages over VISCODE in terms of usability.
VISCODE2 is self-consistent, independent of phase, usable as a standalone field, and includes an explicit reference to the timing of a visit in relation to the participant’s baseline visit in the study.
With that said:
VISCODE is the most common longitudinal identifier across all ADNI data. In the next section, we will endeavor to provide significantly more information than anybody has ever wanted to know about the VISCODE field.
What is a VISCODE?
The VISCODE is a direct reference to a participant’s original schedule of events, as laid out in the ADNI procedures manual.
These schedules have varied by phase, diagnostic category, and rollover status, however the VISCODE itself has only varied by phase - although some VISCODES may not exist for certain patients, depending on their individual schedule.
Again, bear in mind that these codes are phase-specific. Due to the use of overlapping VISCODEs in every stage outside of ADNI2, reconciling tables based on VISCODE alone may lead to erroneous merging - for example, ADNI1 and ADNI3 both use the ‘sc’ code to denote screening visits.
The interpretation of every possible phase/VISCODE combination is summarized in the tables below, along with long-form visit descriptions – specifically, those that appear in the ‘Visit’ field that is utilized in IDA imaging data.
ADNI1 and ADNIGO
There are two primary considerations with VISCODE in these two phases of the study:
Some of these are shared with ADNI3. Phase information is required to merge by VISCODE
VISCODEs are shared between ADNI1 and ADNIGO. This is not something that occurs in other phases of the study.
To elaborate further on point number 2:
In the transition between ADNIGO and ADNI2, and also in the transition between ADNI2 and ADNI3, participant VISCODEs were “reset”. To give a concrete example:
If participant 9991 was enrolled in ADNI2 from the start of the phase to the very end, and rolled over into ADNI3, then the transition between the last visit of ADNI2 and their last ADNI2 Year 5 visit and first ADNI3 visit would be recorded like this:
RID | Phase | VISCODE |
---|---|---|
9991 | ADNI2 | V51 |
9991 | ADNI3 | init |
If participant 9992 had just completed their first screening visit prior to the end of ADNI3, then their ADNI2/ADNI3 transition would look like this: (note that v02 was the code for new patient screening used in ADNI2)
RID | Phase | VISCODE |
---|---|---|
9992 | ADNI2 | v02 |
9992 | ADNI3 | init |
In both cases, the first visit of ADNI3 is always coded as ‘init’, or ‘initial visit’. This is not true for ADNI1/ADNI2 VISCODEs. Consider the same example: participant 9991, who is enrolled in ADNI1 for two years and rolls over into ADNIGO:
RID | Phase | VISCODE |
---|---|---|
9991 | ADNI1 | m24 |
9991 | ADNIGO | m36 |
Now again, consider participant 9992, who completed their initial screening just prior to the start of ADNIGO, and then rolled over:
RID | Phase | VISCODE |
---|---|---|
9992 | ADNI1 | sc |
9992 | ADNIGO | bl |
With that in mind, the following table contains all VISCODEs and descriptions that were used in ADNI1/GO. Note that the ‘f’ code, used to denote screen fails, only occurs in ADNI1, and the screening visit attached to the ADNIGO field must have occurred in ADNIGO, otherwise it is ambiguous as to how much time in ADNI1 are accounted for by a participant’s ADNIGO VISCODEs without examining their schedule across both phases.
Phase | VISCODE | Visit |
---|---|---|
ADNI1 | f | ADNI1 Screening (Screen fail) |
ADNI1 | sc | ADNI1 Screening |
ADNI1 | bl | ADNI1 Baseline |
ADNI1 | m06 | ADNI1 Month 6 |
ADNI1 | m12 | ADNI1 Month 12 |
ADNI1 | m18 | ADNI1 Month 18 |
ADNI1 | m24 | ADNI1 Month 24 |
ADNI1 | m30 | ADNI1 Month 30 |
ADNI1 | m36 | ADNI1 Month 36 |
ADNI1 | m42 | ADNI1 Month 42 |
ADNI1 | m48 | ADNI1 Month 48 |
ADNI1 | m54 | ADNI1 Month 54 |
ADNI1 | uns1 | Unscheduled Visit |
ADNIGO | sc | ADNIGO Screening Visit |
ADNIGO | scmri | ADNIGO Screening MRI |
ADNIGO | bl | ADNI1/GO Baseline Visit |
ADNIGO | m03 | ADNI1/GO Month 3 MRI |
ADNIGO | m06 | ADNI1/GO Month 6 |
ADNIGO | m12 | ADNI1/GO Month 12 |
ADNIGO | m18 | ADNI1/GO Month 18 |
ADNIGO | m36 | ADNI1/GO Month 36 |
ADNIGO | m42 | ADN1/IGO Month 42 |
ADNIGO | m48 | ADNI1/GO Month 48 |
ADNIGO | m54 | ADNI1/GO Month 54 |
ADNIGO | m60 | ADNI1/GO Month 60 |
ADNIGO | m66 | ADNI1/GO Month 66 |
ADNIGO | m72 | ADNI1/GO Month 72 |
ADNIGO | m78 | ADNI1/GO Month 78 |
The schedules of events are included here:
ADNI2
ADNI2 utilized a very different system of VISCODEs compared to the other phases of ADNI.
One advantage of this scheme is that ADNI2 visits are immediately identifiable by VISCODE, as they all begin with ‘v’, while some shared codes are ambiguous between ADNI1/GO and ADNI3.
One disadvantage of this scheme is that it is not at all immediately obvious which scheduled visits are referenced by these codes.
There are several unique aspects to this system, that it is worthwhile to spend some time explaining:
New Participants and ADNI1/GO rollovers were assigned a different set of VISCODEs for their first ADNI2 visit, so there are two different systems for referencing the same visits depending on how the participant enrolled
This separation of rollovers and new participants ends after the first year, at which point the VISCODEs align
V02, V03, V04, and V05 are unique to new participants, as are any codes with a ‘2’ as the second digit
V06 and V07 are exclusive to rollovers
For every visit after the participant’s first year in the study (excepting tau pet only visits, which have their own tau VISCODE), the codes are formatted as V## - where the first number is the number of years into the study in which the visit takes place, while the second number corresponds to the order of visits within that year. So, for instance:
V51 is the first visit of year 5
V22 is the second visit of year 2
Codes that end in ‘2’ are phone follow-up visits that are exclusive to new participants.
Phase | VISCODE | Visit |
---|---|---|
ADNI2 | -4 | This should be recoded as NA. |
ADNI2 | v01 | ADNI2 Screening-New Pt |
ADNI2 | v02 | ADNI2 Screening MRI-New Pt |
ADNI2 | v03 | ADNI2 Baseline-New Pt |
ADNI2 | v04 | ADNI2 Month 3 MRI-New Pt |
ADNI2 | v05 | ADNI2 Month 6-New Pt |
ADNI2 | v06 | ADNI2 Initial Visit-Cont Pt |
ADNI2 | v07 | ADNI2 Month 6-Cont Pt |
ADNI2 | v11 | ADNI2 Year 1 Visit |
ADNI2 | v12 | ADNI2 Year 1 Phone Followup |
ADNI2 | v21 | ADNI2 Year 2 Visit |
ADNI2 | v22 | ADNI2 Year 2 Phone Followup |
ADNI2 | v31 | ADNI2 Year 3 Visit |
ADNI2 | v32 | ADNI2 Year 3 Phone Followup |
ADNI2 | v41 | ADNI2 Year 4 Visit |
ADNI2 | v42 | ADNI2 Year 4 Phone Followup |
ADNI2 | v51 | ADNI2 Year 5 Visit |
ADNI2 | v52 | ADNI2 Year 5 Phone Followup |
ADNI2 | tau | ADNI2 Tau-only visit |
The schedule of events is included here for convenience:
ADNI3
ADNI3 shares the sc and bl codes with ADNI1/GO but is otherwise distinct in the use of codes beginning with ‘y’, to denote years.
Please note the footnotes on the schedules of events below, as there is some complexity in how visits were handled at the onset of this phase
Phase | VISCODE | Visit |
---|---|---|
ADNI3 | sc | ADNI Screening |
ADNI3 | init | ADNI3 Initial Visit-Cont Pt |
ADNI3 | bl | ADNI Baseline |
ADNI3 | y1 | ADNI3 Year 1 Visit |
ADNI3 | y2 | ADNI3 Year 2 Visit |
ADNI3 | y4 | ADNI3 Year 4 Visit |
ADNI3 | y3 | ADNI3 Year 3 Visit |
ADNI3 | y5 | ADNI3 Year 5 Visit |
ADNI3 | y6 | ADNI3 Year 6 Visit |
ADNI4
The system used in ADNI4 is similar to the systems used in ADNI1/GO and ADNI3. The codes are all prefaced with ‘4_’ and otherwise follow the general pattern of sc for screening, bl for new participant baseline, and init for rollover initial. An additional code indicating final disposition for participants who drop out of the study is also included.
The schedules of events in ADNI4 are the most varied of any stage. This is partially due to a new conditional wherein schedules for rollover participants vary depending on whether or not they received a PET scan within their last 12 months of ADNI3 participation.
Phase | VISCODE | Visit |
---|---|---|
ADNI4 | 4_sc | ADNI4 Screening |
ADNI4 | 4_init | ADNI4 Initial Visit-Cont Pt |
ADNI4 | 4_bl | ADNI4 Baseline-New Pt |
ADNI4 | 4_m12 | ADNI4 Month 12 Visit |
ADNI4 | 4_m24 | ADNI4 Month 24 Visit |
ADNI4 | 4_m36 | ADNI4 Month 36 Visit |
ADNI4 | 4_m48 | ADNI4 Month 48 Visit |
ADNI4 | 4_disp | ADNI4 Final Disposition |
The myriad of schedules is included here, along with their prodigious collections of footnotes
SITEID
This field is shared across most of the ADNI tables. On the surface, it is a numerical code used to identify the clinical site at which a record was made.
The SITEID codes used in ADNI1 are inconsistent with those of the later phases.
Now that we’ve addressed that major sticking point using the transformative power of bold text, we can move on to some other intricacies:
The identification numbers assigned to sites are not linked back to the name/location of the sites. However, there is still some utility in this field - for instance, it enables the specification of site-level random effects within mixed effects models.
This is not the only site identifier present in the data. The leading digits of the PTID correspond to the site at which a participant initially enrolled.
This may serve as a better identifier for certain applications, as it is consistent between all phases and only captures the initial enrollment site, however it does not account for the possibility of a participant changing sites, while the SITEID field records the location of each procedure.