The anatomy of an ADNI table


What exactly does an ADNI data table look like? Some fields are universal to almost all tables, and every table is going to have some number of unique fields. In general, we can subdivide the fields in any given table into three categories:

Administrative fields contain information that is important for internal administrative purposes.

The second category of fields are those that represent the content of a table. These are the fields that are specific to each table and contain the bulk of useful information.

Finally, we have structural identifiers. These fields contain important information on the longitudinal and cross-sectional structure of the observations contained in the tables.

These include participant identifiers - fields that indicate which participant an observation corresponds to, and longitudinal identifiers - fields that indicate exactly when an observation took place.

To demonstrate this breakdown, here is an example of a (simulated) ADNI data table, using a subset of fields from the participant demographics table.

As we will see shortly: PHASE, VISCODE, VISCODE2, and VISDATE are all longitudinal identifiers (structural identifiers), and the RID and PTID fields represent two different sets of participant identifiers (structural identifiers).

The fields here that represent actual demographic information (content) are PTGENDER, PTEDUCAT, PTETHCAT, and PTRACCAT – which give us participant gender, educational attainment in years, self-identified ethnic identity, and self-identified racial identity, respectively.

The final two fields on the left, ID and USERDATE, are administrative fields. The table can be broken down as follows:

Before moving on from this example, there is one detail that we should be mindful of. If we consult the data dictionary for the PTRACCAT field in the ADNI1 phase, we see that the field shows us a participant’s (self-identified) racial category, and is coded using the following scheme:

Code Definition
1 American Indian or Alaskan Native
2 Asian
3 Native Hawaiian or Other Pacific Islander
4 Black or African American
5 White
6 More than one Race
7 Unknown

Taking our example table above, one of the entries does not match up with this table.

In this case, the -4 indicates missing data. There are other similar cases where missing data may be denoted with a -1, or with an empty string for fields that are coded as characters. In all such cases, analysts should be careful to ensure that there are no indicators of missing data mixed up with valid measurements.

A preliminary frequency table is good practice for categorical variables, to identify any recorded outcome that does not match the possible cases in the codebook.

With that important usage note out of the way, let us review the more common examples of administrative and structural fields that recur throughout the data set.

Administrative fields

These fields only have a very niche use case for investigators who are working with the data set. These uses include the identification of data quality issues, sorting out particularly unexpected observations.

These fields are present in almost all of the tabular ADNI data.

USERDATE and USERDATE2

These fields appear in records that are generated from the clinical electronic data capture (EDC) system at sites. USERDATE records the date on which the record was generated (physically entered in the EDC by site staff), and USERDATE2 records the date on which the record was updated.

These dates have NO consistent relationship to the dates on which the clinical observation recorded in the row took place; those are captured in different fields, which are expounded on in the section on structural fields.

ID and record_ID

These columns of numbers are used internally to identify records in the data released on LONI with records in the EDC system. They are not participant identifiers and should not be used as such.

update_stamp

This date is automatically generated by the record-keeping system and represents the most recent build date of the table. It should not be used in analysis.

Structural Fields


These are the fields that are ubiquitous throughout the data set and are very useful for keeping everything lined up neatly. In many ways, they are the skeleton of the data set - providing the scaffolding along which all of the other information is arranged.

Participant identifiers


One ubiquitous aspect of ADNI tables is the presence of fields that uniquely identify every study participant, allowing their data to be collated across multiple domains. There are two common, and closely related fields.

The first is RID - or “roster ID” - this is the most common participant identifier across different tables and should be the go-to participant identifier for most analyses. It is formatted as a 1-5 digit number, with different blocks of numbers corresponding to the first phase in which the participant enrolled.

Phase RID block
ADNI1/GO [1, 2500]
ADNI2 [4000, 5500]
ADNI3 [6000, 7500]
ADNI4 [10000, 11500]

RIDs are assigned to participants on screening and follow them throughout their entire time in the study – including rolling over between different phases.

If a participant with RID 125 has data available for the ADNI4 phase, then we know that they have been continuously enrolled since ADNI1/GO. Similarly, a participant with RID 10342 must have initially enrolled in the ADNI4 phase.

The other commonly used identifier is PTID, which is the identifier that is widely used in imaging data.

It is formatted as ###_S_####, where the first 3 digits correspond to the site at which the participant was first enrolled, and the final digits are the participant’s RID.

For example, a PTID of 067_S_4750 indicates that the participant initially enrolled at site 67, and that their RID is 4750. From this information, we also know that they initially enrolled in the ADNI2 phase as that RID belongs to the block assigned to new ADNI2 participants.

In some tables, the field containing PTID information may be titled as something else - for instance, subject.ID or SID – but the formatting should make it perfectly clear that it is equivalent to a PTID.

Note also that the leading digits of the PTID, giving a participant’s initial site, do not correspond to the same coding system for sites used in the SITEID field.

The ROSTER.csv file explicitly links the two identification systems, however many tables – including the registry - have both PTID and RID columns present and can be used as a crosswalk between the two schemas. PTIDs can also be quickly translated into RIDs by extracting the trailing digits.

There are other identifiers that appear throughout the data - for example, some biofluids analysis results from external labs may include sample identifiers - however, at least one of these two fields should be present in every table.

Longitudinal identifiers


ADNI is a very large and complex longitudinal study that has, at times straddled some parts of the line between a designed experiment and an observational study.

There are scheduled follow-up visits, however many factors can lead to individual “visits” being broken up over longer time spans, or that they occur out of step with the idealized schedule. There are many contributing factors, including scheduling concerns for both sites and participants, issues that require the clinical or imaging components to be separated, and logistic constraints that arise from the scale of the study and the variety of data collected from participants.

In ADNI, there are several fields used in different tables to record information about time. A summary of the most common longitudinal identifiers is presented in the table below:

Phase The clinical protocol under which the observation was collected. Occasionally rendered in all caps as PHASE
VISCODE The scheduled visit that corresponds to the actual visit. This field has no relation to the actual time of the visit relative to baseline. These codes are specific to each phase of the study, and that needs to be taken into account when working with them.
VISCODE2 An alternative to the VISCODE system that attempts to maintain continuity across phases. For ADNI4 and going forward, this is the date of the in-clinic visit rounded to the nearest 6 months from baseline.
EXAMDATE The date of the in-clinic procedure. Note that different EXAMDATEs could correspond to the same VISCODE as a result of scheduled visits being broken up into multiple actual visits.
VISDATE Synonymous with EXAMDATE
COLDATE Synonymous with EXAMDATE
COLPROT “Collection protocol”; this is analogous to the Phase field, but tends to show up more in imaging data where imaging protocols are occasionally uncoupled from clinical protocols.
ORIGPROT “Original protocol”; Some tables keep track of the initial phase in which participants were enrolled using this field. It should not be confused with Phase or COLPROT.
USERDATE The date at which the clinical record was physically entered into the EDC system by site personnel.
USERDATE2 If the record was updated at any point after its initial upload, this records the date of the last update.

An eagle-eyed reader may have noticed that the table was presented as ‘a summary of the most common longitudinal identifiers’ used in the ADNI data set. This is not an exhaustive list.

A particularly handsome and sharp reader may also note that these myriad fields fall into one of two categories - dates, and things that aren’t dates.

As is the nature of most things that are different - cats and umbrellas, steamed rice and the platonic ideal of the tetrahedron, children’s clothing stores and the nation-state of Honduras – dates and things that are not dates offer different sorts of utility:

  • Dates are useful for statistical analysis.

  • Longitudinal identifiers that are not dates are useful for merging disparate tables, identifying missing data, and other niche tasks.

Dates


Dates, or ‘continuous time’ in statistical parlance, are everywhere in the ADNI data set. Some of them are useful, others are vestigial, and some can be quite deceptive.

There are more than 95 distinct fields in use across the ADNI data set that include the word “DATE” in their name. Some of these are unique to their particular tables, and others are shared.

Since there are so many different dates, it is important for us to understand what they are and how they relate to each other. In designing any analysis using ADNI data, whether cross-sectional, longitudinal, or something else entirely, it is critically important to understand what the various date fields reference, how they relate to each other, and how to use them.

EXAMDATE and VISDATE

These are the most common date fields that appear across ADNI tables. It is also one of the trickiest, as different tables use it differently.

As it appears in the REGISTRY table, EXAMDATE is a reference to date of a participant’s clinical examination - that is, the date on which the ADNI participant physically visited the ADNI site and completed their clinical assessment battery or did so over the phone for phone follow-ups.

Note that all of a participant’s clinical assessments are not necessarily conducted on the same date. This is a particular concern when considering data from across multiple clinical tables.

For non-clinical assessments, the usage of this field becomes very vague. For example, in certain biofluid analysis results tables the EXAMDATE field corresponds to the date at which the sample was collected. Due to a variety of factors, it is not always possible for a participant to complete a lumbar puncture on the same date as their clinical assessments, and so this can differ significantly from the EXAMDATE that appears in clinical tables. Similarly, in some imaging tables, EXAMDATE refers to the date on which a PET scan was completed.

SCANDATE, Study.Date, and other imaging dates

These fields appear across ADNI tables and are also associated with imaging data downloaded from the IDA. These dates correspond to the date at which a particular imaging procedure was carried out. The use of “Study.Date” in place of scandate is in reference to the DICOM hierarchy for imaging data.

Similarly to clinical assessments, imaging regiments are often broken up across several days - particularly in the case of PET where imaging with multiple radiotracers is sometimes required. For MR imaging, this most often occurs when different MR modalities are required across a span of days.

For researchers who are interested in working with both PET and MR data. It is important to understand that PET and MR scans are not always acquired during the same visit and may be separated by some significant spans of time.

Specific imaging schedules can vary across phases of ADNI, participant diagnostic categories, and the phase at which the participant initially entered the study. For more information, we recommend that prospective investigators consult the participant schedules of events that can be found in the ADNI procedures manuals.

RUNDATE

This field occurs in two very different contexts, imaging and biofluids, and it’s important not to conflate the two when merging tables.

In the context of biofluids, this generally refers to the date on which the analytical run of the sample was performed, but this can vary - it’s important to consult the data dictionary and documentation for any specific instances.

In the context of imaging this is generally in reference to the date on which an analysis of an image was conducted - for example, the date on which a computational segmentation routine was run.

Longitudinal Identifiers that aren’t Dates


Long-form Visit ID

This is an identifier that goes by many names and can be found in image data and some imaging-derived tables. Examples of these identifiers include “ADNI2 month 12 follow-up” and “ADNI3 screening MRI”, in all their fully spelled-out glory.

These identifiers are few and far between, and in general are equivalent to certain VISCODE entries. An exhaustive list of equivalences can be found in the Visits table

VISCODE2

This is a ubiquitous identifier in much of the ADNI data set. The idea behind VISCODE2 is that it offers an alternative to the VISCODE system that is not phase-specific. The first version of the ADNIMERGE package used VISCODE2 exclusively.

VISCODE2s are formatted as follows: “sc” and “bl” denote screening and baseline visits respectively, and all subsequent VISCODE2s consist of the letter “m”, followed by a number. For example: ‘m06’, ‘m48’, and ‘m124’. In all of these cases, this is meant to reflect the number of months that have elapsed between enrollment and the visit being recorded.

Historically, VISCODE2 was assigned according to a matching system, and there were occasional foibles in this assignment. For ADNI4, VISCODE2 was changed to reflect the number of months elapsed since baseline, rounded to the nearest 6 months, a change which was applied retroactively in most tables.

For more information on how VISCODE2s were assigned historically, please see the following documentation from the ADNI2 phase:

VISCODE

Before moving on, it is good to note that VISCODE2 offers a number of advantages over VISCODE in terms of usability.

VISCODE2 is self-consistent, independent of phase, usable as a standalone field, and includes an explicit reference to the timing of a visit in relation to the participant’s baseline visit in the study.

With that said:

VISCODE is the most common longitudinal identifier across all ADNI data. In the next section, we will endeavor to provide significantly more information than anybody has ever wanted to know about the VISCODE field.

What is a VISCODE?


The VISCODE is a direct reference to a participant’s original schedule of events, as laid out in the ADNI procedures manual.

These schedules have varied by phase, diagnostic category, and rollover status, however the VISCODE itself has only varied by phase - although some VISCODES may not exist for certain patients, depending on their individual schedule.

Again, bear in mind that these codes are phase-specific. Due to the use of overlapping VISCODEs in every stage outside of ADNI2, reconciling tables based on VISCODE alone may lead to erroneous merging - for example, ADNI1 and ADNI3 both use the ‘sc’ code to denote screening visits.

The interpretation of every possible phase/VISCODE combination is summarized in the tables below, along with long-form visit descriptions – specifically, those that appear in the ‘Visit’ field that is utilized in IDA imaging data.

ADNI1 and ADNIGO

There are two primary considerations with VISCODE in these two phases of the study:

  1. Some of these are shared with ADNI3. Phase information is required to merge by VISCODE

  2. VISCODEs are shared between ADNI1 and ADNIGO. This is not something that occurs in other phases of the study.

To elaborate further on point number 2:

In the transition between ADNIGO and ADNI2, and also in the transition between ADNI2 and ADNI3, participant VISCODEs were “reset”. To give a concrete example:

If participant 9991 was enrolled in ADNI2 from the start of the phase to the very end, and rolled over into ADNI3, then the transition between the last visit of ADNI2 and their last ADNI2 Year 5 visit and first ADNI3 visit would be recorded like this:

RID Phase VISCODE
9991 ADNI2 V51
9991 ADNI3 init

If participant 9992 had just completed their first screening visit prior to the end of ADNI3, then their ADNI2/ADNI3 transition would look like this: (note that v02 was the code for new patient screening used in ADNI2)

RID Phase VISCODE
9992 ADNI2 v02
9992 ADNI3 init

In both cases, the first visit of ADNI3 is always coded as ‘init’, or ‘initial visit’. This is not true for ADNI1/ADNI2 VISCODEs. Consider the same example: participant 9991, who is enrolled in ADNI1 for two years and rolls over into ADNIGO:

RID Phase VISCODE
9991 ADNI1 m24
9991 ADNIGO m36

Now again, consider participant 9992, who completed their initial screening just prior to the start of ADNIGO, and then rolled over:

RID Phase VISCODE
9992 ADNI1 sc
9992 ADNIGO bl

With that in mind, the following table contains all VISCODEs and descriptions that were used in ADNI1/GO. Note that the ‘f’ code, used to denote screen fails, only occurs in ADNI1, and the screening visit attached to the ADNIGO field must have occurred in ADNIGO, otherwise it is ambiguous as to how much time in ADNI1 are accounted for by a participant’s ADNIGO VISCODEs without examining their schedule across both phases.

Phase VISCODE Visit
ADNI1 f ADNI1 Screening (Screen fail)
ADNI1 sc ADNI1 Screening
ADNI1 bl ADNI1 Baseline
ADNI1 m06 ADNI1 Month 6
ADNI1 m12 ADNI1 Month 12
ADNI1 m18 ADNI1 Month 18
ADNI1 m24 ADNI1 Month 24
ADNI1 m30 ADNI1 Month 30
ADNI1 m36 ADNI1 Month 36
ADNI1 m42 ADNI1 Month 42
ADNI1 m48 ADNI1 Month 48
ADNI1 m54 ADNI1 Month 54
ADNI1 uns1 Unscheduled Visit
ADNIGO sc ADNIGO Screening Visit
ADNIGO scmri ADNIGO Screening MRI
ADNIGO bl ADNI1/GO Baseline Visit
ADNIGO m03 ADNI1/GO Month 3 MRI
ADNIGO m06 ADNI1/GO Month 6
ADNIGO m12 ADNI1/GO Month 12
ADNIGO m18 ADNI1/GO Month 18
ADNIGO m36 ADNI1/GO Month 36
ADNIGO m42 ADN1/IGO Month 42
ADNIGO m48 ADNI1/GO Month 48
ADNIGO m54 ADNI1/GO Month 54
ADNIGO m60 ADNI1/GO Month 60
ADNIGO m66 ADNI1/GO Month 66
ADNIGO m72 ADNI1/GO Month 72
ADNIGO m78 ADNI1/GO Month 78

The schedules of events are included here:

ADNI2

ADNI2 utilized a very different system of VISCODEs compared to the other phases of ADNI.

One advantage of this scheme is that ADNI2 visits are immediately identifiable by VISCODE, as they all begin with ‘v’, while some shared codes are ambiguous between ADNI1/GO and ADNI3.

One disadvantage of this scheme is that it is not at all immediately obvious which scheduled visits are referenced by these codes.

There are several unique aspects to this system, that it is worthwhile to spend some time explaining:

  • New Participants and ADNI1/GO rollovers were assigned a different set of VISCODEs for their first ADNI2 visit, so there are two different systems for referencing the same visits depending on how the participant enrolled

  • This separation of rollovers and new participants ends after the first year, at which point the VISCODEs align

  • V02, V03, V04, and V05 are unique to new participants, as are any codes with a ‘2’ as the second digit

  • V06 and V07 are exclusive to rollovers

  • For every visit after the participant’s first year in the study (excepting tau pet only visits, which have their own tau VISCODE), the codes are formatted as V## - where the first number is the number of years into the study in which the visit takes place, while the second number corresponds to the order of visits within that year. So, for instance:

    • V51 is the first visit of year 5

    • V22 is the second visit of year 2

  • Codes that end in ‘2’ are phone follow-up visits that are exclusive to new participants.

Phase VISCODE Visit
ADNI2 -4 This should be recoded as NA.
ADNI2 v01 ADNI2 Screening-New Pt
ADNI2 v02 ADNI2 Screening MRI-New Pt
ADNI2 v03 ADNI2 Baseline-New Pt
ADNI2 v04 ADNI2 Month 3 MRI-New Pt
ADNI2 v05 ADNI2 Month 6-New Pt
ADNI2 v06 ADNI2 Initial Visit-Cont Pt
ADNI2 v07 ADNI2 Month 6-Cont Pt
ADNI2 v11 ADNI2 Year 1 Visit
ADNI2 v12 ADNI2 Year 1 Phone Followup
ADNI2 v21 ADNI2 Year 2 Visit
ADNI2 v22 ADNI2 Year 2 Phone Followup
ADNI2 v31 ADNI2 Year 3 Visit
ADNI2 v32 ADNI2 Year 3 Phone Followup
ADNI2 v41 ADNI2 Year 4 Visit
ADNI2 v42 ADNI2 Year 4 Phone Followup
ADNI2 v51 ADNI2 Year 5 Visit
ADNI2 v52 ADNI2 Year 5 Phone Followup
ADNI2 tau ADNI2 Tau-only visit

The schedule of events is included here for convenience:

ADNI3

ADNI3 shares the sc and bl codes with ADNI1/GO but is otherwise distinct in the use of codes beginning with ‘y’, to denote years.

Please note the footnotes on the schedules of events below, as there is some complexity in how visits were handled at the onset of this phase

Phase VISCODE Visit
ADNI3 sc ADNI Screening
ADNI3 init ADNI3 Initial Visit-Cont Pt
ADNI3 bl ADNI Baseline
ADNI3 y1 ADNI3 Year 1 Visit
ADNI3 y2 ADNI3 Year 2 Visit
ADNI3 y4 ADNI3 Year 4 Visit
ADNI3 y3 ADNI3 Year 3 Visit
ADNI3 y5 ADNI3 Year 5 Visit
ADNI3 y6 ADNI3 Year 6 Visit

ADNI4

The system used in ADNI4 is similar to the systems used in ADNI1/GO and ADNI3. The codes are all prefaced with ‘4_’ and otherwise follow the general pattern of sc for screening, bl for new participant baseline, and init for rollover initial. An additional code indicating final disposition for participants who drop out of the study is also included.

The schedules of events in ADNI4 are the most varied of any stage. This is partially due to a new conditional wherein schedules for rollover participants vary depending on whether or not they received a PET scan within their last 12 months of ADNI3 participation.

Phase VISCODE Visit
ADNI4 4_sc ADNI4 Screening
ADNI4 4_init ADNI4 Initial Visit-Cont Pt
ADNI4 4_bl ADNI4 Baseline-New Pt
ADNI4 4_m12 ADNI4 Month 12 Visit
ADNI4 4_m24 ADNI4 Month 24 Visit
ADNI4 4_m36 ADNI4 Month 36 Visit
ADNI4 4_m48 ADNI4 Month 48 Visit
ADNI4 4_disp ADNI4 Final Disposition

The myriad of schedules is included here, along with their prodigious collections of footnotes

SITEID


This field is shared across most of the ADNI tables. On the surface, it is a numerical code used to identify the clinical site at which a record was made.

The SITEID codes used in ADNI1 are inconsistent with those of the later phases.

Now that we’ve addressed that major sticking point using the transformative power of bold text, we can move on to some other intricacies:

The identification numbers assigned to sites are not linked back to the name/location of the sites. However, there is still some utility in this field - for instance, it enables the specification of site-level random effects within mixed effects models.

This is not the only site identifier present in the data. The leading digits of the PTID correspond to the site at which a participant initially enrolled.

This may serve as a better identifier for certain applications, as it is consistent between all phases and only captures the initial enrollment site, however it does not account for the possibility of a participant changing sites, while the SITEID field records the location of each procedure.