Validating ASHS-T1 automated entorhinal and transentorhinal cortical segmentation in Alzheimer's disease

The current study aimed to validate entorhinal and transentorhinal cortical volumes measured by the automated segmentation tool Automatic Segmentation of Hippocampal Subfields (ASHS-T1). The study sample comprised 34 healthy controls (HCs), 37 individuals with amnestic mild cognitive impairment (aMCI), and 29 individuals with Alzheimer's disease (AD) dementia from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Entorhinal and transentorhinal cortical volumes were assessed using ASHS-T1, manual segmentation, as well as a widely used automated segmentation tool, FreeSurfer v6.0.1. Mean differences, intraclass correlation coefficients, and Bland-Altman plots were computed. ASHS-T1 tended to underestimate entorhinal and transentorhinal cortical volumes relative to manual segmentation and FreeSurfer. There was variable consistency and low agreement between ASHS-T1 and manual segmentation volumes. There was low-to-moderate consistency and low agreement between ASHS-T1 and FreeSurfer volumes. There was a trend toward higher consistency and agreement for the entorhinal cortex in the aMCI and AD groups compared to the HC group. Despite the differences in volume measurements, ASHS-T1 was sensitive to entorhinal and transentorhinal cortical atrophy in both early and late disease stages. Based on the current study, ASHS-T1 appears to be a promising tool for automated entorhinal and transentorhinal cortical volume measurement in individuals with likely underlying AD.


Introduction
In Alzheimer's disease (AD), the entorhinal and transentorhinal cortices are typically among the earliest brain regions to show pathological changes.Tau pathology staging in postmortem AD brain tissue has shown that pathological tau deposits commence in the transentorhinal and entorhinal regions (Braak and Braak, 1991).Using structural magnetic resonance imaging (MRI), pathological tau accumulation in these regions has been observed to correlate with cerebral atrophy (Xie et al., 2018).Other MRI studies have also shown that the volumes of the entorhinal and transentorhinal cortices are reduced in patients with amnestic mild cognitive impairment (aMCI) relative to healthy controls (HCs) and also predict cognitive decline and conversion from aMCI to AD dementia (Kulason et al., 2019;Pennanen et al., 2004;Stoub et al., 2010;Tapiola et al., 2008;Tward et al., 2017;Venneri et al., 2011).These results suggest that volume loss in the entorhinal and transentorhinal cortices may be a useful biomarker for early, objective detection of AD.
There are two primary methods to assess the volumes of the entorhinal and transentorhinal cortices on MRI scans.Manual segmentation by an expert observer is widely considered to be the "gold standard", but it is time-consuming and labor-intensive and is thus impractical for routine use in the clinical setting (Bobinski et al., 1999).Automated segmentation requires minimal supervision and allow rapid volumetric calculation, and it may thus be a promising tool for routine clinical use.Few studies have validated automated segmentation to measure entorhinal cortex volumes, and no study to date has validated automated segmentation to measure transentorhinal cortex volumes.The studies that have evaluated automated segmentation to measure entorhinal cortex volumes have reported very weak concordance with manual segmentation (Fung et al., 2019;Lehmann et al., 2010).These studies highlight that the entorhinal cortex is a challenging region for automated volumetric analysis.
A major challenge for the automated segmentation of the entorhinal cortex, and to some extent the transentorhinal cortex, relates to the proximity of the mesial temporal lobe (MTL) to the tentorium cerebelli.A large section of the entorhinal cortex and some parts of the transentorhinal cortex lie directly adjacent to the tentorium cerebelli.On T1weighted MRI scans, the tentorium cerebelli has a similar intensity to that of gray matter (Penumetcha et al., 2011).Consequently, automated segmentation methods such as FreeSurfer and Advanced Normalization Tools (ANTs) that rely on intensity variations to distinguish between different tissue types often mislabel portions of the tentorium cerebelli as entorhinal or transentorhinal cortex (Xie et al., 2016).The over-segmentation of the entorhinal and transentorhinal cortices to include portions of the tentorium cerebelli would certainly result in mis-estimations of the volumes of these regions, which may confound research findings based on these measurements.
Recently, a new automated segmentation tool, Automatic Segmentation of Hippocampal Subfields (ASHS-T1), was developed to address the issue of the over-segmentation of the entorhinal and transentorhinal cortices (Xie et al., 2019;Yushkevich et al., 2015).ASHS-T1 uses a multi-atlas approach to label the MTL subregions and attempts to reduce mislabeling of the tentorium cerebelli as entorhinal or transentorhinal cortex by explicitly labeling the tentorium cerebelli (Xie et al., 2019).Xie et al. (2019) investigated the degree of dura mislabeling as cortex by ASHS-T1 and FreeSurfer against manually labeled tentorium cerebelli and showed that ASHS-T1 mislabeled only 6.5% of tentorium cerebelli voxels as gray matter, whereas FreeSurfer mislabeled 62.4% of tentorium cerebelli voxels as gray matter.The substantially reduced mislabeling of the tentorium cerebelli as entorhinal or transentorhinal cortex by ASHS-T1 compared to FreeSurfer suggests that ASHS-T1 should provide more accurate estimations of the volumes of these regions.
Whilst ASHS-T1 appears to be a more promising method to automatically estimate volumes of the entorhinal and transentorhinal cortices, it has yet to be comprehensively evaluated against the "gold standard" manual segmentation.The only study comparing ASHS-T1 against manual segmentation tested ASHS-T1 in the same sample that was used to construct the ASHS-T1 multi-atlas (Xie et al., 2019).Evaluating ASHS-T1 on the same sample used to construct its multi-atlas may produce more optimistic results than would be expected when evaluated in an independent sample (Baumann, 2003).Moreover, the sample used by Xie et al. (2019) was small, comprising only 15 HCs and 14 patients with aMCI, and did not include patients with AD dementia.Previous studies have suggested that automated methods may perform more poorly in brains with severe atrophy, such as that seen in patients with AD (Lehmann et al., 2010;Sánchez-Benavides et al., 2010).It remains uncertain whether the performance of ASHS-T1 is likewise affected by the severe atrophy present in the brains of patients with AD.Considering these limitations of the study by Xie et al. (2019), the aims of the current study were (1) to compare entorhinal and transentorhinal cortical volumes measured by ASHS-T1 against those measured by manual segmentation and FreeSurfer in HCs, individuals with aMCI, and individuals with AD dementia, and (2) to evaluate the clinical utility of ASHS-T1 entorhinal and transentorhinal cortical volumes in detecting atrophy in individuals with aMCI and individuals with AD dementia.

Participants
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu).The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD.The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD).For up-to-date information, see www.adni-info.org.
The current study sample comprised 100 participants (34 HCs, 37 with aMCI, and 29 with AD dementia).Only participants with MRI scans acquired at 3.0 T were included in the current study.The ADNI diagnostic criteria are described in Petersen et al. (2010).The sample demographic characteristics are presented in Table 1.

MRI acquisition and preprocessing
The MRI scans were acquired across various scanners (General Electric, Philips, and Siemens) at multiple sites.The ADNI MRI protocols have been described previously (in Jack et al. (2008) and also at http ://adni.loni.usc.edu/methods/documents/mri-protocols/).
Variations in MRI scan orientation can introduce substantial variability in MRI-based measurements (Bartzokis et al., 1998).Hence, all MRI scans were first segmented using FreeSurfer v6.0.1 and then aligned to a common orientation, perpendicular to the long axis of the Free-Surfer hippocampal segmentation, using a python script (https://doi.org/10.25919/8kjn-d006), to improve measurement reliability.All MRI scans were then re-sampled to 0.3 × 0.3 × 1.0 mm 3 by cubic spline interpolation to enhance scan resolution.

Manual segmentation
Entorhinal and transentorhinal cortices were manually segmented using ITK-SNAP Version 3.8.0(Yushkevich et al., 2006).The segmentations were conducted according to the Berron et al. (2017) MTL subregions manual segmentation protocol.According to the Berron et al. (2017) protocol, segmentation of the entorhinal and transentorhinal cortices commences 4.4 mm anterior to the first slice in which the hippocampal head is visible and concludes 2.2 mm and 4.4 mm, respectively, posterior to the last slice in which the hippocampal head is visible.Because the current study protocol utilized MRI scans with a 1.0 mm distance between slices, segmentation of the entorhinal and transentorhinal cortices was commenced 5 slices anterior to the first slice in which the hippocampal head is visible to avoid volume under-estimation and was concluded 2 and 4 slices, respectively, posterior to the last slice in which the hippocampal head is visible.An example manual segmentation of the entorhinal and transentorhinal cortices is provided in the Supplementary Material.
All manual segmentations were completed by a single rater (YEQ) blind to participants' diagnosis.Manual segmentations on 10 randomly selected MRI scans were repeated to assess intra-rater reliability and conducted by an independent rater (YLF) to assess inter-rater reliability.The intra-class correlation coefficient (ICC) was used to evaluate intraand inter-rater reliability.For the entorhinal cortex volume measurements, intra-rater reliability was ICC agreement = 0.89 (95% CI 0.54 to 0.96), and inter-rater reliability was ICC agreement = 0.93 (95% CI 0.91 to 0.99).For the transentorhinal cortex volume measurements, intrarater reliability was ICC agreement = 0.87 (95% CI 0.43 to 0.96), and inter-rater reliability was ICC agreement = 0.72 (95% CI 0.42 to 0.88).

Statistical analysis
To examine the differences in demographic characteristics among the diagnostic groups, the one-way ANOVA procedure (with age, education, and MMSE as the dependent variables) and Pearson's chisquared test (with sex as the dependent variable) were conducted.Where a significant difference across groups was identified, post hoc pairwise comparisons with the Bonferroni adjustment were undertaken.
To compare the entorhinal cortex and transentorhinal cortex volumes estimated by ASHS-T1 with those estimated by manual segmentation and FreeSurfer in HCs, individuals with aMCI, and individuals with AD dementia, the dependent-samples t-test, ICC statistic, partial Pearson correlation analysis, and Bland-Altman method were used.The dependent-samples t-test was used to test the significance of the differences between the segmentation methods.The magnitude of the differences between the methods was expressed using Cohen's d.The ICC statistic was used to evaluate consistency and agreement between the segmentation methods.The ICC statistics were based on a singlemeasure, two-way mixed model, and both consistency and absolute agreement were reported to assess both the consistency in the volumes as well as the agreement in the absolute volumes.The partial Pearson correlation analysis was used to examine the strength of the linear relationship between the segmentation methods, controlling for ICV.To determine the significance of the difference between comparisons, the overlap of the 95% confidence intervals of the Cohen's d and ICC point estimates was examined.An overlap in two confidence intervals by no more than half the average margin of error was considered statistically significantly different (i.e., p < 0.05; Cumming and Finch, 2005).The Bland-Altman method was used to assess the agreement between the segmentation methods.Regression lines were included to assess proportional bias.These analyses were also repeated to compare the entorhinal cortex and transentorhinal cortex volumes estimated by FreeSurfer to those estimated by manual segmentation and are presented in the Supplementary Material.
To evaluate the utility of ASHS-T1 in detecting entorhinal and transentorhinal atrophy in individuals with aMCI and individuals with AD dementia, the volumes of the entorhinal and transentorinal cortices estimated by ASHS-T1 were compared across the diagnostic groups.The volumes of the entorhinal and transentorhinal cortices obtained by manual segmentation and by FreeSurfer were also compared across the diagnostic groups for comparison against ASHS-T1.For each volume measurement, one-way analysis of covariance (ANCOVA) with group as the independent variable and ICV as a covariate was used.Post hoc pairwise comparisons between groups were corrected for multiple testing using the Bonferroni adjustment.The magnitude of the differences in the volume measurements between the groups was expressed using Cohen's d.

Demographic characteristics
As shown in Table 1, there were statistically significant differences among the HC, aMCI, and AD groups in years of education and MMSE score.In particular, the HC group had significantly more years of education than the AD group (p = 0.020).The AD group scored significantly lower on the MMSE than both the aMCI and HC groups (both ps < 0.001), and the aMCI group scored significantly lower on the MMSE than the HC group (p = 0.011).The groups did not significantly differ in age and sex.

Comparing ASHS-T1 and manual segmentation
Mean differences, standard deviation of the differences, p values, and Cohen's d estimates between ASHS-T1 and manual segmentation volumes are shown in Table 2. ICC consistency and absolute agreement statistics between ASHS-T1 and manual segmentation volumes are shown in Table 3. Partial Pearson correlation coefficients between ASHS-T1 and manual segmentation volumes, controlling for ICV, are shown in Table 4. Scatter plots of ASHS-T1 and manual segmentation volumes are presented in Fig. 1.Bland-Altman plots of ASHS-T1 and manual segmentation volumes are presented in Fig. 2.

Entorhinal cortex
The dependent-samples t-tests indicated that ASHS-T1 significantly underestimated entorhinal cortex volumes compared to manual segmentation for all three diagnostic groups, with the differences large in magnitude (d range = − 2.36 to − 3.43).There was a trend toward less underestimation of both left and right entorhinal cortex volumes with increasing disease severity (HC d = − 3.43 and − 3.40 vs. aMCI d = − 3.00 and − 2.81 vs. AD d = − 2.36 and − 2.73).Overall, ICC consistency (range = 0.43 to 0.79) was higher, but not significantly so, than ICC agreement (range = 0.06 to 0.27).There was a trend toward higher ICC consistency and agreement for both left and right entorhinal cortex volumes in the aMCI group (ICC consistency = 0.79 and 0.67, ICC agreement = 0.27 and 0.19) and AD group (ICC consistency = 0.69 and 0.67, ICC agreement = 0.25 and 0.19) compared to the HC group (ICC consistency = 0.43 and 0.56, ICC agreement = 0.06 and 0.09).Likewise, there was a trend toward higher partial correlation coefficients for both left and right entorhinal cortex volumes in the aMCI group (r p = 0.83 and 0.75) and AD group (r p = 0.79 and 0.75) compared to the HC group (r p = 0.55 and 0.63).The Bland-Altman plots showed very wide 95% limits of agreement for both the left entorhinal cortex (±1.96SD = − 565 to − 101 mm 3 ) and right entorhinal cortex (±1.96SD = − 614 to − 115 mm 3 ).Regression analyses revealed a significant proportional bias for all three diagnostic groups (all ps < 0.01), whereby the magnitude of the difference in the volumes increased as the mean of the volumes increased.

Transentorhinal cortex
The dependent-samples t-tests indicated that, compared to manual segmentation, ASHS-T1 significantly underestimated transentorhinal cortex volumes for all three diagnostic groups, though to a lesser extent than entorhinal cortex volumes, with the differences medium to large in magnitude (d range = − 0.61 to − 1.46).The degree of underestimation was similar across the diagnostic groups.Overall, ICC consistency (range = 0.33 to 0.60) was higher, but not significantly so, than ICC agreement (range = 0.14 to 0.37).Both ICC consistency and agreement were similar across the diagnostic groups.Likewise, the partial correlation coefficients were similar across the diagnostic groups.The Bland-Altman plots showed very wide 95% limits of agreement for both the left transentorhinal cortex (±1.96SD = − 294 to 75.9 mm 3 ) and right transentorhinal cortex (±1.96SD = − 347 to 82 mm 3 ) but less so than those for the entorhinal cortices.Regression analyses indicated that there was no proportional bias for any of the diagnostic groups (all ps > 0.05).

Comparing ASHS-T1 and FreeSurfer
Mean differences, standard deviation of the differences, p values, and Cohen's d estimates between ASHS-T1 and manual segmentation volumes are shown in Table 2. ICC consistency and absolute agreement statistics between ASHS-T1 and FreeSurfer volumes are shown in Table 3. Partial Pearson correlation coefficients between ASHS-T1 and FreeSurfer volumes, controlling for ICV, are shown in Table 4. Scatter plots of ASHS-T1 and FreeSurfer volumes are presented in Fig. 3. Bland-Altman plots of ASHS-T1 and FreeSurfer volumes are presented in

Entorhinal cortex
The dependent-samples t-tests indicated that ASHS-T1 significantly underestimated entorhinal cortex volumes compared to FreeSurfer for all three diagnostic groups, with the differences large in magnitude (d range = − 1.76 to − 3.78).There was a trend toward less underestimation of both left and right entorhinal cortex volumes with increasing disease severity (HC d = − 3.08 and − 3.78 vs. aMCI d = − 2.37 and − 2.86 vs. AD d = − 1.76 and − 2.36).Overall, ICC consistency (range = 0.31 to 0.72) was higher, but not significantly so, than ICC agreement (range = 0.03 to 0.28).There was a trend toward higher ICC consistency and agreement for both left and right entorhinal cortex volumes in the aMCI group (ICC consistency = 0.72 and 0.54, ICC agreement = 0.28 and 0.11) and AD group (ICC consistency = 0.59 and 0.51, ICC agreement = 0.26 and 0.14) than in the HC group (ICC consistency = 0.43 and 0.31, ICC agreement = 0.07 and 0.03).Likewise, there was a trend toward higher partial correlation coefficients for both left and right entorhinal cortex volumes in the aMCI group (r p = 0.80 and 0.62) and AD group (r p = 0.76 and 0.71) compared to the HC group (r p = 0.41 and 0.33).The Bland-Altman plots showed extremely wide 95% limits of agreement for both the left entorhinal cortex (±1.96SD = − 613 to − 49.2 mm 3 ) and right entorhinal cortex (±1.96SD = − 930 to − 158 mm 3 ).Regression analyses revealed a significant proportional bias for all three diagnostic groups (all ps < 0.001), whereby the magnitude of the difference in the volumes increased as the mean of the volumes increased.

Transentorhinal cortex
The dependent-samples t-tests indicated that ASHS-T1 significantly underestimated left transentorhinal cortex volume compared to Free-Surfer for all three diagnostic groups, with the differences large in magnitude (d range = − 0.98 to − 1.11).There were no significant differences in right transentorhinal cortex volume between ASHS-T1 and FreeSurfer.However, for both the left and right transentorhinal cortices, the standard deviations of the differences in volumes were very large in all diagnostic groups (SD range = 177.85 to 346.38 mm 3 ).ICC consistency (range = − 0.11 to 0.17) was similar to ICC agreement (range = − 0.05 to 0.17).Both ICC consistency and agreement were similar across the diagnostic groups.Likewise, the partial correlation coefficients were similar across the diagnostic groups.The Bland-Altman plots showed extremely wide 95% limits of agreement for both the left transentorhinal cortex (±1.96SD = − 930 to 295 mm 3 ) and right transentorhinal cortex (±1.96SD = − 433 to 357 mm 3 ).Regression analyses revealed a significant proportional bias for all three diagnostic groups, whereby the magnitude of the difference in the volumes increased as the mean of the volumes increased (all ps < 0.001), except for the right hemisphere in the aMCI group, which showed no proportional bias (p = 0.270).

Evaluating volume differences across HC, aMCI, and AD groups
The means and standard deviations of the entorhinal and transentorhinal cortical volumes estimated by ASHS-T1, manual segmentation, and FreeSurfer for each diagnostic group as well as the p values and Cohen's d estimates between the diagnostic groups are presented in Table 5.
Post hoc pairwise comparisons of the ASHS-T1 entorhinal cortex volumes showed that the left and right entorhinal cortex volumes were significantly smaller in the AD and aMCI groups compared to the HC group and in the AD group compared to the aMCI group.
Post hoc pairwise comparisons of the manual segmentation and FreeSurfer entorhinal cortex volumes showed that the left and right entorhinal cortex volumes were significantly smaller in the AD and aMCI groups compared to the HC group but only the left entorhinal cortex volume was significantly smaller in the AD group compared to the aMCI group.
Post hoc pairwise comparisons of the ASHS-T1 transentorhinal cortex volumes showed that the left and right transentorhinal cortex volumes were significantly smaller in the AD and aMCI groups compared to the HC group but were not significantly different between the AD and aMCI groups.
Post hoc pairwise comparisons of the manual segmentation transentorhinal cortex volumes showed that only the left transentorhinal cortex volume was significantly smaller in the aMCI group compared to the HC group but both the left and right transentorhinal cortex volumes were significantly smaller in the AD group compared to the aMCI and HC groups.
Post hoc pairwise comparisons of the FreeSurfer transentorhinal cortex volumes showed that only the right transentorhinal cortex volume was significantly smaller in the aMCI group compared to the HC group.Furthermore, there seemed to be some inaccuracy in the measurement of the right transentorhinal cortex volume as the mean right transentorhinal cortex volume for the AD group was slightly larger, though not significantly so, than for the aMCI group.

Discussion
The current study has three key, novel findings.First, there was substantial discrepancy between the volumes measured by ASHS-T1 and the volumes measured by manual segmentation and FreeSurfer.Second, there was a trend toward less discrepancy and higher consistency and agreement between ASHS-T1 and manual segmentation entorhinal cortex volumes for individuals with aMCI and individuals with AD dementia than for HCs, suggesting that segmentation performance may be influenced by the presence of AD-related atrophy.Third, both the entorhinal and transentorhinal cortical volumes measured by ASHS-T1 were sensitive to atrophy in individuals with aMCI and individuals with AD dementia, whereas only the FreeSurfer entorhinal cortex volumes were able to reliably detect disease-related atrophy in individuals with aMCI and individuals with AD dementia.

Comparison between ASHS-T1 and manual segmentation
There was overall variable consistency and low agreement between the ASHS-T1 and manual segmentation entorhinal and transentorhinal cortical volumes.Notably, ASHS-T1 tended to underestimate entorhinal and transentorhinal cortical volumes relative to manual segmentation.The consistently smaller volumes produced by ASHS-T1 may have been a result of differences between the segmentation protocol used to construct the ASHS-T1 atlas set and the segmentation protocol used in the current study.In particular, the segmentation protocol used to construct the ASHS-T1 atlas set defines the anterior boundary of the entorhinal and transentorhinal cortices as 1.3 mm anterior to the hippocampal head (Xie et al., 2019), whereas the segmentation protocol used in the current study commenced segmentation of the entorhinal and transentorhinal cortices 5 mm anterior to the hippocampal head.The smaller volumes produced by ASHS-T1 likely contributed to the low ICC absolute agreement values between ASHS-T1 and manual segmentation (entorhinal cortex = 0.06-0.27,transentorhinal cortex = 0.14-0.37)observed in the current study.The ICC consistency values, which ignores absolute differences between the two sets of measurements, were higher but still suboptimal (entorhinal cortex = 0.43-0.79,transentorhinal cortex = 0.33-0.60).The initial validation study by Xie et al. (2019), which validated ASHS-T1 using the same manual 3. Scatter plots of ASHS-T1 and FreeSurfer entorhinal and transentorhinal cortical volumes.
Y.-E.Quek et al. segmentation data with which the ASHS-T1 multi-atlas was constructed, reported moderate-to-high ICC values between ASHS-T1 and manual segmentation (viz., entorhinal cortex = 0.69, transentorhinal cortex = 0.77).Whilst validating ASHS-T1 against its own atlas set would likely have resulted in over-optimistic comparison results, the ICC values reported by Xie et al. (2019) were nonetheless below the acceptable reliability threshold of 0.80 (Hopkins, 2000).Overall, the findings of both the current study and that of Xie et al. (2019) highlight less than optimal agreement between ASHS-T1 and manual segmentation entorhinal and transentorhinal cortical volumes.

Comparison between ASHS-T1 and FreeSurfer
Comparisons between the ASHS-T1 and FreeSurfer entorhinal and transentorhinal cortical volumes showed overall low-to-moderate consistency and low agreement for the entorhinal cortex and low consistency and agreement for the transentorhinal cortex.Moreover, the Bland-Altman plots showed extremely wide 95% limits of agreement for both regions.Xie et al. (2019) compared ASHS-T1 and FreeSurfer volumes in individuals at different stages of AD and showed that FreeSurfer produced on average 37-76% larger entorhinal and transentorhinal cortical volumes compared to ASHS-T1.The current study replicates the previous findings of Xie et al. (2019) of large differences between ASHS-T1 and FreeSurfer volume measurements, demonstrating on average 2-112% larger entorhinal and transentorhinal cortical volumes by FreeSurfer compared to ASHS-T1.The discrepancy between the ASHS-T1 and FreeSurfer volumes is likely due to FreeSurfer mislabeling the tentorium cerebelli as entorhinal or transentorhinal cortex, resulting in larger entorhinal and transentorhinal cortical volumes (Xie et al., 2019).Overall, these results highlight the substantial variability in volume measurements between different automated segmentation methods.A study comparing three widely used automated segmentation methods, SPM5, FSL, and FreeSurfer, also showed large differences in the volume measurements between the different segmentation methods (Klauschen et al., 2009).Taken together, these findings suggest that different automated segmentation methods may produce very discrepant volume measurements and supports a strong cautionary caveat against comparing volume measurements obtained with different automated methods.

Influence of atrophy on segmentation performance
There was some evidence that the segmentation performance of ASHS-T1, particularly for the entorhinal cortex, was influenced by the presence of atrophy.The results showed a trend toward less discrepancy and greater consistency and agreement between ASHS-T1 and manual segmentation entorhinal cortex volumes for individuals with aMCI and individuals with AD dementia than for HCs (see Tables 2 and 3).Furthermore, significant proportional bias was observed in the Bland-Altman plots between ASHS-T1 and manual segmentation entorhinal cortex volumes, whereby ASHS-T1 tended to underestimate entorhinal cortex volumes that were larger and the degree of underestimation increased as entorhinal cortex volumes increased (see Fig. 1).Xie et al. ( 2019) also observed significant proportional bias in the ASHS-T1 entorhinal cortex segmentations.Visual inspection of the ASHS-T1 segmentations suggested that ASHS-T1 tended to under-segment the entorhinal cortex particularly at its superior border and at the gray matter-white matter border.It is possible that the under-segmentation is exacerbated in larger entorhinal cortices compared to smaller entorhinal cortices, resulting in the observed volume-dependent bias.Other studies have also observed volume-or atrophy-dependent segmentation performance in other automated segmentation methods compared to manual segmentation for other brain regions, including the hippocampus, amygdala, superior temporal gyrus, and temporal lobe (Lehmann et al., 2010;Sánchez-Benavides et al., 2010).These findings suggest that automated segmentation methods may perform differently in healthy brains versus diseased brains, highlighting the importance of validating these segmentation methods across the populations in which they are intended to be used.

Clinical utility of entorhinal and transentorhinal cortical volumes
The current study additionally assessed the clinical utility of ASHS-T1 by comparing the entorhinal and transentorhinal cortical volumes measured by ASHS-T1 between HCs, individuals with aMCI, and individuals with AD dementia.The entorhinal and transentorhinal cortical volumes were significantly smaller in the aMCI and AD groups compared to the HC group.Only the entorhinal cortex volumes, however, were significantly smaller in individuals with AD dementia compared to individuals with aMCI, whereas the transentorhinal cortex volumes were not significantly different between these two groups.The finding of volume loss in the entorhinal and transentorhinal cortices in individuals with aMCI and individuals with AD dementia is consistent with the findings of Xie et al. (2019), who showed significantly reduced ASHS-T1 entorhinal and transentorhinal cortical volumes in individuals with early prodromal AD, individuals with late prodromal AD, and individuals with AD dementia compared to HCs, and is also in line with previous characterizations of entorhinal and transentorhinal cortical atrophy in individuals with aMCI and individuals with AD dementia (Jessen et al., 2006;Juottonen et al., 1998;Kulason et al., 2019;Pennanen et al., 2004;Tward et al., 2017).However, the finding of no difference in transentorhinal cortex volume between individuals with aMCI and individuals with AD dementia is surprising, given that AD-related neuropathological changes within the transentorhinal cortex continue to accumulate as the disease progresses (Braak and Braak, 1991).One possible explanation for such a result is that the current study was not sufficiently powered to detect a difference in transentorhinal cortex volume between the two groups, as suggested by the medium effect size (i.e., d > 0.50) for the comparisons despite statistical non-significance.Taken together, these findings suggest that ASHS-T1 is sufficiently sensitive to entorhinal and transentorhinal cortical volume changes to detect AD-related atrophic changes in both early and late disease stages.Further evaluation of ASHS-T1 in a larger sample is required to clarify its utility in monitoring disease progression in the transentorhinal cortex.
The FreeSurfer entorhinal and transentorhinal cortical volumes showed inconsistent results in detecting differences across the diagnostic groups.Whilst the FreeSurfer entorhinal cortex volumes showed a similar trend of differences across the diagnostic groups as the ASHS-T1 entorhinal cortex volumes, except for the lack of a difference between the aMCI and AD groups, some inconsistency was observed in the FreeSurfer transentorhinal cortex volumes.The left transentorhinal cortex volume was not significantly different between any of the diagnostic groups, whereas the right transentorhinal cortex volume was  significantly different only between the HC and aMCI groups.Moreover, when examining the mean volumes for each diagnostic group, a marginal increase in right transentorhinal cortex volume was observed in the individuals with AD dementia relative to the individuals with aMCI.Xie et al. (2019) found a similar fluctuation in FreeSurfer entorhinal cortex volumes, whereby the mean entorhinal cortex volume was smaller in individuals with preclinical AD compared to HCs but was larger in individuals with early prodromal AD relative to individuals with preclinical AD.These findings may be the result of the erroneous inclusion of portions of the tentorium cerebelli in the FreeSurfer entorhinal or transentorhinal cortex labels (Xie et al., 2019).The fluctuation in entorhinal and transentorhinal cortical volumes across the diagnostic groups suggests that such an issue in the FreeSurfer segmentations is inconsistently applied, which may consequently obscure true group differences in the volumes.Overall, these findings raise questions about the validity of FreeSurfer entorhinal and transentorhinal cortical volumes, especially when employed for detecting clinically meaningful differences across the spectrum of AD severity.

Limitations
Some potential limitations to the current study should be mentioned.First, there were several differences between the segmentation protocol used in the ASHS-T1 atlas set and the segmentation protocol used in the manual approach in the current study.These differences in the segmentation protocols likely contributed to the differences in the volumes of the entorhinal and transentorhinal cortices, thereby impacting the comparisons between the two segmentation methods.Accordingly, whilst it would be difficult to draw strong conclusions about the segmentation accuracy of ASHS-T1 compared to manual segmentation, the current study nonetheless showed that the volumes of the entorhinal and transentorhinal cortices measured by ASHS-T1 were sensitive to ADrelated atrophy in both early and late disease.Second, the results of the current study are based on MRI scans that were obtained in the research setting and may thus not be generalizable to routinely acquired clinical MRI scans.For example, ADNI MRI scans undergo rigorous quality control to ensure high quality data (Jack et al., 2008).Scans acquired in the clinical setting, however, typically do not receive the same standard of quality control due to capacity constraints and are thus more likely to contain image artifacts, which may adversely affect the performance of automated segmentation methods (Reuter et al., 2015;Tisdall et al., 2016).Consequently, it would be important for the findings of the current study to be replicated on MRI scans acquired in routine clinical practice.Third, the current study did not examine cortical thickness measurements of the entorhinal and transentorhinal cortices.It has been previously argued that thickness, rather than volume, measurements of the entorhinal and transentorhinal cortices may be more sensitive to AD-related atrophy in these regions as thickness measurements are more robust to the anatomical variability exhibited by these regions (Feczko et al., 2009;Xie et al., 2019).However, several previous studies, including the initial ASHS-T1 validation study by Xie et al. (2019), did not find significantly superior performance of thickness measurements compared to volume measurements of the entorhinal and transentorhinal cortices in detecting early AD-related atrophy in these regions (Li et al., 2021;Schwarz et al., 2016).On the contrary, Xie et al. (2019) found a significant difference in entorhinal cortex volume, but not thickness, between HCs and individuals with early prodromal AD.Therefore, there is still considerable uncertainty about the value of thickness over volume measurements of the entorhinal and transentorhinal cortices in the early detection of AD-related atrophy.

Conclusion
In conclusion, despite significant differences between ASHS-T1 and manual segmentation entorhinal and transentorhinal cortical volumes, ASHS-T1 demonstrated potential clinical utility in detecting AD-related atrophy by significantly discriminating individuals with aMCI and individuals with AD dementia from HCs.Moreover, ASHS-T1 performed more consistently than FreeSurfer in detecting volume differences across the diagnostic groups.Overall, the findings of the current study highlight ASHS-T1 as a promising tool for the automated segmentation of the entorhinal and transentorhinal cortices.Further research should examine the performance of ASHS-T1 on routinely acquired clinical MRI scans to clarify its potential to be translated to the clinical setting.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Table 1
Sample demographic characteristics.
b Pairwise comparisons are Bonferroni-adjusted post hoc analyses.Y.-E.Quek et al.

Table 5
Comparisons of entorhinal and transentorhinal cortical volumes (mm 3 ) estimated by ASHS-T1, manual segmentation, and FreeSurfer, adjusted for intracranial volume, across diagnostic groups.