Click here to login if you're an NAE Member
Recover Your Account Information
Author: Hiroko H. Dodge and Deborah Estrin
All people are uniquely endowed at birth by genetic and environmental conditions; by the time they enter their last decades, they have a lifetime of differentiation that determines their state of health and response to new events and conditions. This cumulative differentiation creates substantial intraindividual variability in the rate of aging as well as the extent of resistance and resilience to pathological insults. Therefore, applying normative group data such as group means or median thresholds often fails to accurately identify and predict an individual’s clinical state and prognosis.
There are two ways to cope with this high intraindividual variability. One is to use “big data,” which consists of a large number of subjects to improve the prediction algorithm. Another is to use each subject as their own universe to identify subtle changes or deviations from their premorbid stage.
Rich temporal data from a single person—what we call “small data”—can be used for the individual’s tailored diagnosis, disease management, and health behavior. Using such data for patient care, self-care, sustained independence, and research involves access to, processing, and interpretive use of an individual’s combined data streams over time.
The promise of big data is that analysis of millions of people’s genetic data, life events, and environment will reveal complex interactions of the key factors that create human variability. Such information is already used to build diagnostic and predictive models for health care, from radiology to oncology and public health (e.g., see Dishman in this issue). It will eventually be possible to identify a broader range of disease prognoses, -responses to drugs, and aging processes by sorting out the causes of variability (Li et al. 2006; MacDonald et al. 2006).
That said, rich temporal data from a single person can be especially relevant to understanding and caring for individuals. Digital data streams now capture health-related aspects of an individual’s daily life, from the number of steps taken to sleep duration, heart rate, language and speech patterns, driving patterns, and social interactions, resulting in a large volume of high-dimensional data spanning a person’s lifetime, including premorbid and treatment periods.
To distinguish this type of data (high-frequency, time-series digital data from a single subject) from big data (large N across populations), we call the former data and related processing and analytical approaches “small data” (Estrin 2014; Estrin and Juels 2016). Atul Gawande (2017) makes the case for using such data:
the more capacity we develop to monitor the body and the brain for signs of future breakdown and to correct course along the way—to deliver ‘precision medicine,’ as the lingo goes—the greater the difference health care can make in people’s lives, as well as in reducing future costs…. We can…shift our focus from rescue medicine to lifelong incremental care. Or we can leave millions of people to suffer and die from conditions that, increasingly, can be predicted and managed.
Thus small data—defined here as the “big data” of an individual—are as important as big-N data of a population for the development and implementation of precision medicine, prediction modeling, and multitargeted integrative medicine interventions, particularly when it comes to aging.
In this article we describe (1) how small data are becoming increasingly essential to aging research, (2) sources of small data, (3) applications of small data in dementia research, and (4) challenges for research.
Small Data as Digital Biomarkers for Aging and Disease Management
Small data sources are high dimensional and noisy and require processing before they can be used to inform care. Once processed, the data can be used as digital biomarkers, which can provide high-resolution, accurate, and rapid feedback to practitioners and patients in support of disease prevention, treatment, and management, including interventions around the behavioral drivers of chronic disease.
As individuals age they are increasingly susceptible to chronic conditions (e.g., elevated blood pressure, glucose intolerance) that are often accompanied by cognitive, neurological, sensory, and musculo-skeletal effects. Moreover, a person’s state and function may change because of disease progression, medication interactions, metabolism changes, and lifestyle adaptations.
In addition to point-of-care measures of vital signs and laboratory data, clinical researchers and practitioners rely on self-reported and clinically observed measures of function, mood, fatigue, and pain as indicators of health status. These clinical measures are essential to observe changes in a patient’s state as she recovers, declines, or responds to a change in treatment or lifestyle. But these sparse and often subjective measures can be used diagnostically and prognostically only as gross estimates of status and outcomes.
Small data as digital biomarkers offer timely new ways of observing day-to-day changes in health measures such as activities of daily living, fatigue, sleep disruption, pain level, social interaction, emotional state, and cognitive impairment. For example, although an aging relative might reduce the range and frequency of daily walks, grocery shopping, and email responses, occasional visits to clinicians and even frequent visits by family may not detect these differences. But they would be evident in the individual’s digital biomarkers.
Sources of Small Data
While many more devices and data streams will emerge, there is already a tremendous amount that can be done with available data from the following sources.
Self-reported information is collected as part of a patient history, whether verbally by an intake nurse, on paper forms, or via a kiosk or tablet in the clinician’s waiting room. There is a desire to move to more objective and low-user-overhead techniques for more frequent feedback, but some things require direct questions about the person’s perception or feeling. Such questions can (and should) be asked in new and personalized ways rather than via generic text-based questionnaires.
High-quality touchscreens (available on smartphones and tablets) and many Internet of Things (IoT) devices can be used to solicit patient self-reports in more effective, personalized, and lower-burden ways, as is done with visual pain scales, the photographic affect meter, and adaptive visual assessment (Pollak et al. 2011; -Selter et al. 2018). In addition, online patient fora (e.g., PatientsLikeMe, Reddit) are a rich source of self-reported information about the patient experience and disease progression. Several research groups have made extensive use of these data sources (e.g., Ernala et al. 2017; Ryen et al. 2013).
Passive Recording of Mobility and Location
Information about changes in a person’s patterns of movement and location is readily available (and affordable) and can reveal health status changes. In the home, sensors for motion, pressure, and occupancy can capture activity patterns related to mobility and sleep (Uddin et al. 2018). Personal devices for safety and personal assistants may provide even richer (although more intrusive) audio and video data. Devices developed for home security may similarly become an important source of information.
Location patterns can be collected through readily available smartphone applications. Once authorized by the user, these applications run in the background and do not require further interaction. They can record a continuous location time series or be implemented to capture only the times at which the individual enters and exits predefined locations (e.g., home, work, community center, doctor’s office, relative’s home). This approach takes advantage of geofencing, in which a mobile device is programmed to record data when in the vicinity of particular locations.
For individuals who engage in regular activities outside the home, location traces (e.g., emanating from their smartphone) can be used to observe features of social rhythms (Abdullah and Choudhury 2018; Aung et al. 2017; Dagum 2018; Insel 2017). Such data can show shifts in location patterns (e.g., time of departure for work, hours spent out of the house on weekends) that may provide early signs of relapse or, conversely, improvement after an intervention or treatment. And the measurements can be automatically personalized relative to an individual’s baseline data (e.g., work hours per day, workday/weekend movement patterns).
Some research groups have begun to use retrospective location data available through the Google Takeout service, through which an individual can access both location and other online history and make it available to a research study. The data can be used to create richer person-specific models.
Consumer and medical-grade wearables can monitor vital signs such as sleep quality, heart rate variability, glucose, seizure onset, and stress indicators (e.g., -Dolgin 2014). Devices such as Fitbit and Actigraph continuously measure and characterize participant mobility (e.g., sitting, standing, walking, running). Garmin, Polar, and other vendors have marketed wearables for athletes and sport enthusiasts to monitor heart rate and distance. Other vendors have provided functionalities in a smartwatch.
Sleep monitoring is one of the most active areas addressed by multipurpose wearables as well as sleep-specific devices. Wearables such as Empatica monitor disease-specific concerns such as seizures (Dolgin 2014), and their techniques may lead to sensing capabilities for more common patient concerns such as stress. Continuous glucose monitoring has improved tremendously over the last decade and, to the extent that nutrition is a critical driver in many conditions, will have a major role to play beyond diabetes (Bolinder et al. 2016). Other wearables of interest for the aging population are smart shoes, hearing aids that incorporate actigraphy, and smart canes, all of which may provide more detailed measures of ambulation and balance.
Many wearables have interfaces that enable individuals to obtain and share access to their device-generated data, both semi-real-time and historical. The data are of interest to the consumer but of variable accuracy for research assessments (Feehan et al. 2018), although Fitbit has been incorporated in many research studies and interventions, and Actigraph has been used extensively as a research tool in aging and other studies (e.g., Law et al. 2018).
Active Tasks or Testing
Active task-based observations could be used to assess neurological or cognitive conditions of aging (e.g., language and speech assessments during stroke recovery), particularly as they were developed for administration by another adult alongside the patient rather than by the patient alone.
ResearchKit popularized the use of passive and wearable sensing capabilities in the context of specific, prompted physical activities, in some cases based on clinical analogues such as a tapping test or an interactive assessment of vocalization, language, or balance. For example, participants in a Parkinson’s study were prompted to tap repeatedly on the touchscreen of their smartphone as a measure of fine motor capability (Bot et al. 2016).
A ResearchKit autism study illustrates the smartphone or tablet adaptation of an established observational method for autism screening (Egger et al. 2018). The user-facing camera enables researchers to assess detailed characteristics and timing of facial expression in response to a video on the device’s display.
IoT-Based Personal Assistants
For home-based populations such as those aging with significant visual or physical limitations, voice agents (such as Amazon Echo and Google Home) or easily interpreted touchscreens/kiosks/tablets may be more functional than smartphones. There have not, how-ever, been extensive studies of the use of these devices as tools for measurement or intervention. There is interest in analyzing speech characteristics and utterance as tools for early identification of cognitive changes (e.g., Asgari et al. 2017; Roark et al. 2011), so conversational and other voice interactions could be used as a digital biomarker in the future.
Online interactions such as web browsing, shopping, email, and social media are a rich source of digital biomarkers. Major internet platforms (e.g., Google, -Twitter, Facebook) allow individuals to access their personal usage data. While one of the challenges of developing and validating digital biomarkers is the slow pace and overhead cost of prospective data collection, these services allow individuals to download and share -retrospective data for researchers to use in the development of biomarkers, and ultimately for clinicians to establish personalized models of health progression.
Putting Small Data to Use
The diverse small data streams described above may be most effective when combined to increase predictive capabilities for disease identification, progression, and trial outcomes. However, most research has focused on single digital biomarkers to examine their association with clinical conditions. This limitation is largely due to the fact that integrated analyses require a platform that synthesizes data from different sources (e.g., sleep data from wearables, motion data from sensors, highly frequent self-report data from an iPad, location and cognitive testing data from an Android phone) into a single analyzable data source.
The use of integrated analyses to improve predictive ability is especially useful in prevention research. For example, early prediction of fall risks, emergency room visits, or institutionalization could reduce or even prevent them and thus decrease health-related expenditures. Similarly, identification of a disease at the earliest stages might delay or prevent further progression.
Potential Contribution of Digital Biomarkers in Dementia Research
In dementia research it is important to try to identify presymptomatic subjects who will develop -dementia but for whom conventional diagnostic approaches (e.g., annual neuropsychological tests, subjective input from family members) often fail. Fluid and imaging bio-markers have been extensively evaluated as early indicators of pathological processes in clinical Alzheimer’s disease (AD), but assessing these biomarkers is expensive and often challenging to apply widely among presymptomatic individuals. Efforts are being made to use noninvasive clinical variables to identify those at high risk of developing AD (e.g., Ewers et al. 2016; Lin et al. 2018). Small data in the form of digital biomarkers may provide significant advances in this area.
Applying group norms or population averages does not work well for identifying pre-symptomatic dementia subjects or for predicting disease prognosis. Currently a gold standard to confirm a diagnosis of Alzheimer’s disease is an autopsy examination of the brain. Cognitive tests and biomarkers have high variability and are not necessarily well correlated with in vivo pathological burdens. Often, intraindividual fluctuations within a short time (i.e., in the morning vs. at night, or on a “good” day vs. a “bad” day) in assessed abilities measured by cognitive or functional tests can fail to reveal long-term changes over a year or even several years.
Besides this intraindividual variability, there is considerable interindividual variability. Even with the same amount of pathological burden in the brain, some people live without any significant symptoms, whereas others become quite disabled or show functional declines in daily living. Studies suggest that between a third and half of participants may die with no clinical diagnosis of dementia despite autopsy findings of moderate or high pathology (MRC CFAS 2001; Sonnen et al. 2011). It may eventually be possible to quantify precisely what constitutes this resilience through biomarker and brain connectivity discoveries.
Intra- and interindividual variability makes it difficult to identify those at risk of cognitive decline or to measure treatment effects of new drugs and interventions. It is especially problematic as clinical trials in Alzheimer’s disease focus on prevention in asymptomatic individuals for whom current approaches often cannot detect cognitive and functional changes.
One way to cope with this variability is to use each subject as their own universe to identify subtle -changes or deviations from their premorbid stage (i.e., to use subject-specifically defined normative stages). For example, using individual-specific distributions (as opposed to group norms) of continuously monitored activity data through an unobtrusive in-home sensor system, a study showed that the use of frequently monitored intraindividual changes improves signal-to-noise ratios and provides more powerful metrics for identification of the early onset of diseases (Dodge et al. 2015).
The data used for this simulation study came from a platform developed by the Oregon Center for Aging and Technology (ORCATECH; www.orcatech.org) (Kaye et al. 2011; Lyons et al. 2015). A series of studies showed that continuously monitored diverse activities such as walking speed (Dodge et al. 2012; Kaye et al. 2012), sleep quality and duration (Hayes et al. 2014; Seelye et al. 2015b), driving patterns (Seelye et al. 2017), computer use (Kaye et al. 2014), and computer mouse movements (Seelye et al. 2015a) can help identify early signs of dementia long before more obvious clinical symptoms emerge. Such monitoring has been further refined through an NIH-VA–supported initiative (www.carthome.org) to make it readily deployable to more households for research use and to include more diverse (low-income, rural, ethnic minority) populations.
While the data are noisy and variable over time and across individuals, the above illustrates that there is great promise for creating and evaluating initial digital biomarker models for individuals with dense traces and clinical history.
Other Healthcare Applications of Digital Biomarkers
Integration of multiple small data streams has been used for self-care and other research fields. For example, one pilot study explored the multifaceted use of such data to help patients manage lower back pain by integrating rehabilitation tutorial videos, frequent symptom self-report using a visual assessment app, remote health coaching via a chat channel, and daily activity tracking via a suite of smartphone apps (Selter et al. 2018). And for patients at risk of heart failure, an in-home toilet seat–based cardiovascular monitoring system of blood pressure, stroke volume, and blood oxygenation has demonstrated accuracy consistent with gold standard measures. The resulting data could reduce the hospitalization rate of such patients through early intervention (Conn et al. 2018).
As techniques are refined and deployed, digital biomarkers will be useful to many stakeholders: (1) researchers, to evaluate alternative treatment regimens; (2) clinicians, to identify disease onset or determine a patient’s readiness for or response to a new form or dose of treatment; (3) family and other informal members of a care team, to support care coordination; and (4) patients themselves, to support treatment adherence and desirable behavior change. But challenges remain:
We have considered how the confluence of novel data types generated by ubiquitous computing and sensing technologies may transform the aging experience. Both big and small data are useful in overcoming inter- and intraindividual variability in late life, the former through a large number of subjects and the latter through individual-specific trajectories. Small data in the form of digital biomarkers provide opportunities to identify aspects of a person’s life that may enable real-time predictions of disease and treatment outcomes. We have also presented examples of the use of digital biomarkers in health care to maintain health and independence.
Much research is needed to ensure the continued development of meaningful and actionable algorithms based on data that are representative of the diversity of the general population. Addressing inherent challenges in working with these data, such as ensuring the validity of the synthesized data, its usability, and its security, is a key part of research and development.
We thank Cameron Fletcher for her superb editing.
Abdullah S, Choudhury T. 2018. Sensing technologies for monitoring serious mental illnesses. IEEE MultiMedia 25:61–75.
Asgari M, Kaye J, Dodge H. 2017. Predicting mild cognitive impairment from spontaneous spoken utterances. -Alzheimer’s & Dementia (NY) 3(2):219–228.
Aung M, Matthews M, Choudhury T. 2017. Sensing behavioral symptoms of mental health and delivering personalized interventions using mobile technologies. Depression and Anxiety 34(7):603–609.
Bolinder J, Antuna R, Geelhoed-Duijvestijn P, Kröger J, -Weitgasser R. 2016. Novel glucose-sensing technology and hypoglycaemia in type 1 diabetes: A multicentre, non-masked, randomised controlled trial. Lancet 388:2254–2263.
Bot BM, Suver C, Neto EC, Kellen M, Klein A, Bare C, Doerr M, Pratap A, Wilbanks J, Dorsey ER, and 2 others. 2016. The mPower study, Parkinson disease mobile data collected using ResearchKit. Scientific Data 3:160011.
Conn NJ, Schwarz KQ, Borkholder DA. 2019. In-home cardio-vascular monitoring system for heart failure: Comparative study. JMIR mHealth and uHealth 7(1):e12419.
Dagum P. 2018. Digital biomarkers of cognitive function. npj Digital Medicine 1:10.
Dishman E. 2019. Toward precision aging: Engineering health and lifespan planning for all of us. The Bridge 49(1):47–56.
Dodge HH, Mattek NC, Austin D, Hayes TL, Kaye JA. 2012. In-home walking speeds and variability trajectories associated with mild cognitive impairment. Neurology 78:1946–1952.
Dodge HH, Zhu J, Mattek NC, Austin D, Kornfeld J, Kaye JA. 2015. Use of high-frequency in-home monitoring data may reduce sample sizes needed in clinical trials. PLoS One 10:e0138095.
Dolgin E. 2014. Technology: Dressed to detect. Nature 511:S16–S17.
Egger H, Dawson G, Hashemi J, Carpenter K, Espinosa S, Campbell K, Brotkin S, Schaich-Borg J, Qiu Q, Tepper M, Baker J. 2018. Automatic emotion and attention analysis of young children at home: A ResearchKit autism feasibility study. npj Digital Medicine 1:20.
Estrin D. 2014. Small data, where n = me. Communications of the ACM 57:32–34.
Estrin D, Juels A. 2016. Reassembling our digital selves. Dædalus 145:43–53.
Ewers M, Walsh C, Trojanowski JQ, Shaw LM, Petersen RC, Jack CR Jr, Feldman HH, Bokde AL, Alexander GE, -Scheltens P, and 4 others. 2016. Prediction of conversion from mild cognitive impairment to Alzheimer’s disease dementia based upon biomarkers and neuropsychological test performance. Neurobiology of Aging 33(7):1203–1214.
Feehan L, Geldman J, Sayre E. 2018. Accuracy of Fitbit -devices: Systematic review and narrative syntheses of quantitative data. JMIR mHealth and uHealth 6:e10527.
Gawande A. 2017. The heroism of incremental care. The New Yorker, January 23.
Hayes TL, Riley T, Mattek N, Pavel M, Kaye JA. 2014. Sleep habits in mild cognitive impairment. Alzheimer Disease and Associated Disorders 28:145–150.
Insel T. 2017. Digital phenotyping technology for a new science of behavior. JAMA 318:1215–1216.
Kaye JA, Maxwell SA, Mattek N, Hayes TL, Dodge H, Pavel M, Jimison HB, Wild K, Boise L, Zitzelberger TA. 2011. Intelligent systems for assessing aging changes: Home-based, unobtrusive, and continuous assessment of aging. Journals of Gerontology Series B: Psychological Sciences and Social Sciences 66(Suppl 1):i180–i190.
Kaye JA, Mattek N, Dodge H, Buracchio T, Austin D, Hagler S, Pavel M, Hayes TL. 2012. One walk a year to 1000 within a year: Continuous in-home unobtrusive gait assessment of older adults. Gait & Posture 35(2):197–202.
Kaye JA, Mattek N, Dodge HH, Campbell I, Hayes T, Austin D, Hatt W, Wild K, Jimison H, Pavel M. 2014. Unobtrusive measurement of daily computer use to detect mild cognitive impairment. Alzheimer’s & Dementia 10(1):10–17.
Law L, Rol R, Schultz S, Dougherty R, Edwards D, Koscik R, Gallagher C, Carlsson C, Bendlin B, Zetterberg H, and 7 others. 2018. Moderate intensity physical activity associates with CSF biomarkers in a cohort at risk for Alzheimer’s disease. Alzheimer’s & Dementia (Amst) 10:188–195.
Li S, Oertzen T, Lindenberger U. 2006. A neuro-computational model of stochastic resonance and aging. Neuro-computing 69:1553–1560.
Lin M, Gong P, Yang T, Ye J, Albin RL, Dodge HH. 2018. Big data analytical approaches to the NACC dataset: Aiding preclinical trial enrichment. Alzheimer Disease and Associated Disorders 32(1):18–27.
Lyons BE, Austin D, Seelye A, Petersen J, Yeargers J, Riley T, Sharma N, Mattek N, Wild K, Dodge H, Kaye JA. 2015. Pervasive computing technologies to continuously assess Alzheimer’s disease progression and intervention efficacy. Frontiers in Aging Neuroscience 7:102.
MacDonald SW, Nyberg L, Backman L. 2006. Intra-individual variability in behavior: Links to brain structure, neurotransmission and neuronal activity. Trends in Neuro-sciences 29(8):474–480.
MRC CFAS [Neuropathology Group of the Medical Research Council Cognitive Function and Aging Study]. 2001. Pathological correlates of late-onset dementia in a multi-centre, community-based population in England and Wales. -Lancet 357(9251):169–175.
Perrin A. 2015. Social networking usage: 2005–2015. Pew Research Center, October 8.
Pollak J, Adams P, Gay G. 2011. PAM: A photographic affect meter for frequent, in situ measurement of affect. Proceedings of the ACM Conference on Human Factors in Computing Systems, pp 725–734.
Roark B, Mitchell M, Hosom JP, Hollingshead K, Kaye J. 2011. Spoken language derived measures for detecting mild cognitive impairment. IEEE Transactions on Audio, Speech, and Language Processing 19(7):2081–2090.
Ryen W, Tatonetti N, Shah N, Altman R, Horvitz E. 2013. Web-scale pharmacovigilance: Listening to signals from the crowd. Journal of the American Medical Informatics Association 20(3):404–408.
Shebib R, Bailey JF, Smittenaar P, Perez DA, Mecklenburg G, Hunter S. 2019. Randomized controlled trial of a 12-week digital care program in improving low back pain. npj -Digital Medicine 2:1.
Seelye A, Hagler S, Mattek N, Howieson DB, Wild K, Dodge HH, Kaye JA. 2015a. Computer mouse movement patterns: A potential marker of mild cognitive impairment. Alzheimer’s & Dementia (Amst) 1:472–480.
Seelye A, Mattek N, Howieson D, Riley T, Wild K, Kaye J. 2015b. The impact of sleep on neuropsychological performance in cognitively intact older adults using a novel in-home sensor-based sleep assessment approach. Clinical Neuropsychologist 29:53–66.
Seelye A, Mattek N, Sharma N, Witter P, Brenner A, Wild K, Dodge H, Kaye J. 2017. Passive assessment of routine -driving with unobtrusive sensors: A new approach for identifying and monitoring functional level in normal aging and mild cognitive impairment. Journal of Alzheimer’s Disease 59:1427–1437.
Selter A, Tsangouri C, Ali SB, Freed D, Vatchinsky A, Kizer J, Sahuguet A, Vojta D, Vad V, Pollak JP, Estrin D. 2018. An mHealth app for self-management of chronic lower back pain (Limbr): Pilot study. JMIR mHealth and uHealth 6(9):e179.
Sonnen JA, Santa Cruz K, Hemmy LS, Woltjer R, Leverenz JB, Montine KS, Jack CR, Kaye J, Lim K, Larson EB, and 2 others. 2011. Ecology of the aging human brain. Archives of Neurology 68:1049–1056.
Uddin MB, Chow CM, Su SW. 2018. Classification methods to detect sleep apnea in adults based on respiratory and oximetry signals: A systematic review. Physiological Measurement 39(3):03TR01.