Speech is a biomarker that reflects, in a sensitive way, the integrated functioning of several physiological systems, namely the nervous, respiratory, and muscular systems. This complexity makes it a promising resource for detecting changes associated with health status. Although speech is not in itself a digital biomarker, it can acquire this status when it is captured, digitized and analyzed by computational methods, namely those supported by AI. In these circumstances, it becomes possible to extract relevant vocal patterns for screening, early diagnosis and monitoring of different clinical conditions.
Home / Publications / Publication
Home / Publications / Publication

THE FUTURE OF DIAGNOSTICS: SPEECH AND AI
Publication type: Article Summary
Original title: Speech as biomarker for multidisease screening
Article publication date: November 2024
Source: Repositório Institucional do Instituto Superior Técnico (Scholar)
Author: Catarina Botelho
Supervisors: Isabel Trancoso, Alberto Abad & Tanja Schultz
What is the goal, target audience, and areas of digital health it addresses?
This study aims to explore and validate the use of speech as a non-invasive and low-cost digital biomarker for remote screening of multiple diseases, particularly those affecting the respiratory, neuvosus, and muscular systems. It is directed at the medical community, as well as researchers and professionals in the fields of Artificial Intelligence (AI) and signal processing. Within the field of digital health, the study contributes to key areas such as remote telemonitoring and surveillance, digital biomarkers, AI-powered diagnostics, predictive and personalized health, and silent computational paralinguistics.
What is the context?
Speech is a biomarker that reflects, in a sensitive way, the integrated functioning of several physiological systems, namely the nervous, respiratory, and muscular systems. This complexity makes it a promising resource for detecting changes associated with health status. Although speech is not in itself a digital biomarker, it can acquire this status when it is captured, digitized and analyzed by computational methods, namely those supported by AI. In these circumstances, it becomes possible to extract relevant vocal patterns for screening, early diagnosis and monitoring of different clinical conditions.
From the formulation of communicative intention in the cortical areas of the brain — including Broca’s area (associated with motor control of speech) and Wernicke’s area (related to language comprehension) to the final sound emission, the speech process requires precise motor control, integrated cognitive function and continuous regulation by auditory and proprioceptive feedback mechanisms. Hearing regulates characteristics such as intonation, volume, and articulation, while proprioception ensures the muscle coordination necessary to produce clear and fluent speech. Any dysfunction along this circuit — resulting from neurodegenerative disease, respiratory disorder, psychiatric condition, or changes associated with aging — can lead to anomalous and detectable acoustic patterns.
Conditions such as obstructive sleep apnea — in which recurrent obstruction of the upper airways compromises vocal quality —, Alzheimer’s disease — which affects language coherence, resulting in shorter and less precise sentences, reduced vocabulary, and more frequent pauses —, Parkinson’s disease — which affects motor control and causes weak, monotonous speech and imprecise articulation — or psychiatric disorders such as depression, are associated with characteristic vocal profiles, showing changes in intensity, pronunciation, rhythm, articulation or linguistic content. Aging, although not a disease, can also induce changes in speech, such as reduced control of tone and vocal strength, which can mimic changes associated with certain diseases, making differential diagnosis difficult.
What are the current approaches?
Traditional medical procedures used for the diagnosis of these diseases are often poorly scalable and accessible, especially for early or large-scale screenings. For example, the go-to method for diagnosing obstructive sleep apnea is polysomnography — an overnight sleep study conducted in a clinic that monitors breathing, heart rate, brain activity, and body movements. However, this method is expensive, time-consuming, and uncomfortable for patients. In the case of Alzheimer’s disease and depression, the diagnosis remains largely subjective: Alzheimer’s symptoms are often mistaken for aging, and depression depends on self-reports and clinical judgment, which leads to great variability and sometimes delays in recognizing the disease.
Speech-based detection therefore represents a promising, non-invasive, and potentially more affordable alternative. However, most current AI models face several limitations: many are designed to detect only one condition at a time, and they use complex “black-box” algorithms — systems trained on small, homogeneous datasets whose decision-making process is difficult to interpret, limiting their clinical adoption. Other obstacles include the scarcity of diverse speech datasets, ethical and legal concerns related to privacy, the difficulty to generalizing models to different languages, speech tasks, or acoustic environments, and the tendency to pick up irrelevant patterns, such as background noise, rather than signals associated with actual symptoms. These challenges point to the need to develop more robust, interpretable, and reliable AI approaches with greater applicability in real-world clinical settings.
At the same time, other approaches explore the analysis of non-verbal signals produced during speech, such as the muscle activity of the face and neck. This field, known as silent computational paralinguistics, focuses on the study of aspects such as pauses, facial expressions, and related physiological cues. Muscle activity is usually measured using surface electromyography (EMG), which uses small sensors placed on the face and neck to record electrical signals of the muscles. These signals can be used to reconstruct speech in individuals who are unable to speak. However, EMG remains an invasive, expensive technique limited to laboratory settings, which restores its large-scale applicability.
What does innovation consist of? How is the impact of this study assessed?
This study explored new non-invasive approaches to detecting diseases through speech. The central innovation consisted of the development of a system that, from the recording of the voice with a microphone, generates artificial signals that replicate muscle activity during speech production. For this, parallel voice recordings and EMG signals from 8 individuals were used. Initially, acoustic characteristics – such as pitch and rhythm – and hourglass-shaped neural networks were extracted from the recordings to recreate simplified muscle signals, which were later compared with real EMG signals. In a second step, the simplified real EMG signals were processed by convolutional neural networks and bidirectional long short-term memory networks, allowing the generation of artificial EMG signals, which were also compared with real EMG to assess accuracy.
In the analysis of obstructive sleep apnea, 40 YouTube videos were used, from which three modalities were extracted: voice recordings, facial images, and lip movements. These were processed by convolutional neural networks to identify patterns associated with the disease. The voice recordings were analyzed for pitch and harshness, with noise filtering techniques and identifying unique vocal patterns. Facial images were evaluated based on shape and texture, while the lip movements allowed us to analyse to joint. The three modalities were integrated through two strategies: early fusion, with join analysis from the beginning, and late fusion, with individual analysis followed by combination of forecasts. The performance of the models was evaluated for their ability to distinguish patients with obstructive sleep apnea from healthy individuals.
In the case of Alzheimer’s disease, two datasets were used: the Interdisciplinary Longitudinal Study on Adult Development and Aging – ILSE (long interviews in German) and Alzheimer’s Dementia Recognition through Spontaneous Speech – ADReSS. Models such as Gaussian Mixture Models, Linear Discriminant Analysis, and Support Vector Machines were applied to analyze linguistic (such as lexical richness, grammatical structure, and pauses) and acoustic features (such as pitch, rhythm, and voice quality). The study tested which features and models performed best when evaluated on the same dataset they were trained on, and how models trained with German data performed when applied to English data and vice versa, to distinguish between Alzheimer’s patients and healthy individuals.
Additionally, reference parameters for healthy speech were defined based on the Crowdsourced Language Assessment Corpus dataset, composed of recordings of individuals describing images and producing vowel sounds. From these recordings, representative vocal features were extracted — such as tone, speech rate, and lexical diversity. Then, Machine learning algorithms, including Support Vector Machines, Logistic Regression, and Neural Additive Models, were trained to classify individuals as healthy or diseased based on deviations from these parameters. The models were then evaluated with the ADReSS (Alzheimer’s) set and the Parkinson’s dataset in Spanish (PC-GITA), where they produced sustained vowel sounds, to test their ability to detect speech alterations associated with these diseases.
Finally, Large Language Models (LLMs), including GPT-4-Turbo, Llama-2-13B, Mistral-7B, and Mixtral-8x7B, were tested to assess their ability to identify Alzheimer’s disease through the analysis of textual transcripts of speech. Based on ADReSS, two approaches were explored: one direct, questioning the models about the speaker’s condition (with or without examples of patient speech) and another based on the previous evaluation of linguistic features such as textual coherence, lexical diversity, sentence length, and lexical retrieval difficulty. The LLM’s predictions were combined with machine learning models such as Support Vector Machines, Linear Discriminant Analysis, One-Nearest Neighbour, Decision Trees, and Random Forests. Speech rate (syllables per second) and explicit annotations of pauses (short, medium, long) were also considered in the accuracy analysis. The models were evaluated for their ability to distinguish between Alzheimer’s patients and healthy individuals, with the LLMs providing a step-by-step explanation, a YES/NO prediction, and a confidence level.
What are the main results? What is the future of this approach?
The results obtained demonstrated that the AI models developed can predict simplified muscle signals with an accuracy of approximately 75% when trained and tested with data from the same person and recording. In more challenging scenarios, such as generalization to different recording sessions or different individuals, the accuracies obtained were 57% and 46%, respectively. The artificial EMG signals achieved a reasonable match with the real EMG signals, with an average similarity of 66.3%.
For the detection of obstructive sleep apnea, lip movements were the isolated modality the best performance (80%), followed by facial images (77.5%) and voice recordings (67.5%). The late fusion of the three modalities — combining the forecasts after the individual analysis — obtained the highest overall accuracy (82.5%), highlighting the benefit of multimodal integration, especially in contexts with noisy or incomplete data.
Regarding Alzheimer’s disease, the Support Vector Machines models outperformed the other classifiers in the ADReSS and ILSE datasets. In ADReSS linguistic features were more informative (77.1% accuracy) than acoustic features (66.7%). In ILSE, both features showed high results, with 86% accuracy for acoustic features and 83.8% for linguistic features. However, when the models trained in German were tested in English, and vice versa, it was revealed a marked loss of performance, with accuracies dropping to values close to change, underlining the difficulty of transferring AI models between different languages and different recording conditions.
The analysis of healthy speech established benchmarks for the detection of deviations, allowing the models to identify subtle changes in Parkinson’s and Alzheimer’s patients, such as altered pitch, slower speech, and reduced vocabulary. In the case of Parkinson’s disease, the Neural Additive Model correctly identified patients in 75% of cases during training and in 69% of cases with new data. For Alzheimer’s disease, performance was even better, with an accuracy of 84% in training and 75% in new data. Although the Neural Additive Model was slightly less accurate than Logistic Regression and Support Vector Machines in detecting Parkinson’s, it outperformed both in detecting Alzheimer’s and offered the added advantage of interpretability by showing how each speech feature contributed to its predictions.
The study also revealed that GPT-4-Turbo was the best-performing model of LLMs, with 77% accuracy in detecting Alzheimer’s disease in data not used during training. Approaches based on the classification of linguistic characteristics — such as textual coherence and lexical diversity — have outperformed the strategy of asking LLMs directly to predict the diagnosis. The inclusion of speech rate slightly improved detection, while pause annotations did not bring significant gains. Among the classifiers, the Support Vector Machines achieved the highest accuracy, with 81.3%.
Overall, the study demonstrated the potential of speech as a remote and scalable biomarker for screening for multiple diseases. Integrating multimodal data — such as facial images, lip movements, and artificial EMG signals — with interpretable machine learning models strengthens the ability to detect changes related to neurological, respiratory, and psychiatric diseases. The results highlight the benefits of late fusion in the management of noisy real-world data and revealed practical challenges, such as variations in language, context, and recording devices (e.g., differences between recordings made with a home phone vs clinical microphones). These findings underlined the importance of establishing normative parameters of healthy speech to accurately detect subtle deviations indicative of pathology.
Future work should focus on the development of large-scale and diverse datasets to improve the generalization of models across diseases, languages, contexts (including uncontrolled environments), and recording conditions. The priority will be to integrate additional non-invasive biosignals – such as cough – and adopt multimodal approaches to capture overlapping effects of comorbidities. Advances in machine learning could also increase interpretability and reliability. Collaboration with clinicians and speech therapists will allow multicenter studies to be conducted, ensuring relevance and clinical applicability. User-centric mobile prototypes, in compliance with the general data protection regulation and medical regulations, will facilitate integration into clinical workflows. The goal is to enable this technology for mobile and telehealth platforms, allowing continuous and passive monitoring for preventive and personalized healthcare, safeguarding ethical and privacy issues.
Would you like to know all the details?
Autonomous Patient Mobility in a Hospital Environment
The internal transport of patients in healthcare institutions, although at first glance it may seem like a simple task, represents a complex, continuous, demanding and…
Immersive Reality as a Formative Tool for Understanding Schizophrenia
Schizophrenia is a chronic mental disorder that significantly affects thinking, emotions, perception of reality and behaviour. It is characterised by a break with reality (psychosis),…
Augmented Reality in Laparoscopy: A New Way to Operate Comfortably
Minimally invasive surgery is a procedure performed inside the body through small incisions and is referred to by different names depending on the region intervened….
What Literature Reveals About Healthcare in the Future
The healthcare sector is undergoing rapid transformation driven by population aging, increasing complexity of care, and digital advancements, in a context that requires greater integration,…
Prediction of Delivery Mode After Induction Using Machine Learning Models
Induction of labor is a frequently performed obstetric procedure that involves the artificial initiation of uterine contractions before spontaneous onset. Its use has been increasing…
From Mammographic Reports in Portuguese to Structured Clinical Data: An Automatic Transformation
Digital transformation has reconfigured healthcare systems, by integrating digital technologies into clinical and administrative processes, driving improvements in the quality of services, patient safety and…
A Digital Intervention for Insomnia in Oncology
Insomnia is a sleep disorder characterised by persistent difficulties in initiating sleep, maintaining sleep during the night, or achieving restful sleep. These difficulties arise even…
From Awareness to Change With a Digital Toolkit to Promote Healthy Lifestyles
In the last century, the average human life expectancy has significantly increased, due to improvements in healthcare, sanitation, nutrition, and medical therapies. However, this increased…
Digital Technology Revolutionising Post-cardiac Surgery
According to the World Health Organisation, cardiovascular disease remains the leading cause of death worldwide, responsible for around 17.9 million deaths a year. Its high…
Autonomous Robotics System for Autism Therapy
Autism spectrum disorder is a neurodevelopmental condition with significant clinical, social and economic repercussions throughout life. According to the World Health Organization, it is estimated…
Virtual Reality Exergames as a Tool for Diagnosing Eye Diseases
Eye diseases represent a growing public health challenge in Portugal, significantly compromising the population’s quality of life. The increase in their prevalence is associated with…
Improving Efficiency in the Clinical Follow-up for Covid-19 Cases With a Digital Platform
COVID-19, caused by the SARS-CoV-2 virus, is a highly contagious disease with the potential to cause serious complications, requiring the isolation of infected individuals and…
The Rising Threat of Antibiotic-resistant Klebsiella in Portuguese Hospitals
Healthcare-associated infections pose a serious public health threat, as they are acquired during medical treatments or hospital stays, often leading to prolonged hospitalizations, high costs…
Deep Neural Networks And The Future Of Early Detection Of Alzheimer’s Disease
Alzheimer’s disease is the most common form of dementia, affecting more than 55 million people globally and accounting for around 70 percent of dementia cases….
The Challenges of Data Protection in Digital Health Platforms for the Elderly
Demographic ageing poses significant challenges to healthcare systems, intensifying the pressure on infrastructures and human resources. It is estimated that by 2050 the elderly population…
Mobile Application to Improve Workflows in Nursing Homes
Portugal has one of the highest aging populations in the world, placing increasing pressure on elderly care services, especially in nursing homes. Healthcare professionals in…
Facilitating Epilepsy Diagnosis With a Wireless and Wearable EEG System
Paroxysmal diseases are characterized by sudden, episodic conditions that cause temporary changes in the body. Among them, epilepsy stands out for causing synchronous and uncontrolled…
Automatic Segmentation of Blood Vessels in Carotid Ultrasound Images
Vascular diseases, such as carotid stenosis (narrowing of the carotid arteries, which connect the heart to the brain, caused by the accumulation of fatty atheroma…
Impact of Robotherapy-PARO on Elderly People With Dementia in Portugal
Aging is a gradual, multifactorial and continuous process characterized by the progressive loss of biological function and degeneration associated with the onset of age-related diseases….
New Era of Interoperability in Healthcare Systems
The growing use of electronic health records, digital diagnostic systems and remote monitoring technologies has led to a significant increase in the volume and complexity…
Improvement in Breast Tumor Localization With an Image Fusion Algorithm
Breast-conserving surgery aims to remove tumors while preserving as much healthy breast tissue as possible, ensuring optimal aesthetic outcomes that are critical for a patient’s…
Collaborative Robotics Improves Working Conditions
Workers face growing challenges in the industrial environment. Among the most critical are fatigue and inappropriate postures, often associated with repetitive tasks and working conditions…
The Role of Mobile Technologies in the Monitoring and Rehabilitation of Peripheral Arterial Disease
PAD is a prevalent chronic condition, affecting approximately 200 million individuals globally, characterized by obstruction of the peripheral arteries, especially in the lower extremities, due…
Incorporation of Digital Implants Into CT Images to Plan Orthopedic Surgery
Orthopedic surgery addresses conditions of the musculoskeletal system to alleviate pain, restore function, and enhance the patient’s quality of life. Its success relies on meticulous…
Digital Health at the Top of the National Poliempreende 2024 Results
Poliempreende is a consolidated national network for encouraging entrepreneurship in higher education in Portugal, with two decades of existence. Focused on promoting innovation, the competition…
Digital Solution Facilitates Interaction Between Users and Health Professionals
Many patients face difficulties scheduling medical appointments in hospital units, and, when successful, they often endure long waiting times to be attended. This situation is…
The Impact of Calm Computing Integration on the Clinical Process
In recent years, digital transformation in healthcare has played a crucial role, driven by the exponential increase in medical data. This ranges from administrative information…
ULS Almada-Seixal Revolutionizes With the Region’s First Surgical Robot
In recent years, ULSAS has been gradually implementing robotic systems, reinforcing its commitment to innovation and improving healthcare. Recently, the institution acquired a state-of-the-art robotic…
Online Intervention Aims to Prevent Anxiety in the General Population
Anxiety disorders are a global problem, affecting 300 million people worldwide and placing significant pressure on individuals and healthcare systems. In Europe alone, the economic…
Rehabilitation of Facial Paralysis Through Virtual Assistants
Facial paralysis, defined by the inability to move one or both sides of the face, has an incidence of 20 to 30 cases per 100,000…
Detection of Anxiety and Panic Attacks in Real Time
The growing number of people with anxiety disorders, along with increased awareness of mental health, drives the need for new technological tools that provide remote…
A Novel Approach for Robotic-assisted Tele-echography
Currently, robotic systems for ultrasound diagnostic procedures fall into two main categories: portable robots that require manual positioning and fully autonomous robotic systems that independently…
From Big Data to Big Decisions: How AI Stratifies Cancer Cases by Risk Factors
The CLARIFY Decision Support Platform (DSP) is a responsive web application designed to support decision-making in cancer care through real-time data integration and predictive analytics….
From “Free Text” to Structured Clinical Data: the Foundation for Clinical Decision Support Systems
Currently, the practice of recording clinical information in “free text” offers flexibility, but hinders automatic data extraction, limiting the application of analytical models. Most records…
Artificial Intelligence used in Depression Detection in Cancer Survivors
The goal of the FAITH project (Federated Artificial Intelligence solution for moniToring mental Health status after cancer treatment) is to remotely identify and predict depressive…
Integration of SONHO v2 and SClínico Systems at ULS of Coimbra to Improve Healthcare Services
With more than half a million hospital medical consultations carried out in the first half of 2024, the ULS of Coimbra stands out as an…
Elderly Care Ecosystem: an Innovative Platform for Personalized and Efficient Services
The Elderly Care Ecosystem (ECE) is an integration of various digital health technologies, exploring the areas of telehealth and predictive analytics. The goal of this…
Innovative technology that subconsciously relieves anxiety through a scarf
The SCAARF technology aims to offer an alternative method to alleviate anxiety symptoms in a non-intrusive and subconscious way. This technology is an innovative idea…
Digital Health Interventions: Equity in Hypertension Care for Everyone
Nearly half of all adults in the United States have hypertension, one of the leading risk factors for cardiovascular disease, and only about a quarter…
Personalization and Technology in Diabetes Management
IPDM has significant potential to improve diabetes management and drive health system reforms to become high-performing, effective, equitable, accessible, and sustainable. Evidence and good practices…
Negotiations on the European Health Data Space Advance With the Participation of the SPMS
The European Health Data Space will be a common health data sharing system across the European Union. It foresees the use of data for purposes…
Secretary of State Margarida Tavares Emphasizes Digital Innovation in Health Promotion
Margarida Tavares spoke at the opening of the conference ” O Digital na promoção contínua da saúde e do bem-estar”, organized by the Associação para…
ARS Algarve Modernizes Radiology With AI and New Data Center
The radiology service of ARS Algarve has already performed nearly 29,000 exams using Artificial Intelligence (AI) technology. In recent years, there has been a significant…
European Health Data Space: Unified Access To Health Data In The EU
The COVID-19 pandemic highlighted the importance of digital services in health, but complex rules and increasing cyberattacks make it difficult to share data across Member…
European Commission Amends Digital Europe Programme With an Investment of €762.7 Million
The European Commission has amended the Digital Europe Programme work programmes 2023-2024, investing an additional €762.7 million in Europe’s digital transition and cybersecurity. The digital…
SPMS Integrates the TEF-Health Initiative
SPMS participates in the TEF-Health initiative as a partner in a consortium composed of 51 entities from 9 European Union countries. This action is co-financed…
FMUP Creates Inhealth Junior Academy for High School Students
The InHealth Junior Academy — Academia Júnior de Inovação em Saúde is an initiative of the Departamento de Medicina da Comunidade, Informação e Decisão em…
SPMS Represents Portugal as Vice-president of GDHP
The GDHP is an intergovernmental organization in the digital health sector that facilitates cooperation and collaboration between government representatives and the World Health Organization (WHO)….
Digital Transformation of Health at INCoDe.2030 in Tomar
The “National Digital Skills Initiative e.2030, Portugal” (INCoDe.2030) is an initiative that aims to improve the Portuguese population’s level of digital skills, placing Portugal at…
Braga Hospital Evaluates Memory With Interactive Game in Patients With Multiple Sclerosis
Multiple Sclerosis is known as a chronic disease of the central nervous system, with a wide variety of motor and sensory symptoms that can lead…
Neurosurgery Teleconsultation Wins Innovation Award
The aim of the BI Award for Innovation in Healthcare is to recognize innovative projects in the healthcare sector that improve the quality of life…
HealthData@PT: New SPMS Initiative for Health Data
Action HealthData@PT is launched in the context of the implementation of the European Health Data Space, and is an initiative approved by the European Commission…
Do you have an innovative idea in healthcare field?
Share it with us and see it come to life.
We will help bring your projects to life!
Co-funded by
Co-funded by
Home / Publications / Publication

THE FUTURE OF DIAGNOSTICS: SPEECH AND AI
Publication type: Article Summary
Original title: Speech as biomarker for multidisease screening
Article publication date: November 2024
Source: Repositório Institucional do Instituto Superior Técnico (Scholar)
Author: Catarina Botelho
Supervisors: Isabel Trancoso, Alberto Abad & Tanja Schultz
What is the goal, target audience, and areas of digital health it addresses?
This study aims to explore and validate the use of speech as a non-invasive and low-cost digital biomarker for remote screening of multiple diseases, particularly those affecting the respiratory, neuvosus, and muscular systems. It is directed at the medical community, as well as researchers and professionals in the fields of Artificial Intelligence (AI) and signal processing. Within the field of digital health, the study contributes to key areas such as remote telemonitoring and surveillance, digital biomarkers, AI-powered diagnostics, predictive and personalized health, and silent computational paralinguistics.
What is the context?
Speech is a biomarker that reflects, in a sensitive way, the integrated functioning of several physiological systems, namely the nervous, respiratory, and muscular systems. This complexity makes it a promising resource for detecting changes associated with health status. Although speech is not in itself a digital biomarker, it can acquire this status when it is captured, digitized and analyzed by computational methods, namely those supported by AI. In these circumstances, it becomes possible to extract relevant vocal patterns for screening, early diagnosis and monitoring of different clinical conditions.
From the formulation of communicative intention in the cortical areas of the brain — including Broca’s area (associated with motor control of speech) and Wernicke’s area (related to language comprehension) to the final sound emission, the speech process requires precise motor control, integrated cognitive function and continuous regulation by auditory and proprioceptive feedback mechanisms. Hearing regulates characteristics such as intonation, volume, and articulation, while proprioception ensures the muscle coordination necessary to produce clear and fluent speech. Any dysfunction along this circuit — resulting from neurodegenerative disease, respiratory disorder, psychiatric condition, or changes associated with aging — can lead to anomalous and detectable acoustic patterns.
Conditions such as obstructive sleep apnea — in which recurrent obstruction of the upper airways compromises vocal quality —, Alzheimer’s disease — which affects language coherence, resulting in shorter and less precise sentences, reduced vocabulary, and more frequent pauses —, Parkinson’s disease — which affects motor control and causes weak, monotonous speech and imprecise articulation — or psychiatric disorders such as depression, are associated with characteristic vocal profiles, showing changes in intensity, pronunciation, rhythm, articulation or linguistic content. Aging, although not a disease, can also induce changes in speech, such as reduced control of tone and vocal strength, which can mimic changes associated with certain diseases, making differential diagnosis difficult.
What are the current approaches?
Traditional medical procedures used for the diagnosis of these diseases are often poorly scalable and accessible, especially for early or large-scale screenings. For example, the go-to method for diagnosing obstructive sleep apnea is polysomnography — an overnight sleep study conducted in a clinic that monitors breathing, heart rate, brain activity, and body movements. However, this method is expensive, time-consuming, and uncomfortable for patients. In the case of Alzheimer’s disease and depression, the diagnosis remains largely subjective: Alzheimer’s symptoms are often mistaken for aging, and depression depends on self-reports and clinical judgment, which leads to great variability and sometimes delays in recognizing the disease.
Speech-based detection therefore represents a promising, non-invasive, and potentially more affordable alternative. However, most current AI models face several limitations: many are designed to detect only one condition at a time, and they use complex “black-box” algorithms — systems trained on small, homogeneous datasets whose decision-making process is difficult to interpret, limiting their clinical adoption. Other obstacles include the scarcity of diverse speech datasets, ethical and legal concerns related to privacy, the difficulty to generalizing models to different languages, speech tasks, or acoustic environments, and the tendency to pick up irrelevant patterns, such as background noise, rather than signals associated with actual symptoms. These challenges point to the need to develop more robust, interpretable, and reliable AI approaches with greater applicability in real-world clinical settings.
At the same time, other approaches explore the analysis of non-verbal signals produced during speech, such as the muscle activity of the face and neck. This field, known as silent computational paralinguistics, focuses on the study of aspects such as pauses, facial expressions, and related physiological cues. Muscle activity is usually measured using surface electromyography (EMG), which uses small sensors placed on the face and neck to record electrical signals of the muscles. These signals can be used to reconstruct speech in individuals who are unable to speak. However, EMG remains an invasive, expensive technique limited to laboratory settings, which restores its large-scale applicability.
What does innovation consist of? How is the impact of this study assessed?
This study explored new non-invasive approaches to detecting diseases through speech. The central innovation consisted of the development of a system that, from the recording of the voice with a microphone, generates artificial signals that replicate muscle activity during speech production. For this, parallel voice recordings and EMG signals from 8 individuals were used. Initially, acoustic characteristics – such as pitch and rhythm – and hourglass-shaped neural networks were extracted from the recordings to recreate simplified muscle signals, which were later compared with real EMG signals. In a second step, the simplified real EMG signals were processed by convolutional neural networks and bidirectional long short-term memory networks, allowing the generation of artificial EMG signals, which were also compared with real EMG to assess accuracy.
In the analysis of obstructive sleep apnea, 40 YouTube videos were used, from which three modalities were extracted: voice recordings, facial images, and lip movements. These were processed by convolutional neural networks to identify patterns associated with the disease. The voice recordings were analyzed for pitch and harshness, with noise filtering techniques and identifying unique vocal patterns. Facial images were evaluated based on shape and texture, while the lip movements allowed us to analyse to joint. The three modalities were integrated through two strategies: early fusion, with join analysis from the beginning, and late fusion, with individual analysis followed by combination of forecasts. The performance of the models was evaluated for their ability to distinguish patients with obstructive sleep apnea from healthy individuals.
In the case of Alzheimer’s disease, two datasets were used: the Interdisciplinary Longitudinal Study on Adult Development and Aging – ILSE (long interviews in German) and Alzheimer’s Dementia Recognition through Spontaneous Speech – ADReSS. Models such as Gaussian Mixture Models, Linear Discriminant Analysis, and Support Vector Machines were applied to analyze linguistic (such as lexical richness, grammatical structure, and pauses) and acoustic features (such as pitch, rhythm, and voice quality). The study tested which features and models performed best when evaluated on the same dataset they were trained on, and how models trained with German data performed when applied to English data and vice versa, to distinguish between Alzheimer’s patients and healthy individuals.
Additionally, reference parameters for healthy speech were defined based on the Crowdsourced Language Assessment Corpus dataset, composed of recordings of individuals describing images and producing vowel sounds. From these recordings, representative vocal features were extracted — such as tone, speech rate, and lexical diversity. Then, Machine learning algorithms, including Support Vector Machines, Logistic Regression, and Neural Additive Models, were trained to classify individuals as healthy or diseased based on deviations from these parameters. The models were then evaluated with the ADReSS (Alzheimer’s) set and the Parkinson’s dataset in Spanish (PC-GITA), where they produced sustained vowel sounds, to test their ability to detect speech alterations associated with these diseases.
Finally, Large Language Models (LLMs), including GPT-4-Turbo, Llama-2-13B, Mistral-7B, and Mixtral-8x7B, were tested to assess their ability to identify Alzheimer’s disease through the analysis of textual transcripts of speech. Based on ADReSS, two approaches were explored: one direct, questioning the models about the speaker’s condition (with or without examples of patient speech) and another based on the previous evaluation of linguistic features such as textual coherence, lexical diversity, sentence length, and lexical retrieval difficulty. The LLM’s predictions were combined with machine learning models such as Support Vector Machines, Linear Discriminant Analysis, One-Nearest Neighbour, Decision Trees, and Random Forests. Speech rate (syllables per second) and explicit annotations of pauses (short, medium, long) were also considered in the accuracy analysis. The models were evaluated for their ability to distinguish between Alzheimer’s patients and healthy individuals, with the LLMs providing a step-by-step explanation, a YES/NO prediction, and a confidence level.
What are the main results? What is the future of this approach?
The results obtained demonstrated that the AI models developed can predict simplified muscle signals with an accuracy of approximately 75% when trained and tested with data from the same person and recording. In more challenging scenarios, such as generalization to different recording sessions or different individuals, the accuracies obtained were 57% and 46%, respectively. The artificial EMG signals achieved a reasonable match with the real EMG signals, with an average similarity of 66.3%.
For the detection of obstructive sleep apnea, lip movements were the isolated modality the best performance (80%), followed by facial images (77.5%) and voice recordings (67.5%). The late fusion of the three modalities — combining the forecasts after the individual analysis — obtained the highest overall accuracy (82.5%), highlighting the benefit of multimodal integration, especially in contexts with noisy or incomplete data.
Regarding Alzheimer’s disease, the Support Vector Machines models outperformed the other classifiers in the ADReSS and ILSE datasets. In ADReSS linguistic features were more informative (77.1% accuracy) than acoustic features (66.7%). In ILSE, both features showed high results, with 86% accuracy for acoustic features and 83.8% for linguistic features. However, when the models trained in German were tested in English, and vice versa, it was revealed a marked loss of performance, with accuracies dropping to values close to change, underlining the difficulty of transferring AI models between different languages and different recording conditions.
The analysis of healthy speech established benchmarks for the detection of deviations, allowing the models to identify subtle changes in Parkinson’s and Alzheimer’s patients, such as altered pitch, slower speech, and reduced vocabulary. In the case of Parkinson’s disease, the Neural Additive Model correctly identified patients in 75% of cases during training and in 69% of cases with new data. For Alzheimer’s disease, performance was even better, with an accuracy of 84% in training and 75% in new data. Although the Neural Additive Model was slightly less accurate than Logistic Regression and Support Vector Machines in detecting Parkinson’s, it outperformed both in detecting Alzheimer’s and offered the added advantage of interpretability by showing how each speech feature contributed to its predictions.
The study also revealed that GPT-4-Turbo was the best-performing model of LLMs, with 77% accuracy in detecting Alzheimer’s disease in data not used during training. Approaches based on the classification of linguistic characteristics — such as textual coherence and lexical diversity — have outperformed the strategy of asking LLMs directly to predict the diagnosis. The inclusion of speech rate slightly improved detection, while pause annotations did not bring significant gains. Among the classifiers, the Support Vector Machines achieved the highest accuracy, with 81.3%.
Overall, the study demonstrated the potential of speech as a remote and scalable biomarker for screening for multiple diseases. Integrating multimodal data — such as facial images, lip movements, and artificial EMG signals — with interpretable machine learning models strengthens the ability to detect changes related to neurological, respiratory, and psychiatric diseases. The results highlight the benefits of late fusion in the management of noisy real-world data and revealed practical challenges, such as variations in language, context, and recording devices (e.g., differences between recordings made with a home phone vs clinical microphones). These findings underlined the importance of establishing normative parameters of healthy speech to accurately detect subtle deviations indicative of pathology.
Future work should focus on the development of large-scale and diverse datasets to improve the generalization of models across diseases, languages, contexts (including uncontrolled environments), and recording conditions. The priority will be to integrate additional non-invasive biosignals – such as cough – and adopt multimodal approaches to capture overlapping effects of comorbidities. Advances in machine learning could also increase interpretability and reliability. Collaboration with clinicians and speech therapists will allow multicenter studies to be conducted, ensuring relevance and clinical applicability. User-centric mobile prototypes, in compliance with the general data protection regulation and medical regulations, will facilitate integration into clinical workflows. The goal is to enable this technology for mobile and telehealth platforms, allowing continuous and passive monitoring for preventive and personalized healthcare, safeguarding ethical and privacy issues.
Would you like to know all the details?
Autonomous Patient Mobility in a Hospital Environment
The internal transport of patients in healthcare institutions, although at first glance it may seem like a simple task, represents a complex, continuous, demanding and time-consuming logistical operation that cuts across all levels of the…
Immersive Reality as a Formative Tool for Understanding Schizophrenia
Schizophrenia is a chronic mental disorder that significantly affects thinking, emotions, perception of reality and behaviour. It is characterised by a break with reality (psychosis), often manifested by hallucinations (such as hearing non-existent voices), delusions…
Augmented Reality in Laparoscopy: A New Way to Operate Comfortably
Minimally invasive surgery is a procedure performed inside the body through small incisions and is referred to by different names depending on the region intervened. When it occurs in the abdominal or pelvic cavity, it…
What Literature Reveals About Healthcare in the Future
The healthcare sector is undergoing rapid transformation driven by population aging, increasing complexity of care, and digital advancements, in a context that requires greater integration, sustainability and adaptation to new realities such as the European…
Prediction of Delivery Mode After Induction Using Machine Learning Models
Induction of labor is a frequently performed obstetric procedure that involves the artificial initiation of uterine contractions before spontaneous onset. Its use has been increasing globally, particularly in high-income countries, where it accounts for about…
From Mammographic Reports in Portuguese to Structured Clinical Data: An Automatic Transformation
Digital transformation has reconfigured healthcare systems, by integrating digital technologies into clinical and administrative processes, driving improvements in the quality of services, patient safety and the organizational efficiency of institutions. In addition to the computerization…
A Digital Intervention for Insomnia in Oncology
Insomnia is a sleep disorder characterised by persistent difficulties in initiating sleep, maintaining sleep during the night, or achieving restful sleep. These difficulties arise even in the presence of adequate sleeping conditions and are often…
From Awareness to Change With a Digital Toolkit to Promote Healthy Lifestyles
In the last century, the average human life expectancy has significantly increased, due to improvements in healthcare, sanitation, nutrition, and medical therapies. However, this increased longevity does not always translate into more years of healthy…
Digital Technology Revolutionising Post-cardiac Surgery
According to the World Health Organisation, cardiovascular disease remains the leading cause of death worldwide, responsible for around 17.9 million deaths a year. Its high prevalence is associated with unhealthy lifestyles characterised by poor diet,…
Autonomous Robotics System for Autism Therapy
Autism spectrum disorder is a neurodevelopmental condition with significant clinical, social and economic repercussions throughout life. According to the World Health Organization, it is estimated to affect approximately 1 in 160 children worldwide. Its origin…
Virtual Reality Exergames as a Tool for Diagnosing Eye Diseases
Eye diseases represent a growing public health challenge in Portugal, significantly compromising the population’s quality of life. The increase in their prevalence is associated with various factors, such as demographic ageing, excessive use of digital…
Improving Efficiency in the Clinical Follow-up for Covid-19 Cases With a Digital Platform
COVID-19, caused by the SARS-CoV-2 virus, is a highly contagious disease with the potential to cause serious complications, requiring the isolation of infected individuals and appropriate clinical follow-up. While severe cases require hospitalization, patients with…
The Rising Threat of Antibiotic-resistant Klebsiella in Portuguese Hospitals
Healthcare-associated infections pose a serious public health threat, as they are acquired during medical treatments or hospital stays, often leading to prolonged hospitalizations, high costs for healthcare systems and high mortality rates. Portugal has one…
Deep Neural Networks And The Future Of Early Detection Of Alzheimer’s Disease
Alzheimer’s disease is the most common form of dementia, affecting more than 55 million people globally and accounting for around 70 percent of dementia cases. In Portugal, it is estimated that 200,000 people live with…
The Challenges of Data Protection in Digital Health Platforms for the Elderly
Demographic ageing poses significant challenges to healthcare systems, intensifying the pressure on infrastructures and human resources. It is estimated that by 2050 the elderly population will exceed 2 billion people, making it imperative to implement…
Mobile Application to Improve Workflows in Nursing Homes
Portugal has one of the highest aging populations in the world, placing increasing pressure on elderly care services, especially in nursing homes. Healthcare professionals in these facilities are often overwhelmed due to the increasing number…
Facilitating Epilepsy Diagnosis With a Wireless and Wearable EEG System
Paroxysmal diseases are characterized by sudden, episodic conditions that cause temporary changes in the body. Among them, epilepsy stands out for causing synchronous and uncontrolled neuronal discharges, resulting in recurrent and unprovoked seizures. These seizures…
Automatic Segmentation of Blood Vessels in Carotid Ultrasound Images
Vascular diseases, such as carotid stenosis (narrowing of the carotid arteries, which connect the heart to the brain, caused by the accumulation of fatty atheroma plaques), cerebrovascular accidents (CVA) (sudden interruption of blood flow to…
Impact of Robotherapy-PARO on Elderly People With Dementia in Portugal
Aging is a gradual, multifactorial and continuous process characterized by the progressive loss of biological function and degeneration associated with the onset of age-related diseases. In Portugal, the aging of the population is particularly noticeable,…
New Era of Interoperability in Healthcare Systems
The growing use of electronic health records, digital diagnostic systems and remote monitoring technologies has led to a significant increase in the volume and complexity of health data. This increase intensifies the need for continuous,…
Improvement in Breast Tumor Localization With an Image Fusion Algorithm
Breast-conserving surgery aims to remove tumors while preserving as much healthy breast tissue as possible, ensuring optimal aesthetic outcomes that are critical for a patient’s quality of life. To achieve this objective, precise location of…
Collaborative Robotics Improves Working Conditions
Workers face growing challenges in the industrial environment. Among the most critical are fatigue and inappropriate postures, often associated with repetitive tasks and working conditions that lack ergonomic suitability. These factors represent significant risks for…
The Role of Mobile Technologies in the Monitoring and Rehabilitation of Peripheral Arterial Disease
PAD is a prevalent chronic condition, affecting approximately 200 million individuals globally, characterized by obstruction of the peripheral arteries, especially in the lower extremities, due to the formation of atherosclerotic plaques, which compromise blood flow…
Incorporation of Digital Implants Into CT Images to Plan Orthopedic Surgery
Orthopedic surgery addresses conditions of the musculoskeletal system to alleviate pain, restore function, and enhance the patient’s quality of life. Its success relies on meticulous pre-operative planning that incorporates clinical data and patient-specific imaging to…
Digital Health at the Top of the National Poliempreende 2024 Results
Poliempreende is a consolidated national network for encouraging entrepreneurship in higher education in Portugal, with two decades of existence. Focused on promoting innovation, the competition has had a significant impact on the national economy, with…
Digital Solution Facilitates Interaction Between Users and Health Professionals
Many patients face difficulties scheduling medical appointments in hospital units, and, when successful, they often endure long waiting times to be attended. This situation is aggravated by problems such as the incompatibility of schedules between…
The Impact of Calm Computing Integration on the Clinical Process
In recent years, digital transformation in healthcare has played a crucial role, driven by the exponential increase in medical data. This ranges from administrative information to detailed records of diagnoses, laboratory tests, medical images and…
ULS Almada-Seixal Revolutionizes With the Region’s First Surgical Robot
In recent years, ULSAS has been gradually implementing robotic systems, reinforcing its commitment to innovation and improving healthcare. Recently, the institution acquired a state-of-the-art robotic system, developed under the concept of an ‘immersive intuitive interface’,…
Online Intervention Aims to Prevent Anxiety in the General Population
Anxiety disorders are a global problem, affecting 300 million people worldwide and placing significant pressure on individuals and healthcare systems. In Europe alone, the economic impact reached 74.380 million in 2010, with 62.2% attributed to…
Rehabilitation of Facial Paralysis Through Virtual Assistants
Facial paralysis, defined by the inability to move one or both sides of the face, has an incidence of 20 to 30 cases per 100,000 people annually. This condition often causes facial weakness, difficulties in…
Detection of Anxiety and Panic Attacks in Real Time
The growing number of people with anxiety disorders, along with increased awareness of mental health, drives the need for new technological tools that provide remote and continuous monitoring of anxiety and panic disorders. Thus, the…
A Novel Approach for Robotic-assisted Tele-echography
Currently, robotic systems for ultrasound diagnostic procedures fall into two main categories: portable robots that require manual positioning and fully autonomous robotic systems that independently control the ultrasound probe’s orientation and positioning. Portable robots rely…
From Big Data to Big Decisions: How AI Stratifies Cancer Cases by Risk Factors
The CLARIFY Decision Support Platform (DSP) is a responsive web application designed to support decision-making in cancer care through real-time data integration and predictive analytics. Built on Big Data Europe, the platform integrates a variety…
From “Free Text” to Structured Clinical Data: the Foundation for Clinical Decision Support Systems
Currently, the practice of recording clinical information in “free text” offers flexibility, but hinders automatic data extraction, limiting the application of analytical models. Most records are unstructured, and the use of non-standard abbreviations increases ambiguity,…
Artificial Intelligence used in Depression Detection in Cancer Survivors
The goal of the FAITH project (Federated Artificial Intelligence solution for moniToring mental Health status after cancer treatment) is to remotely identify and predict depressive symptoms in cancer survivors using a federated machine learning approach…
Integration of SONHO v2 and SClínico Systems at ULS of Coimbra to Improve Healthcare Services
With more than half a million hospital medical consultations carried out in the first half of 2024, the ULS of Coimbra stands out as an institution dedicated to integrated, high-quality and patient-centered healthcare, with 8…
Elderly Care Ecosystem: an Innovative Platform for Personalized and Efficient Services
The Elderly Care Ecosystem (ECE) is an integration of various digital health technologies, exploring the areas of telehealth and predictive analytics. The goal of this ecosystem is to improve the quality of life for elderly…
Innovative technology that subconsciously relieves anxiety through a scarf
The SCAARF technology aims to offer an alternative method to alleviate anxiety symptoms in a non-intrusive and subconscious way. This technology is an innovative idea in the field of digital health and wearable technology, designed…
Digital Health Interventions: Equity in Hypertension Care for Everyone
Nearly half of all adults in the United States have hypertension, one of the leading risk factors for cardiovascular disease, and only about a quarter (24%) of those people have their hypertension under control. Studies…
Personalization and Technology in Diabetes Management
IPDM has significant potential to improve diabetes management and drive health system reforms to become high-performing, effective, equitable, accessible, and sustainable. Evidence and good practices inspire health system transformation. Adopting person-centred approaches like co-creation and…
Negotiations on the European Health Data Space Advance With the Participation of the SPMS
The European Health Data Space will be a common health data sharing system across the European Union. It foresees the use of data for purposes that benefit people and society. It will ensure citizens have…
Secretary of State Margarida Tavares Emphasizes Digital Innovation in Health Promotion
Margarida Tavares spoke at the opening of the conference ” O Digital na promoção contínua da saúde e do bem-estar”, organized by the Associação para a Promoção e Desenvolvimento da Sociedade da Informação (APDSI) and…
ARS Algarve Modernizes Radiology With AI and New Data Center
The radiology service of ARS Algarve has already performed nearly 29,000 exams using Artificial Intelligence (AI) technology. In recent years, there has been a significant investment in image digitization and data storage, as well as…
European Health Data Space: Unified Access To Health Data In The EU
The COVID-19 pandemic highlighted the importance of digital services in health, but complex rules and increasing cyberattacks make it difficult to share data across Member States; the EHDS, based on several regulations, provides tailor-made rules…
European Commission Amends Digital Europe Programme With an Investment of €762.7 Million
The European Commission has amended the Digital Europe Programme work programmes 2023-2024, investing an additional €762.7 million in Europe’s digital transition and cybersecurity. The digital transition is the main work programme with a budget of…
SPMS Integrates the TEF-Health Initiative
SPMS participates in the TEF-Health initiative as a partner in a consortium composed of 51 entities from 9 European Union countries. This action is co-financed by the European Commission and has a duration of five…
FMUP Creates Inhealth Junior Academy for High School Students
The InHealth Junior Academy — Academia Júnior de Inovação em Saúde is an initiative of the Departamento de Medicina da Comunidade, Informação e Decisão em Saúde da Faculdade de Medicina da Universidade do Porto (FMUP)….
SPMS Represents Portugal as Vice-president of GDHP
The GDHP is an intergovernmental organization in the digital health sector that facilitates cooperation and collaboration between government representatives and the World Health Organization (WHO). Its purpose is to foster policymaking that promote the digitalization…
Digital Transformation of Health at INCoDe.2030 in Tomar
The “National Digital Skills Initiative e.2030, Portugal” (INCoDe.2030) is an initiative that aims to improve the Portuguese population’s level of digital skills, placing Portugal at the level of the most advanced European countries in this…
Braga Hospital Evaluates Memory With Interactive Game in Patients With Multiple Sclerosis
Multiple Sclerosis is known as a chronic disease of the central nervous system, with a wide variety of motor and sensory symptoms that can lead to work disability, socioeconomic burden, and reduced quality of life…
Neurosurgery Teleconsultation Wins Innovation Award
The aim of the BI Award for Innovation in Healthcare is to recognize innovative projects in the healthcare sector that improve the quality of life for the Portuguese people. In 2021, the specific theme was…
HealthData@PT: New SPMS Initiative for Health Data
Action HealthData@PT is launched in the context of the implementation of the European Health Data Space, and is an initiative approved by the European Commission under the EU4Health 2021-2027 programme. This initiative contributes to the…
Do you have an innovative idea in healthcare field?
Share it with us and see it come to life.
We will help bring your projects to life!