Currently, the practice of recording clinical information in “free text” offers flexibility, but hinders automatic data extraction, limiting the application of analytical models. Most records are unstructured, and the use of non-standard abbreviations increases ambiguity, making interpretation difficult. Manual review is slow and prone to errors, while clinical decision support systems, which rely on structured data, are ineffective for “free text.” The use of text mining and natural language processing in healthcare is still at an early stage, with no widespread application in clinical contexts.

Home / Publications / Publication

Home / Publications / Publication

Médico analisando dados médicos no computador

Publication type: Article Summary
Original title: Validação de algoritmos de text mining em contexto de oncologia
Article publication date: October 2020
Source: Repositório Aberto da Universidade do Porto
Author: Renato Ferreira Magalhães
Supervisors: Mário Amorim Lopes & Lúcio Lara Santos

What is the goal, target audience, and areas of digital health it addresses?
     This study aims to provide structured clinical information to support decision-making, facilitating access to clinical data to assist healthcare professionals in their daily activities. The main area of digital health covered is Artificial Intelligence (AI), with an emphasis on machine learning.

What is the context?
     This study was conducted in a reference oncology hospital that generates large volumes of clinical data from medical procedures, diagnoses, treatments, and test results.  The process of consulting and properly recording each patient’s data is time-consuming, which prolongs consultations and reduces the time of health professionals, consequently impacting the quality of clinical decisions. The COVID-19 pandemic, the period in which the study was carried out, further aggravated these difficulties, complicating the allocation of professionals for data annotation tasks.

What are the current approaches?
     Currently, the practice of recording clinical information in “free text” offers flexibility, but hinders automatic data extraction, limiting the application of analytical models. Most records are unstructured, and the use of non-standard abbreviations increases ambiguity, making interpretation difficult. Manual review is slow and prone to errors, while clinical decision support systems, which rely on structured data, are ineffective for “free text.” The use of text mining and natural language processing in healthcare is still at an early stage, with no widespread application in clinical contexts.  

What does innovation consist of? How is the impact of this study assessed?
     The innovation of this study consists in the creation of a structured data repository that feeds the clinical decision support systems. This repository is built by transforming patient records (clinical diaries) written in “free text” into structured data through text mining and natural language processing.

     The process starts with extracting text from PDF documents, followed by normalization, which includes correcting sentences and replacing abbreviations with their full terms. With the text already normalized, the next step is to identify the three categories or entities to which the data refer: ‘Procedures,’ ‘Disorders,’ or ‘Drugs’ using text mining algorithms. The natural language processing is used for the analysis and understanding of the texts.

     In this study, 2 clinical processes were used, and 2 algorithm training models were applied. In the semi-supervised model, the training was performed by automatically associating the biomedical vocabulary of the UMLS (Unified Medical Language System) with the 3 categories while in the supervised model, the training was performed through the manual identification of the categories by 2 human annotators (A and B). The study also included the clinical validation of the text mining algorithms in the oncology context, ensuring the practical relevance and applicability of the developed models. The impact of the models was assessed through performance metrics such as precision, as well as a qualitative analysis performed by a senior physician. 

What are the main results? What is the impact of these results? What is the future of this technology?
     The semi-supervised model presented a precision result of 86.6%, while the supervised model obtained 63.8% precision with annotator A and 65.7% with annotator B. The qualitative analysis revealed that both models left some categories/entities unidentified in certain sentences. Although the results are promising, there is a need to reinforce the training of annotators and train the algorithms with more validated data.

     The impact of these results is significant, as it demonstrates that it is possible to create tools to support clinical decision-making. The use of structured data will allow the creation of intuitive dashboards that present critical information, such as diagnoses, drugs, and even a detailed timeline with the patient’s clinical history. This organization will provide a clear and quick view of the patient’s condition, assisting healthcare professionals make more informed decisions and reducing the time spent analyzing large volumes of “free text”.

     In the future, this tool can be expanded to integrate AI, helping to compare treatments and predict outcomes based on data from similar patients, crossing information with international guidelines and scientific evidence. With this, the tool will provide greater personalization of treatments and optimization of resources. By reducing redundancies, such as repeat tests, and improving the efficiency of clinical processes, technology can directly impact the quality of health care and reduce costs.

Do you have an innovative idea in healthcare field?

Share it with us and see it come to life.
We will help bring your projects to life!

Newsletter

Receive the latest updates from the InovarSaúde portal.

Support

República Portuguesa logo
logotipo SNS
SPMS logotipo

Follow Us

Co-funded by

PRR Logotipo
república Portuguesa logo
União Europeia Logo

Newsletter

Receive the latest updates from the InovarSaúde portal.

Support

República Portuguesa logo
SNS Logo
SPMS Logo

Follow Us

Co-funded by

PRR Logotipo
República Portuguesa logo
União Europeia Logo

Home / Publications / Publication

Médico analisando dados médicos no computador

Publication type: Article Summary
Original title: Validação de algoritmos de text mining em contexto de oncologia
Article publication date: October 2020
Source: Repositório Aberto da Universidade do Porto
Author: Renato Ferreira Magalhães
Supervisors: Mário Amorim Lopes & Lúcio Lara Santos

What is the goal, target audience, and areas of digital health it addresses?
     This study aims to provide structured clinical information to support decision-making, facilitating access to clinical data to assist healthcare professionals in their daily activities. The main area of digital health covered is Artificial Intelligence (AI), with an emphasis on machine learning.

What is the context?
     This study was conducted in a reference oncology hospital that generates large volumes of clinical data from medical procedures, diagnoses, treatments, and test results.  The process of consulting and properly recording each patient’s data is time-consuming, which prolongs consultations and reduces the time of health professionals, consequently impacting the quality of clinical decisions. The COVID-19 pandemic, the period in which the study was carried out, further aggravated these difficulties, complicating the allocation of professionals for data annotation tasks.

What are the current approaches?
     Currently, the practice of recording clinical information in “free text” offers flexibility, but hinders automatic data extraction, limiting the application of analytical models. Most records are unstructured, and the use of non-standard abbreviations increases ambiguity, making interpretation difficult. Manual review is slow and prone to errors, while clinical decision support systems, which rely on structured data, are ineffective for “free text.” The use of text mining and natural language processing in healthcare is still at an early stage, with no widespread application in clinical contexts.  

What does innovation consist of? How is the impact of this study assessed?
     The innovation of this study consists in the creation of a structured data repository that feeds the clinical decision support systems. This repository is built by transforming patient records (clinical diaries) written in “free text” into structured data through text mining and natural language processing.

     The process starts with extracting text from PDF documents, followed by normalization, which includes correcting sentences and replacing abbreviations with their full terms. With the text already normalized, the next step is to identify the three categories or entities to which the data refer: ‘Procedures,’ ‘Disorders,’ or ‘Drugs’ using text mining algorithms. The natural language processing is used for the analysis and understanding of the texts.

     In this study, 2 clinical processes were used, and 2 algorithm training models were applied. In the semi-supervised model, the training was performed by automatically associating the biomedical vocabulary of the UMLS (Unified Medical Language System) with the 3 categories while in the supervised model, the training was performed through the manual identification of the categories by 2 human annotators (A and B). The study also included the clinical validation of the text mining algorithms in the oncology context, ensuring the practical relevance and applicability of the developed models. The impact of the models was assessed through performance metrics such as precision, as well as a qualitative analysis performed by a senior physician. 

What are the main results? What is the impact of these results? What is the future of this technology?
     The semi-supervised model presented a precision result of 86.6%, while the supervised model obtained 63.8% precision with annotator A and 65.7% with annotator B. The qualitative analysis revealed that both models left some categories/entities unidentified in certain sentences. Although the results are promising, there is a need to reinforce the training of annotators and train the algorithms with more validated data.

     The impact of these results is significant, as it demonstrates that it is possible to create tools to support clinical decision-making. The use of structured data will allow the creation of intuitive dashboards that present critical information, such as diagnoses, drugs, and even a detailed timeline with the patient’s clinical history. This organization will provide a clear and quick view of the patient’s condition, assisting healthcare professionals make more informed decisions and reducing the time spent analyzing large volumes of “free text”.

     In the future, this tool can be expanded to integrate AI, helping to compare treatments and predict outcomes based on data from similar patients, crossing information with international guidelines and scientific evidence. With this, the tool will provide greater personalization of treatments and optimization of resources. By reducing redundancies, such as repeat tests, and improving the efficiency of clinical processes, technology can directly impact the quality of health care and reduce costs.

Sistema robótico autónomo INSIDE

Autonomous Robotics System for Autism Therapy

Autism spectrum disorder is a neurodevelopmental condition with significant clinical, social and economic repercussions throughout life. According to the World Health Organization, it is estimated to affect approximately 1 in 160 children worldwide. Its origin…

Read more
Enfermeira com um telefone

Mobile Application to Improve Workflows in Nursing Homes

Portugal has one of the highest aging populations in the world, placing increasing pressure on elderly care services, especially in nursing homes. Healthcare professionals in these facilities are often overwhelmed due to the increasing number…

Read more
troca de informações de saúde e interoperabilidade

New Era of Interoperability in Healthcare Systems

The growing use of electronic health records, digital diagnostic systems and remote monitoring technologies has led to a significant increase in the volume and complexity of health data. This increase intensifies the need for continuous,…

Read more
robótica colaborativa

Collaborative Robotics Improves Working Conditions

Workers face growing challenges in the industrial environment. Among the most critical are fatigue and inappropriate postures, often associated with repetitive tasks and working conditions that lack ergonomic suitability. These factors represent significant risks for…

Read more
Benefícios da Eletrônica Médica

Detection of Anxiety and Panic Attacks in Real Time

The growing number of people with anxiety disorders, along with increased awareness of mental health, drives the need for new technological tools that provide remote and continuous monitoring of anxiety and panic disorders. Thus, the…

Read more
tele-ecografia

A Novel Approach for Robotic-assisted Tele-echography

Currently, robotic systems for ultrasound diagnostic procedures fall into two main categories: portable robots that require manual positioning and fully autonomous robotic systems that independently control the ultrasound probe’s orientation and positioning. Portable robots rely…

Read more
Personalização e tecnologia na gestão da Diabetes

Personalization and Technology in Diabetes Management

IPDM has significant potential to improve diabetes management and drive health system reforms to become high-performing, effective, equitable, accessible, and sustainable. Evidence and good practices inspire health system transformation. Adopting person-centred approaches like co-creation and…

Read more
TEF-HEALTH Logo

SPMS Integrates the TEF-Health Initiative

SPMS participates in the TEF-Health initiative as a partner in a consortium composed of 51 entities from 9 European Union countries. This action is co-financed by the European Commission and has a duration of five…

Read more
Global Digital Health Partnership Logo

SPMS Represents Portugal as Vice-president of GDHP

The GDHP is an intergovernmental organization in the digital health sector that facilitates cooperation and collaboration between government representatives and the World Health Organization (WHO). Its purpose is to foster policymaking that promote the digitalization…

Read more
Portugal INCoDe.2030

Digital Transformation of Health at INCoDe.2030 in Tomar

The “National Digital Skills Initiative e.2030, Portugal” (INCoDe.2030) is an initiative that aims to improve the Portuguese population’s level of digital skills, placing Portugal at the level of the most advanced European countries in this…

Read more
HealthData@PT Logo

HealthData@PT: New SPMS Initiative for Health Data

Action HealthData@PT is launched in the context of the implementation of the European Health Data Space, and is an initiative approved by the European Commission under the EU4Health 2021-2027 programme. This initiative contributes to the…

Read more

Do you have an innovative idea in healthcare field?

Share it with us and see it come to life.
We will help bring your projects to life!

Newsletter

Receive the latest updates from the Inovarsaúde portal.

Support

FAQs

Contacts

República Portuguesa logo
SNS Logo
SPMS Logo

Follow Us

Co-funded by

PRR Logotipo
República Portuguesa logo
União Europeia Logo
Scroll to Top