2
Model Validation Techniques for AI in Cancer Research Based on Physiotherapy and Oncology

The application of artificial intelligence (AI) in cancer research has transformed the landscape concerning the diagnosis, prognosis and personalized therapy of cancer patients. AI analytics have recently emerged in the fields of oncology and physiotherapy to augment clinical decision-making, improve rehabilitation schedule designs and advance patient supervision. However, the clinical efficacy and precision of these models are highly reliant on dependable validation techniques. To guarantee adequate recovery and improvement in a patient’s quality of life, validated AI systems used in real-world clinical settings need to be generalizable to multiple patient cohorts, particularly when applied for guiding physiotherapy interventions. This chapter presents a thorough review of model validation methodologies relating to AI in cancer research, particularly focusing on physiotherapy and oncology. We classify validation techniques into internal approaches (e.g. split-sample, k-fold cross-validation, bootstrapping, etc.) and external ones (e.g. geographic, temporal and demographic validation) and discuss their advantages and drawbacks for clinical research.

Furthermore, we apply advanced analytic methods for evaluating efficacy within clinical workflows, such as prospective real-world validation, decision curve analysis and calibration, to measure and strengthen the value of AI models. Concerns specific to physiotherapy are also considered in this chapter, such as the small size of datasets, the changing course of recovery and data diversity.

Furthermore, the chapter stresses the need for trustworthiness in AI with respect to ethics, governance and the AI’s impact on patients, which includes patient advocacy. Enhanced AI systems that are purposely designed with ethical principles will be fundamental to maintaining safety, equity and fostering optimal health outcomes as they further advance cancer treatments.

2.1. Introduction

The use of artificial intelligence (AI) in cancer care is transforming how decisions are made in the areas of diagnosis, treatment planning and even in rehabilitation. Models in both physiotherapy and oncology are now more advanced and AI model-driven. They have the ability to interpret data from various electronic health records, medical imaging, genomics, sensors and other data provided by patients (Birla et al. 2025). These AI models can assist in the early detection of various types of cancer, anticipating how each patient will respond to treatment, planning clinical actions and even tailoring rehabilitation to suit requirements of every individual patient.

In physiotherapy, specifically in oncological contexts, AI technologies are used to monitor motor function, fatigue levels, pain and optimize recovery pathways (Vaniya et al. 2024). AI-powered platforms are capable of analyzing data collected from wearable sensors, video footage and gait-tracking systems to give feedback on rehabilitation’s progress so that therapeutic exercises can be modified in real time. These tailored approaches are essential for cancer survivors who experience long-term functional limitations due to treatment including chemotherapy, radiotherapy and surgery (Lippi et al. 2024).

Now, regardless of how promising AI could be, its effective application in clinical oncology and physiotherapy strongly relies on the effective model validation process and its transparency. Model validation is a specific method that considers whether a model’s estimated outcomes and predictions are accurate, reproducible and generalizable to cases outside the sample used to build the model, including new, unseen patients (Van Calster et al. 2023). Without stringent validation, AI models put patients and clinicians in a position to be misled, which endangers patient safety and sound clinical judgment.

This chapter sets out to analyze and evaluate with scrutiny the model validation techniques used in AI cancer research, with a focus on best practices in physiotherapy and oncology. It brings forth, what were previously overlooked, validation gaps and methodologies as well as defining and outlining best practices, governance and foresight towards AI that can clinically be deemed safe and reliable (Hafeez et al. 2024).

2.2. Role of AI in oncology and physiotherapy

Integrating AI into healthcare, especially in fields such as oncology and physiotherapy, transforms clinical practice in terms of operational efficiency, precision and individual attention. Through data harvesting and high-level calculations, the AI systems are converting the traditional methodologies of managing cancer identification, treatment and rehabilitation (Rasool et al. 2024). Such advancements streamline clinical processes and bolsters patient satisfaction and results through automated, up-to-the-minute decision-making based on relevant data.

2.2.1. AI in oncology

AI has a pivotal role in the oncological space, especially with the entire continuum of care: from cancer screening, detection and diagnosis, all the way to treatment planning and longitudinal follow-ups (Papachristou et al. 2023). Algorithms such as machine learning (ML) and deep learning (DL) are able to learn and make predictions from enormous datasets which include medical imaging data (CT, MRI, PET scans, etc.), genomics, histopathology, as well as EHR data which can either be structured or unstructured. With these models, earlier and more accurate detection of malignancies can be achieved because these models can uncover patterns and biomarkers that may be too subtle for human observers (Aftab et al. 2025).

AI aids in risk stratification by estimating the probability of someone developing cancer based on their genetics, lifestyle choices and environment. Predictive models that estimate the likely chemotherapy, immunotherapy or radiation response enhance personalized treatment planning, giving oncologists the ability to tailor regimens at the patient level (Sherani et al. 2024). In addition, AI tools help with clinical decision support by emulating tumor progression, optimal treatment sequence selection and side effect forecasting.

Other than making clinical AI-driven decisions, AI is being used in cancer screening programs (mammogram interpretation), cancer trial matching and real-time monitoring and tracking of disease progression (Khalifa et al. 2024). All of these capabilities tremendously contribute towards lowering the rate of diagnostic errors, delays in treatment and increasing the overall survival rate.

2.2.2. AI in physiotherapy

As part of cancer rehabilitation, physiotherapy encompasses a wide variety of functional and physical disabilities resulting from the malignant disease as well as from its therapy. The efforts towards personalizing and optimizing these rehabilitation strategies have substantially benefitted from AI. AI is capable of predicting individual patient trajectories using predictive modeling, which takes into consideration baseline functional status, treatment history, age and comorbidities (Terranova and Venkatakrishnan 2024).

Wearable sensors, accelerometers, inertial measurement units (IMUs) and video-based motion capture systems are increasingly being integrated into AI- powered platforms to evaluate patients’ movements, balance, range of motion and gait abnormalities (Tsiara et al. 2025). These parameters enable healthcare professionals to quantify and assess a patient’s motor function with respect to the recovery outcomes. Natural language processing (NLP) methods are also used to obtain information from patient-reported outcome measures and clinical notes, thus improving interfacing and follow-up planning automation (Upadhyaya et al. 2025).

In addition, AI can recommend and alter physiotherapy treatments as needed in real time. Smart rehabilitation platforms, for example, can provide virtual exercise sessions which increase or ease demand based on patients’ real-time performance and feedback. Such flexibility is especially helpful for cancer patients, who may have fluctuating states of energy, pain or post-treatment fatigue (Hussey et al. 2024).

With AI, remote rehabilitation monitoring fosters access to physiotherapy services for patients residing in remote or underserved regions, further reducing existing healthcare gaps within these regions.

Moreover, alerts powered by AI can inform healthcare professionals about potential risks such as the probability of falling or nonadherence to therapy, initiating timely action (Bhambri and Khang 2024).

Combining AI in physiotherapy and oncology signifies a comprehensive advancement towards proactive and bespoke medicine for patients and practitioners while claiming data-driven characteristics. Regardless, the effectiveness and safety of these technologies are fundamentally investigated through strong validation processes, which are discussed in the next sections.

2.3. AI model development pipeline in cancer research

Creating an AI model for cancer research, including its oncology and physiotherapy sections, follows a systematic model development pipeline rationale that integrates clinical framework value, precision, validity and model generalizability (Perez-Lopez et al. 2024).

There is order to everything – data collection, data processing, storage and even implementation – because when it comes to AI and healthcare, decisions are time-sensitive and of high stakes. This portion highlights the AI model pipeline skeleton alongside model validation significance in the ghastly flow of work.

2.3.1. Data collection

As with any pipeline, the first step is to gather information, and in AI, the first workflow stage is data acquisition. This phase involves the collection of various forms of clinical, biological and behavioral data including but not limited to:

Electronic health records (EHRs): a comprehensive database that stores demographic information, clinical history, medication records and outcomes of the patient’s treatment (Nordo et al. 2019).
Medical imaging: the use of MRI, CT, PET and ultrasound processes for tumor detection, staging and monitoring.
Genomic and molecular data: sequencing data alongside biomarkers and multi-omics profiles that pertain to both identifying and decoding the tumor.
Pathology reports: histology and cytology of biopsies.
Sensor and wearable data: motion, heart rate, gait or physical activity measuring devices, which are very useful in physiotherapy (Nascimento et al. 2020).
Patient-reported outcomes (PROs): results derived from surveys and questionnaires that pertain to pain, fatigue, mobility and the patient’s perception of life quality.

The dataset’s relevance and completeness are cornerstones to achieving any success with the AI model. An unbalanced, incomplete or biased dataset can lead to unfair conclusions and model trust issues (Zhu and Salimi 2024). As a result, ethical clearances, data governance policies, along with consent forms from the patient become critical at this stage.

2.3.2. Data preprocessing

Healthcare data is often unstructured, inconsistent and noisy. Data preprocessing is aimed at cleaning noisy and unstructured data to transform it into a usable format for ML algorithms (Nandan Prasad 2024). Typical data preprocessing procedures include the following:

Data cleaning entails managing erroneous data, attending to missing values and handling duplicates or inconsistent records.
Normalization and standardization indicates harmonizing values originating from different sources, dividing them by a common multiplier, or adjusting them to correlate with a specific reference point.
Segmentation and labeling involve recognizing regions of interest or boundaries of lesions in order to create accurate models.
Encoding categorical variables may entail changing words or categories into numbers so that ML algorithms can accept them.
Data augmentation is common in imaging where the dataset is artificially enlarged by techniques such as flipping, rotating or cropping images (Alomar et al. 2023).

These procedures are critical in minimizing biases, optimizing the model’s learning and increasing its general applicability across diverse institutions and patient populations.

2.3.3. Creating and preparing new features

Creating a new feature for a model entail choosing, extracting or creating new features that enhance the accuracy or the predictive strength of the model (Katya 2023). This can be achieved manually or through automated methods of feature selection or domain knowledge.

Manual feature engineering: by clinical knowledge such as the age of the patient, tumor stage or ECOG performance status.
Using statistical techniques: this can be done with univariate analysis, correlation or other simulation techniques such as PCA to determine the most crucial features.
Algorithmic-integrated selection: using Lasso or Ridge regression or tree methods that automatically choose features during the training phase.
Automated feature learning: CNNs and RNNs are capable of constructing hierarchical features from image or sequential data that is needed at different levels of a model (Dhruv and Naskar 2020).

Well-designed feature engineering improves not only the interpretability of the models, but also their performance. It can also lead to lesser costs for computation, chances of overfitting.

2.3.4. Computer-aided diagnosis system

After the definition of features, next steps involve choosing a suitable algorithm to train the model. The model in question has to undergo some tasks such as classification, regression and segmentation, which assist in determining the goal of the model and its data (Asgari Taghanaki et al. 2021). Algorithms that are mostly used in cancer research are the following:

Supervised learning has labeled data with features. These may include logistic regression, decision trees, random forests, and support vector machines, gradient boosting and neural networks.
If tasks require looking for patterns in data without preset labels, techniques such as k-means clustering, hierarchical clustering and PCA clustering can be applied under unsupervised learning.
CNNs, RNNs, LSTMs and transformers are all types of DL neural networks that are particularly useful for image-related tasks, genomics and sequential physiotherapy data analysis.
Learning at different stages, such as optimization of treatments over time, is an example of reinforcement learning, where systematic decision-making is used.

In supervised learning, a model optimally derives the relationship between the features and the problem’s label by minimizing a loss function. Optimum model training may require hyperparameter adjustments which can be done through grid search, Bayesian optimization and other techniques (Kornblith et al. 2021).

Overfitting occurs when there is a performance disparity between training data and new input data. A model performs well on training data, but poorly on real-world data. To counter this problem, a set of regularization techniques, including dropout, early stopping, L1/L2 penalties and validation are used to enhance performance.

2.3.5. Model validation

Checking the trained model against data it has not previously encountered is crucial for assessing its performance. This step in validation will determine its accuracy level, ability to be generalized and how stable it is under different conditions. The procedure can be subdivided into the following parts:

To simulate testing from outside sources, internal validation divides the original dataset into subsets where testing can be done. This can be done with k-fold cross-validation and bootstrapping.
Independent datasets from different hospitals, patient populations or geographic locations are incorporated into the system for external validation, where further generalization of the model is assessed.
Temporal validation evaluates a model’s performance using data from a different temporal window which is particularly important in fast changing scenarios such as cancer recovery or progression.
Prospective validation evaluates applications within clinical settings capturing actual use conditions.

Other metrics used to validate models include model accuracy, precision, recall, F1 score, ROC–AUC, calibration curves or decision curve analysis. In physiotherapy, primary outcome measures from patients may also include improvement in mobility, decrease in fatigue or increase in perceived independence (Naidu et al. 2023).

The validation step is particularly vital in clinical AI where incorrect predictions could result in delays for a diagnosis, unnecessary treatments or insufficient rehabilitation.

2.3.6. Model interpretation and explainability

Interpretability can be defined as the extent to which we can fathom the underlying reasons for a given decision of a model. In practice, this is critical in winning the confidence of clinicians and patients. Attention maps in neural networks, or SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations), all help to explain complex models (Band et al. 2023).

Explanations are necessary for clinicians not just to provide accuracy but also to fulfill ethical, legal or safety obligations. For example, an AI suggests a patient could be discharged early from physiotherapy, the model must provide reasoning for that.

2.3.7. Model deployment

After the model has been validated, it should be incorporated into the workflow of the clinic. Incorporation entails the creation of appropriate windows, integration with hospital information systems (e.g. PACS, EHR, etc.) and ongoing analysis of model performance. Deployment hurdles include:

Data privacy and security: protecting patient information under privacy laws such as HIPAA or GDPR.
Scalability and maintenance: rescaling models for various hospital setups, incorporating new data and updating to latest standards over time.
Clinician training and buy-in: convincing clinicians’ trust and confidence in the AI system.

Model surveillance after deployment – sometimes called “post-launch monitoring” – is essential for identifying model drift, which is a deterioration in performance resulting from changes in clinical practice, patient characteristics or data collection methodologies over time (Rajagopal et al. 2024).

2.4. Validation techniques

Model validation ensures the reliability and accuracy of AI in healthcare. It is crucial for evaluating a model’s accuracy, generalizability and clinical usefulness. Validation in physiotherapy and oncology aims at ascertaining that the model works well for varying patients and clinical situations in cancer. This part elaborates on methodologies of validation with a focus on oncology physiotherapy including internal validation, external validation and real-world validation along with their value and shortcomings.

2.4.1. Internal validation

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Tags: AI driven Innovations in Physiotherapy and Oncology 2

Mar 15, 2026 | Posted by admin in ONCOLOGY | Comments Off

Oncohema Key

Fastest Oncology & Hematology Insight Engine

Model Validation Techniques for AI in Cancer Research Based on Physiotherapy and Oncology

2
Model Validation Techniques for AI in Cancer Research Based on Physiotherapy and Oncology

2.1. Introduction

2.2. Role of AI in oncology and physiotherapy

2.2.1. AI in oncology

2.2.2. AI in physiotherapy

2.3. AI model development pipeline in cancer research

2.3.1. Data collection

2.3.2. Data preprocessing

2.3.3. Creating and preparing new features

2.3.4. Computer-aided diagnosis system

2.3.5. Model validation

2.3.6. Model interpretation and explainability

2.3.7. Model deployment

2.4. Validation techniques

2.4.1. Internal validation

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

Oncohema Key

Fastest Oncology & Hematology Insight Engine

Model Validation Techniques for AI in Cancer Research Based on Physiotherapy and Oncology

2.1. Introduction

2.2. Role of AI in oncology and physiotherapy

2.2.1. AI in oncology

2.2.2. AI in physiotherapy

2.3. AI model development pipeline in cancer research

2.3.1. Data collection

2.3.2. Data preprocessing

2.3.3. Creating and preparing new features

2.3.4. Computer-aided diagnosis system

2.3.5. Model validation

2.3.6. Model interpretation and explainability

2.3.7. Model deployment

2.4. Validation techniques

2.4.1. Internal validation

Related posts:

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree