Open access peer-reviewed chapter - ONLINE FIRST

Integrating Health Behaviour and AI/ML Theories: A Case for Pre-Screening Prediction in Industry-Sponsored Clinical Trials

Written By

Janine Zitianellis

Submitted: 15 December 2025 Reviewed: 21 January 2026 Published: 08 April 2026

DOI: 10.5772/intechopen.1014718

Machine Learning and Data Mining IntechOpen
Machine Learning and Data Mining Authored by Marco Antonio Aceves-Fernández

From the Annual Volume

Machine Learning and Data Mining [Working Title]

Marco Antonio Aceves-Fernández

Chapter metrics overview

6 Chapter Downloads

View Full Metrics

Abstract

Recruitment inefficiencies remain a critical barrier in clinical research, often leading to delayed timelines, underpowered studies, and reduced generalisability. While algorithmic solutions have improved enrolment logistics, many models overlook behavioural and time-sensitive factors that influence patient progression through the recruitment process. This study addresses these limitations by integrating behavioural theory, artificial intelligence (AI), machine learning (ML), and survival analysis into a unified recruitment optimisation framework. Patient pre-screening interactions (screener question and response pair) were mapped to Health Belief Model (HBM) constructs perceived susceptibility, severity, benefits, and barriers using Named Entity Recognition (NER) within large language models (LLMs). These behavioural features were incorporated into an XGBoost classifier to predict phone screening progression, yielding strong performance (AUC = 0.7398; accuracy = 67.04%). SHAP analysis revealed perceived barriers and composite social vulnerability indicators as dominant predictors, while severity and benefits contributed marginally, highlighting the value of embedding behavioural variables early in screening design. Kaplan-Meier survival analysis showed that the site contact probability declined sharply after referral, falling below 50% by day 11. This underscores the temporal fragility of patient engagement and the inadequacy of static models. Incorporating survival-derived probabilities into dynamic scoring enables timely prioritisation of referrals during peak responsiveness. This behaviourally informed, temporally adaptive screening framework enhances recruitment precision while supporting site-level efficiency. It advances methodological practice by aligning AI/ML capabilities with behavioural theory and offers practical, ethically grounded implications for improving patient recruitment in clinical trials. The findings contribute to both scholarly understanding and the evolving architecture of AI/ML within clinical trial recruitment.

Keywords

  • randomised controlled trials (RCTs)
  • artificial intelligence (AI)
  • natural language processing (NLP)
  • health belief model (HBM)
  • clinical trial recruitment
  • survival analysis (Kaplan-Meier)
  • machine learning (XGBoost)

1. Introduction

Randomised controlled trials (RCTs) represent a cornerstone of clinical and translational research, offering the most direct pathway to generating evidence-based healthcare interventions. Their capacity to influence practice and policy depends on achieving sufficient statistical power, which, in turn, relies on enrolling an adequate and representative participant population [1]. Despite advancements in eligibility algorithms and recruitment logistics, persistent inequities in participant demographics, particularly related to race, gender, age, and socioeconomic status, continue to undermine the generalisability and ethical robustness of trial findings [2, 3, 4].

To mitigate these shortcomings, recent research has explored the integration of Artificial Intelligence (AI) and Machine Learning (ML) into trial design and operations. These technologies offer the potential to enhance recruitment precision, reduce site burden, and improve trial timelines [5, 6, 7, 8, 9]. However, their deployment raises significant ethical concerns, especially when algorithms are trained on non-representative datasets or implemented without attention to social context. Without thoughtful design and inclusive frameworks, AI/ML-driven recruitment strategies may unintentionally widen existing disparities and erode trust in research processes [6, 10, 11, 12, 13, 14].

This research adopts a behaviourally informed approach to AI/ML-enhanced recruitment by integrating the Health Belief Model (HBM) into predictive modelling pipelines. The HBM is a well-established theoretical framework used to explain health-related decision-making based on perceived susceptibility, severity, benefits, and barriers [15, 16, 17, 18, 19]. By mapping pre-screening patient responses to these constructs using Natural Language Processing (NLP) and Named Entity Recognition (NER), the study develops a referral scoring system designed to capture both motivational and behavioural dimensions of patient decision-making.

In addition to psychological and attitudinal factors, operational inefficiencies also limit recruitment effectiveness. Delays in initiating contact with referred patients, often due to resource constraints at clinical trial sites, can significantly reduce the probability of successful enrolment [20, 21]. Internal site-level observations and industry metrics increasingly point to “time-to-contact” as a critical determinant of referral progression. However, few studies have incorporated temporal dynamics into predictive recruitment models, representing a notable gap in both practice and research.

By addressing both behavioural and temporal drivers of patient enrolment, this research contributes to a multi-dimensional, ethically grounded framework for enhancing recruitment effectiveness in industry-sponsored RCTs. The framework is designed to be scalable, data-driven, and inclusive, aligning with Good Clinical Practice (GCP) guidelines and the principles of equitable trial access [22, 23, 24, 25, 26]. In doing so, it responds to global calls—from the National Institutes of Health (NIH) and the Food and Drug Administration (FDA), among others—for more representative and ethically sound clinical research [24, 27, 28].

At a practical level, this approach provides a mechanism to reduce site burden, enhance the targeting of recruitment efforts, and promote fairness in patient engagement. It also holds promise for broader applications in digital health, particularly in the development of adaptive screening tools that account for both motivational and logistical barriers. By integrating behavioural theory and temporal modelling with AI/ML, the research proposes a novel strategy to support real-time, responsive, and equitable recruitment.

Informed by the principle that medical progress must align with technological capability and ethical responsibility [29], this research aims to address a critical operational and equity gap in patient recruitment. Specifically, it aims to develop a scalable, behaviourally informed AI/ML framework to enhance the quality and fairness of patient referrals in clinical trials.

The research draws on the HBM to model motivational influences on enrolment while accounting for the limitations of the model’s assumption of rationality and linear decision-making in complex trial settings [17, 30, 31, 32, 33]. It also examines how incorporating time-to-contact as a dynamic feature can improve recruitment prediction models and operational efficiency. The central research question guiding this study is as follows:

“How can integrating Health Belief Model constructs into AI/ML models enhance the prediction of successful clinical trial pre-screening?”

Two sub-questions support this overarching question:

RQ1.1: Which behavioural and contextual factors most strongly predict successful progression through clinical trial recruitment?

RQ1.2: How does the timing of site contact affect patient referral outcomes during the initial phone screening stage?

The following section critically reviews the theoretical foundations, methodological gaps, and empirical findings that inform this research, including applications of behavioural theory in recruitment, the evolving role of AI/ML in trial operations, and the ethical challenges of algorithmic decision-making in participant engagement.

2. Integrating health behaviour and artificial intelligence and machine learning theories within industry-sponsored clinical trials

Despite advancements in clinical research, a persistent gap remains in understanding how patients perceive clinical trial participation and the factors influencing their willingness to engage [34, 35, 36]. Fragmented evidence and inconsistent methodologies have limited the development of recruitment strategies that are both effective and equitable [1, 34]. To address this, the current study employs a triangulated literature review guided by the Monarch Standard Research Method (MSRM), integrating three theoretical domains: clinical trial design, health behaviour theory—particularly the Health Belief Model (HBM), and Artificial Intelligence/Machine Learning (AI/ML) methodologies (Figure 1). This approach supports the development of an adaptive recruitment model grounded in both operational rigour and health equity goals.

Figure 1.

Distribution of coded literature by theme and source class.

2.1 Clinical trial theories

RCTs are widely regarded as the gold standard in clinical research methodology, offering the most rigorous framework for evaluating the safety and efficacy of medical interventions. According to seminal works by Friedman et al. [3], RCTs are not only essential for medical advancement but also represent “the most definitive tool for evaluation of the applicability of clinical research.” In addition to randomisation, these trials are designed with strict methodological safeguards, including blinding and intention-to-treat analysis, to protect against bias and ensure internal validity. This approach also demonstrates sufficient statistical power to enhance external validity.

Success in clinical trials relies significantly on effective recruitment strategies, as engaging participants is a vital component of the process. A notable increase in the number of preclinical and early phase (Phase I and II) drugs suggests a strong innovation pipeline. Still, it also draws attention to stagnating Phase III numbers [1]. This raises concerns about the progression of drug development, particularly around recruitment, retention, and the overall trial feasibility.

Ethical trial design theory emphasises the importance of fairness, representativeness and transparency in participant selection [2]. Fundamental principles such as equipoise underscore the ethical imperative of genuine uncertainty in treatment allocation [3]. Methodological frameworks, such as the CONSORT guidelines [4], ensure transparency in reporting, while adaptive trial designs provide flexibility in addressing emerging evidence as the trial progresses [5]. Therefore, recruitment failures that result in demographic imbalances have implications for the generalisability of trials and for healthcare equity. The call for methodologically rigorous and ethically sound recruitment models is not new; rather, it is foundational to the discipline [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. Despite decades of advancement, many trials still suffer from poor external validity due to homogenous recruitment pools, suggesting that methodological rigour alone does not guarantee representativeness or public health impact. [17, 18, 19, 20, 21, 37].

The tangible consequences of weak recruitment infrastructures necessitate a shift toward scalable, adaptive recruitment models, particularly in industry-sponsored trials, where operational efficiency and participant diversity often conflict. This emphasises the urgent need for innovative, data-driven recruitment methodologies that align with established theoretical frameworks.

Resources such as ClinicalTrials.gov [38] provide researchers and sponsors with real-world data for benchmarking and planning [34, 39, 40]. In unison, initiatives like the Clinical Trials Transformation Initiative (CTTI) emphasise the importance of patient engagement in ethical trial design. Together, they play a pivotal role in shaping the field of patient engagement by addressing barriers and developing solutions to improve clinical trials, aiming for greater collaboration between patient groups and trial sponsors [41].

However, structural assumptions embedded within clinical protocols frequently overlook psychosocial variables, for example, perceived burden, trust, and motivational alignment that influence enrolment [35, 36, 42, 43, 44, 45, 46, 47]. Simultaneously, AI/ML-driven modelling approaches are being proposed to predict trial success and improve recruitment outcomes [27, 30, 48, 49, 50]. Thus, the integration of behavioural theory and AI/ML techniques guided by established trial design principles and governance is a logical evolution in patient recruitment methodology.

This research focuses on clinical trial theories and challenges in patient recruitment, emphasising the need for an interdisciplinary strategy that merges behavioural insights with predictive modelling. This concept will be further explored in the next section.

2.2 Health behaviour theories

Effective patient engagement hinges on understanding how individuals perceive the risks, benefits, and burdens associated with participating in clinical trials. Studies have identified multiple barriers, including social behaviours, cultural norms, and logistical constraints [9, 10, 12, 28, 51]. Despite this, the theoretical frameworks that underpin patient decision-making remain underutilised in patient recruitment models. Signorell et al. [22] emphasise the importance of consistent and detailed methodologies for gathering participant perspectives, highlighting that neglecting participant insights compromises the internal validity and reliability of clinical trials.

To address these gaps, seminal health behaviour theories provide foundational insights into individual decision-making regarding clinical trial participation. The Health Belief Model (HBM) elucidates individual health behaviours through subjective perceptions or concepts of risks, benefits, and barriers to action. The HBM posits that health behaviours are shaped by personal beliefs (perceptions) regarding health threats, benefits and/or barriers to action [23]. As such, individuals are more likely to engage in health-promoting behaviours when they perceive a significant health threat, recognise the substantial benefits of taking action, and encounter fewer barriers [47].

The HBM also offers a structured approach to identifying individual-level barriers and motivators, aligning closely with motivational factors critical when designing trial pre-screening questionnaires and during early patient recruitment phases [24, 36, 43]. The HBM therefore provides valuable guidance for crafting recruitment materials that emphasise perceived benefits, such as access to novel treatments, and reduce perceived barriers, including safety concerns and logistical inconvenience [25]. However, the HBM also has notable limitations, particularly its abstract nature, limited predictive power, and the omission of constructs related to social influence. Critics argue that the model overemphasises rational decision-making, neglecting the often irrational and emotional factors that influence health decisions [24, 26, 36, 52, 53].

In response, alternative models such as the Theory of Planned Behaviour (TPB), the Theory of Reasoned Action (TRA), the Integrated Behavioural Model (IBM), and the Transtheoretical Model (TTM) explicitly address social norms, behavioural intentions, and perceived behavioural control [29, 36]. The TPB and TRA both emphasise intention as a robust predictor of behaviour, significantly influenced by attitudes and subjective norms, providing comprehensive frameworks for addressing social contexts and individual capabilities. The TTM offers additional utility by conceptualising behaviour change as a progression through distinct stages, enabling interventions that can be tailored to participant readiness [31]. The assumption that patient referrals occur in a strict sequence may not accurately reflect real-world decision-making, which could result in inefficiencies when implementing interventions. While subjective norms may be less important during the initial stages of patient recruitment, they become vital in later phases, particularly when implementing targeted interventions and securing informed consent.

Given this limitation, the HBM proves particularly effective during the initial recruitment and pre-screening stages, as it emphasises individuals’ perceptions of risk and benefit. Recognising these advantages, the HBM was adopted as the primary behavioural framework for guiding feature engineering and model development. The HBM constructs presented in Table 1 can be successfully translated into structured, patient-reported features through Named Entity Recognition (NER) tasks. This approach enables the development of scalable and dynamic AI/ML applications. The structured domains of the HBM enable a direct mapping of patient-generated language into features, allowing ML techniques, such as XGBoost, to leverage nuanced behavioural indicators in predicting referral quality.

HBM constructs underpinning health beliefs
ConstructHealth belief definition
SeverityPerceived severity refers to a person’s belief about the seriousness or severity of a disease.
SusceptibilityPerceived susceptibility refers to a person’s belief about their chances of getting a specific condition.
BarriersPerceived barriers are obstacles that hinder behaviour change. They can be tangible (e.g., money, transportation, and childcare) or intangible (e.g., fear and embarrassment).
BenefitsPerceived benefits refer to a person’s opinion of the value or usefulness of a new behaviour (treatment) in lowering the risk of disease or improving their quality of life.

Table 1.

HBM constructs underpinning health beliefs.

The following section examines how AI/ML principles facilitate the translation of behavioural insights into predictive models that enhance patient recruitment, responsiveness, and scalability.

2.3 Artificial intelligence and machine learning theories

Despite rapid advancements, AI/ML have yet to achieve widespread clinical impact, particularly in patient recruitment for clinical trials [32, 48, 50]. As noted by Balagopalan et al. [54], this limited success is primarily attributed to a persistent focus on technical performance rather than on outcomes that tangibly improve patient engagement or trial efficiency. The authors argue for a realignment of ML efforts toward patient-centred objectives, calling for collaborative approaches that integrate the expertise of AI/ML researchers, clinicians and regulatory stakeholders to advance both health equity and operational relevance.

Within the clinical trial context, AI/ML offers significant potential to address recruitment inefficiencies. Predictive models, particularly ensemble methods such as Bootstrap Aggregation, Random Forest, and XGBoost, have consistently demonstrated robust performance across various domains [30, 33, 55]. However, these methods, despite their efficiency, must be carefully managed to avoid perpetuating existing biases as highlighted by Obermeyer et al., [56, 57]. Mitigation strategies, such as the Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN), are commonly used to address class imbalances, though they have the potential to reinforce existing structural inequalities in healthcare data and introduce artefacts that lead to overfitting and distorting representativeness. [58].

Ensuring the ethical deployment of AI/ML systems requires more than algorithmic accuracy; it also necessitates models that are grounded in patient diversity and regulatory compliance. Integrating AI/ML into patient recruitment must be accompanied by deliberate efforts to train models on diverse and representative datasets, supported by rigorous bias detection and correction strategies [54]. Moreover, the ethical deployment of such systems requires strict ethical compliance with regulatory frameworks, for example, the Helsinki Declaration [2] and the General Data Protection Regulation (GDPR) [59], which ensures the protection of patient autonomy and public trust.

A key challenge in operationalising these models lies in the extraction and interpretation of structured and unstructured clinical text, specifically trial pre-screening questions and responses. Recent advancements in Natural Language Processing (NLP), particularly Medical Named Entity Recognition (NER) within Large Language Models (LLMs), have enhanced the classification of clinical texts. LLMs such as BERT and BioBERT utilise transformer architectures to extract medical entities with contextual sensitivity [60, 61, 62, 63]. While these models contribute to accurately extracting relevant information, several limitations persist, such as the variability in clinical language, including colloquialisms and non-standard terminology, posing a challenge to model accuracy. Although models trained on domain-specific corpora address some of these issues, as noted by Alsentzer et al. [60] and Peng et al. [62], performance disparities across demographic and linguistic subgroups remain a concern [64].

Thus, AI/ML can operationalise these constructs by detecting behavioural patterns across large datasets. For example, AI/ML models can predict which individuals view themselves as being at higher risk and therefore more inclined to participate in trial initiatives. By utilising cues to action, such as reminders or customised messaging, AI/ML algorithms can refine outreach strategies and improve referral quality. Leveraging these insights, clinical recruiters can tailor their communication, address concerns, and enhance the likelihood of enrolment.

However, mapping extracted NLP entities onto abstract HBM constructs introduces challenges, necessitating consideration of various theoretical and practical dimensions, as well as clearly defined operational rules [23, 53]. Additionally, the interpretability of these models is crucial, as stakeholders need to understand how predictions are made and their relation to patient behaviour and decision-making. Interpretability is crucial; stakeholders must understand and trust the model’s outputs to make informed patient referrals. Therefore, techniques such as SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) are essential for elucidating predictions [65].

In addition, time-to-event (TTE) or survival analysis provides insights into patient recruitment dynamics that complement behavioural and ensemble models by quantifying the temporal progression from initial contact to consent or randomisation.

The Kaplan-Meier (KM) estimator is a non-parametric method for estimating survival probabilities in clinical settings, particularly where censoring is present [66, 67, 68]. Its transparency and robustness render it highly suitable for assessing the effectiveness of patient referrals to clinical sites. However, KM estimates rely on assumptions such as independent censoring and population homogeneity, which may not hold in real-world patient recruitment scenarios. For instance, KM estimates can become biased if dropouts or delayed responses are related to specific patient characteristics. Additionally, the step nature of KM curves may not support the smooth, dynamic predictions needed for real-time decision-making.

Alternatives such as Cox proportional hazards models or survival forests offer more nuanced predictions by incorporating covariates and handling heterogeneity [69]. As such, while KM estimators provide a valuable foundation, they should be paired with advanced methodologies, such as accelerated failure time models or ML-based survival models, to accommodate complex relationships and improve prediction accuracy.

Therefore, integrating AI/ML with behavioural theory and temporal modelling offers a robust, interdisciplinary framework for optimising patient recruitment. When aligned with established behavioural frameworks and ethical safeguards, AI/ML can meaningfully enhance both the quality and equity of clinical trial recruitment strategies. Success, however, hinges on transparent and interpretable models, equitable data representation, and rigorous methodological validation, as below.

3. Research methodology

3.1 Design

This study adopts a pragmatic research paradigm to address the complex operational and behavioural challenges associated with patient recruitment in industry-sponsored clinical trials. Grounded in a deductive, quantitative framework, the research employs a sequential case study design that enables granular, context-specific insight while preserving methodological robustness.

Although the study is fundamentally quantitative, its phased structure supports a stepwise examination of associations between behavioural constructs and recruitment outcomes, as well as the development and validation of predictive models. This alignment across data integration, sampling, and modelling enhances both coherence and interpretability.

3.2 Techniques and procedures

3.2.1 Data collection process

An analytics-ready dataset was constructed by integrating patient referral records collected from SubjectWell’s internal recruitment platform over a five-year period (April 1, 2020 - April 19, 2025), augmented with county-level socioeconomic indicators from the CDC’s 2025 Social Vulnerability Index (SVI) [70]. Variable definitions and field descriptions are documented in Appendix A (Table A1).

Additionally, a semantic information score was generated using transformer-based language models, specifically BERT and BioBERT by evaluating the trial’s paired pre-screening criteria and corresponding responses [60, 62, 71]. The objective was to quantify the overall informativeness of each interaction by measuring the extent to which the response semantically aligned with the question. Low information values are typically characterised by ambiguity, irrelevance, or syntactic incoherence and were algorithmically down-weighted.

The final score reflects the clarity, coherence and potential decision-making value for each patient referral instance and is included as a structured feature in the analytics-ready dataset. Concurrently, embedded within the same LLM framework, medically relevant terms were extracted using Medical Named Entity Recognition (NER). These terms, derived from the pre-screening criteria, were categorised according to the HBM constructs using a multi-class classification label (ensemble label), including diagnosis and treatment (perceived severity) and symptoms (perceived susceptibility).

The extracted features were aligned with clinical eligibility criteria and behavioural constructs derived from the HBM, facilitating the identification of relevant indicators such as perceived severity, susceptibility and barriers to participation. These procedures yielded an analytics-ready dataset suitable for stratified sampling, statistical modelling and predictive evaluation.

3.2.2 Sampling techniques

The research follows a retrospective cross-sectional design, with each observation representing a single pre-screening interaction. Although the data is not tracked longitudinally at the individual level, the temporal depth of the five-year dataset captures operational variability across recruitment waves and trial lifecycle stages [72, 73].

At the raw-data level, each record represents a single pre-screening interaction (screener question and response pair) within a referral instance; for modelling, interaction-level data were reshaped (transposed) into a unified, feature-aligned representation at the referral-instance level (unique patient and trial referral by disease classification).

To ensure representativeness, a stratified sampling approach was employed. Stratification was based on disease classification and self-reported ethnicity to preserve proportional subgroup representation and minimise sampling bias. This strategy was critical given the influence of sample characteristics on model accuracy and fairness, particularly the dependent variable’s class distribution and the number of independent variables considered.

Stratified sampling also supports reliable comparative subgroup analysis, which is essential for testing the equity implications of AI/ML-based recruitment models. Furthermore, this approach enhances the generalisability of the findings and improves model stability during train-test partitioning and holdout validation.

3.3 Ethical considerations

Ethical integrity in this research is governed by established frameworks, including Good Clinical Practice (GCP) [22], the Health Insurance Portability and Accountability Act (HIPAA), and the General Data Protection Regulation (GDPR) [59, 74, 75, 76, 77]. All data used in the study were fully anonymised prior to analysis. Common identifiers—such as names, residential addresses, and dates of birth—were excluded to ensure that no individual patient could be re-identified, even when linked with trial-specific data.

Data handling protocols adhered to data minimisation and role-based access principles, ensuring that only the necessary data was used for analysis and that access was restricted to authorised personnel. Moreover, ethical considerations were central to the integration of AI/ML methods, particularly regarding model fairness, interpretability, and the responsible handling of features related to race, ethnicity, and socioeconomic vulnerability.

3.4 Analytical workflow overview

The primary goal was to create a “Patient Referral Scoring” (PRS) model to predict the likelihood that a patient will complete the site phone screening, as illustrated in Figure 2.

Figure 2.

Patient referral recruitment process.

The analysis involved key stages: descriptive analysis for feature engineering using large language models and health behaviour theory; univariate analysis for missing values and outliers; bivariate analyses to explore relationships between the unit of analysis and categorical variables; multivariate logistic regression to understand interactions among factors; feature engineering specific to the HBM; predictive analysis using a Gradient-Boosted Ensemble (XGBoost) Model; and dynamic risk adjustment via Kaplan-Meier survival analysis. Insights from each stage informed previous phases, enabling continuous refinement of features and model structure.

3.4.1 Feature engineering using large language models and health behaviour theory

With contextual sensitivity, medically relevant terms, symptoms, diagnoses, treatment, and behavioural concerns were extracted by operationalising the HBM through Natural Language Processing (NLP) by leveraging Named Entity Recognition (NER) within large language models (LLMs) [60, 62]. Terms, derived from the pre-screening criteria, were categorised by applying the HBM constructs via a multi-class classification label (ensemble label), accommodating diagnoses, symptoms (perceived susceptibility and severity), and treatment (perceived benefits) as illustrated in Figure 3.

Figure 3.

Feature engineering using large language models and health behaviour theory.

Diversity and equity are critical considerations in trial participation [56]. The HBM does not reliably address these without introducing bias. This concern is addressed by the AI/ML modelling layer, which applies the Social Vulnerability Index (SVI) and minority group indicators, ensuring they are accounted for without compromising theoretical integrity.

3.4.2 Reproducibility and validation protocol

The final dataset was partitioned into stratified training and test sets using a 70/30 split to support internal validation. Stratification preserved outcome prevalence and subgroup composition across partitions, and partition comparability was assessed by confirming consistent label rates and subgroup distributions between the training and test sets, indicating minimal covariate or label shift [78]. This supports evaluation under realistic operational conditions and reduces the risk of performance inflation due to distributional artefacts.

Model development used the stratified training split and was evaluated using 5-fold cross-validation implemented via the XGBoost cross-validation routine (xgb.cv). AUC was used as the optimisation metric, with early stopping (10 rounds) applied under a maximum cap of 500 boosting rounds; the final model was fit using the cross-validated best iteration (best number of boosting rounds). Analyses were conducted using RStudio (v2026.01.0-392) with a fixed random seed (set.seed = 123). The unit of analysis was the referral instance (unique patient-trial pair), and stratification for the train-test partition was performed by disease classification and minority indicator. Key packages included xgboost (v1.7.5.1), caret (v6.0.94), mice (v3.16.0), survival (v3.8.3), SHAPforxgboost (v0.1.3), dplyr (v1.1.3), and ggplot2 (v3.5.1).

4. A case for integrating health behaviour and AI/ML for pre-screening prediction in industry-sponsored clinical trials

4.1 Data description

The final dataset comprises 5,436,136 observations and 69 variables, spanning demographic, clinical, socioeconomic, and site-level attributes across 543 distinct counties in the United States of America (US). Each patient referral record was categorised by trial identifier, disease classification, self-reported ethnicity, and structured responses to trial-specific pre-screening criteria, representing 398,879 unique patient trial referral IDs. This indicates that some patients were eligible for, and referred to, multiple studies over time.

Of the referrals, 23.3% successfully passed the site phone screening. The remaining 76.7% comprised unprocessed referrals, identified by missing finalisation dates; unsuccessful contact attempts, identified by missing contact dates but populated finalisation dates, and referrals that failed the phone screening. To support subsequent analyses of referral screening progression and attrition, these distinctions were preserved in the dataset.

4.2 Descriptive analysis and discussion

4.2.1 Univariate analysis: Distributions, missing and outlier value imputation

Univariate analysis examines and summarises a single variable in a dataset, focusing on central tendency and dispersion for continuous variables and frequencies for categorical and ordinal variables [79]. These descriptive statistics (Appendix B.1) provided critical insight into missingness patterns, outlier detection, and the need for encoding or transformation procedures to inform model development.

Prior to sampling, the integrated dataset underwent a rigorous data preparation pipeline. Observations missing trial protocol information (n = 65,574) were excluded, resulting in the removal of 5781 unique patient referrals across 126 studies. Implausible age values (≤ 0 or > 95) were corrected by imputing the mean age within each trial. Missing values for patient Body Mass Index (BMI) (n = 135,994), patient p_connect scores (n = 21,544), and distance to site (n = 18) were imputed using trial strata to minimise distortion. Outlier values in patient age (>75 years), BMI (>42 kg/m2) and distance were normalised using the interquartile range (IQR) method within trial strata [80]. The distribution of semantic information scores exhibited a mean of 0.660 (SD = 0.069), with the central 50% of values ranging between the 25th percentile (0.635) and the 75th percentile (0.703) and including the 90th percentile, which was ≥0.724.

Univariate analysis of key categorical variables revealed meaningful distributional characteristics relevant to modelling patient referral outcomes. Among the total pre-screening criteria categorised with medically relevant terms, 32.6% (n = 1,772,327) were labelled as “Diagnosis,” 26.5% (n = 1,438,875) to “Treatment,” 22.2% (n = 1,208,987) to “Other”, 18.3% (n = 993,929) to “Symptoms,” and 0.4% (n = 22,018) to “Behaviour.”

Patient-reported ethnicity values were standardised according to the OMB’s SPD15 Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity [74, 81]. The majority, 48.5% (2,636,261), of the observations are associated with White, followed by 27.9% (1,518,441) Black or African American, 14.5% (787,950) Hispanic or Latino, 2.6% (140,695) Asian, 2.2% (120,486) American Indian or Alaska Native, and 0.6% (33,028) Native Hawaiian or Pacific Islander. A small proportion, 0.1% (5726) of Middle Eastern or North African origin and 3.6% (193,549) of observations, have an unknown ethnicity. The protocol burden variable was skewed toward lower screening demands, with 51.9% classified as “Low,” 38.8% as “Medium,” and only 9.3% as “High.”

The source of patient information was primarily current medical data (72.2%), with the remainder derived from historical records (27.8%) (Infosource). At the time of analysis, 66.0% of referrals were linked to inactive studies. In comparison, 18.1% were associated with active studies that were open for recruitment and 15.9% corresponded to studies that were temporarily on hold due to site constraints.

4.3 Exploratory analysis and discussion

4.3.1 Bivariate analyses: Explore the relationship between the unit of analysis and categorical variables, and multicollinearity

The subsequent analysis focused on the subset of patients who advanced to the site phone screening stage. To validate the observed relationships and facilitate the modelling of relevant outcomes, the dataset was refined to include only those with a recorded phone screening date. This adjustment resulted in 2,528,230 observations, representing 173,632 unique patient trial IDs. This resulted in a decrease in the geographic reach from 544 to 517 counties, the number of studies from 267 to 260, and a significant reduction in the number of sites from 3418 to 3141. The ethnic composition shifted marginally, with an increased proportion of Black or African American patients to 29.9% (up from 27.9%), a reduced proportion of White patients to 47.0% (down from 48.5%), and a reduced proportion of patients of unknown ethnicity to 2.9% (down from 3.6%).

Following this, bivariate analyses examined relationships between the unit of analysis and key categorical variables, assessed potential multicollinearity, and laid the foundation for subsequent multivariate modelling.

4.3.2 Pearson’s chi-square test of independence, and Cramér’s V strength of association

To examine the bivariate associations between categorical predictor variables and the unit of analysis, a series of Pearson’s Chi-square tests of independence were conducted [82]. Assuming independence, these tests assessed whether observed distributions across levels of key categorical variables, such as ethnicity and disease classification (as defined by the disease category (MeSH) heading) and protocol burden, and differed significantly from expected distributions. Given the large sample size (n = 2,528,230), all tests yielded a p-value <2.2e−16, indicating statistical significance.

However, due to the Chi-square statistic’s sensitivity to sample size, Cramér’s V was calculated by adjusting for both sample size and degrees of freedom, providing a standardised metric for interpreting the magnitude of association [83]. The results reported in Table 2 revealed that variables such as trial burden (V = 0.1262) and compensation (V = 0.0587) demonstrate moderate and weak practical effects, respectively. In addition, variables, including ethnicity (V = 0.0128) and patient minority identification (V = 0.0128), displayed negligible effect sizes.

Bivariate analysis, chi-square and Cramér’s V statistics
VariableChi-square (X2)DfCramér’s VEffect size interpretation
Ethnicity2721.670.0128Negligible
Disease classification
(mesh heading)
120,327260.0435Negligible to weak
HBM constructs
(ensemble label)
3994.940.0251Negligible to weak
Info source (binary)836.310.0182Negligible
Trial burden
(protocol screening burden)
80,30320.1262Moderate
Minority indicator (binary)414.510.0128Negligible
Trial compensation (binary)8721.910.0587Weak

Table 2.

Bivariate analysis, chi-square and Cramér’s V statistics.

These findings underscore the importance of supplementing statistical significance with effect size metrics to avoid overinterpretation.

While chi-square tests revealed statistically significant associations between various categorical predictors and referral outcomes, the accompanying Cramér’s V coefficients indicate that many of these relationships are weak. This highlights a common challenge in large-scale observational studies where statistical significance may stem from sample size rather than meaningful effects.

For instance, although the chi-square results for ethnicity and minority identification were significant, their negligible effect sizes (Cramér’s V = 0.013) suggest limited practical influence on referral progression. Conversely, the moderate effect size for protocol screening burden (Cramér’s V = 0.1262) aligns with the expectation that structural barriers have a significant impact on patient engagement.

These findings emphasise the need to incorporate both significance testing and effect size estimation to assess variable relevance, ensuring that variables selected for predictive modelling reflect both statistical associations and meaningful impacts.

4.3.3 Pearson’s and Spearman’s rank (rho) correlation

The Pearson correlation method was used to assess the strength and significance of multicollinearity among the independent variables [84]. A drawback of this parametric method is the requirement for normally distributed data. Thus, to further support the initial correlation, non-parametric methods such as Spearman’s rank correlation (rho) were employed [82, 85].

Results from the Pearson correlation matrix indicated that most variable pairs exhibited weak to negligible linear associations (r < |0.30|), suggesting a low risk of multicollinearity. However, stronger linear dependencies were identified among a subset of variables from the Social Vulnerability Index (SVI). Notably, the estimated proportion of adults without a high school diploma (EP_NOHSDP) and individuals with limited English proficiency (EP_LIMENG) demonstrated a strong positive correlation, with a coefficient of r ≈ 0.80. Similarly, adults without a high school diploma and the estimated proportion of individuals in minority groups (EP_MINRTY) exhibited a positive correlation of r ≈ 0.77, indicating the presence of latent demographic clusters.

In the context of trial design variables, protocol duration and protocol site visits were moderately positively correlated with a coefficient of r ≈ 0.44, suggesting interdependence in structure. The Spearman’s rho correlation matrix, which evaluates monotonic relationships regardless of data distribution, confirmed these patterns and revealed slightly stronger associations among the variables from SVI. For instance, individuals with limited English proficiency and individuals in minority groups displayed a high rank correlation (ρ ≈ 0.82). Additionally, the estimated proportion of individuals burdened by housing costs (EP_HBURD) was positively correlated with both uninsured individuals (EP_UNINSUR) and the unemployed (EP_UNEMP) (ρ > 0.50), indicating overlapping social vulnerability profiles.

Moreover, interaction terms such as inter_protocol_distance and distance from site consistently showed associations across both methods (Pearson r = 0.50; Spearman ρ = 0.64), warranting caution in their concurrent inclusion during multivariate modelling due to potential redundancy. In summary, the consistency between the Pearson and Spearman matrices (Figure C1) suggests a stable correlation structure, with only a limited number of variable clusters posing a risk of multicollinearity.

4.3.4 Multivariate analysis: Explore interaction terms and the combined effects of multiple factors on the unit of analysis

The multivariate analysis was conducted employing a logistic regression model, estimated via a Generalised Linear Model (GLM) with a binomial distribution and logit link [84, 86, 87]. The GLM was designed to investigate multivariate associations with patients progressing to the phone screening stage, rather than for predictive deployment. Its discriminatory capacity, evaluated using the area under the curve (AUC), achieved a fair value of 0.662, indicating that the GLM can distinguish between those who passed the phone screening and those who did not with fair accuracy.

Further evaluation of the confusion matrix reveals a balanced accuracy of 61.6%, with sensitivity of 60.7% and specificity of 62.4%. The model correctly classified 65.3% of positive predictions (positive predictive value) and 57.7% of negative predictions (negative predictive value). Notably, the overall accuracy of 61.5% was significantly greater than the no information rate (53.8%), as confirmed by a McNemar’s test (p < 2.2e−16). This suggests that the model adds meaningful value relative to a naïve classifier.

In addition, several statistically significant predictors (p < 0.05) of patient progression to the phone screening were also identified employing the odds ratios (ORs), computed by exponentiating the model coefficients, which quantify the change in odds of passing the phone screening associated with a one-unit increase in each predictor, holding all else constant. Where OR > 1, the predictor is associated with increased odds of progression; where OR < 1, the predictor is associated with decreased odds. A complete summary of the odds ratios and their confidence intervals is presented in Appendix E.

4.3.5 Key predictors of a successful phone screening

Several variables emerged as strong and statistically significant predictors (p < 0.05) of favourable referral outcomes. Patients linked to clinical trials with low (OR = 2.75, 95% CI: 2.71–2.79) or medium (OR = 2.63, 95% CI: 2.60–2.66) screening burden had substantially increased odds of passing the phone screening compared to those with high-burden protocols. Similarly, the overall social vulnerability (RPL themes) had the most pronounced effect (OR = 5.38, 95% CI: 4.55–6.36), suggesting that patients from highly vulnerable communities were over five times more likely to pass the site phone screening.

Disease classification also showed differential effects, with elevated odds associated with chemically induced disorders (OR = 5.33), behavioural and behavioural mechanisms (OR = 3.49), physiological phenomena and diagnosis (each OR = 3.27), as well as congenital, hereditary, and neonatal disorders (OR = 2.94).

These findings suggest a strong correlation between disease category and screening eligibility, which may be attributed to either protocol-specific inclusion criteria or patient motivation for participation in the screening process and thus, an RCT.

4.3.6 Construct-level predictors based on health belief model alignment

Each of the four HBM constructs, i.e., symptoms (OR = 1.60), treatment (OR = 1.53), diagnosis (OR = 1.50), and others (OR = 1.47), is associated with an increased odds of progressing to the phone screening. For patient referral scoring, these results affirm the utility of semantic classification approaches that map responses to HBM constructs.

4.3.6.1 Sociodemographic and site connectivity variables

Several sociodemographic indicators from the CDC SVI were statistically significant, albeit with more modest effect sizes. For example, increased local poverty (EP_POV150, OR = 1.06) and crowding (EP_CROWD, OR = 1.07) were positively associated with progression. In contrast, higher proportions of youth (EP_AGE17, OR = 0.92), individuals with disabilities (EP_DISABL, OR = 0.95), or those from single-parent households (EP_SNGPNT, OR = 0.94) were negatively associated with phone screening outcomes.

Significant disparities were observed across racial and ethnic categories. Compared to a reference group (American Indian or Alaska Native), patients identifying as Asian (OR = 0.84), Black or African American (OR = 0.87), Hispanic or Latino (OR = 0.88), Middle Eastern or North African (OR = 0.83) were significantly less likely to pass the phone screening, indicating potential inequities in referral quality or eligibility alignment that merit further investigation.

Patient connectivity, operationalised through the patient p_connect score, was also a notable predictor (OR = 0.59, 95% CI: 0.58–0.61), confirming that weaker initial patient contact significantly decreases the likelihood of site screening success.

4.3.6.2 Variables with limited or null effect

Variables such as total population (E_TOTPOP), protocol screening visits, protocol duration, distance from the site and specific educational indicators, e.g., estimated proportion of adults without a high school diploma (OR = 1.01), and estimated proportion of individuals with limited English proficiency (OR = 0.96) exhibited odds ratios close to 1.00, suggesting low predictive influence when adjusted for other variables. Additionally, patients classified as having an unknown ethnicity (OR = 1.00, p = 0.94) and/or the number of trial screening visits (OR = 1.00, p = 0.46) were not statistically significant.

These findings are contradictory, as extensive literature has established that both lower educational attainment and language barriers are significant impediments to clinical trial participation, particularly in underrepresented populations [88, 89]. Their apparent lack of predictive strength may be attributed to multicollinearity or overlapping variance shared with other socio-contextual predictors, as evident in the bivariate correlation analysis findings.

To investigate the dimensional structure of these overlapping socioeconomic indicators and gain a deeper understanding of their latent relationships, an exploratory factor analysis (EFA) was subsequently conducted and discussed in the following section.

4.3.7 Health belief model specific feature engineering and refinement

Underpinned by insights from the preceding analyses and to support predictive modelling, patient-level features were engineered to align with the HBM, ensuring the inclusion of relevant and discriminative variables that capture perceived susceptibility, severity, benefits, and barriers (Table F1). An exploratory factor analysis (EFA) [90] was conducted to construct a latent variable that captures perceived trial burden, a core component of the HBM barriers-to-action domain.

The final model retained three objective indicators: trial duration, visit frequency and total number of visits. The analysis yielded a strong unidimensional solution with a cumulative variance of 55.3%. The retained items loaded meaningfully (≥0.45), with duration (loading = 0.998) emerging as the dominant factor. This latent perceived barrier (BARR) construct was operationalised as a continuous variable and integrated into subsequent predictive modelling. Appendix D provides a detailed overview of all feature transformations, interaction terms and theoretical mappings.

To represent each HBM construct independently and enable patient-level analysis where multiple entries were recorded per patient, the original data structure was restructured so that all HBM domains were consolidated into a single observation per patient. In this transformed format, each HBM construct was represented as a distinct variable within an integrated record.

However, due to inconsistencies in pre-screening criteria across studies, several HBM construct values were missing. Missing data were handled using Multiple Imputation by Chained Equations (MICE) with predictive mean matching (method = ‘pmm’) and five iterations (maxit = 5) to preserve the underlying data distribution while minimising imputation bias [91]. To support imputation stability and reduce sparse-cell artefacts across clinical domains, imputation was conducted within strata defined by ethnicity and disease classification. This stratification was used for missing-value handling and is distinct from the train-test partitioning stratification used for model evaluation. In addition, imputation was performed prior to partitioning to improve construct completeness within clinical-domain strata; model evaluation, therefore, reflects internal validation and should be complemented by prospective/temporal validation.

To validate the imputation method, the standard deviations before and after imputation were compared across key interaction variables (Appendix F.2). Minimal differences were observed, indicating that the imputation process preserved the original dataset’s variance structure.

This restructuring preserved the conceptual distinctions between HBM constructs while ensuring compatibility with statistical modelling procedures that require a unified, feature-aligned representation for patient referral observation.

4.4 Predictive analysis and discussion

Following feature engineering, a gradient-boosted ensemble (XGBoost) model with cross-validation is applied to estimate the probability that a patient will successfully pass the phone screening phase. This XGBoost model capitalises on the nuanced feature space created by the LLM to provide a robust, non-linear prediction of screening success [92]. Model performance was evaluated using standard classification metrics, including AUC and a confusion matrix. Feature importance was assessed using SHAP values [93] to enhance model interpretability and support stakeholder confidence in model decisions.

4.4.1 Model training and cross-validation

The final XGBoost model was developed using a stratified 70/30 train–test split (disease classification and minority indicator). Model selection was conducted using five-fold cross-validation on the training data, with AUC as the optimisation metric and early stopping (10 rounds). Cross-validation yielded a mean training AUC of 0.7403 (SD = 0.0008) and a mean validation AUC of 0.7348 (SD = 0.0022), indicating stable performance across folds and minimal evidence of overfitting. Early stopping selected the cross-validated best iteration (optimal number of boosting rounds), subject to a maximum of 500 boosting rounds.

4.4.2 Model evaluation and interpretability

Evaluation of the test set (n = 96,281) confirmed the model’s moderate discriminative ability, achieving an AUC of 0.7398 and an overall accuracy of 67.04% (95% CI: 66.74–67.33%). The model significantly outperformed the no information rate (51.16%) with a p-value <2e–16. Sensitivity (66.19%) and specificity (67.84%) were well-balanced, indicating consistent classification across positive and negative outcomes. The balanced accuracy of 67.02% accounts for class distribution imbalances, while the Kappa statistic of 0.3403 reflects fair agreement beyond chance. McNemar’s test yielded a p-value of 0.7448, suggesting no significant prediction bias across the two outcome classes.

The model’s performance, disaggregated by disease classification, reveals substantial heterogeneity in predictive performance across disease areas, underscoring the interaction between behavioural constructs and disease typology as observed in Figure 4.

Figure 4.

Patient referral scoring model by disease classification.

An evaluation of the confusion matrix by disease classification (Appendix G) reveals an overall accuracy ranging from as low as 58% for musculoskeletal diseases to as high as 90% for chemically induced disorders, with corresponding variability in sensitivity (1–100%) and specificity (2–100%).

Notably, high sensitivity was observed in domains such as behaviour and behaviour mechanisms (100%), diagnosis (99%), and congenital, hereditary, and neonatal disorders (95%).

In contrast, stomatognathic diseases and virus diseases yielded critically low sensitivity (≤3%) and exhibited near-perfect specificity. Domains such as digestive system diseases and skin and connective tissue diseases exhibited strong specificity (≥92%) but poor sensitivity (≤20%), reflecting a conservative model bias that favours minimising false positives at the expense of false negatives.

4.4.3 Feature importance

To further interrogate the model’s internal decision-making logic, a feature importance analysis was conducted using XGBoost’s gain metric. The results, summarised in Table 3, revealed that the HBM perceived barriers construct was the most influential predictor, contributing over 55% of the total gain. Disease classification (MeSH Headings) and overall social vulnerability (RPL Themes), representing clinical and socioeconomic indicators, were the following most predictive features, contributing 20.6 and 18.5%, respectively.

Patient referral scoring model feature importance
FeatureGainCoverFrequency
Barriers (BARR)0.552674770.429823330.31647157
Disease classification (mesh heading)0.206178770.141673670.12249164
Overall social vulnerability (RPL themes)0.185243210.272408380.27689521
Susceptibility (SUSC)0.020122690.073889780.09921962
Benefits (BEN)0.016626850.025559040.08584169
Severity (SEV)0.015638410.036637730.08430881
Minority indicator (MINRTY_IND)0.00351530.020008070.01477146

Table 3.

Patient referral scoring model feature importance.

The remaining HBM constructs, that is, perceived susceptibility (SUSC) and perceived severity (SEV), showed moderate contributions (2.0 and 1.6%), while perceived benefits (BEN) contributed slightly more at 1.7%. The minority indicator exhibited a minimal gain (0.35%), which, conceptually, is significant for equity considerations; however, it did not independently influence model decisions.

Overall, the importance matrix validates the conceptual alignment between the PRS model and HBM constructs, providing valuable insights for targeted intervention design, and demonstrates that barriers and indication-specific attributes predominate in the referral progression process.

4.4.4 SHAP-based feature evaluation and clustering

SHAP values were computed for all features using the final XGBoost model to quantify the contribution of each variable to the predicted probability of passing the phone screening. For each patient referral, local SHAP values were calculated and summarised across features using mean absolute values to indicate global importance. To interpret the contribution of individual predictors to the model’s classification output, SHAP values were computed and summarised in a global SHAP beeswarm plot (Figure 5). Feature importance was ranked by the mean absolute SHAP value across all instances, and the plot displays both the magnitude and direction of features’ effects on the predicted probability that a patient will pass the phone screening stage.

Figure 5.

Patient referral scoring model SHAP Beeswarm plot.

Figure 5 shows that perceived barriers outperform all other features with an average SHAP contribution of 0.398. Disease classification and overall social vulnerability were followed by mean SHAP values of 0.215 and 0.155, respectively. All remaining HBM construct variables, e.g., perceived susceptibility, severity, and/or perceived benefits, contributed to lower average impact values, ranging from 0.022 to 0.046. Notably, the minority indicator had minimal standalone predictive contribution, confirming its limited direct influence on the trained model.

In addition, the distribution of SHAP values across individual patients also highlighted heterogeneity in feature effect: perceived barriers and overall social vulnerability exhibited wide SHAP ranges (up to ±2.5). In contrast, features such as perceived severity and benefits showed tightly bound distributions near zero.

To investigate the interaction between disease classification, overall social vulnerability and HBM constructs, a two-way matrix of mean absolute SHAP values was constructed. The resulting matrix was scaled column-wise and visualised using a dendrogram heatmap (Figure 6), with hierarchical clustering applied to both rows (disease classifications) and columns (features). This clustering grouped disease classifications with similar referral behaviour patterns, and the columns reflected shared SHAP value profiles among features. Clustering was based on the Euclidean distance and the complete linkage method.

Figure 6.

Patient referral scoring model dendrogram heatmap.

Figure 6 also shows that disease classifications, such as musculoskeletal and neural physiological phenomena, psychological phenomena, and bacterial infections, cluster together. Similarly, chemically induced disorders, hemic and lymphatic, as well as eye diseases, clustered together. Some leaf nodes are distantly clustered, as is the case with female urogenital diseases and congenital disorders, suggesting unique SHAP patterns.

4.4.5 Dynamic risk adjustment, Kaplan-Meier survival analysis

To incorporate temporal dynamics into the PRS framework and prevent information contamination across modelling stages, a reserved independent hold-out cohort of active in-pipeline referral instances (n = 14,146) was used for survival-based calibration. This cohort was excluded from all PRS training, tuning, and internal testing, which were conducted exclusively on finalised referral instances. Because referral lifecycle states are mutually exclusive at the referral-instance level, the PRS development dataset and the Kaplan-Meier (KM) calibration cohort do not overlap.

Kaplan-Meier survival analysis was then applied to estimate the conditional probability of referral contact as a function of time since the original referral date. As a non-parametric approach, KM modelling captures time-to-contact degradation without imposing distributional assumptions, making it well suited to operational recruitment data where dropout patterns are irregular and right-censoring is common [67, 94, 95].

5. Validity and reliability

Multiple layers of statistical and procedural rigour addressed model validity and reliability.

5.1 Construct validity: Generalisability across conditions

Construct validity was assessed by testing whether HBM construct distributions differed meaningfully across disease classifications (known-groups differentiation). The nonparametric Kruskal-Wallis rank-sum test [96] was used to evaluate differences in HBM construct distributions across 27 disease classifications. The test yielded highly significant results across all constructs (p < 2.2e–16), demonstrating that perceptions of barriers, susceptibility, severity, and benefits varied meaningfully by condition. These findings (Table 4) support the construct differentiation and contextual sensitivity of the underlying behavioural features, bolstering both the ecological and theoretical validity of the model.

Kruskal-Wallis rank test statistics
HMB constructχ2dfp-value
Barriers (BARR)76,62226< 2.2e-16
Susceptibility (SUSC)35,48226< 2.2e-16
Severity (SEV)38,67426< 2.2e-16
Benefit (BEN)38,50726< 2.2e-16

Table 4.

Kruskal-Wallis rank test statistics.

5.2 Reliability: Cross-validation (stability and reproducibility)

Reliability was assessed through five-fold cross-validation, yielding a mean training AUC of 0.7403 (SD = 0.0008) and a mean test AUC of 0.7348 (SD = 0.0022). The narrow standard deviations indicate high performance consistency and negligible variance across folds, suggesting robust reproducibility. The optimal number of boosting rounds was identified as 500, reflecting the model’s stable learning trajectory and convergence.

5.3 Structural validity: Robustness across data representations

To assess model robustness across feature architectures, predictions were compared between wide- and long-format representations of HBM domains. While overall accuracy remained comparable, key differences in sensitivity and specificity were observed, reflecting trade-offs between parsimony and granularity. The wide-format favoured specificity, while the long-format improved sensitivity, particularly in low-prevalence conditions. These results underscore the structural validity of the behavioural feature set and support context-sensitive deployment. Full benchmarking results are provided in Appendix I.

5.4 Bias and fairness: Class imbalance and interpretability checks

To ensure equitable performance across outcome classes, McNemar’s test was applied to assess symmetry in misclassifications. The resulting p-value (0.7448) indicated no statistically significant prediction bias, affirming parity in error rates between positive and negative cases. Further fairness validation was conducted via SHAP value decomposition, which revealed alignment between high-impact features and theoretically grounded HBM domains, rather than spurious or proxy variables. The model was also trained using a stratified sampling strategy based on disease classification and minority group status to preserve equity in representational learning.

Overall, the model demonstrates strong internal validity, domain-general external validity, construct sensitivity, and high reliability, underpinned by consistent statistical performance and interpretable feature contributions. Importantly, these findings reflect the conceptual integration of behavioural theory into an ML pipeline capable of supporting operational improvements in clinical recruitment. However, to ensure clinical utility, the model must undergo continuous validation against site-level outcomes, periodic retraining with real-world data and sensitivity analyses to balance referral volume with predictive precision [97].

5.5 Time-to-event validity: Survival model discrimination

To evaluate the discriminative performance of the time-to-event component, the Concordance Index (C-index) was used. The C-index assesses the model’s ability to correctly order pairs of patients by risk, while accounting for censored observations [98]. Theoretically, this measures the likelihood that, for any randomly selected pair of people, the individual with the higher predicted risk will experience the event (such as loss of contact eligibility) before the other patient. A C-index of 0.5 indicates random chance, whereas a value of 1.0 denotes perfect discrimination [99].

The KM model achieved a C-index of 0.987, indicating high concordance between predicted and observed referral outcomes over time. This result supports the model’s validity in capturing time-sensitive degradation in referral quality.

6. Research findings and interpretation

This section presents the results of the predictive modelling and survival analyses conducted in answer to the primary research question and its two sub-questions. Findings are interpreted within the theoretical framework, emphasising the integration of the HBM with AI/ML techniques.

6.1 The primary research question

This research aimed to explore the primary research question: How can integrating HBM constructs into AI/ML models enhance the prediction of successful clinical trial site screening?

The results show that integrating HBM constructs with AI/ML techniques meaningfully improved both the quality and equity of patients recruited for clinical trials. From a quality perspective, the inclusion of behavioural features enabled nuanced segmentation of patients based on motivational, perceptual, and structural influences, moving beyond static demographic or clinical eligibility filters. SHAP value analysis revealed that contextual variables, particularly perceived barriers, play a pivotal role in shaping referral predictions, providing actionable insights for refining recruitment strategies.

From an equity perspective, the model demonstrated consistent performance across underserved subgroups, thereby reducing the risk of algorithmic exclusion. This suggests that embedding behavioural theory into ML pipelines can counterbalance the systemic biases often inherent in historical clinical trial datasets.

Finally, the differential performance across disease categories underscores the need for condition-specific behavioural profiling. This opens new avenues for ethical, adaptive recruitment practices that are both data-driven and person-centred, supporting broader clinical trial participation and better alignment with public health objectives.

6.2 The predictive value of HBM pre-screening features

Which behavioural and contextual factors most strongly predict successful progression to the site screening phase within a clinical trial? Hierarchical clustering of SHAP values reveals two dominant behavioural feature groupings, directly addressing the questions as to which factors, derived from patient responses, best predict successful progression through the recruitment process (Research question RQ1.1).

First, perceived barriers, overall social vulnerability, and perceived severity cluster tightly, suggesting a shared influence across clinical domains and are likely tied to protocol burden or screening logistics. In contrast, perceived susceptibility and benefits cluster separately, indicating that they operate through distinct perceptual pathways and may capture more subjective, or belief-oriented variance.

The interpretability of the model’s behaviour was explored using both global SHAP summary plots and a condition-stratified heatmap of mean SHAP values by disease classification. Together, these maps offer insights into both the overall structure of feature influence and how feature importance shifts across different clinical domains. This dual analysis reinforces the behavioural and contextual heterogeneity in trial recruitment and provides empirical support for applying the HBM framework.

The global SHAP summary plot confirms the theoretical and operational dominance of perceived barriers, which exhibited the highest average contribution to model predictions (mean SHAP: 0.398). Barriers with high values consistently suppressed the likelihood of phone screening success, underscoring the primacy of logistical, emotional or procedural obstacles in shaping patient recruitment.

The second and third-ranked features, that is, disease classification and overall social vulnerability, highlight the complementary importance of clinical context and linguistic framing in the model. Overall social vulnerability suggests that how patients articulate their interest or concerns, as captured through NER and thematic analysis, meaningfully shapes algorithmic assessments of patient referral quality.

In contrast, features representing HBM constructs, such as internal belief systems like perceived susceptibility, severity, and benefits, offered only limited predictive power. This may indicate either a restricted expression of these constructs in the pre-screening context or a more general trend where structural and procedural factors have a greater influence than internal health beliefs during the referral stage. The minority indicator showed negligible predictive contribution, which reduces concerns of disparate model impact by ethnicity and affirms the model’s focus on behavioural rather than demographic signals.

To further explore these dynamics across medical domains, a SHAP heatmap was constructed, mapping the mean absolute SHAP values for each feature against corresponding disease classification. Hierarchical clustering applied to both rows (disease classification) and columns (features) revealed two major patterns. First, perceived barriers, overall social vulnerability, and perceived severity formed a tightly clustered feature group, suggesting that perceived barriers, thematic content of responses, and perceived severity often co-occur as dominant signals across various conditions. In contrast, perceived susceptibility and benefits clustered separately, reflecting less consistent influence across clinical contexts and reinforcing their subordinate role in model behaviour.

The disease classification row identified clinically meaningful groupings of disease categories with similar SHAP value profiles through clustering. For instance, the neurological, psychological, and infectious disease categories formed a coherent cluster characterised by a more substantial influence from perceived barriers and severity, suggesting a heightened concern over procedural burden or clinical seriousness. Conversely, female urogenital diseases and congenital disorders exhibited more idiosyncratic SHAP patterns, pointing to unique contextual or demographic influences not captured by other domains. These findings suggest that trial recruitment dynamics vary substantially across indications, reinforcing the need for disease-specific referral strategies.

In summary, the combined SHAP visual analyses offer a multidimensional understanding of model decision-making. They confirm that perceived barrier-related perceptions and language-based response patterns dominate predictive behaviour, while traditional health beliefs play a secondary role. Moreover, the variability of SHAP influence across clinical conditions highlights the importance of stratified recruitment strategies.

These findings validate the integration of the HBM into the XGBoost model and suggest that perceived burden and disease classification framing are central levers for improving patient referral conversion and equity in industry-sponsored trials.

6.2.1 Correlation analysis of HBM constructs across disease domains

Spearman’s rank-order correlation was employed to examine monotonic relationships among HBM constructs across disease classifications (Figure 7). This analysis revealed consistently strong and statistically robust inverse correlations between perceived benefits and barriers, as well as severity and susceptibility, with coefficients often exceeding −0.95. This suggests a cognitive trade-off when patients evaluate risks versus benefits during pre-screening.

Figure 7.

HBM constructs Spearman’s rho correlation by disease classification.

Conversely, perceived severity and susceptibility were highly positively correlated (often >0.98), indicating strong convergence in patients’ perceived disease threat. The perceived barriers dimension exhibited weaker and more variable associations, notably when correlated with perceived benefits and severity, where coefficients rarely exceeded ±0.25 in specific disease categories (e.g., mental and musculoskeletal disorders).

These findings indicate construct-specific differentiation and support the theoretical distinction between threat appraisal (severity and susceptibility) and action-modulating factors (benefit and barriers), with potential implications for feature independence in behavioural modelling pipelines.

A closer examination of the Spearman correlation coefficients, barriers, and other HBM constructs revealed a heterogeneous but theoretically meaningful pattern across disease domains. In the context of bacterial infections and mycoses, barriers demonstrated a strong inverse correlation with perceived benefits (ρ = −0.71), suggesting that patients who identified more barriers to participation were markedly less likely to perceive the potential benefits of clinical trial enrolment. This reflects a cognitive suppression effect, wherein a high logistical or emotional burden attenuates the perceived value of engagement. This dynamic is likely intensified by the acute and transmissible nature of bacterial diseases.

This finding is consistent with results from a trial using the HBM to assess the behaviour of mammography patients. [24]. Notably, barriers in this domain also showed moderate positive associations with perceived severity (ρ = 0.51) and susceptibility (ρ = 0.64), indicating that those who perceived the disease as severe and themselves as susceptible still felt encumbered by participation barriers.

By contrast, in categories such as mental disorders, otorhinolaryngologic diseases, and female urogenital conditions, perceived barriers-benefit correlations were weaker (ρ ranging from −0.22 to −0.25), and associations with perceived severity and susceptibility remained low to moderate. This attenuated inter-construct linkage suggests that in these contexts, perceived barriers function more independently of benefit appraisal and threat perception. The strong negative perceived barriers-benefit coupling in bacterial infections may thus represent a condition-specific motivational bottleneck, underscoring the need for targeted strategies during trial recruitment in infectious disease settings.

Correlations between perceived severity and susceptibility reveal a consistently strong and positive association across all disease categories. Correlation coefficients frequently exceeded 0.95, with several nearing unity (e.g., stomatognathic diseases: ρ = 0.999, behaviour and behaviour mechanisms: ρ = 0.999, congenital diseases: ρ = 0.999), indicating an almost linear monotonic relationship. This suggests that individuals who regard a condition as severe also tend to perceive themselves as highly susceptible, confirming the HBM theory that threat appraisal emerges from the joint perception of severity and personal risk (detailed in Appendix H).

These findings indicate that structured behavioural constructs derived through LLMs can effectively inform AI/ML models, and that perceived health threats with access concerns serve as crucial motivators in early trial engagement. This emphasises the theoretical consistency of HBM threat components while also highlighting the conditional independence of factors such as barriers, especially in behavioural models for participation in RCTs.

6.3 The effect of timing on pre-screening outcomes

Kaplan-Meier survival estimates revealed a consistent and progressive decline in the likelihood of successful patient contact as time elapsed from the initial referral. This speaks directly to the influence of the time lag between initial screening and site contact on the likelihood of a patient potentially moving beyond the initial site phone screening stage (research sub-question RQ1.2).

Figure 8 represents the mean site-level probabilities derived from an average cohort of 1891 clinical trial sites, offering robust insight into system-wide patterns of temporal degradation in patient engagement. On day 1, the average contact probability was 0.80 (95% CI: 0.67–0.88), suggesting a high initial likelihood of successful recruitment. However, this probability declined steadily, falling to 0.60 (CI: 0.50–0.69) by day 11. Significantly, the lower confidence limit (LCL) dropped below the critical 0.50 threshold on day 12, signalling that in at least half of all observed scenarios across sites, the probability of successful contact fell below chance if the patient was not engaged within this window.

Figure 8.

Kaplan-Meier survival analysis.

This downward trend continued, with the mean contact probability declining to 0.46 (CI: 0.37–0.54) by day 31, representing a relative reduction of approximately 42.5% from the baseline. These findings underscore the narrow and operationally significant window of opportunity for referral conversion. The most rapid decline occurred during the first 7 to 10 days, during which the contact probability dropped by 14 percentage points, from 0.80 to 0.66, reflecting a nonlinear risk function. After day 10, the rate of decay attenuated, with probabilities declining more gradually, indicative of a saturation point in outreach efficacy.

Following dynamic adjustment using Kaplan-Meier (KM) survival probabilities, substantial shifts in the predicted likelihood of successful patient contact were observed across clinical disease classification domains. The unadjusted scores, generated via an XGBoost classifier, represented the baseline probability of referral conversion independent of elapsed time. However, when scaled by time-sensitive survival probabilities, the recalibrated cumulative probabilities, as shown in Figure 9, more accurately reflect the real-world engagement potential within each disease area.

Figure 9.

Patient referral time adjusted probability.

Across the full sample, the mean static probability was 0.59. In contrast, the KM-adjusted cumulative mean dropped to 0.26, a 55.9% reduction, which highlights the systemic overestimation of referral viability when time decay is not considered. This effect was not uniform across disease classifications. Under the static model, high-performing categories, such as male urogenital diseases (0.82), cardiovascular diseases (0.80), and female urogenital diseases (0.73), retained relatively higher adjusted probabilities (0.42, 0.38, and 0.37, respectively), indicating more durable referral quality over time. In contrast, lower-performing categories such as digestive system diseases (0.33) and musculoskeletal diseases (0.39) exhibited substantial post-adjustment attenuation, with cumulative probabilities falling to 0.11 and 0.12, respectively.

This suggests that referral decay curves differ meaningfully by disease classification, potentially due to underlying differences in trial complexity, patient burden, symptom acuity, or pre-screening ambiguity. For instance, the steep decline in domains such as neoplasms (from 0.40 to 0.13) and endocrine disorders (from 0.49 to 0.17) may reflect delayed outreach combined with clinical complexity or informational overload during screening. Notably, certain domains with moderate static scores, such as diagnosis (0.54) and mental disorders (0.55), maintained intermediate adjusted probabilities (0.30 and 0.25, respectively), suggesting some resilience to time decay.

Static classification models alone mask these temporal vulnerabilities, leading to inefficient prioritisation and potential patient loss. Incorporating time-aware adjustments enables precision targeting and aligns referral efforts with empirically observed conversion windows. Consequently, these stratified findings reinforce the clinical and operational relevance of survival-adjusted scoring. They demonstrate that domain-specific referral decay dynamics must be considered in triage logic, resource allocation, and follow-up protocols.

7. Data synthesis and integration

7.1 The integration of artificial intelligence and machine learning techniques

In response to calls for process-oriented HBM research [52, 53], this research demonstrates that the HBM can serve as both a conceptual and operational scaffold for structuring ML inputs. LLM-derived features mapped to HBM constructs enabled patient responses to be transformed into predictive variables, aligning health behaviour with computational modelling.

The integration of behavioural constructs, such as perceived barriers, benefits and severity into the pre-screening dataset revealed both their analytical potential and limitations when treated as post hoc explanatory variables. The opportunity to reverse this logic, rather than applying behavioural theories retrospectively to interpret patient decisions, is a key insight emerging from the synthesis of SHAP value distributions and clustering patterns. The minimal contribution of perceived severity and benefits may not necessarily indicate that these constructs lack predictive relevance, but rather that the structure and content of pre-screening interactions fail to surface these dimensions, warranting further attention to address them effectively.

Pre-screening criteria are often operational, focusing on inclusion and exclusion thresholds that may not prompt participants to reflect on, or disclose, deeper motivational factors. As such, the low SHAP influence of perceived severity and benefits may reflect an information asymmetry between what the constructs represent and what is practically expressed in the data. This suggests a methodological gap: current screening protocols may under-capture rich behavioural indicators critical to understanding patient motivation for engaging in RCTs.

Future recruitment frameworks could proactively embed these constructs into pre-screening design. Specifically, operationalising behavioural readiness, motivational intent, and perceived trial burden at the point of initial screening may enable more nuanced and accurate stratification of patient referrals. This approach represents a shift from eligibility-based screening to behavioural risk stratification, with the potential to improve recruitment efficiency and enhance patient engagement.

7.2 Temporal modelling and operational relevance

The observed shifts in referral probability across disease classifications following the Kaplan-Meier (KM) adjustment confirm the critical need to incorporate temporal survival modelling into patient scoring pipelines. While static classifiers, such as XGBoost, effectively capture behavioural, clinical, and demographic predictors, they systematically overestimate referral viability when time-dependent engagement decay is not considered. This underscores temporal vulnerability as an independent risk dimension, not detectable through static modelling alone.

To address this, integrating KM survival probabilities into the scoring architecture provides a methodologically rigorous and operationally responsive enhancement. The end-to-end framework, which merges NLP for structured feature extraction, ensemble classification (XGBoost) for probabilistic labelling, and non-parametric survival analysis (Kaplan-Meier) for time-aware recalibration, enables dynamic, risk-adjusted scoring that reflects both the initial likelihood of contact and its temporal resilience.

This multidisciplinary approach supports real-time triage of patient referrals, aligning model outputs with the cadence and constraints of clinical trial workflows. The resulting framework functions not merely as a predictive engine but as a decision-support mechanism, prioritising outreach based on cumulative probability degradation and clinical urgency at the disease classification level. By contextualising risk within site-level operations, contact timelines and disease-specific referral behaviour, this model/framework bridges analytical performance with recruitment feasibility.

The findings provide empirical support for integrating Kaplan-Meier-derived survival probabilities into dynamic risk scoring models. Traditional static classification methods fail to account for the temporal decline in site contact and patient responsiveness. By incorporating time-to-event adjustments, the referral model can prioritise cases during the peak period for contact, minimising operational inefficiencies and reducing site burden.

Future work will focus on monitoring calibration drift arising from shifts in recruitment strategies, campaign responsiveness or systemic outreach delays. Evaluating model robustness across evolving referral pathways will be critical to ensuring sustained accuracy, equity and clinical utility.

7.3 Addressing bias and enhancing equity

While the HBM remains a robust behavioural framework, it does not explicitly accommodate structural inequities or population-level disparities. Equity was considered not by altering the HBM to include elements of diversity or structural factors which could compromise its behavioural focus, but rather by integrating SVI features into the model’s architecture. Similarly, fairness was addressed procedurally through model design. This method preserved the behavioural integrity of the HBM while accommodating the structural inequities that affect patient recruitment.

Model bias revealed through heterogeneous predictive performance in disease classification highlights the need for domain-specific calibration. Potential adjustments to decision thresholds could mitigate asymmetric performance and optimise operational utility in referral scoring workflows.

7.4 Theoretical and practical contribution

This research introduces an innovative interdisciplinary approach that combines theoretical models with practical applications. It uses a hybrid modelling framework that retains the strengths of individual-level cognitive concepts (e.g. perceived barriers and benefits) while also incorporating population-level fairness considerations directly into its design. In doing so, it represents a methodological advancement in which ethical safeguards can be built into models through careful feature selection and validation, rather than by expanding the underlying theoretical concepts [43]. Fairness was reinforced through stratified performance checks across SVI-defined subgroups, confirming the model’s stability and equity across varying levels of structural vulnerability.

This research seeks to contribute to the growing body of work that emphasises equity-informed AI/ML in patient recruitment, without compromising the integrity of domain-specific theoretical frameworks. It presents a blueprint for incorporating fairness into behavioural modelling pipelines by using structural covariates and performance stratification, rather than relying on post-hoc corrections or theoretical overreach.

8. Conclusion and managerial implications

Persistent challenges in patient recruitment continue to impede the efficiency and generalisability of clinical research, particularly during the pre-screening phase. Although some drivers of attrition reflect broader socio-environmental and health system constraints, improving early-stage measurement and triage can materially reduce operational burden and increase screening yield.

A central contribution of this research is demonstrating how behavioural theory can be operationalised within an AI/ML pipeline. By mapping pre-screening text to HBM-aligned constructs and combining these features with clinical and contextual variables, the PRS framework enables earlier identification of referrals at a higher likelihood of passing site phone screening. Importantly, findings also indicate that commonly under-expressed constructs, such as perceived benefits and perceived severity, may be weakly represented in routine screening interactions, suggesting a practical opportunity to redesign pre-screening instruments to elicit motivational readiness, perceived burden, and intent more explicitly.

This research further shows that behavioural and clinical predictors alone are insufficient when time-dependent engagement decay is not modelled. Incorporating survival-based adjustment provides a time-aware perspective on referral viability and supports operational prioritisation during the peak contact window. From a managerial standpoint, this enables a shift from static eligibility screening toward dynamic, risk-adjusted triage that better aligns outreach sequencing with recruitment workflows and capacity constraints.

Finally, equity considerations require that recruitment decision-support tools account for structural context and monitor performance across subgroups. In this framework, contextual and access-related influences are represented through protocol burden measures, SVI-derived covariates, and time-to-contact dynamics, alongside disaggregated performance assessment. Future recruitment strategies may also benefit from complementary, non-algorithmic interventions (e.g., tailored messaging and community-informed outreach) that address engagement barriers not fully observable in pre-screening data and should be evaluated prospectively alongside model-assisted prioritisation. Accordingly, future work should establish prospective validation and deployment utility evaluation to quantify operational benefit and ensure sustained calibration and equity under live conditions.

9. Recommendations and future work

The research demonstrated the potential to integrate HBM constructs with AI/ML modelling to enhance the quality of patient referrals in clinical trials. However, several areas require further refinement and strategic development to support real-world deployment, scalability, and ethical integrity.

The recommendations outlined in Table 5 are essential for future work and broader industry advancement.

Recommendations and future work
1. Establish standardisation protocols for clinical screening data
The variability in trial criteria phrasing and patient response semantics poses a significant barrier to the scalability of AI/NLP applications. Future research should support the development of shared ontologies, controlled vocabularies, and standardised data collection frameworks across trials. This will facilitate harmonisation of eligibility criteria, improve semantic interoperability, and reduce ambiguity in LLM-based extraction processes.
2. Improve model generalizability and equity across patient populations
Addressing issues such as class imbalance, therapeutic area sparsity, and algorithmic bias is essential to ensure the robustness and fairness of the model. Future work should focus on adaptive modelling strategies, such as cost-sensitive learning, Bayesian frameworks, and fairness-aware training. These should be evaluated not only on performance metrics but also on their impact across diverse demographic and linguistic subgroups.
3. Enhance the validity and interpretability of behavioural constructs
Mapping patient-generated text to the HBM constructs remains a promising but imperfect approach. Future studies should empirically validate these mappings through triangulation with actual trial outcomes, follow-up surveys and/or engagement analytics. This will strengthen the theoretical contribution of behavioural science within AI/ML frameworks and foster interdisciplinary alignment.
4. Prioritise data quality, provenance, and infrastructure
Missing or unverified data, especially for diagnoses and referral outcomes, compromises the validity of the model. Future systems should incorporate automated data quality checks, confidence scores based on source reliability, and, where feasible, linkages to external clinical records. Investments in interoperable infrastructure will be critical to achieving scalable, high-integrity analytics pipelines.
5. Design transparent and ethical AI governance models
As LLM-based models become integral to screening automation, continuous oversight is required to monitor model drift, audit fairness, and mitigate embedded biases. Human-in-the-loop feedback, explainability features, and community-driven auditing practices should be institutionalised to uphold ethical standards and foster stakeholder trust.
6. Enhancing HBM construct visibility
Future iterations may benefit from incorporating qualitative sentiment analysis, structured follow-up questions, or open-ended motivational prompts to more effectively elicit perceived benefits and threats in early-stage recruitment.

Table 5.

Recommendations and future work.

9.1 Prospective validation and deployment evaluation

This research reports retrospective model development and internal validation. To establish prospective performance under live operating conditions, future work will evaluate the PRS model in a forward-looking, time-split design using newly accrued referrals after a model freeze date. In an initial silent-mode deployment, predictions will be generated without influencing outreach decisions, enabling unbiased estimation of discrimination (AUC), calibration (Brier score; calibration slope/intercept), and operational utility (precision/recall at action thresholds and yield per outreach effort). Performance will be monitored at predefined intervals and disaggregated by disease classification and minority indicator to detect drift and inequitable degradation. Recalibration or retraining will be triggered when calibration or subgroup performance falls below pre-specified tolerances.

9.2 Perform deployment utility and decision-threshold analysis (non-monetary cost: Benefit)

Because monetary costing parameters were not available in this retrospective dataset, deployment value will be quantified using operational proxies: successful phone screenings per unit outreach effort (e.g., per recruiter-hour or per 1000 outreach attempts), effort per success (attempts/minutes per successful screen), and, where available, downstream progression. Decision thresholds will be selected using an expected-utility approach, based on plausible relative cost ratios for false positives (wasted outreach) versus false negatives (missed eligible referrals), evaluated across disease classifications and the minority indicator, to ensure that efficiency gains do not introduce inequitable performance. Where trial-level cost inputs become available, these operational metrics can be translated into monetary net benefit.

10. Assumptions, limitations and delimitations

Despite the methodological rigour applied, this research is subject to several limitations detailed in Table 6.

Assumptions, limitations and delimitations
CategoryDescription
Missing protocol dataThe exclusion may introduce selection bias if the missingness is not random or is systematically associated with specific trial types, therapeutic areas, or patient characteristics.
Class imbalanceA notable imbalance between passed and failed referrals persists as a challenge, affecting model calibration and predictive discrimination
Small sample sizes and heterogeneityDisease classification with a low number of patient referrals such as bacterial infections and mycoses (n = 1211), chemically induced disorders (n = 36), otorhinolaryngologic diseases (n = 153), and physiological phenomena (n = 480), particularly when coupled with high heterogeneity within these areas, limit the model’s ability to generalise robustly across diverse trial profiles.
Redundant study criteriaThe repeated use of similar but variably worded trial criteria across studies unnecessarily inflated the number of factor levels, increasing dataset complexity, model bias and redundancy.
Ambiguous patient responsesAmbiguous patient responses, such as “unsure” or “not known” to trial criteria questions, complicate the evaluation of key trial criteria categories, namely diagnosis, symptoms, and treatment.
These inconclusive responses directly impact the performance of the LLM-based Named Entity Recognition (NER) system, which classifies such responses under a generic “Other” category. This classification weakens the predictive power of relevant variables, reduces the model’s ability to infer meaningful behavioural constructs, and lowers the overall precision of candidate referral predictions.
Inconsistent question phrasingLLM-based name entity recognition (NER) provides strong semantic parsing capabilities; its effectiveness is diminished when trial criteria questions are conflicting or inconsistently phrased. Such inconsistencies can lead to misinterpretations, overlooked nuances, and incorrect classification of eligibility criteria, resulting in information loss and noise in the modelling process.
LLM model biasLLM-based models inherently carry biases from their original training data, which may impact performance across different demographic, linguistic, or cultural subgroups.
Diagnosis validityLLM-based entity extraction and HBM mappings serve as a valid proxy for patients’ beliefs and attitudes regarding clinical participation. Patient responses to pre-screening questions are assumed to be honest, pertinent and adequate for inferring underlying behavioural constructs.
The accuracy of medical diagnoses attached to patient Lead IDs cannot be conclusively verified. These diagnoses may represent self-reported conditions or data entered by recruiters and, therefore, may not accurately reflect formally confirmed clinical conditions.
Model interpretabilityAlthough LLM-based entity recognition provides robust semantic parsing capabilities, it also carries potential biases from the training data, which may impact performance across various demographic or linguistic groups.
Despite enhancements through SHAP, the model’s interpretability continues to grapple with challenges typical of black-box machine learning approaches.

Table 6.

Assumptions, limitations and delimitations.

Acknowledgments

I want to acknowledge Tommy Habibe (MSc) for his substantial contribution to this research, particularly in operationalising the Health Belief Model (HBM) through Natural Language Processing (NLP). His expertise in leveraging Named Entity Recognition (NER) within large language models (LLMs) was instrumental in bridging theoretical constructs with practical application.

Conflict of interest

The authors declare no conflict of interest.

Thanks

I extend my sincere appreciation to SubjectWell for their support and collaboration in this research. SubjectWell enhances the patient journey in clinical trials to accelerate clinical outcomes. Powered by a robust patient experience platform featuring patient-centric technology, global reach, breakthrough creative, and a suite of services to reduce site burden. SubjectWell drives intelligence and efficiency for sponsors, CROs, sites, and site networks. From protocol development through recruitment, enrollment, retention, and post-trial real-world evidence studies, SubjectWell supports the entire clinical lifecycle.

Appendices and nomenclature

Appendix A: Patient referral dataset attributes

(See Table A1).

DomainFieldsDescription
Patient identityPatient_Study_IdUnique patient study identifiers
DemographicsEthnicity
Gender
Age_Bin,
Patient_Bmi
Distance_From_Site
Standardised patient sociodemographic
geographic proximity
Referral interactionFirst_Contact_Date
Phone_Screen
Referral engagement
Unit of analysis
Pre-screening criteria and responsesCriteria_Id
Lead_Response Positive_For_Fact
Ensemble_Label
Pre-screening criteria identifier
Structured screening responses
LLM-derived classifications
Study protocol attributesMin_Bmi,
Max_Age
Protocol_Duration
Protocol_Screening_Visits
Protocol_Site_Visits
Study Burden
Eligibility metadata
Study commitment
Semantic embeddingsBioclinicalbert_Ambiguity Bioclinicalbert_InfoBERT-based semantic information score
Ontological metadataMesh_HeadingMedically standardised disease or condition topics for the associated trial
Site and region attributesSite_Id
County_Fips
Province_Abbr
Trial site geography and regional identifiers

Table A1.

Patient referral dataset attributes.

Appendix B1: Patient referral univariate analysis

5,436,136 Observations 69 Variables, Phone_Screen_Date 2020-2104-21 to 2025-2104-15 (Table B1).

VariableNMissingDistinct
County_Fips5,436,1360544
Patient (Study) Id5,436,1360398,879
Ethnicity5,436,13608
Study_ID5,436,1360267
Mesh_Heading5,436,136027
Ensemble_Label5,436,13605
Patient_Compensation_Note5,436,13602
Protocol_Screening_Burden5,436,13603
Phone Screen5,436,13602
Phone Screen Date5,436,1362,907,9062
Patient_P_Connect5,436,136135,994177
Distance_From_Site5,436,1362116,563
Infosource5,436,13602
Positive_For_Fact5,436,13602
Site ID5,436,13603418
Province_Abbr5,436,13602
Internal_Status5,436,13603
Variable class proportions
VariableClassFrequencyProportion
Ensemble_LabelBehaviour1,772,3270.326
Ensemble_LabelDiagnosis1,208,9870.222
Ensemble_LabelOther993,9290.183
Ensemble_LabelSymptoms1,438,8750.265
Patient_Compensation_Note0 (no compensation)3,385,8150.623
Patient_Compensation_Note1 (compensation)2,050,3210.377
Protocol_Screening_BurdenHigh506,9110.093
Protocol_Screening_BurdenLow2,821,6040.519
Protocol_Screening_BurdenMedium2,107,6210.388
Protocol_Screening_Visits072,3310.013
Protocol_Screening_Visits13,641,8170.670
Protocol_Screening_Visits21,352,7840.249
Protocol_Screening_Visits3285,8350.053
Protocol_Screening_Visits482,4730.015
Protocol_Screening_Visits58960.000
Infosource1 (current medical)3,922,6260.722
Infosource2 (medical history)1,513,5100.278
Positive_For_FactFalse3,053,2500.562
Positive_For_FactTrue2,382,8860.438
Internal_StatusActive984,1350.181
Internal_StatusInactive3,588,1430.660
Internal_StatusPaused863,8580.159
Variable distribution
VariableMeanGmd0.050.10.250.50.750.90.95LowestHighest
Patient_Age52.9117.8221304356647175-1 ∼ 4115 ∼ 120
Patient_BMI29.867.745202225293439420 ∼ 4304 ∼ 432
Patient_P_Connect0.160.10.050.060.080.130.220.30.340.00 ∼ 0.000.61 ∼ 0.65
Distance_From_Site30.240.22.547.313.522.229.350.20 ∼ 0.0410,280 ∼ 10,945
Protocol_Screening_Window34.2616.19142128284256700 ∼ 1075 ∼ 112
Protocol_Duration (Days)322.6308.4567712620437272810112 ∼ 211644 ∼ 2557
Protocol_Site_Visits11.086.2358101316240 ∼ 436 ∼ 60
AVG_BIOCLINICALBERT_INFO0.650.060.540.580.630.670.70.720.730.13 ∼ 0.140.80 ∼ 0.82
E_TOTPOP2,149,5882,274,120173,355286,108654,4531,272,2642,604,0534,726,1779,936,690416 ∼ 10,0263,289,701 ∼ 9,936,690
EP_POV1502.665.81112.817.521.524.326.327.83.9 ∼ 7.441.9 ∼ 47.8
EP_UNEMP5.31.373.74.24.65.46.47.17.30.4 ∼ 2.111.2 ∼ 15.1
EP_HBURD28.985.4420.822.225.829.331.933.936.612.9 ∼ 15.939.2 ∼ 48.5
EP_NOHSDP11.544.895.66.48.410.813.518.919.70.3 ∼ 2925.5 ∼ 31.4
EP_UNINSUR10.145.1744.66.79.112.51621.20.9 ∼ 1.921.6 ∼ 31
EP_AGE6515.313.621111.212.614.816.718.522.55.5 ∼ 9.733.9 ∼ 57.9
EP_AGE1721.943.0915.718.620.721.823.325.826.27.1 ∼ 14.330.3 ∼ 32.2
EP_DISABL11.72.448.89.110.211.212.515.115.86.5 ∼ 7.623.4 ∼ 30
EP_SNGPNT6.541.654.54.85.46.57.58.18.61.8 ∼ 2.911.5 ∼ 14.3
EP_LIMENG5.984.860.91.22.35.1812.512.50 ∼ 0.415.4 ∼ 20.6
EP_MINRTY52.7320.8621.527.639.554.46774.874.81.9 ∼ 6.290.4 ∼ 96.7
EP_MUNIT22.1713.696.89.413.918.52635.839.30 ∼ 148.2 ∼ 89.7
EP_MOBILE3.1283.1060.20.31.22.43.47.69.60 ∼ 0.436.2 ∼ 43.5
EP_CROWD4.2862.9711.21.52.13.56.18.8110.5 ∼ 0.910.5 ∼ 12.7
EP_NOVEH10.238.8323.345.26.78.714.131.80 ∼ 1.935.7 ∼ 77.9
RPL_THEME10.6330.28270.12980.23930.4620.65730.87210.90490.95130.003 ∼ 0.0100.98 ∼ 0.99
RPL_THEME20.46010.27190.09350.12250.28250.45150.66470.75660.85590.001 ∼ 0.0040.99
RPL_THEME30.8420.14910.54880.63730.7760.88960.94810.97360.97360.001 ∼ 0.1090.99
RPL_THEME40.69460.22750.23580.34330.62810.71710.85680.91760.96630.009 ∼ 0.0430.98 ∼ 0.99
RPL_THEMES0.6770.23830.19920.34010.56920.72830.86290.89250.92010.001 ∼ 0.0220.99
EP_NOINT9.9743.3355.66.37.99.711.714.315.32.6 ∼ 426.5 ∼ 36.6
EP_AFAM15.4112.331.73.5712.422.228.742.60 ∼ 0.461.3 ∼ 75.4
EP_HISP26.0319.873.75.310.225.134.548.7610 ∼ 1.482.9 ∼ 95.4
EP_ASIAN7.0295.8111.51.72.95.28.714.619.30 ∼ 0.430.3 ∼ 41.6
EP_AIAN0.25480.27340.10.10.10.10.20.41.40 ∼ 0.46.7 ∼ 39.8
EP_NHPI0.13260.194200000.20.30.70
EP_TWOMORE3.3851.0722.32.83.33.94.55.10 ∼ 0.78 ∼ 18.6
EP_OTHERRACE0.47720.19510.30.30.40.40.50.70.90

Table B1.

Patient referral univariate analysis results.

Appendix B.2: Patient referral (site phone screened) univariate analysis

(See Table B2).

Variable (2,528,230 observations)nMissingDistinct
County_Fips2,528,2300517 (from 544)
Patient Study ID2,528,2300173,632 (from 398,879)
Ethnicity2,528,23008
Study_Id2,528,2300260 (from 267)
Mesh_Heading2,528,230027
Ensemble_Label2,528,23005
Patient_Compensation_Note2,528,23002
Protocol_Screening_Burden2,528,23003
Site_Id2,528,23003141 (From 3418)
Variable class proportions
Ethnicityn%Previous %
American Indian or Alaska Native58,2932.3%
Asian65,7492.6%
Black or African American756,90729.9%27.9%
Hispanic or Latino368,30814.6%
Middle Eastern or North African21500.1%
Native Hawaiian or Pacific Islander13,9860.6%
Unknown74,3742.9%3.6%
White1,188,46347.0%48.5%
CategorySubcategoryFrequencyProportion
Ensemble_LabelBehaviour69730.003
Ensemble_LabelDiagnosis824,2870.326
Ensemble_LabelOther symptoms561,6140.222
Ensemble_LabelTreatment464,4330.184
Ensemble_LabelFrequency670,9230.265
PATIENT_COMPENSATION_NOTE0 (no compensation)1,581,1170.625
PATIENT_COMPENSATION_NOTE1 (compensation)947,1130.375
PROTOCOL_SCREENING_BURDENHigh278,8690.110 (from 0.09)
PROTOCOL_SCREENING_BURDENLow1,287,0470.509
PROTOCOL_SCREENING_BURDENMedium962,3140.381
PHONE_SCREEN0 (failed phone screen)1,264,3090.51
PHONE_SCREEN1 (pass phone screen)1,263,9210.49
INFOSOURCE1 (current medical)1,737,2260.687 (from 0.722)
INFOSOURCE2 (medical history)791,0040.313 (from 0.278)

Table B2.

Patient referral (site phone screened) univariate analysis results.

Appendix C: Bivariate analysis, Pearson correlation, and Spearman rho

(See Figure C1).

Figure C1.

Bivariate analysis- Pearson correlation and Spearman rho results.

Appendix D: Feature engineering, encoding and interaction terms

To support predictive modelling, a structured feature engineering approach was adopted. This process transformed the patient referral data into analytically meaningful features, designed to capture both behavioural and protocol-related dimensions relevant to screening outcomes (Table D1).

DomainVariablesDescription
Encoding and Adjustment of Information SourceINFOSOURCEThe INFOSOURCE variable, which captures the origin of the patient information, was encoded as a binary feature distinguishing between current medical (coded as 1) and all other sources (coded as 2). Given the potential variation in information reliability, this binary transformation enabled downstream adjustment of information scores based on source provenance.
Semantic Information Polarity and AdjustmentAVG_BIOCLINICALBERT_INFO
POSITIVE_FOR_FACT
INFOSOURCE
The semantic informativeness of pre-screening responses was captured using the AVG_BIOCLINICALBERT_INFO score, derived from domain-specific transformer models. To reflect the directionality of informativeness, scores were inverted (multiplied by −1) when the associated response was marked as not positively contributing to the patient fact extraction (POSITIVE_FOR_FACT = FALSE). Further refinements were made by scaling the information score depending on its source. Specifically, information derived from non-medical sources (INFOSOURCE = 2) was upweighted by a factor of 1.2, whereas negatively contributing facts were down weighted by 50%, rather than fully inverted, to retain partial information signal without introducing noise.
Interaction FeaturesAVG_BIOCLINICALBERT_INFO
Standardised patient age
Standardised patient BMI
To capture non-linear dependencies between patient characteristics and their semantic information contribution, interaction terms were created:
INTER_AGE_INFO: the product of standardised age and information score.
INTER_BMI_INFO: the product of standardised BMI and information score.
Additionally, a protocol-adjusted distance burden variable (INTER_PROTOCOL_DISTANCE) was constructed. This feature integrates the total number of visits with travel distance and normalises it by protocol duration (in weeks). In cases where the visit count was zero, a near-zero placeholder (1e-6) was introduced to avoid undefined transformations, and a logarithmic transformation was applied to reduce skewness.
An aggregate interaction score (INTER_AGG_INFO) was computed as the row-wise mean of the above interaction features, providing a consolidated metric representing the interaction between demographic, geographical, and semantic factors.
Composite Informativeness IndexAVG_BIOCLINICALBERT_INFO (INFO_ADJ)
Standardised patient age
Standardised patient BMI
Standardised forms of patient age (AGE_SCALED) and BMI (BMI_SCALED) were computed to remove scale dependencies. These were then averaged together with the adjusted informativeness score (INFO_ADJ) to derive a unified composite informativeness score (COMPOSITE_INFO_SCORE). This composite index reflects a normalised integration of biological and semantic features. Infinite or undefined values resulting from earlier transformations were cleaned and imputed as missing (NA) to avoid downstream analytic distortions.
Protocol Burden and Patient Effort MetricsTo reflect patient burden associated with trial participation, a derived STUDY_BURDEN score was computed based on protocol screening complexity. Protocol burden levels were encoded as ordinal values: 0 (low), 0.5 (medium), and 1 (high). Additional features capturing the temporal and logistical dimensions of burden were calculated:
DURATION: total trial length in months.
TOTAL_VISITS: sum of site and screening visits.
FREQUENCY: average interval between visits, defined as duration divided by visit count.
EFFORT: estimated patient travel burden, calculated as total distance for all visits (assuming two-way travel).
Exploratory Factor Analysis (EFA)EFA [86] was conducted to construct a latent variable that captures perceived trial burden, representing a core component of the HBM barriers to action domain. The final model retained three objective indicators: trial duration, visit frequency, and total number of visits. The analysis yielded a strong unidimensional solution with a cumulative variance explained of 55.3%. All retained items loaded meaningfully (≥0.45), with duration (loading = 0.998) emerging as the dominant factor. This latent barrier (BARR) construct was operationalised as a continuous variable and integrated into subsequent predictive modelling.

Table D1.

Feature engineering, encoding and interaction terms.

Appendix E: Multivariate analysis, GLM model odds ratios (OR)

In interpreting the logistic regression results, odds ratios (ORs) were used to assess the direction and magnitude of each predictor’s effect on the likelihood of a patient completing the phone screening. An OR greater than 1 indicates that the predictor increases the likelihood of progressing to the phone screening stage, whereas an OR less than 1 suggests a decreased likelihood. An OR close to 1 implies that the predictor has minimal or no discernible effect. Predictors were considered statistically significant and thus meaningful in the model when the associated p-value was less than 0.05 (Table E1).

VariableEstimatestd.errorp.valueOROR.lowOR.High
BURDEN:Low1.010.010.002.752.712.79
BURDEN:Medium0.970.010.002.632.602.66
DISTANCE_FROM_SITE0.000.000.001.001.001.00
E_TOTPOP0.000.000.001.001.001.00
EP_AGE17−0.090.000.000.920.910.92
EP_AGE65−0.050.000.000.950.950.96
EP_CROWD0.060.000.001.071.061.07
EP_DISABL−0.050.000.000.950.950.95
EP_HBURD0.030.000.001.031.031.03
EP_LIMENG−0.040.000.000.960.950.96
EP_MINRTY0.010.000.001.011.001.01
EP_MOBILE−0.030.000.000.980.970.98
EP_MUNIT−0.020.000.000.980.980.98
EP_NOHSDP0.010.000.001.011.011.02
EP_NOVEH0.010.000.001.011.011.01
EP_POV1500.060.000.001.061.061.06
EP_SNGPNT−0.060.000.000.940.930.94
EP_UNEMP−0.050.000.000.950.950.96
EP_UNINSUR0.020.000.001.021.021.02
ETH:ASIAN−0.170.010.000.840.820.86
ETH:BLACK OR AFRICAN AMERICAN−0.140.010.000.870.850.89
ETH:HISPANIC OR LATINO−0.130.010.000.880.870.90
ETH:MIDDLE EASTERN OR NORTH AFRICAN−0.190.050.000.830.760.90
ETH:NATIVE HAWAIIAN OR PACIFIC ISLANDER−0.030.020.090.970.931.00
ETH:UNKNOWN0.000.010.941.000.981.02
ETH:WHITE−0.090.010.000.910.900.93
INFO:20.050.000.001.061.051.06
INTER_AGE_INFO−0.010.000.000.990.990.99
INTER_BMI_INFO0.000.000.741.001.001.00
INTER_PROTOCOL_DISTANCE (scaled)−0.040.000.000.960.950.96
LABEL:DIAGNOSIS0.400.030.001.501.421.57
LABEL:OTHER0.390.030.001.471.401.55
LABEL:SYMPTOMS0.470.030.001.601.521.68
LABEL:TREATMENT0.430.030.001.531.461.61
MESH:BEHAVIOUR AND BEHAVIOUR MECHANISMS1.250.030.003.493.323.67
MESH:CARDIOVASCULAR DISEASES0.690.030.002.001.902.10
MESH:CHEMICALLY-INDUCED DISORDERS1.670.160.005.333.917.26
MESH:CONGENITAL, HEREDITARY, AND NEONATAL DISEASES AND ABNORMALITIES1.080.030.002.942.803.09
MESH:DIAGNOSIS1.180.040.003.273.023.55
MESH:DIGESTIVE SYSTEM DISEASES0.060.030.021.061.011.11
MESH:ENDOCRINE SYSTEM DISEASES0.150.030.001.161.111.22
MESH:EYE DISEASES−0.500.030.000.610.570.64
MESH:FEMALE UROGENITAL DISEASES AND PREGNANCY COMPLICATIONS0.750.030.002.122.022.24
MESH:HEMIC AND LYMPHATIC DISEASES0.030.030.211.030.981.09
MESH:IMMUNE SYSTEM DISEASES0.550.030.001.741.661.83
MESH:MALE UROGENITAL DISEASES0.760.030.002.142.032.26
MESH:MENTAL DISORDERS0.710.030.002.041.942.14
MESH:MUSCULOSKELETAL AND NEURAL PHYSIOLOGICAL PHENOMENA0.220.030.001.251.181.32
MESH:MUSCULOSKELETAL DISEASES0.340.030.001.411.341.48
MESH:NEOPLASMS0.640.030.001.901.792.00
MESH:NERVOUS SYSTEM DISEASES0.310.030.001.371.301.44
MESH:NUTRITIONAL AND METABOLIC DISEASES0.420.030.001.521.451.61
MESH:OTORHINOLARYNGOLOGIC DISEASES0.670.080.001.961.672.30
MESH:PATHOLOGICAL CONDITIONS, SIGNS AND SYMPTOMS0.380.030.001.461.391.53
MESH:PHYSIOLOGICAL PHENOMENA1.180.040.003.273.023.55
MESH:PSYCHOLOGICAL PHENOMENA AND PROCESSES0.220.030.001.251.181.32
MESH:RESPIRATORY TRACT DISEASES0.040.030.101.040.991.10
MESH:SKIN AND CONNECTIVE TISSUE DISEASES0.840.030.002.312.202.43
MESH:STOMATOGNATHIC DISEASES−0.250.030.000.780.730.83
MESH:VIRUS DISEASES−0.410.030.000.670.630.70
PATIENT_AGE0.000.000.001.001.001.00
PATIENT_BMI0.000.000.001.001.001.00
PATIENT_P_CONNECT−0.530.010.000.590.580.61
PROTOCOL_DURATION0.000.000.001.001.001.00
PROTOCOL_SCREENING_VISITS0.000.000.461.001.001.01
PROTOCOL_SCREENING_WINDOW−0.010.000.000.990.990.99
PROTOCOL_SITE_VISITS0.010.000.001.011.011.01
RPL_THEME1−2.660.060.000.070.060.08
RPL_THEME20.280.030.001.331.261.40
RPL_THEME3−1.180.050.000.310.280.33
RPL_THEME4−0.240.030.000.790.740.84
RPL_THEMES1.680.090.005.384.556.36

Table E1.

Multivariate analysis, GLM model odds ratios (OR).

Appendix F.1: HBM constructs feature engineering

To theoretically ground the analysis within the Health Belief Model (HBM), four core constructs were derived from structured patient data and integrated as explanatory features. These theoretically informed constructs were foundational to the predictive modelling framework and aligned with established behavioural science literature to enhance interpretability and construct validity (Table F1).

HBM domainVariableDescription
Perceived susceptibilitySUSCCalculated as the average of DIAGNOSIS and SYMPTOMS; reflects a patient’s subjective vulnerability to illness.
Perceived severitySEVDerived directly from DIAGNOSIS; captures the perceived seriousness of the health condition.
Perceived benefitsBENComputed as 1 - TREATMENT; represents the perceived utility of trial participation in the absence of treatment.
Perceived barriersBARRNormalised using min-max scaling; reflects logistical, informational, or psychological obstacles to trial participation.

Table F1.

HBM constructs feature engineering.

Appendix F.2: HBM constructs imputation methodology

Missing data were handled using Multiple Imputation by Chained Equations (MICE) with predictive mean matching (method = ‘pmm’) and five iterations (maxit = 5). This approach was chosen for its robustness in preserving the underlying data distribution while minimising imputation bias. To validate the imputation method, the standard deviations before and after imputation were compared across key interaction variables. Minimal differences were observed (e.g., INTER_AGE_INFO_DIAGNOSIS: 0.7953 pre- vs. 0.7951 post-imputation; INTER_BMI_INFO_SYMPTOMS: 0.9471 vs. 0.9511), indicating that the imputation process preserved the variance structure of the original dataset (Table F2).

VariableStandard deviation (Std)
(prior to imputation)
Standard deviation (Std)
(after imputation)
INTER_AGE_INFO_DIAGNOSIS0.79532140.7950995
INTER_AGE_INFO_TREATMENT0.92787590.9203363
INTER_AGE_INFO_OTHER0.88748280.8887972
INTER_AGE_INFO_SYMPTOMS0.97097990.9542551
INTER_BMI_INFO_DIAGNOSIS0.78433140.7837689
INTER_BMI_INFO_TREATMENT0.89226270.8932449
INTER_BMI_INFO_OTHER0.87391910.8721255
INTER_BMI_INFO_SYMPTOMS0.94706360.9511442
INTER_AGG_INFO_DIAGNOSIS0.60614910.6024828
INTER_AGG_INFO_TREATMENT0.62350460.6199936
INTER_AGG_INFO_OTHER0.57525510.5731147
INTER_AGG_INFO_SYMPTOMS0.63434180.6215216

Table F2.

HBM constructs imputation methodology.

Appendix G: Patient referral scoring confusion matrix by mesh heading

(See Table G1).

Mesh_HeadingTpTnFpFnAccuracySensitivitySpecificity
Bacterial infections and mycoses161981713159%11%92%
Behaviour and behaviour mechanisms190723768871%100%3%
Cardiovascular diseases4038978212854765%88%31%
Chemically-induced disorders540190%83%100%
Congenital, hereditary, and neonatal diseases and abnormalities235628497412771%95%23%
Diagnosis100141171%99%2%
Digestive system diseases59727727249374%2%100%
Endocrine system diseases9562712551130866%42%83%
Eye diseases554483921467%20%92%
Female urogenital diseases and pregnancy complications106876432043071%71%70%
Hemic and lymphatic diseases356113021662264%36%84%
Immune system diseases298422001340138166%68%62%
Male urogenital diseases110341439113374%89%51%
Mental disorders32781592168375267%81%49%
Musculoskeletal and neural physiological phenomena27469013235466%44%84%
Musculoskeletal diseases17561207131980258%69%48%
Neoplasms34731318215866%69%63%
Nervous system diseases16452150730109368%60%75%
Nutritional and metabolic diseases84454339716971%83%58%
Otorhinolaryngologic diseases181312269%90%52%
Pathological conditions, signs and symptoms14584221663212567%41%86%
Physiological phenomena7116292761%72%36%
Psychological phenomena and processes21774110338966%36%88%
Respiratory tract diseases188936841043230362%45%78%
Skin and connective tissue diseases26011280107180967%76%54%
Stomatognathic diseases5224515759%3%98%
Virus diseases1320262101067%1%100%

Table G1.

Patient referral scoring confusion matrix by mesh heading.

Appendix H: Patient referral scoring HBM constructs Spearman CHBM

(See Table H1).

HBM construct (2)HBM construct (1)Spearman correlationDisease classification
SeverityBenefit−0.8654575Bacterial infections and mycoses
SusceptibilityBenefit−0.9489828Bacterial infections and mycoses
BarriersBenefit−0.7079344Bacterial infections and mycoses
BenefitSeverity−0.8654575Bacterial infections and mycoses
SusceptibilitySeverity0.9326143Bacterial infections and mycoses
BarriersSeverity0.5051384Bacterial infections and mycoses
BenefitSusceptibility−0.9489828Bacterial infections and mycoses
SeveritySusceptibility0.9326143Bacterial infections and mycoses
BarriersSusceptibility0.6354392Bacterial infections and mycoses
BenefitBarriers−0.7079344Bacterial infections and mycoses
SeverityBarriers0.5051384Bacterial infections and mycoses
SusceptibilityBarriers0.6354392Bacterial infections and mycoses
SeverityBenefit−0.9928538Behaviour and behaviour mechanisms
SusceptibilityBenefit−0.9933512Behaviour and behaviour mechanisms
BenefitSeverity−0.9928538Behaviour and behaviour mechanisms
SusceptibilitySeverity0.9993919Behaviour and behaviour mechanisms
BenefitSusceptibility−0.9933512Behaviour and behaviour mechanisms
SeveritySusceptibility0.9993919Behaviour and behaviour mechanisms
SeverityBenefit−0.9939225Cardiovascular diseases
SusceptibilityBenefit−0.9395737Cardiovascular diseases
BenefitSeverity−0.9939225Cardiovascular diseases
SusceptibilitySeverity0.9458765Cardiovascular diseases
BenefitSusceptibility−0.9395737Cardiovascular diseases
SeveritySusceptibility0.9458765Cardiovascular diseases
SeverityBenefit−0.6597454Chemically induced disorders
SusceptibilityBenefit−0.739577Chemically induced disorders
BenefitSeverity−0.6597454Chemically induced disorders
SusceptibilitySeverity0.9832668Chemically induced disorders
BenefitSusceptibility−0.739577Chemically induced disorders
SeveritySusceptibility0.9832668Chemically induced disorders
SeverityBenefit−0.9934572Congenital, hereditary, and neonatal diseases and abnormalities
SusceptibilityBenefit−0.992469Congenital, hereditary, and neonatal diseases and abnormalities
BenefitSeverity−0.9934572Congenital, hereditary, and neonatal diseases and abnormalities
SusceptibilitySeverity0.9986425Congenital, hereditary, and neonatal diseases and abnormalities
BenefitSusceptibility−0.992469Congenital, hereditary, and neonatal diseases and abnormalities
SeveritySusceptibility0.9986425Congenital, hereditary, and neonatal diseases and abnormalities
SeverityBenefit−0.9709958Diagnosis
SusceptibilityBenefit−0.8863042Diagnosis
BenefitSeverity−0.9709958Diagnosis
SusceptibilitySeverity0.9079219Diagnosis
BenefitSusceptibility−0.8863042Diagnosis
SeveritySusceptibility0.9079219Diagnosis
SeverityBenefit−0.8934219Digestive system diseases
SusceptibilityBenefit−0.8545585Digestive system diseases
BenefitSeverity−0.8934219Digestive system diseases
SusceptibilitySeverity0.955635Digestive system diseases
BenefitSusceptibility−0.8545585Digestive system diseases
SeveritySusceptibility0.955635Digestive system diseases
SeverityBenefit−0.976574Endocrine system diseases
SusceptibilityBenefit−0.9513764Endocrine system diseases
BenefitSeverity−0.976574Endocrine system diseases
SusceptibilitySeverity0.9759969Endocrine system diseases
BenefitSusceptibility−0.9513764Endocrine system diseases
SeveritySusceptibility0.9759969Endocrine system diseases
SeverityBenefit−0.9895697Eye diseases
SusceptibilityBenefit−0.9419866Eye diseases
BenefitSeverity−0.9895697Eye diseases
SusceptibilitySeverity0.9635876Eye diseases
BenefitSusceptibility−0.9419866Eye diseases
SeveritySusceptibility0.9635876Eye diseases
SeverityBenefit−0.9906221Female urogenital diseases and pregnancy complications
SusceptibilityBenefit−0.95009Female urogenital diseases and pregnancy complications
BarriersBenefit−0.2490277Female urogenital diseases and pregnancy complications
BenefitSeverity−0.9906221Female urogenital diseases and pregnancy complications
SusceptibilitySeverity0.9566264Female urogenital diseases and pregnancy complications
BarriersSeverity0.2337339Female urogenital diseases and pregnancy complications
BenefitSusceptibility−0.95009Female urogenital diseases and pregnancy complications
SeveritySusceptibility0.9566264Female urogenital diseases and pregnancy complications
BenefitBarriers−0.2490277Female urogenital diseases and pregnancy complications
SeverityBarriers0.2337339Female urogenital diseases and pregnancy complications
SeverityBenefit−0.9918556Hemic and lymphatic diseases
SusceptibilityBenefit−0.9546052Hemic and lymphatic diseases
BenefitSeverity−0.9918556Hemic and lymphatic diseases
SusceptibilitySeverity0.9672369Hemic and lymphatic diseases
BenefitSusceptibility−0.9546052Hemic and lymphatic diseases
SeveritySusceptibility0.9672369Hemic and lymphatic diseases
SeverityBenefit−0.9926578Immune system diseases
SusceptibilityBenefit−0.9728108Immune system diseases
BenefitSeverity−0.9926578Immune system diseases
SusceptibilitySeverity0.9814646Immune system diseases
BenefitSusceptibility−0.9728108Immune system diseases
SeveritySusceptibility0.9814646Immune system diseases

Table H1.

Patient referral scoring HBM constructs Spearman HBM.

Appendix I: Modelling approaches

Two modelling approaches were tested: a wide-format structure, where each HBM domain was represented as a distinct feature in a single-row record per patient and trial referral, and a long-format structure, where each HBM domain contributed a separate row per patient per trial. Both approaches were evaluated to assess the trade-offs between model simplicity, granularity, and the extent of required imputation due to missing domain-level information.

To assess how data structure influences classification performance across clinical domains, confusion matrix metrics (accuracy, sensitivity, and specificity) were compared between wide- and long-format models. While overall accuracy remained broadly comparable, with minimal divergence in most mesh headings, there were critical differences in sensitivity and specificity that reflect trade-offs in granularity versus model conservatism. The wide-format model demonstrated higher specificity in most domains (e.g., digestive system diseases: 99.7 vs. 97.1%, chemically induced disorders: 100 vs. 5.6%), indicating a bias toward minimising false positives. However, this came at the cost of substantially lower sensitivity, particularly in underrepresented conditions such as stomatognathic diseases (3.1 vs. 11.5%) and Bacterial Infections and Mycoses (10.9 vs. 19.3%).

In contrast, the long-format model consistently improved sensitivity, especially in domains with variable HBM coverage. Notably, immune system diseases saw a gain from 68.4 to 76.2%, and skin and connective tissue diseases rose from 76.3 to 84.9%. Physiological phenomena achieved 100% sensitivity under the long format, albeit with a complete loss in specificity (0%). These shifts highlight the long-format model’s strength in recovering true positives, likely due to its finer-grained representation of domain-level constructs. However, this approach also introduced increased noise and overfitting in some categories, particularly where positive cases are rare and imputation rates were higher.

Overall, the wide-format model offered greater parsimony and better control of false positives, making it advantageous for high-prevalence categories or operational use-cases requiring high specificity. In contrast, the long-format model was superior for recall-oriented applications, particularly in exploratory or safety-critical contexts where false negatives must be minimised. These findings support a context-sensitive deployment strategy, selecting format structures aligned with domain prevalence, data sparsity, and end-use priorities (Table I1).

Mesh headingAccuracy (wide| long)Sensitivity (wide| long)Specificity (wide| long)
BACTERIAL INFECTIONS AND MYCOSES(59.2 | 62.9)(10.9 | 19.3)(92.1 | 93.6)
BEHAVIOUR AND BEHAVIOUR MECHANISMS(71.4 | 71.2)(99.6 | 99.7)(3 | 2.9)
CARDIOVASCULAR DISEASES(65.3 | 64.1)(88.1 | 85.4)(31.5 | 33.9)
CHEMICALLY-INDUCED DISORDERS(90 | 58.6)(83.4 | 75.2)(100 | 5.6)
CONGENITAL, HEREDITARY, AND NEONATAL DISEASES AND ABNORMALITIES(70.6 | 71.2)(94.9 | 95.7)(22.6 | 19.4)
DIAGNOSIS(70.7 | 69.1)(99.1 | 100)(2.4 | 0.2)
DIGESTIVE SYSTEM DISEASES(74.5 | 73.4)(2.4 | 13.4)(99.7 | 97.1)
ENDOCRINE SYSTEM DISEASES(66.4 | 65.9)(42.3 | 41.8)(83.2 | 83.3)
EYE DISEASES(66.6 | 67.5)(20.5 | 13.7)(92 | 96.2)
FEMALE UROGENITAL DISEASES AND PREGNANCY COMPLICATIONS(71 | 71.2)(71.3 | 78.7)(70.5 | 61.3)
HEMIC AND LYMPHATIC DISEASES(64 | 65.7)(36.5 | 26.6)(84 | 93.6)
IMMUNE SYSTEM DISEASES(65.6 | 67.2)(68.4 | 76.2)(62.2 | 55.7)
MALE UROGENITAL DISEASES(74.4 | 73.8)(89.3 | 83.9)(51.5 | 57.9)
MENTAL DISORDERS(66.7 | 68)(81.4 | 84.6)(48.7 | 45.6)
MUSCULOSKELETAL AND NEURAL PHYSIOLOGICAL PHENOMENA(66.5 | 65.4)(43.7 | 42)(84 | 83.8)
MUSCULOSKELETAL DISEASES(58.3 | 58.3)(68.7 | 67.7)(47.8 | 48.7)
NEOPLASMS(66 | 64.8)(68.8 | 66)(63.3 | 63.7)
NERVOUS SYSTEM DISEASES(67.6 | 68.8)(60.1 | 62.2)(74.7 | 75)
NUTRITIONAL AND METABOLIC DISEASES(71.1 | 68.9)(83.4 | 72)(57.8 | 65.6)
OTORHINOLARYNGOLOGIC DISEASES(68.9 | 70.5)(90 | 46)(52 | 86)
PATHOLOGICAL CONDITIONS, SIGNS AND SYMPTOMS(67.1 | 67.8)(40.7 | 50.7)(86.5 | 81)
PHYSIOLOGICAL PHENOMENA(60.9 | 69.1)(72.5 | 100)(35.6 | 0)
PSYCHOLOGICAL PHENOMENA AND PROCESSES(66.1 | 65.4)(35.9 | 41.3)(87.8 | 84.2)
RESPIRATORY TRACT DISEASES(62.5 | 62.3)(45.1 | 49)(78 | 74)
SKIN AND CONNECTIVE TISSUE DISEASES(67.4 | 69.7)(76.3 | 84.9)(54.5 | 45.5)
STOMATOGNATHIC DISEASES(58.6 | 58)(3.1 | 11.5)(97.9 | 92.3)
VIRUS DISEASES(66.9 | 65.7)(1.3 | 0.9)(100 | 99.5)

Table I1.

Modelling approaches comparison.

References

  1. 1. Pharma R&D 2024. Available from: https://www.citeline.com/en/pharma-rd-2024 [Accessed: May 1, 2024]
  2. 2. World Medical Association Declaration of Helsinki. Ethical principles for medical research involving human subjects. Journal of the American Medical Association. 2013;310(20):2191
  3. 3. Friedman LM, Furberg CD, DeMets DL, Reboussin DM, Granger CB. Fundamentals of Clinical Trials [Internet]. Cham: Springer International Publishing; 2015. DOI: 10.1007/978-3-319-18539-2 [Accessed: February 15, 2024]
  4. 4. Butcher NJ, Monsour A, Mew EJ, Chan AW, Moher D, Mayo-Wilson E, et al. Guidelines for reporting outcomes in trial reports: The CONSORT-outcomes 2022 extension. Journal of the American Medical Association. 2022;328(22):2252-2264
  5. 5. Van Norman GA. Phase II trials in drug development and adaptive trial design. JACC. Basic to Translational Science. 2019;4(3):428-437
  6. 6. Admin. Clinical Trial Delays: America’s Patient Recruitment Dilemma [Internet]. London, UK: Clinical Trials Arena; 2012. Available from: https://www.clinicaltrialsarena.com/marketdata/featureclinical-trial-patient-recruitment/ [Accessed: March 4, 2024]
  7. 7. Applied Clinical Trials. [Internet]Barriers to Clinical Trial Recruitment and Possible Solutions: A Stakeholder Survey. Cranbury, NJ, USA: MJH Life Sciences; 2015. Available from: https://www.appliedclinicaltrialsonline.com/view/barriers-clinical-trial-recruitment-and-possible-solutions-stakeholder-survey [Accessed: May 21, 2025]
  8. 8. Briel M, Elger BS, McLennan S, Schandelmaier S, von Elm E, Satalkar P. Exploring reasons for recruitment failure in clinical trials: A qualitative study with clinical trial stakeholders in Switzerland, Germany, and Canada. Trials. 2021;22(1):844
  9. 9. Chaudhari N, Ravi R, Gogtay NJ, Thatte UM. Recruitment and retention of the participants in clinical trials: Challenges and solutions. Perspectives in Clinical Research. 2020;11(2):64-69
  10. 10. Desai M. Recruitment and retention of participants in clinical studies: Critical issues and challenges. Perspectives in Clinical Research. 2020;11:51-53
  11. 11. Kadam RA, Borde SU, Madas SA, Salvi SS, Limaye SS. Challenges in recruitment and retention of clinical trial subjects. Perspectives in Clinical Research. 2016;7(3):137
  12. 12. Laaksonen N, Bengtström M, Axelin A, Blomster J, Scheinin M, Huupponen R. Success and failure factors of patient recruitment for industry-sponsored clinical trials and the role of the electronic health records—A qualitative interview study in the Nordic countries. Trials. 2022;23(1):385
  13. 13. Pinto BM, Dunsiger SI. The many faces of recruitment in a randomized controlled trial. Contemporary Clinical Trials. 2021;102:106285
  14. 14. Ross L, Eberlein S, Khalil C, Choi SY, McKelvey K, Spiegel BMR. Bridging the gap: Culturally responsive strategies for NIH trial recruitment. Journal of Racial and Ethnic Health Disparities [Internet]. 2024;12:3655-3663. DOI: 10.1007/s40615-024-02166-y [Accessed: May 27, 2025]
  15. 15. Thoma A, Farrokhyar F, McKnight L, Bhandari M. How to optimize patient recruitment. Canadian Journal of Surgery. 2010;53(3):205-210
  16. 16. Ahmad W, Al-Sayed M. Human subjects in clinical trials: Ethical considerations and concerns. Journal of Translational Science [Internet]. 2018;4(6):1-5. Available from: https://www.oatext.com/human-subjects-in-clinical-trials-ethical-considerations-and-concerns.php [Accessed: June 23, 2024]
  17. 17. Rice S. Why Do Underperforming or Non Performing Clinical Trial Sites Even Exist? Sunnyvale, CA, USA: LinkedIn [Internet]; 2016. Available from: https://www.linkedin.com/pulse/why-do-underperforming-non-performing-clinical-trial-sites-sean-rice/ [Accessed: April 1, 2024]
  18. 18. Bruhn H, Treweek S, Duncan A, Shearer K, Cameron S, Campbell K, et al. Estimating site performance (ESP): Can trial managers predict recruitment success at trial sites? An exploratory study. Trials. 2019;20(1):192
  19. 19. Goodlett D, Hung A, Feriozzi A, Lu H, Bekelman JE, Mullins CD. Site engagement for multi-site clinical trials. Contemporary Clinical Trials Communications. 2020;19:100608
  20. 20. Hurtado-Chong A, Joeris A, Hess D, Blauth M. Improving site selection in clinical studies: A standardised, objective, multistep method and first experience results. BMJ Open. 2017;7(7):e014796
  21. 21. Tew M, Catchpool M, Furler J, De La Rue K, Clarke P, Manski-Nankervis JA, et al. Site-specific factors associated with clinical trial recruitment efficiency in general practice settings: A comparative descriptive analysis. Trials. 2023;24(1):164
  22. 22. Signorell A, Saric J, Appenzeller-Herzog C, Ewald H, Burri C, Goetz M, et al. Methodological approaches for conducting follow-up research with clinical trial participants: A scoping review and expert interviews. Trials. 2021;22(1):961
  23. 23. Rosenstock IM. Historical origins of the health belief model. Health Education Monographs. 1974;2(4):328-335
  24. 24. Murphy CC, Vernon SW, Diamond PM, Tiro JA. Competitive testing of health behavior theories: How do benefits, barriers, subjective norm, and intention influence mammography behavior? Annals of Behavioral Medicine: A Publication of the Society of Behavioral Medicine. 2014;47(1):120-129
  25. 25. Verheggen FW, Nieman F, Jonkers R. Determinants of patient participation in clinical studies requiring informed consent: Why patients enter a clinical trial. Patient Education and Counseling. 1998;35(2):111-125
  26. 26. Alyafei A, Easton-Carr R. The health belief model of behavior change. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025. Available from: http://www.ncbi.nlm.nih.gov/books/NBK606120/ [Accessed: March 25, 2025]
  27. 27. Skrede OJ, De Raedt S, Kleppe A, Hveem TS, Liestøl K, Maddison J, et al. Deep learning for prediction of colorectal cancer outcome: A discovery and validation study. Lancet London England. 2020;395(10221):350-360
  28. 28. Alemayehu C, Mitchell G, Nikles J. Barriers for conducting clinical trials in developing countries- A systematic review. International Journal for Equity in Health. 2018;17(1):37
  29. 29. Ajzen I. The theory of planned behavior. Organizational Behavior and Human Decision Processes. 1991;50(2):179-211
  30. 30. Bieganek C, Aliferis C, Ma S. Prediction of clinical trial enrollment rates. PLoS One. 2022;17(2):e0263193
  31. 31. Prochaska JO, Velicer WF. The transtheoretical model of health behavior change. American Journal of Health Promotion AJHP. 1997;12(1):38-48
  32. 32. Maug CW. Overcoming the Challenges of Clinical Trial Recruitment using Big Data Technologies [Internet]. Hyderabad, Telangana, India: World Pharma Today; 2017. pp. 26-33. Available from: https://www.worldpharmatoday.com/Articles/overcoming-the-challenges-of-clinical-trial-recruitment-using-big-data-technologies/ [Accessed: September 16, 2022]
  33. 33. Li S, Chen B, Chen H, Hua Z, Shao Y, Yin H, et al. Analysis of potential genetic biomarkers and molecular mechanism of smoking-related postmenopausal osteoporosis using weighted gene co-expression network analysis and machine learning. PLoS One. 2021;16(9):e0257343
  34. 34. Zarin DA, Tony T, Williams RJ, Califf RM, Ide NC. The ClinicalTrials.gov results database — Update and key issues. The New England Journal of Medicine. 2011;364(9):852-860
  35. 35. Gibbons MC. The potential of technology solutions for behavioral healthcare disparities. In: Marsch L, Lord S, Dallery J, editors. Behavioral Health Care and Technology: Using Science-Based Innovations to Transform Practice [Internet]. New York, NY, USA: Oxford University Press; 2014. DOI: 10.1093/med/9780199314027.003.0020 [Accessed: May 15, 2023]
  36. 36. Glanz K, Rimer BK, Viswanath K‘V’. Health Behavior: Theory, Research, and Practice. 5th ed. Vol. xxv. Hoboken, NJ, US: Jossey-Bass/Wiley; 2015. 485 p
  37. 37. Wandile PM. Patient recruitment in clinical trials: Areas of challenges and success, a practical aspect at the private research site. Journal of Biosciences and Medicines. 2023;11(10):103-113
  38. 38. ClinicalTrials.gov [Internet]. Bethesda (MD): National Library of Medicine (US); 2000. Available from: https://clinicaltrials.gov/ [Accessed: April 1, 2024]
  39. 39. Gresham G, Meinert JL, Gresham AG, Meinert CL. Assessment of trends in the design, accrual, and completion of trials registered in ClinicalTrials.gov by sponsor type, 2000-2019. JAMA Network Open. 2020;3(8):e2014682
  40. 40. Tse T, Fain KM, Zarin DA. How to avoid common problems when using ClinicalTrials.gov in research: 10 issues to consider. The BMJ [Internet]. 2018;361:k1452. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5968400/ [Accessed: June 19, 2024]
  41. 41. Patrick-Lake B. Patient engagement in clinical trials: The clinical trials transformation Initiative’s leadership from theory to practical implementation. Clinical Trials London England. 2018;15(1_Suppl.):19-22
  42. 42. Baraldi AN, Wurpts IC, MacKinnon DP, Lockhart G. Evaluating mechanisms of behavior change to inform and evaluate technology-based interventions. In: Marsch L, Lord S, Dallery J, editors. Behavioral Health Care and Technology: Using Science-Based Innovations to Transform Practice [Internet]. New York, NY, USA: Oxford University Press; 2014. DOI: 10.1093/med/9780199314027.003.0013 [Accessed: May 15, 2023]
  43. 43. Coffey T, Duncan EM, Morgan H, Lawrie L, Gillies K. Behavioural approaches to recruitment and retention in clinical trials: A systematic mapping review. BMJ Open. 2022;12(3):e054854
  44. 44. Cruz Rivera S, Liu X, Chan AW, Denniston AK, Calvert MJ, Ashrafian H, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: The SPIRIT-AI extension. The Lancet Digital Health. 2020;2(10):e549-e560
  45. 45. Fogel DB. Response to “a clinical trial is a success, not a failure, if it does not demonstrate efficacy or does identify safety concerns,” by Simon Kolstoe, Hugh Davies, and Janet Messer. Contemporary Clinical Trials Communications. 2018;12:199
  46. 46. Nilsen WJ, Pavel M. Behavioral health information technology adoption in the context of a changing healthcare landscape. In: Marsch L, Lord S, Dallery J, editors. Behavioral Health Care and Technology: Using Science-Based Innovations to Transform Practice [Internet]. New York, NY, USA: Oxford University Press; 2014. DOI: 10.1093/med/9780199314027.003.0021 [Accessed: May 15, 2023]
  47. 47. Wegge-Larsen C, Mehlsen M, Jensen AB. The motivation of breast cancer patients to participate in a national randomized control trial. Supportive Care in Cancer. 2023;31(8):468
  48. 48. Blickstein D. A machine learning approach for recruitment prediction in clinical trial design. arXiv [Internet]. 2021:arXiv:2111.07407. Available from: https://www.academia.edu/86876255/A_Machine_Learning_Approach_for_Recruitment_Prediction_in_Clinical_Trial_Design [Accessed: May 8, 2024]
  49. 49. Fogel DB. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review. Contemporary Clinical Trials Communications. 2018;11:156-164
  50. 50. Liu J, Allen PJ, Benz L, Blickstein D, Okidi E, Shi X. A machine learning approach for recruitment prediction in clinical trial design [Internet]. arXiv. 2021:arXiv:2111.07407. Available from: http://arxiv.org/abs/2111.07407 [Accessed: September 26, 2024]
  51. 51. Fisher JA, Kalbaugh CA. Challenging assumptions about minority participation in US clinical research. American Journal of Public Health. 2011;101(12):2217-2222
  52. 52. Brewer NT, Gilkey MB. Comparing theories of health behavior using data from longitudinal studies: A comment on Gerend and Shepherd. Annals of Behavioral Medicine: A Publication of the Society of Behavioral Medicine. 2012;44(2):147-148
  53. 53. Jones CL, Jensen JD, Scherr CL, Brown NR, Christy K, Weaver J. The health belief model as an explanatory framework in communication research: Exploring parallel, serial, and moderated mediation. Health Communication. 2015;30(6):566-576
  54. 54. Balagopalan A, Baldini I, Celi LA, Gichoya J, McCoy LG, Naumann T, et al. Machine learning for healthcare that matters: Reorienting from technical novelty to equitable impact. PLOS Digital Health. 2024;3(4):e0000474
  55. 55. Shafqat S, Fayyaz M, Khattak HA, Bilal M, Khan S, Ishtiaq O, et al. Leveraging deep learning for designing healthcare analytics heuristic for diagnostics. Neural Processing Letters. 2023;55(1):53-79
  56. 56. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453
  57. 57. Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring fairness in machine learning to advance health equity. Annals of Internal Medicine. 2018;169(12):866-872
  58. 58. Hassanat AB, Tarawneh AS, Altarawneh GA. Stop Oversampling for Class Imbalance Learning: A Critical Review [Internet]. Durham, NC, USA: Research Square; 2022. Available from: https://www.researchsquare.com/article/rs-1336037/v1 [Accessed: May 22, 2024]
  59. 59. General Data Protection Regulation (GDPR). [Internet]General Data Protection Regulation (GDPR). Hamburg, Germany: Intersoft Consulting Services AG; 2018. Available from: https://gdpr-info.eu/ [Accessed: June 2, 2025]
  60. 60. Alsentzer E, Murphy J, Boag W, Weng WH, Jindi D, Naumann T, et al. Publicly available clinical BERT embeddings. In: Rumshisky A, Roberts K, Bethard S, Naumann T, editors. Proceedings of the 2nd Clinical Natural Language Processing Workshop [Internet]. Minneapolis, Minnesota, USA: Association for Computational Linguistics; 2019. pp. 72-78. Available from: https://aclanthology.org/W19-1909/ [Accessed: May 1, 2025]
  61. 61. Ghim JL, Ahn S. Transforming clinical trials: The emerging roles of large language models. Translational and Clinical Pharmacology. 2023;31(3):131-138
  62. 62. Peng Y, Yan S, Lu Z. Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets. In: Demner-Fushman D, Cohen KB, Ananiadou S, Tsujii J, editors. Proceedings of the 18th BioNLP Workshop and Shared Task [Internet]. Florence, Italy: Association for Computational Linguistics; 2019. pp. 58-65. Available from: https://aclanthology.org/W19-5006/ [Accessed: May 1, 2025]
  63. 63. Singhal K, Tu T, Gottweis J, Sayres R, Wulczyn E, Hou L, et al. Towards expert-level medical question answering with large language models. Nature Medicine. 2023;31(3):943-950
  64. 64. Omar M, Sorin V, Agbareia R, Apakama DU, Soroush A, Sakhuja A, et al. Evaluating and addressing demographic disparities in medical large language models: A systematic review. International Journal for Equity in Health. 2025;24(1):57
  65. 65. Ribeiro MT, Singh S, Guestrin C. ‘Why should I trust you?’: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016. pp. 1135-1144
  66. 66. Argyropoulos C, Unruh ML. Analysis of time to event outcomes in randomized controlled trials by generalized additive models. PLoS One. 2015;10(4):e0123784
  67. 67. Goel MK, Khanna P, Kishore J. Understanding survival analysis: Kaplan-Meier estimate. International Journal of Ayurveda Research. 2010;1(4):274-278
  68. 68. Klinglmüller F, Fellinger T, König F, Friede T, Hooker AC, Heinzl H, et al. A comparison of statistical methods for time-to-event analyses in randomized controlled trials under non-proportional hazards. Statistics in Medicine. 2025;44(5):e70019
  69. 69. Kassambara A. Cox Proportional-Hazards Model - Easy Guides. Montpellier, France: Wiki – STHDA [Internet]; 2016. Available from: https://www.sthda.com/english/wiki/cox-proportional-hazards-model [Accessed: June 2, 2025]
  70. 70. CDC. SVI data & documentation download [Internet]. 2025. Available from: https://www.atsdr.cdc.gov/place-health/php/svi/svi-data-documentation-download.html [Accessed: March 11, 2025]
  71. 71. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol. 1 (Long and Short Papers). 2019. pp. 4171-4186
  72. 72. Bryman A. Social Research Methods. 4th ed. Oxford, UK: Oxford University Press; 2012
  73. 73. Saunders M, Lewis P, Thornhill A. Research Methods for Business Students. 8th ed. Pearson; 2019
  74. 74. Federal Register. [Internet]Revisions to OMB’s Statistical Policy Directive No. 15: Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity. Washington, DC, USA: Office of the Federal Register, National Archives and Records Administration; 2024. Available from: https://www.federalregister.gov/documents/2024/03/29/2024-06469/revisions-to-ombs-statistical-policy-directive-no-15-standards-for-maintaining-collecting-and [Accessed: June 9, 2025]
  75. 75. Russell ES, Aubrun E, Moga DC, Guedes S, Camelo Castillo W, Hardy JR, et al. FDA draft guidance to improve clinical trial diversity: Opportunities for pharmacoepidemiology. Journal of Clinical and Translational Science. 2023;7(1):e101. DOI: 10.1017/cts.2023.515
  76. 76. NIH Workshop on the Enrollment and Retention of Participants in NIH-Funded Clinical Trials [Video Recording]. 2014. Available from: https://videocast.nih.gov/watch=14355
  77. 77. U.S. Food and Drug Administration (FDA). Report to Congress: Diversity Action Plans Summary FY 2023 and FY 2024. U.S. Food and Drug Administration; 2023. Available from: https://www.fda.gov/media/179593/download
  78. 78. Finlayson SG, Subbaswamy A, Singh K, Bowers J, Kupke A, Zittrain J, et al. The clinician and dataset shift in artificial intelligence. The New England Journal of Medicine. 2021;385(3):283-286
  79. 79. Frost J. Introduction to Statistics: An Intuitive Guide [Internet]. Pennsylvania, USA: Statistics by Jim Publishing; 2020. Available from: https://statisticsbyjim.com/basics/introduction-statistics-intuitive-guide/
  80. 80. Bhandari P. How to Find Outliers – 4 Ways with Examples & Explanation [Internet]. Amsterdam, The Netherlands: Scribbr; 2021. Available from: https://www.scribbr.com/statistics/outliers/ [Accessed: June 9, 2025]
  81. 81. Bureau UC. What Updates to OMB’s Race/Ethnicity Standards Mean for the Census Bureau. Suitland, MD, USA: U.S. Census Bureau; 2014. Available from: https://www.census.gov/newsroom/blogs/random-samplings/2024/04/updates-race-ethnicity-standards.html [Accessed: June 9, 2025]
  82. 82. Laerd Statistics. Pearson Product-Moment Correlation - When you Should Run this Test, the Range of Values the Coefficient Can Take and How to Measure Strength of Association. Long Eaton, Nottingham, UK: Lund Research Ltd; 2018. Available from: https://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide.php [Accessed: January 1, 2021]
  83. 83. McHugh ML. The chi-square test of independence. Biochemia Medica. 2013;23(2):143-149
  84. 84. Jim F. Statistics by Jim. In: Regression Analysis: An Intuitive Guide. Pennsylvania, USA: Statistics by Jim Publishing; 2019. Available from: https://statisticsbyjim.selz.com/item/regression-analysis-an-intuitive-guide [Accessed: January 1, 2020]
  85. 85. Akoglu H. User’s guide to correlation coefficients. Turkish Journal of Emergency Medicine. 2018;18(3):91-93
  86. 86. Faraway JJ. Generalized linear models. In: Peterson P, Baker E, McGaw B, editors. International Encyclopedia of Education. 3rd ed Edition) [Internet]. Oxford: Elsevier; 2010. pp. 178-183. Available from: https://www.sciencedirect.com/science/article/pii/B9780080448947013312 [Accessed: June 16, 2025]
  87. 87. James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning. New York: New York, NY Springer; 2013
  88. 88. George S, Duran N, Norris K. A systematic review of barriers and facilitators to minority research participation among African Americans, Latinos, Asian Americans, and Pacific islanders. American Journal of Public Health. 2014;104(2):e16-e31
  89. 89. Rodríguez-Torres E, González-Pérez MM, Díaz-Pérez C. Barriers and facilitators to the participation of subjects in clinical trials: An overview of reviews. Contemporary Clinical Trials Communications. 2021;23:100829
  90. 90. Rajput P. Exploratory Factor Analysis. Boston, MA, USA: RPubs / Posit Software, PBC (formerly RStudio, PBC); 2018. Available from: https://rstudio-pubs-static.s3.amazonaws.com/376139_e9adaefdf4594a79a54a3f87ff4852d6.html#:~:text=Factor%20Analysis%20Model%20Adequacy [Accessed: January 1, 2022]
  91. 91. Wilson S. The MICE Algorithm [Internet]. Vienna, Austria: The Comprehensive R Archive Network (CRAN); 2021. Available from: https://cran.r-project.org/web/packages/miceRanger/vignettes/miceAlgorithm.html [Accessed: June 17, 2025]
  92. 92. Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, et al. xgboost: Extreme Gradient Boosting [Internet]. 2025. Available from: https://cran.r-project.org/web/packages/xgboost/index.html [Accessed: June 2, 2025]
  93. 93. Lundberg S, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems. 2017;30:4768-4777
  94. 94. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958;53(282):457-481
  95. 95. Rich JT, Neely JG, Paniello RC, Voelker CCJ, Nussenbaum B, Wang EW. A practical guide to understanding Kaplan-Meier curves. Otolaryngology–Head and Neck Surgery: Official Journal of American Academy of Otolaryngology-Head and Neck Surgery. 2010;143(3):331-336
  96. 96. kruskal.test function. RDocumentation [Internet]. Available from: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/kruskal.test [Accessed: June 2, 2025]
  97. 97. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, et al. A guide to deep learning in healthcare. Nature Medicine. 2019;25(1):24-29
  98. 98. cindex. Concordance Index for Right Censored Survival Time Data in Pec: Prediction Error Curves for Risk Prediction Models in Survival Analysis [Internet]. Vienna, Austria: The Comprehensive R Archive Network (CRAN); 2025. Available from: https://rdrr.io/cran/pec/man/cindex.html [Accessed: June 18, 2025]
  99. 99. Alabdallah A, Ohlsson M, Pashami S, Rögnvaldsson T. The concordance index decomposition: A measure for a deeper understanding of survival prediction models. Artificial Intelligence in Medicine. 2024;148:102781

Written By

Janine Zitianellis

Submitted: 15 December 2025 Reviewed: 21 January 2026 Published: 08 April 2026