The Solution has three main components: Dataiku Application, Clinical Site Intelligence Webapp, and Sponsor Insights Dashboard. The **Dataiku Application** set up the analysis for a given ClinicalTrial.gov API query. The  **Clinical Site Intelligence Webapp**  generates a study overview, similarity analysis, and associated clinical sites for a given study synopsis. The  **Sponsor Insights Dashboard** provides an overview of clinical trials and clinical research sites sponsored by a selected lead sponsor.

# Dataiku Application
Users must set up the Solution via the [Dataiku Application](article:43) to access the Dashboards and Web Application. The Dataiku Application is available on the **home page of Dataiku under the Applications section**.

It will help you connect your data regardless of the connection type and seamlessly configure the whole flow according to your specific parameters.

Create a new Dataiku application by clicking the **CREATE APP INSTANCE** button. It will create a new instance of the parent project, which you can configure according to your specific needs. You can make as many instances as you need (for example, if you want to apply this Solution to other data).
![dataiku-application-instance.png](YEExS9YDXqnB)


## Connections Configuration
Select the preferred connection for the data frames and folders where you want to build. The optimal engine will apply to the recipes within the flow. The Solution also requires proper storage for the text embedding vectors and similarity index as pickle files.
![Screenshot 2024-01-19 at 16.19.29.png](lRyxW0arMfIY)
 1. Select a connection.
 2. Click on the **RECONFIGURE**  button
 
<div class="alert">
The current version supports filesystem, S3, and Snowflake connections. Due to data type geometry incompatibility, the SQL connections Databricks, Redshift, and Postgresql are not supported.  
</div>


 
 
## Define Study Scope
This section defines the customized query to the [ClinicalTrials.gov API](https://beta.clinicaltrials.gov/data-about-studies/learn-about-api). In other words, the customized query establishes the scope of the clinical trials that feed into this Solution's intelligence. The query convention follows the [documentation](https://beta.clinicaltrials.gov/data-about-studies/learn-about-api). 
![Screenshot 2024-01-19 at 16.23.06.png](lmtUR4Sh5wTm)
 1. Select or type in the search terms to define your customized query


## Include Demographic & Social Determine of Health Dataset
This optional dataset augments the study enrollment rate prediction model and clinical site intelligence if included. The current release is limited to the SDOH data of USA counties. Read [Social Determinants Of Health](project:SOL_SDOH) for more information.
![Screenshot 2023-10-10 at 12.27.30.png](9FFr2RrV7Mz1)
 1. Tick to include the Demographic and SDOH dataset


## Build the Flow
It processes the configurations and creates all pipelines and models in the flow. 
![Screenshot 2023-10-10 at 14.12.55.png](YFvyLkk6JtX0)
 1. Click on the **BUILD** button to create the Flow
 
<div class="alert">
 Due to the scale and complexity of this Solution, the flow may take some time to complete. The run time is proportional to the scope of the ct.gov API query. For the data packaging of this release, we queried the cancer studies in the US after 2018, which resulted in 15k unique studies in total. The total run time for DSS on the cloud (with 2 CPUs and 16G RAM) to complete its flow zones (including models) was around 270 minutes. We encourage users to test their query on the official ct.gov browser (https://clinicaltrials.gov/) to estimate the size of the result. Users can then extrapolate the expected run time to build the Solution.  
 Since the queried studies will be used to train the enrollment rate prediction model and build the similarity index, we recommend that users include at least a few thousand studies for model training to achieve optimal model performance.
</div>


## Launch Web App 
Launch the Clinical Sites Intelligence Web App to review insights from study similarity analysis and clinical site intelligence.
![Screenshot 2023-10-10 at 14.16.27.png](fEEp5nlmqNVP)
 1. Click on the Web app super link for access
 
 
## Create Sponsor Dashboard
Create a Sponsor Dashboard to overview studies and sites sponsored by a selected lead sponsor.
![Screenshot 2024-04-12 at 10.22.33.png](Ghy7Dc7cfzXK)
 1. Select a lead sponsor
 2. Click on the **CREATE** button according to your SDOH dataset configuration. If you have included SDOH dataset in your setup, choose the option for **Create Sponsor Dashboard with SDOH Data**. Otherwise, select the **Create Sponsor Dashboard** option.
 3. View the dashboard!
 
 
# Clinical Site Intelligence Webapp
 The Study Similarity & Clinical Site Intelligence [Web App](article:14) is an interactive interface for users to query clinical site intelligence with study protocols. It distills the operational history of similar studies and associated clinical sites, representing the intelligence with easy-to-read charts. Users initiate the Web App by providing a study protocol and can interact with each step/component. Finally, users can export the list of selected clinical sites in the last step of the Web App for further analysis. 

This web app has three tabs: **Study Summary**, **Studies and Site**, and **Site Cards**.
1. Initiate each search by entering a study protocol in Tab 1, Study Summary.
2. Review and select similar studies and their associated clinical sites in Tab 2, Studies and Site.
3. Read individual site cards, curate, and export the final list of site candidates in Tab 3, Site Cards.
 

## Study Summary
Start a new query with an existing or a novel study protocol. Users must select a valid National Clinical Trial (NCT) Identification Number for an existing study. Users can also enter a self-defined study protocol for a novel study. The fields include study title, summary, cohort age, sex, inclusion and exclusion criteria, healthy volunteers, and Mesh conditions. 
![webapp-study-review.png](0SktmkeilJpc)


## Studies and Sites
For a given study protocol, the Web App queries the study similarity index prebuilt by the Dataiku application and returns the top 20 similar study protocols. Then, it identifies clinical sites recruited by these similar studies. It shows the results in two tabs: **Similar studies** and **Candidate sites**. 
The left panel of both tabs serves as a filter for users to drop the studies or sites. The filter for the **Similar studies** tab will regenerate the list of sites in the **Candidate sites** tab. Meanwhile, the filter for the **Candidate sites** tab will pass on to generate the site scorecards in Tab 3, Site Scorecards. 
![webapp-study-similarity.png](uMFAALbSfDhX)
![webapp-associated-sites.png](OHNkYwrJkYDT)

## Site Cards
The Site Cards provide visualized insights on individual clinical research sites, including geolocation, SDOH,  studies involved, and competing sponsors. The left panel logs the users' review history on the list of candidate sites and allows users to drop locations. Finally, the user can export the curated list of sites for further analysis. 
![webapp-sitecard-overview.png](3SPrpDAz2OEP)
![webapp-sitecard-stats.png](jUiLDLHnnvBp)
![webapp-sitecard-activities.png](kZ8GX2uHy1Lz)
![webapp-sitecard-hx.png](alAkclW3hoAJ)
![webapp-sitecard-sponsors.png](AvR2yG2RcqLb)

# Sponsor Insights Dashboard
The Sponsor Insights [Dashboard](article:12) overviews clinical trials and clinical research sites sponsored by a selected lead sponsor. The dashboard has two versions, depending on the Dataiku Application configuration: [Sponsor Insights with SDOH](dashboard: KjMTgA5) if the setting includes the SDOH dataset option and [Sponsor Insights](dashboard:ui8AOjN) otherwise.
 1. Studies Overview
 2. Clinical Sites Overview
 3. Census Social Factors and CDC Chronic Disease Prevalences at Site Locations
 

## Studies Overview
The first slide provides the most up-to-date information from the clinicaltrials.gov dataset. This Solution augments the intelligence with study enrollment rate prediction for ongoing studies. This section has two parts: study enrollment rate prediction and study characteristics.
![dashboard-study-enrollment.png](omPsQS8sd2nL)
![dashboard-study-overview.png](UtizL6vk3Gsa)


## Clinical Sites Overview
This slide summarizes broader study activity and history across sponsors sourced from clinicaltrials.gov at clinical sites used by the selected sponsor of interest.
![dashboard-sitemap.png](NsjHv2etK4R4)
![dashboard-sponsors.png](VmW3NKcvyNNh)


## Social Determinants of Health on Sites
The last slide reveals the locations of facilities (sites) used for studies by the selected sponsor with CENSUS county populations and Social Vulnerability information. It is only available if the SDOH dataset is included during the build-up in the Dataiku application. The current version is limited to the USA census data.
![dashboard-sdoh.png](7dYOqkP11g1Z)