The  Molecular Property Prediction Dataiku Application helps to configure the project [key parameters](article:17) so that the elements present in the flow zones can be built.  It also enables multiple users to work on individual instances of the solution without directly modifying the original project. 

The first part of the application validates that the protein accession code exists in the chosen public database. The second part consists of an interface for the user to enter the project variable selection for the required analysis. The next step is to replace the [test_data](dataset:test_dataset) and use scenario [BUILD ALL](scenario:BUILDALL) to build the flow and update the dashboard. Below there are explanations of the different parameters that need to be set by the user. 
 
### Database Selection
Select a **public chemical database** to connect automatically through API and specify the **target protein accession** code. The first scenario validates the presence of the accession code in the selected database. 

### Data Preparation
 Specify the parameters required for data preparation.
**Molecular bioactivity**: encompasses the ability of a molecule to bind to a biological target based on the standard value **IC50**. 
1. A molecule is considered *Active* if the ```IC50 value < Threshold for Active label```
2. A molecule is considered *Inactive* if the ```IC50 value > Threshold for Inactive label```
3. Otherwise intermediation

### Machine Learning 

**Descriptors** characterize the chemical and physical properties of molecules. These descriptors comprise the input features to the regression model for predicting the molecular bioactivity value ```pIC50```.

The solution automatically computes two sets:
1. **Molecular Descriptors**: capture physiochemical properties as continuous numerical values. Examples include molecular weight and number of atoms.
2. **Fingerprint Descriptors**: represent the presence or absence of specific chemical structure features as binary or numerical codes generated from the canonical smile notations.

Note: By default, the Machine Learning model uses the fingerprint descriptors only as input features. The user has the option below to include both.

### Molecular Similarity 

The project performs an analysis to quantify the degree of structural resemblance or likeness between novel scored molecule and studied molecules used for training.  The final field of the Dataiku Applications allows you to specify a novel molecule ID to initiate the analysis. You can dynamically interact with all the novel molecules within the Dashboard results.

 ![dku app.png](uPuSp6FhJpwY)




