# Molecular Properties

Quantitative representation of compounds is required to analyze and interpret chemical data for chemoinformatics and drug discovery. Various descriptors characterize compounds' chemical and physical nature that can be used for comparing, classifying, and predicting their behavior. For example, topological descriptors show the connectivity of atoms within a molecule, such as the number of bonds, rings, and branches. Geometrical descriptors capture the three-dimensional shape of a molecule, including properties like molecular volume, surface area, and the spatial arrangement of atoms.  In this work, we use Fingerprint descriptors (see [article](article:20)) representing the presence or absence of specific chemical features as binary or numerical codes and Molecular descriptors (see [article](article:19)) that capture physiochemical properties as continuous numerical values.  Both play a critical role in drug discovery, with molecule descriptors used for quantitative analysis and predictions and fingerprints used for structural comparisons and database searches.

![descriptors.png](oNYVVZzDkc1z)

 _Resource_ : Baptista D, Correia J, Pereira B, Rocha M. Evaluating molecular representations in machine learning models for drug response prediction and interpretability. Journal of Integrative Bioinformatics. 2022;19(3): 20220006. https://doi.org/10.1515/jib-2022-0006
 
 ## Molecular Measurements
 
### IC50 (Half-maximal Inhibitory Concentration)
This measurement provides valuable information about the interaction between drugs and their targets. **IC50** helps to understand a drug's effectiveness in inhibiting the target. Some basic properties are:
 - IC50 is a measure of a drug's potency and effectiveness in inhibiting the activity of a specific biological target, such as an enzyme or receptor.
 - It represents the concentration of a drug required to inhibit the target's activity by 50%.
 - A lower IC50 value indicates a more potent drug, as it achieves the desired inhibitory effect at a lower concentration.
 - IC50 values are typically determined through dose-response curves, where the drug is tested at various concentrations, and the inhibitory effect is measured.
 
### pIC50   - Target Feature for [Predict pIC50 (regression)](saved_model:zqa8kTkx)
**pIC50** is used to represent the negative logarithm (base 10) of the IC50 value of a drug or inhibitor. It is a way to express the potency of a compound in inhibiting a specific biological target in a more concise and standardized manner. The formula to calculate pIC50 is as follows:

```pIC50 = -log(IC50)```

A higher pIC50 value indicates a more potent drug, as it corresponds to a lower IC50 value.

### Molecular Toxicity Target Feature for 
Molecular toxicity refers to the harmful effects a chemical compound can have at the molecular level, such as disrupting proteins, damaging DNA, or causing cell death. The ClinTox dataset gathers toxicity data by identifying drugs that were FDA-approved (safe) from the SWEETLEAD database and those that failed clinical trials due to toxicity from ClinicalTrials.gov (AACT database). It provides binary toxicity labels to help machine learning models distinguish between safe and toxic compounds based on their molecular structure.
```toxicity_prediction (binary): Binary toxicity risk indicator (0 = Non-Toxic, 1 = Toxic)```


### Bioactivity Class
Molecular bioactivity encompasses the ability of a molecule to bind to a biological target. Understanding molecular bioactivity is crucial in drug discovery, as it is directly related to a compound's potential therapeutic effects or adverse effects. In this analysis, the bioactivity class is set by a user-defined threshold on the IC50, the potency of a compound in inhibiting a biological activity. 

- if ```IC50 <= bioactivity_class_active``` => bioactivity class = "active"
- else if ```IC50> bioactivity_class_inactive``` => bioactivity class = "inactive"
- else => bioactivity class = "intermediate"

### Lipinski's Rule of Five 
The Lipinski descriptors, is a widely used guideline in medicinal chemistry and drug discovery for evaluating the drug-likeness of potential compounds. Such drug-likeness is based on the Absorption, Distribution, Metabolism and Excetion (ADME) also known as the pharmacokinetic profile. Dr. Christopher Lipinski formulated a simple set of rules to assess the likelihood of a compound to become an orally active drug with favorable pharmacokinetic properties.

The Lipinski descriptors focus on four key physicochemical properties of a compound:
1. The molecular weight of the compound should ideally be below 500 daltons. This guideline helps ensure that the compound can efficiently penetrate cell membranes and be absorbed by the body.
  ```Molecular weight < 500 Dalton```
2.  The calculated partition coefficient (LogP) should be less than 5. LogP is a measure of the compound's hydrophobicity, influencing its solubility and permeability.
  ```Octanol-water partition coefficient (LogP) < 5```
3. The number of hydrogen bond donor groups (such as hydroxyl or amino groups) should be less than or equal to 5. A high number of donors can affect oral bioavailability.
  ```Hydrogen bond donors <= 5```
4. The number of hydrogen bond acceptor groups (such as oxygen or nitrogen atoms) should be less than or equal to 10. A high number of acceptors can affect oral bioavailability.
  ```Hydrogen bond acceptors <= 10```
  
 **Note**  that Lipinski’s Rule is not an absolute predictor of a compound’s success. However, adhering to these guidelines increases the likelihood that a compound will possess favorable pharmacokinetic properties and can be efficiently absorbed, distributed, metabolized, and excreted in the body. However, it’s important to note that the Lipinski descriptors are just one aspect of compound evaluation, and other factors such as target specificity, toxicity, and therapeutic activity also play crucial roles in drug development.