## Public Databases
[ **ChEMBL** ](https://www.ebi.ac.uk/chembl/) is a large, open-access drug discovery database that provides information about bioactive molecules and their interactions with biological targets. It aims to capture Medicinal Chemistry data and knowledge across the pharmaceutical research and development process. It is a resource used in drug discovery and pharmacological research. The name "CHEMBL" stands for "Chemical Biology Database." The database contains detailed information on various bioactive compounds, including small molecules and drugs, along with their binding properties to specific biological targets such as proteins, enzymes, and receptors. These interactions play a crucial role in understanding the mechanism of action of drugs and identifying potential new drug candidates. The resulting database has a wide variety of practical applications including the identification of chemical tools for a target of interest, assessment of compound selectivity, training machine learning models (e.g. for target prediction), assisting in generating drug repurposing hypotheses, assessing target tractability and integration into other drug discovery resource.  Data can be pulled via REST API with their [Web Services](https://www.ebi.ac.uk/chembl/api/data/docs)

[**PubChem**](https://pubchem.ncbi.nlm.nih.gov/) is a freely accessible database maintained by the National Center for Biotechnology Information (NCBI), a part of the National Institutes of Health (NIH) in the United States. It serves as a comprehensive resource for information on the biological activities of small molecules. PubChem collects information on chemical structures, identifiers, chemical and physical properties, biological activities, patents, literature citations, and more for millions of chemical compounds.

## Python Libraries
[**Molfeat**](https://molfeat.datamol.io/) is an open-source Python library that provides a unified interface for various molecular featurization methods. It aims to simplify feature extraction from molecules for machine learning applications, particularly in the field of drug discovery and cheminformatics.
[**RDkit**](https://www.rdkit.org/), short for "Rapidly Developing Kit for Open Cheminformatics and Molecular Modeling", is a powerful and versatile open-source Python library for cheminformatics and computational chemistry. It offers a comprehensive set of functionalities for various molecule-related tasks, including molecular representation, fingerprinting, substructure searching, and visualizations. 

## Notation System
[ **SMILES (Simplified Molecular Input Line Entry System)**](https://archive.epa.gov/med/med_archive_03/web/html/smiles.html)  is a notation system used to represent the structure of chemical molecules in a simple and concise way. It is a widely used method in chemistry and drug discovery to encode and communicate molecular structures using ASCII characters. In SMILES notation, each chemical element is represented by its atomic symbol (e.g., C for carbon, H for hydrogen, O for oxygen) and its connectivity is shown through the use of bond symbols. Single bonds are typically represented by a hyphen (-), double bonds by an equals sign (=), and triple bonds by the hash symbol (#). Parentheses and square brackets are used to represent groups of atoms or branching in the molecule.

![smiles.png](7N1PGJRYavm6)

Rersource: [ResearchGate](https://www.researchgate.net/publication/339349026_Multiobjective_de_novo_drug_design_with_recurrent_neural_networks_and_nondominated_sorting)



