Dataset download
The hackathon can be downloaded here (available on Hackathon startup):
https://docs.google.com/presentation/d/1ohGeGRV7fG8tISoTiU97VWO4n8-ha1Bl28THnV4tNPE/edit?usp=sharing
IBM Watson related resources
In order to access the computation resources provided by IBM Watson AI,
- Open the folloginw link: https://skills.yourlearning.ibm.com/activity/URL-F86CD1F49D47?ngo-id=0302
- Select "Mark complete"
- Register on IBM SkillsBuild via tour linkedin
Dataset description
- 3 datasets are provided in the csv (comma-separated) format
- train.csv (9415 rows)
- test_1.csv (750 rows)
- test_2.csv (478 rows)
- Each row of a dataset corresponds to a molecule
- Each csv file comports the following columns
- smiles : Chemical formula of the molecule in the SMILES format.
- 199 molecular features computed with the rdkit package (from column BalabanJ to qed). These features were computed with the rdkit package.
- ecfc_0000 to ecfc_2047 (2048 features) : bit vector representation of Morgan fingerprints
- fcfc_0000 to fcfc_2047 (2048 features) : bit vector representation of pharmacophore feature-based Morgan fingerprints
- class (train.csv only) : The label to predict (1 for hERG inhibitor, 0 otherwise)
Likely, optimal predictors will not use the complete set of 4295 features provided in the datasets.
Chem informatics resources
- rdkit is the most used package for processing molecules and computing molecular properties (e.g. molecular weight, charge, ...).
- Molecular fingerprints are a commonly used features in the litterature on molecular predictions. They are the result of a local kernel application at multiple posiitons of the molecule, aggregated in a fixed length vector.
- Pat Walters tutorial on cheminformatics present a wide variaty of ML baseline predictors using jointly rdkit, scikit-learn and other ML packages.
Statistics / ML resources
Basically any model or library you can find.
If you use any external model, you need to mention it in your submission file.
Hackathon presentation (Margo + Qubit Pharmaceutical):
https://docs.google.com/presentation/d/1ohGeGRV7fG8tISoTiU97VWO4n8-ha1Bl28THnV4tNPE/edit?usp=sharing