Welcome to the MARGO Cheminformatics Hackathon, an exciting competition dedicated to advancing the field of bioinformatics through machine learning tools!
Organizers
- MARGO - Main organizers
- Qubit pharmaceuticals - Hackathon topic designers & chem-informtics expertise
- IBM - Provision of computing resources via the Watson-x platform
About the challenge
In this hackathon, participants will focus on building a heart-toxicity predictive model, a crucial challenge in drug discovery and chemical safety assessment.
By leveraging machine learning techniques, you'll work with real-world datasets to create models that can predict how chemical compounds might interact with biological systems, potentially identifying toxic compounds before they reach clinical trials or industrial applications.
This hackathon provides a unique opportunity to apply your data science and machine learning skills to solve complex biological problems and contribute to public health and safety. We are calling on bioinformaticians, data scientists, and ML experts to collaborate, innovate, and push the boundaries of predictive modeling.
The goal is simple: given a set of molecules labeled as toxic (1) or non-toxic (0), participants are expected to tackle the 3 following tasks:
(Task 1) - Predict the toxicity of a uniformly sampled set of molecules, denoted as test set 1.
(Task 2) - Predict the toxicity of 6 series of molecules. In the drug discovery universe, a molecular series is a family of molecules that share a common global structure, only differing in fragments. These molecular series make up the test set 2.
(Task 3) - Among the predictions of test set 1, select the 200 molecules for which predictions are the most reliable.
These tasks are far from trivial, and there are many skills that can help you in your quest towards molecular toxicity estimation:
- Classic ML skills (python & associated libraries)
- Statistics
- Computational and organic chemistry
- Graph theory
The accuracy of predictions will not be the only criterion taken into account when evaluating results. We encourage you to explore creative and innovative solutions!
Evaluation metrics
The following metrics will be used to assess the quality of the predictions:
(Task 1) - Cohen kappa score on test set 1.
(Task 2) - Accuracy on each series of test set 2 (total of 6 accuracy metrics). These 6 scores wil be averaged to a single evaluation metric.
(Task 3) - Accuracy on the first 200 rows of submission file for test set 1.
Leaderboards
Each task leaderboards can be found here.
Submission impact the leaderboards only after judges have validated their content.
Requirements
What to Submit
Submit your source code in a compressed file format (zip).
Ensure that your code is well-documented, including clear instructions on how to run and test your model.
In addition to the code, you may include at the root of your zip file your predictions with the following format:
pred_1.csv
| smiles | class |
| <molecule-SMILES> | 1 |
| ... | ... |
| <molecule-SMILES> | 0 |
pred_2.csv
| smiles | class |
| <molecule-SMILES> | 0 |
| .. | ... |
| <molecule-SMILES> | 1 |
If provided, these predictions will be used to evaluate performance metrics on the hackathon tasks.
Prediction quality metrics
Each of the 3 tasks will be estimated using either the classification accuracy, either the cohen kappa score
(task 1) - Accuracy of the predictions in pred_1.csv.
(task 2) - Per-series accuracy of the predictions in pred_2.csv, averaged over all the series.
(task 3) - Accuracy of the first 200 molecules (rows) in pred_1.csv.
Judging Criteria
Your submission will be evaluated based on the following criteria:
- Model Performance on task 1 (20%)
- Model Performance on task 2 (20%)
- Model Performance on task 2 (20%)
- Innovation & Approach (30%)
- Reproducibility & Documentation (10%)
Prizes
1st Place
2nd Place
3rd Place
Devpost Achievements
Submitting to this hackathon could earn you:
Judges
Antoine Mazarguil
ML expert / MARGO
Judging Criteria
-
Task 1 performance
-
Task 2 performance
-
Task 3 performance
-
Innovation & Approach
- Does the approach seems relevant - Did the students neglect some important phenomenons ? (Biais, overfitting, ...) -
Reproducibility & Documentation
- Description of the preprocessing / method - Clear specification of the tools (pqckages, ML algortithms, ...) - Clarity of the code
Questions? Email the hackathon manager
Tell your friends
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.