Phase II – Hands-On Tracks
After Phase I, students split into two tracks: Data Science and Programming.
Data science track
The data science track focused on exploring a biological dataset (a variant file or vcf, in this case), and extracting useful insights from the data. For this phase, the notebook contains code cells with simulated vcf data and other analyses with the goal of encouraging participants to think about each step, each code, and what value it may provide for the analyses as a whole. The thought process behind this was that with LLMs like ChatGPT, anyone can write code to analyse any kind of biological data, but only a few would be able to truly understand and explain what was happening underneath. Feedback provided to participants especially emphasised looking at the raw data to guide any inference or conclusion, specifically for Question 11 where the answer depended entirely on understanding the content of the QUAL values in the data.
Programming track
The programming track focused on automation and efficiency. The video explains why programming is valuable for bioinformatics, emphasising the burden of high volume and high dimensional datasets. The task was to create a Python script that perform a list of analyses provided in the notebook. The main feedback provided were on appropriate or efficient variable naming and assignment to avoid mistakes and confusion when reaccessing them.