If you are a biologist and working with Next Gen Sequencing data/Bioinformatics, chances are that you are learning some scripting language and trying hard to make it work. All you typically want to do is come up with the best pipeline that works for you and move on to bench for doing interesting experiments. Just like a well written lab notebook to reproduce your protocol anytime later is highly important, well written program that is easy to understand and reproduce the analysis results anytime by anyone is critical.
Learning to program and work with Next-gen sequencing data simultaneously is immensely challenging, frustrating, and fun at the same. Don’t worry you are not alone, 90% of scientists learn to write software on their own (reference below). That’s why it is best to know what are best practices in programming and stick to it very early.
Here are ten tips to become a better programmer for doing Next Gen Sequencing analysis or “Best Practices for Scienti?c Computing”.
1. Write programs for people, not computers.
2. Automate repetitive tasks.
3. Use the computer to record history
4. Make incremental changes.
5. Use version control.
6. Don’t repeat yourself (or others)
7. Plan for mistakes
8. Optimize software only after it works correctly
9. Document the design and purpose of code rather than its mechanics
10. Conduct code reviews
And get active in forums like stackoverflow for your favorite language/domain. Seeing multiple people’s codes and multiple solutions for the same problem help you a great deal in becoming a good programmer.
Are you saying, wait a minute, I have seen these 10 tips somewhere? You are right. The above 10 tips are “copied” from “Best Practices for Scientific Computing” by D.A. Aruliah et. al.’s manuscript published on arXiv. In fact, the ten tips are 10 section titles of the manuscript. It is a must read for anybody who is interested in writing scientific program, not just Next Gen Sequence analysis/Bioinformatics.