The following resources were used during my self-learning process in data science and contain a mix of DIYs and theoretical exercises. I don’t think there is a unique roadmap, but these learning materials can help you along the way:
- Statistics (Theoretical)
- Background in advanced maths as well as statistics is a must for having a critical-thinking mindset in data science, rather than being an advanced user that massively applies models to data. There are plenty of resources available, where statistical distributions, probability and statistical test are covered.
- A fantastic online book, R for Data Science, written by an statistics professor, which mix R programming with statistics all over the usual data-science process: importing, tidying, transforming, visualizing, modelling data and communicating insights.
- Johns Hopkins University offers a very comprehensive Data Science program through Coursera . This basic course will equip you with a good foundation in the field, good understanding of R platform and the possibility of developing your first project in a guided manner. You can start with this one.
- Data Science (Theoretical + Practical)
- Vijay Kotu, VP Analitycs at Service Now, has created a few slides that will help you navigate in the complex map of data science.
- Rapidminer, a software vendor, offers some free training, AI basics from Rapidminer.
- R plaform is your toolbox, and runs on Windows, MacOS and UNIX.
- Statistical Thinking is a good way to learn how to use data to solve daily problems and making better decisions in business. Material will be very familiar to Six Sigma practitioners. SAS used to offer these courses in free-of-charge
- There nice courses on Datacamp too, though they can be quite frustrating because they’re very step-by-step, and soemtimes you may feel that the trees don’t allow you to see the forest