Amid the current hype around AI and ML, a particular part of the workforce is getting a lot of attention, Data Scientists. They are the new blue-collar executives. You’ve already taken the first step by getting here, let’s discuss a how you can learn data science with no coding background!
Data Scientists need to be a complete package, a software and algorithm programmer, an analyst, a database manager, Machine Learning expert, statistical and operational mathematician, NLP expert and cryptographer: all rolled into one.
Now it might seem a lot of technical words but wait till we take you through the following and you’ll know where to start:
- Say ‘no’ to shortcuts-Work from the ground up!
At first to learn Data Science, start with the basics of statistics and mathematics required for data science. Develop an understanding of basic machine learning algorithms and try solving a real-life situation using it. Another essence here is to stay away from ‘phony’ courses, those promising to make you a data-scientist in, say, 21-days! There can never be a replacement, especially for a technically sophisticated job, of the hard work.
- Stay away from ‘phony’ courses. Enroll only with the professionals and institutes possessing high credibility and success rate.
- Take your time around it and be consistent.
- The basic idea here is to get a command of the fundamentals, step by step.
You might want to get started with programming straight away or take up a hands-on data modeling project, thinking you’ll learn your way as you do it. Trust us!
- Level up your programming skills!
This is one crucial skill you need to have to be a successful professional in this field. Programming languages like C, C++, R/Python, Java are something of a routine in this field. You can follow these pointers to boost your programming skills.
- Start with the basics of C as this is the base language using which many other programming languages are built.
- Understand the concept and up your programming game, try developing example algorithms into working programs.
- Websites like TopCoder, CoderByte, Project Euler hold programming contests. You might want to try any of these to enrich your programming capability.
- Proceed to R/Python once you’re through with the basics.
Following the above guide-points will take you a step closer to your dream profession.
- Start exploring and loving data
As a Data Scientist, you’re going to deal with data day and night. Statistics, mathematical crunching, organizing and segmenting data would be a routine part of your job. Data modeling can be taxing and quite complicated to the beginner. So, it is advised that you get involved with data, statistics, and mathematics as soon as possible.
Once you are comfortable with numbers and loads of data crunching, deriving relations between seemingly unrelated data, seeing the big picture, telling a story through numbers, this is going to be a fruitful job for you.
- Give yourself homework-Learn data science by doing
Let us say, at this point, you will be training yourself from scratch for the technical soundness required by a data-science professional. When you feel comfortable with data modeling and the programming languages, start taking up projects to work on your own.
- Choose a field you want to work in like Healthcare, sports, crime, social justice, etc. and take a relevant dataset about the field from the internet. You’ll find plenty of datasets at websites like KDnuggets.com.
- Take a dataset, process it, crunch it. Play with your data however you like. Again, be consistent and try thinking beyond the obvious.
- You can use tools like Microsoft Azure, Google Cloud, etc. for creating meaningful data-science projects. Gradually, you will build up a portfolio of such projects to add to your credibility.
Again, follow your schedule religiously. Be as consistent as you can and keep practicing.
- Sharpen your analytical skills-Observe, analyze, spot and discover insights
While practicing what you’ve learned from your books and other sources on datasets, think unconventional and try to develop insights about the data. Start questioning the random world around you, why? What? And how? Will help. There are gaps, unmet needs and demands in all the aspects of human life. Hey! That is why you are going to help them as a data-scientist. Keep these tips in mind.
- Focus on the story the data is telling, what is it conveying and if that leads to another prediction.
- Segment the data on as many as possible bases. Subject it to algorithms that you understand and try it with new algorithms.
- Be very consistent and pay attention to the outputs. Prepare summative reports. Also, take part in online data mining operations like the ones held at KDnuggets.com.
- Critically examine your performance and resolve issues if any. Keep a record of these outputs and add them to your portfolio.
- Get in touch with the professionals
Once you have gathered a bunch of your outputs, get in touch with someone who is already a data scientist. We recommend that you start building a strong professional network in this field. Choose someone who has experience of two years or more as a data scientist and has been in the field for a long time. You can ask him/her to examine your work and how it reflects your capabilities. This way :
- You’ll get a professional opinion on how you are doing. Knowing where you stand regarding expertise, you can better yourself rather easily.
- Make it a point to discuss the current trends, salary criteria, expertise expectations, real-life impacts of the job on personal lifestyle, etc. You’ll have better insights about what you’re signing up for.
- If they like your work, ask them for a recommendation or referral. Always carry your portfolio if they have you meet someone.
- Start with an entry-level job!
Do not wait till you become the solitary authority on Data Sciences. Remember we talked about starting from the ground-up? Apply for an entry-level job once you’ve got an idea of basics of data sciences.
Send your resumes to HR departments of companies that deal in ML or AI. Marketing giants also employ data scientists to crunch data for them and device relevant predictions.
You can ask your professional friends about the pay packages and other standard practices before appearing for the interview. Keep their suggestions and advice in your mind and proceed accordingly.
- Keep up with the industry, discuss and indulge
This point makes sense as you are switching to a whole new field. You should join online forums that are dedicated to Data Science, read blogs and articles regarding the latest developments, keep your journal throughout your learning process. Each learning is a brick to the fort you are building. And the fort here is the data-science.
- When learning advanced concepts like cognitive learning, deep learning, neural networks, keep pen and paper nearby and make notes if you have to.
- Applications like EverNote also come in handy for taking notes.
- If you are taking a course from somewhere, make brief notes in class. Focus more on understanding than writing.
- Take part in various mining competitions, coding competitions, etc. as often as you can. Competing refines your skills and promotes strategic thinking.
- Read trusted journals like IEEE, Springer, Elsevier for latest developments in data sciences, AI and ML. These will help you align with the industry events and standards.
- Tools to help (on the way to expertise)
In case you need more time to get your grip on programming languages, following tools will help you process your data until you are good at it. However, it is preferable that you learn the requisite technologies.
- RapidMiner: RapidMiner or RM covers all the activities of prediction modeling that is data preparation, model building, validation and finally validation and deployment. It has predefined code blocks which you can join in multiple ways to run various algorithms without a single line of code. The current package includes:
- RapidMiner Studio: A stand-alone software popular to be used for preprocessing data, statistical modeling and visualization.
- RapidMiner Server: An enterprise-grade environment with central repositories which allows for smooth teamwork, project management combined with deploying models.
- RapidMiner Radoop: This tool implements big-data analytics capabilities centered around Hadoop.
- RapidMiner Cloud: A cloud-based platform that allows for easy sharing of information among various devices.
- DataRobot: DR automates the statistical processing and programming portion of the data scientist’s job, i.e. the scientists needs to apply business knowledge only. It provides following features
- Parallel Processing: DR divides the computation among its numerous multi-core processors and uses distributed algorithm to scale larger data sets.
- Model Optimization: DR automatically identifies the best pre-processing and feature engineering for each modeling technique via employing text-mining, imputation, scaling, variable type detection etc.
- You can deploy your program or algorithm without writing any code.
- It also provides Python SDK and APIs for quick integration of models into tools and software.
- BigML: A versatile platform for solving and automating the Classification, Regression, Time Series Forecasting, Cluster Analysis, Anomaly Detection, Association Discovery, and Topic Modeling tasks. This platform provides following modules:
- Sources: to introduce various sources of information
- Datasets: from the defined sources create a dataset
- Models: helps to make predictive models
- Predictions: to generate predictions based on the model
- Ensembles: to form group of various models
- Evaluation: to verify model against validation sets
- Books to rely on: Books will be your best friends in this scenario, literally! Switch on your ‘student-mode’ and get started with some good books and tutorials relating to this field. Here are some recommendations;
- For C/C++
- Head first C (Beginner):
written for a complete novice, Headfirst C takes you from the very basics to the intricacies of C in a rather fun way.
Pros: Easy to understand language, Real life application examples, and conceptualization, focus on the complete program instead of parts since the beginning itself.
Cons: This book uses command line tools like GCC extensively which might be a little intimidating for a beginner.
Verdict: We would recommend this book for the absolute beginner and those who would like a peek at data structure and basic concepts of computer programming.
- Data Structures Using C and C++ (Intermediate)
By Yedidyah Langsam, Moshe J. Augenstein & Aaron M. Tenenbaum
Pros: This book explains all the concepts related to data structure well. It also has a good number of example programs which even as lengthy, are well structured and easy to understand. This could be considered the ideal book for learning advanced data structures and their importance in algorithms.
Cons: It assumes that the reader has a basic understanding of how the language C works. It only deals with data structure programs. Hence it is not a very novice-friendly book.
Verdict: This book is an essential resource for understanding the importance of data structure and the concept itself. However, it is advisable to get a grip on C/C++ before you start studying this book.
- For R
- Hands-on Programming With R- By Garrett Grolemund (Beginners)
Pros: This book explains the concepts of R in excellent detail and simple language. It also has a good number of examples and projects that you can do yourself. It is a novice centric book and works its way from the ground up. Some people might feel that with R packages, they don’t have to write loops and functions (which is a gross misunderstanding), this book emphasizes that you do write them.
Cons: None that we could find.
Verdict: This is the go-to book if you are interested in learning the concepts and coding of R.
- R cookbook- By Teetor Paul
Pros: Another one written for novices to learn data science, this book unravels concepts like data pre-processing and manipulation, probability, time-series analysis, statistics, and their practical usage in R.
Cons: It doesn’t focus on the theoretical explanation of concepts but their practical implementation. Its focus is more on ‘how’ to do something than ‘what’ to do.
Verdict: For someone familiar with the niches of R, this book is easy to understand. Also, it focuses on practical implementation more hence helps extensively in situations when one is aware of what to do but not how to.
- R Graphics Cookbook- By Winston Chang
Pros: Written more like a recipe book, this one offers you nothing but how to process data and convert it into exciting graphics beyond simple solid tables, customize graphics to display specific data and much more. Knowing that making data interesting and understandable is an integral part of a data scientist’s job, this books seems to be tailor-made.
Cons: Doesn’t focus on the theory of graphics in R.
Verdict: A must-read go-to manual for data scientists.
- Practical Data Science with R- By Nina Zumel & John Mount
Pros: This book discusses real-world problems and attempts to model them using R. This direct approach is a boon for learners who are trying to employ R on real problems. The book is replete with examples, and the focus is consistently on real-world problems, their modeling, and model deployment using R.
Cons: None that we could find.
Verdict: This is a good book for enhancing your R skills in modeling and deploying these models of real-world issues.
- For Python
- Learn Python the hard way By Zed Shaw
Pros: Quite contrary to the title, this book is an easy to understand guide to Python. Python is highly adaptable and thus has multiple facets to itself. This book takes you through all these concepts and enriches your knowledge in a way as simple as possible.
Cons: This book assumes you to have an understanding of some Object Oriented languages like C++.
Verdict: Against the statistics visualization oriented R, Python is easy to understand. The above-mentioned book’s approach is quite simple and bottom-up. This is a good book for anyone who wants to understand Python although some prior understanding of OO languages will be appreciated.
- Mastering Python for Data Science By Samir Madhavan
Pros: This book first elaborates the Python libraries Numpy and Panda and how to import data from various structures into these structures. Then follows it up with performing linear equations using Python and making statements using inferential statistics. The book also covers advanced concepts like building a recommendation engine, high-end visualization using Python, ensemble modeling, etc.
Cons: It requires the reader to have an intermediate understanding of Python.
Verdict: After you have familiarized yourself with Python, this is the book to go to.
- Python for Data Analysis By W Mckinney
Pros: Written by the main author of Pandas API library, this book is quite comprehensive and covers all aspects of data processing and analyzing in Python. The approach is simple and all-inclusive. Also, it has a good collection of examples and cases.
Cons: Requires grasp of Python basics
Verdict: As one would expect, to perform something productive with a language, you’ll need to know the semantics and the syntax. Similarly, to perform data analysis using Python, it is quite obvious that you’ll be expected to know the programming language. In all, this is an excellent book to help you build your command over data analysis using Python.
- Introduction to Machine Learning with Python By Andreas Muller and Sarah Guido
Pros: This is a novice centric book in the domain of ML. It helps you build ML models in Python scikit-learn from scratch. It also covers advanced methods for model evaluation and parameter tuning, methods for working with text-data, text-specific processing techniques and the whole shebang.
Cons: None so far.
Verdict: A Must-read book for all beginners in ML and ML using Python.
- For Neural Networks and other advanced concepts
- Deep Learning Book - by Ian Goodfellow, Yoshua Bengio and Aaron Courville
Pros: this book works its way up from the basics of statistics and ML and puts it forth with its relation to the Deep Learning algorithms. It also discusses the latest developments in deep learning. The book is replete with models and graphs making learning easier. The online HTML based version of the book is free.
Cons: The major focus of this book is deep learning and metrics associated with it. Also, the online version is not easily downloadable or printable.
Verdict: After one is comfortable with the basics of ML, R, and Python, Deep learning is the next step. Trying to skip the intermediaries will only create obstacles in understanding deep learning techniques and models. This book fulfills its motive well and is recommended by us.
- Statistics and Mathematics
- Introduction to Statistical Learning- By Trevor Hastie and Robert Tibshirani. (Beginner)
This book uses R as the medium of modeling and works its way up from the basics to more advanced concepts. It provides the student with good examples, extensive datasets and data models for reference.
Pros: Extensive explanation and elaborate examples for most concepts, clear and crisp theorization, well-explained formulae and almost always accompanied by related graphs.
Cons: For concepts like clustering, support vector machines, etc. richer explanations could have been provided, but these are minimal issues as one can always find more examples online.
Verdict: To get familiar with how statistics work in ML and its processes, this book presents a lot of help. It also analyzes the performance of algorithms with different sets of data which gives you valuable insights into the efficiency of the algorithm and the situations in which it should be ideal. We therefore recommend this book!
- Elements of Statistical Learning- By Trevor Hastie, Robert Tibshirani and Jerome Friedman (Intermediate and advanced)
This book serves as the next volume of the book mentioned first in this section. It discusses advanced concepts like data mining, prediction, and inference about ML.
Pros: This books takes a comprehensive approach towards related concepts like regression techniques, support vector machines with flexible discriminants, ensemble learning, undirected graph models, etc. and uses examples freely. Exercises provided at the end of each concept have some remarkable problems and inspire strategic and out-of-the-box thinking.
Cons: It requires the student to have a good hold of statistics and ML basics, preferably with R/Python. (as the book focuses on building predictive models and draw inferences from them)
Verdict: From the point of view that it is a book for intermediate to advanced levels of learning, the language it uses is pretty simple and understandable. It is, however, advisable to read the first volume of this book (mentioned above) before beginning with this one.
Apart from these books, one could refer to line sources like web tutorials, blogs, and journals for new milestones and developments in the Data Sciences. These will also you to learn data science.
Data Science is challenging. Especially when you are switching to this domain without any coding/programming experience. But most data scientists will tell you that it’s job worth all the effort. So, buckle up and embark upon your journey!