Data science: yes! Statistics: no?

TechNews Writer
Mon Apr 18, 2022

Data science is a hot topic trending everywhere. People become curious about how Artificial Intelligence(AI), machine learning, and computer vision work. Building applications on these technologies seems fascinating, and in many cases people start stepping into it without knowing the depth involved. No, I am not infiltrating fear, but yes it is no cakewalk. As marine scientists still keep discovering new findings, so is the depth of data science. I want to highlight certain things which I feel beginners need to understand.

There are four main components on a superficial level: collecting data, analyzing the data, modeling, and predicting. Amateur individuals pick structured data such as in the form of excel and text and proceed further. It is not the conventional case always. Data can be very unstructured, messy, and inconsistent. The analysis includes exploratory data analysis, data patterns, insights, and visual graphical representation. Based on this output, data modeling comes into the picture. Python has many large machine learning libraries, such as sci-kit learn, scipy, tensor flow, etc. Internally all these model libraries are built on "stats." For example, logistic regression uses a sigmoid function to calculate the values. It takes us to the basics of statistics involving exponential components; the concept behind linear regression takes us back to the school where we solved an equation of a line. If we have libraries to do and the data science pipeline is being automated, why do we need data scientists?

A data scientist should decide the selection of a machine learning algorithm. Selection is based on observing the Exploratory data analysis(EDA), tuning the hyper parameters to increase accuracy, optimizing the loss function, data transformation, and derivation of new attributes. Sound domain knowledge is an essential application for a data scientist or an analyst. It requires a strong foundation in statistics. Many job applicants observe a Ph.D. qualification as a requirement for a data scientist or at least hands-on experience in the field. It may be why few find it tough to get a data scientist job as a complete fresher. Data science is continuously expanding. Amazon's Alexa and Apple's Siri are well-known voice assistants utilizing natural language processing technology, yet I feel it requires improvement.

According to the Hardware Business Review, data scientist is the sexiest job of the 21st century. Many phrases such as analyst, business analyst, and data analyst are used interchangeably in the market; in the end, it is based on the fundamentals of data. Raw potatoes are awful when we eat, but it becomes yummy when fried and added with spices. Similarly, a data scientist must tell a story out of dirty data through their skillset by applying statistical techniques to make a better decision. Data is the oil that drives the ship. AI will conquer the future.



Appears in
2022 - Spring - Issue 11