In writing the first post of this blog, I couldn’t do it differently if not through one of the articles from one of my personal Statistics heroes: Leo Breiman. The article in which I refer is one that is considered by many to be a watershed within the statistic itself:Statistical Modeling - The Two Cultures (2001). Breiman’s article brings several discussions about statistical modeling, among them I would like to highlight two aspects
- Data Modeling Culture vs. Algorithmic Modeling Culture
- The use of data to solve problems
Data Modeling Culture vs. Algorithmic Modeling Culture:
The article begins by commenting on the nature of the data, and how its generation process can be thought from input variables x, output variables y, and what would make the association between these two would be considered to be The Real Black Box: The Nature. Thus, when working with data, two goals are sought: to predict the future and extract information that may explain how nature generates this data.
To achieve these goals, statistical modeling comes into play, and two modeling cultures are possible. Breiman defines the first one as a Data Modeling Culture, where it begins by assuming a stochastic model that would be within the black box (nature), for example: linear regression, logistic regression, cox models. However the second one, Algorithm Modeling Culture, considers the “inside of a black box complex and unknow”, so we need to estimate a function - as a function that predicts from eg: decision trees, neural nets
Despite this discussion comparing the two types of modeling culture that follows throughout the article, one of the points I would like to highlight is precisely the beauty of modeling, which aims to find the best possible estimator (ie: brings more accurate information ) and the challenge is to try to predict and draw as much information as possible from the true black box that is nature. Although recurrently, several models such as Deep Neural Networks are actually called “black box”, none of them can be considered as one, since although uninterpretable we know how they are calculated, and these in themselves are just a model of what would be the nature, truly unknown.

Algorithmic Modelling Culture and the true black box relationship
Use of data to solve problems:
Repeatedly Breiman repeats the idea during his article “If our as a field (Statistics) is to use data to solve problems (…)”. While it seems trivial for the vast majority of data scientists to emphasize using data and modeling tools to solve and understand real problems, it has become pretty much a “golden rule” for me. During academic life it is quite common, at certain times, to overload yourself with theoretical issues creating a disconnect with real data. Therefore, reading about his projects, and the ability to use statistical modeling to transform and add value in society. As Leo himself would say, “The roots of statistics, as in science, lie in working with data and checking theory against data.”

Breiman paying a tribute to another great statistician - Karl Pearson
Therefore, supported by all these ideas, I started to be really excited about all the possibilities that the data science and statistical modeling can provide, and had the idea to started this blog to share, learn and invite everyone who wanna join me to put the hands on data, learn together and try to unreveal the Breiman’s true black box thorough the data!