Deep Feature Synthesis: Towards Automating Data Science Endeavors

In this paper, we develop the Data Science Machine, which is able to derive predictive models from raw data automatically.

To achieve this automation, we first propose and develop the Deep Feature Synthesis algorithm for automatically generating features for relational datasets.

The algorithm follows relationships in the data to a base field, and then sequentially applies mathematical functions along that path to create the final feature.

Second, we implement a generalizable machine learning pipeline and tune it using a novel Gaussian Copula process based approach.

We entered the Data Science Machine in 3 data science competitions that featured 906 other data science teams.

Our approach beats 615 teams in these data science competitions.

In 2 of the 3 competitions we beat a majority of competitors, and in the third, we achieved 94% of the best competitor’s score.

In the best case, with an ongoing competition, we beat 85.6% of the teams and achieved 95.7% of the top submissions score.

Introduction
Data science consists of deriving insights, knowledge, and predictive models from data. This endeavor includes cleaning and curating at one end and dissemination of results at the other, and data collection and assimilation may also be involved.

After the successful development and proliferation of systems and software that are able to efficiently store, retrieve, and process data, attention has now shifted to analytics, both predictive and correlative.

Our goal is to make these endeavors more efficient, enjoyable, and successful.


Log in to download this paper.
Remember me.
Forgot your password? · Not a member? Register today!

What’s Related

News
The 50 Smartest Companies in the World
MIT Technology Review has released its 2017 list of the 50 Smartest Companies, and over the past 12 months, the honorees have taken risks, demonstrated superior technological leade...
Innovators Shaking-Up the Status Quo
Microsoft and MIT Turn Skin Tattoos into Touchpads
Fashion Company Desigual’s Multi-Stranded Approach to Omni-Channel
Elon Musk Launches $1Billion Fund to Save the World from Artificial Intelligence
More News
Resources
Making the Right Risk Decisions to Strengthen Operations Performance
In the past twelve months, more than 60% of the companies surveyed said that their performance indicators had dropped by 3% or more as a result of supply chain disruptions.
DuoSkin: Rapidly Prototyping On-Skin User Interfaces Using Skin-Friendly Materials
In this paper, we aim to make durable and skin-friendly on-skin user interfaces available to the wider community, using commodity materials, electronic components, and fabrication ...
Deep Feature Synthesis: Towards Automating Data Science Endeavors
In this paper, we develop the Data Science Machine, which is able to derive predictive models from raw data automatically.
More Resources