Link
Ask a good question.
Answer the question while economizing on resources.
Communicate your results.
(Sometimes) Make recommendations to engineers or managers.
Asking a good question is probably the hardest thing to get right. If
you neglect this step, you'll spend days of your life working on
something that will have little impact. It's a skill that people who
focus on technical training tend to be bad at [..].
The real art to asking good questions is to
consider your audience.
Who is going to be interested in the results and why are they going to
care? I find that the best questions have punchy answers, are usually
interesting to everyone, and usually affect a potential decision. On
the last point, the key is to think about how someone within your
organization might change their strategy due to your answer.
Effectively answering questions is where technical skills become
important. It's easy to get caught up in fancy algorithms and methods,
but those approaches are usually
premature optimizations.
The best answers are 1) cheap and 2) easy to explain. Give me a table
of counts or event rates over regression coefficients or the first
eigenvector of your matrix decomposition. Perhaps it's a bit modest,
but I often describe data science as "advanced applied counting." [..]
Fancy, new, and complicated are usually bad qualities for a method. Take it from
Jay Kreps, "read current 3-5 pubs and note the stupid simple thing they all claim to beat, implement that."
The other pattern I notice here is the unreasonable effectiveness of
Polya's advice
for solving a math problem, particularly this aphorism: "If you can't
solve a problem, then there is an easier problem you can solve: find
it." Paraphrased for data scientists, if there is a question you can't
answer, there is an easier question you can answer (usually counting
something!).
I firmly believe that data scientists should not be engineers or
managers. Engineers build things, managers make decisions, data
scientists answer questions. This is not to trivialize the role of data
scientists, who plausibly account 2/3 of the steps in the
build-measure-learn loop. The answers can (and should) inform decisions that managers make and help engineers build better products, but answers
always lead to more (and better!) questions.
Don't let the data science technical jargon drive your impression of
what is actually done in the field. In my experience, it's a research
job where you have autonomy to ask and answer some really interesting
questions. The fundamental challenge is being savvy enough to pick good
questions and find concise answers using minimal resources. Then you
must convince everyone to listen to you about what you found. In many
ways it's similar to academic research, but the differences are that the
cycle is tighter and your answers will often effect changes in the
business.