With new career fields emerging in the world of Big Data, founding and efficiently scaling your start-up means finding and keeping team members who can handle the vast amounts of data your concepts needs to work. Lee Ngo hosted a panel at Seattle Startup Week with expert panel members to answer questions from founders and potential students.
Galvanize in Downtown Seattle hosted the event. It's a perfect place for a topic like Big Data as Galvanize provides a series of compressed intense courses designed for students and professionals seeking to expand their knowledge and ability in the start-up community.
Panel members included:
Amanda Casari of Concur.
Diego Oppenheimer founder of Algorithmia
Shawn Scully of Dato
Ruth Stern of Sternshus Cooperative
Each panelist had a solid background in math before entering the data science field. When asked about the evolution of the field, Diego said “You can point to a date in 2011 when the term became a popular search word on Google.” That's when the field was born. Those who now call themselves data scientists were previously known as statisticians or researchers. Many disciplines are useful when considering the field as a career. Finance, economics, and computer science were all fields that Ben said make great data scientists because of their focus on data analysis in forming conclusions. The one core competency necessary for the industry is the ability to code on some level.
The reason for the quick growth of the field and the sharp demand was made clear by Shawn. He said, “What has really changed in the last few years is the computing power now available and cloud access. Now we can massage the data and create new ways to solve problems.” Ben cautioned the audience when looking for someone to analyze the massive amounts of data recently created.
“Data Science is so important because we are drowning in data across many sectors, ” said Shawn. “Now we write software to handle to solve business problems. The movement from academia to industry has led to some people not using scientific method correctly. Data science skills are still vague and somewhat undefined. Startups sometimes don't know what they are buying into.”
At its core, this is a science and each panelist was clear in their agreement with Ben. The need for rigor varies from sector to sector. Shawn had the best point on this topic. He stated data used for risk analysis in finance must be studied and vetted much more closely than email open rates for a marketing campaign.
When business seeks to use their data, it's vital to ask the right questions. That's the most important part for a true scientist. Correct questions relevant to the available data are also vital. Data scientists are not magicians. If they can't answer the necessary questions from available data then companies are wasting money. Amanda and Diego both said the same when asked. Each stressed the necessity of proper framework and correct company culture before even seeking data analysis.
Ben was very clear as well. He said politics play a major role in the perceived competency of data scientists. If the politics of the company aren't ready for the answers that will come then the company will be wasting more money. Diego took this idea a step further. He said those companies who want a data scientist need to ask themselves one question: Do you want to generate revenue or create cost?
In the end, the entire panel came to a consensus about the nature of data in the business world. It's all truly subjective based upon the background of the person looking at the data. Several panelists cited studies where different researches were given the same data set and the same question to answer. One would expect consistent results across the group. However, it wasn't. Finding causality and connection is in the end a craft more than a science. It's based upon the background and strengths of the person.
Diego had the best quote of the panel when he said, ” What you are trying to do is answer a business question with incomplete data. All data science tries to do is decrease the percentage of incomplete data you have.”