No one is questioning the fact that data science is here in a big way, and it’s growing fast. Today, data science ninjas are perhaps the most sought after talent on the market. For those who aren’t clear on what exactly a data scientist does, data scientists use a combined set of skills from computer science, statistics, engineering, business insights, and strategy to mine enormous volumes of data and pull out key insights that have major impacts on businesses.
It's easy to understand how companies have grown to understand the value of data science, as the volume of data generated continues to grow exponentially. According to IBM, over an estimated 2.5 quintillion bytes of data are created daily. What that means is that 90 percent of the data that exists today was created in the past two years.
Without data scientists, consumers wouldn’t have the personalized recommendations of Hulu or the shopping experience of Amazon. They wouldn’t be exposed to the highly targeted content and advertising on all their favorite social media platforms. For the businesses involved here though, these represent clear, compelling revenue opportunities that just don’t happen without strong strategy and execution around their data. The issue is that data scientists aren’t exactly a commodity.
According to a McKinsey study, by next year the demand for data scientists in the U.S. alone will be half a million jobs. The problem is that there will be less than a quarter of a million data scientists available, globally. To make the problem even worse, the majority of most data scientists’ time today is spent doing data engineering, which simply sets the table for them to start doing real data science.
Big Data Engineering
Maxime Beauchemin, a data engineer at Airbnb and architect supreme behind Airflow, defined data engineering as:
“The data engineering field could be thought of as a superset of business intelligence and data warehousing that brings more elements from software engineering.”
Data engineering exists because companies now have massive amounts of highly valuable data, but to gain valuable insight from big data, it needs to be extracted and made sense of quickly and at scale. As far as specific needs or functions within data engineering are concerned, many can be categorized within data integration and services.
With SaaS becoming the new standard for company operations, data integration has become as important and as challenging as it’s ever been. There is a critical need to synchronize referential data across systems, and the need to have up-to-date data to function properly is a much bigger one than within SaaS.
The fact of the matter is that often times today’s executives are still signing deals without really thinking through or understanding the data integration challenges. Now, some SaaS providers will offer their own analytics offering, but they're inherently lacking in understanding and perspective of the rest of a customer’s data infrastructure. It doesn’t make it any easier on executives when the amount work that it will actually take for proper integration is typically downplayed by vendors to help them close more deals, faster.
A step further, data engineering can require developing services and tooling to automate work that data scientists may have previously done more manually. Services and tooling for things like data ingestion, metric computation, anomaly detection, metadata management, experimentation and instrumentation are common examples of this. There needs to be a constant priority to automate workloads and build abstraction that allows data scientists to climb the ladder of complexity.
Sounds like a lot of work, right? Well, we happen to know some folks that, rather than letting a shortage of talent be a crux to improvements in this space, have built their own software to eat the world with technology to solve this problem.
Astronomer powers data intelligence by unifying data to reveal actionable insights. Their platform connects and centralizes data, making it remarkably simple for anyone from business users to data scientists to quickly create and monitor data pipelines.
One of the most exciting things about this team is that they’ve developed a platform and solution that allows them to add value to a range of companies, whether they’re at a point where their ‘flying blind’ (big data initiatives are a new thing), ready for ‘intelligence’ (real-time analytics, predictive analytics, machine learning, AI), or looking for that ‘insanity mode’ gear (breakthrough improvements and new, innovative business models).
Along the way, they're helping companies deal with silo-ed data, data centralization, cleaning, enrichment, aggregation, business intelligence, and simply getting out of ‘spreadsheet hell.' They’re bringing on a wide range of new customers every day from startups to enterprise, and we don’t see the demand slowing down anytime soon.
To give you a real sense of what makes Astronomer so special, I went ahead and did a quick Q&A with Astronomer cofounders Ry Walker and Tim Brunk. Check out what they had to say below:
What’s surprised you guys most about everything that’s happening with big data?
Ry: It’s been amazing to watch data go from a buzzword to a household term over the last few years. Because of that, we see so many companies scrambling to put a plan in place to change how they utilize their data. Their databases are now growing like crazy, but the challenge is turning raw data into a tangible business asset.
Tim: I think a result of this growth is the a paradox of too much data—there’s a paralysis. The amount of data being generated is so much greater than the capacity most organizations have to process it. It’s a little like getting hit with a tidal wave for these companies. Marketers, for example, suddenly have data on every action taken on website or app, CRM data, transactional data, social media, etc. Where do they even begin? Look for tools? Well, there are over 3500 to choose from. And there’s probably not a data scientist on the team, which means it might take getting IT involved just to set something up. Big data utilization is lagging big data generation.
Ry: It’s also worth mentioning, the explosion of data silos. It seems like many organizations long for the simplicity of centralization, but that’s risky these days. Now is the time for exploration and innovation, and that can feel messy. The order of the day is big data connectedness. Embrace the chaos, and adopt a tool—you know, like Astronomer—to build data pipelines across these silos, and find a way to bring clarity to the chaos.
What would you say is special about what you’re doing, especially compared to competitors?
Ry: From a technical perspective, we believe in providing a platform, which we affectionately refer to as our “machine,” that doesn’t compromise on capability, even if our customers don’t know how to write code. So internally, we’re very dev-focused, but we tell this machines+humans narrative. Our machine is our business, but we think our humans add a lot to the mix. We’re not afraid of getting our hands dirty with our customers to implement something tricky.
Tim: On that same note, we’re not just building cool tech, although we are; we’re also sitting in the trenches with the people struggling with a problem we can solve. So we’re solving the problem together. That approach is something that none of our competitors seem to be taking and it allows us to really speak our customers’ language.
When it comes to Astronomer, what are you most proud of?
Ry: That’s an easy one. The team. We’ve been fortunate to be joined by a lot of smart, passionate people, and every day the team blossoms further. We’re excited to think about all the skill development that we’re “sponsoring” and it’s obvious to us already that this early foundational Astronomer team will be legendary in coming years.
Tim: Another thing is, objectively, we shouldn’t be nearly as successful as we are. We have the fantastic customers, top-notch investors and a brand awareness in the market that, frankly, catches us off guard sometimes. We got into the top accelerator program with startups from NYC, SF and London, and we’re the only midwestern company to have been accepted, ever. There’s nothing impressive about us on paper—Ry and I have fairly normal resumes—but we’re doing something special. It’s kind of an island of misfit toys sensation, and I just love that.
Enough about big data, let’s talk about you guys. What would you say is the most important thing you do to keep from getting burnt out?
Ry: Burnout is inevitable. Sometimes an action-packed Monday can make us feel like we just worked a week. The key is to try to disconnect on weekends, and take as many trips as possible to at least keep the scenery interesting, drawing us away from our computers. We understand that we’re sacrificing some work-life balance in the first few years of this company in exchange for building something great.
Tim: But I don’t think balance and greatness are mutually exclusive. The thing that keeps me sane is taking one day off per week, completely. Nobody can even reach me. My wife and I might find a new spot to hike, go to brunch and a matinee, or play games and drink bloody marys. The point is having a day to do whatever sounds fun and spontaneous to us—whatever doesn’t feel like work. It’s the biblical principle of “sabbathing,” which is a fascinating reset and extremely effective way to recover.
If you could give one piece of advice—about big data, startup life, whatever—what would it be?
Tim: Ok, I’ve got one I’m still learning myself: be humble. Specifically, be willing to proactively seek out guidance and actually listen. Be willing to defer on certain things, to your co-founder or team. Be willing to ask for financial help. We’ve taken out loans to make payroll. It’s humbling, but here we are. Be willing to admit that you don’t have everything figured out, and when you make mistakes, turn them into a an opportunity. That sounds like a lot of advice, but true humility is kind of all-encompassing. It helps you build a better business because you’re willing to explore, ask questions and change.
Ry: I’ll go tactical: My advice is start a blog. Writing is challenging but therapeutic, and it’s great for your personal brand. Write about anything you like, even short stories. Whatever. We rotate through our staff to author our weekly blog post in part because we want to encourage lifelong writing habits with our team.