optisraka.blogg.se - Redshift vs snowflake

Briefly stated, the two philosophies are: I’ve written about the two philosophies on this blog before, but it’s been interesting to see this dynamic play out in another company’s story. Intermix.io’s fate tells us something about of the two philosophies of cost that exists in data. And it’s worth noting that Snowflake took only two years to achieve this - in the post, Kamp outlines in great detail Snowflake’s flawless execution from launch in 2016 to aggressive growth in 2018.

SMBs and long-time Redshift users began switching over to Snowflake. This almost magical quality was what caused Intermix.io’s target market (SMBs who were unhappy with Redshift performance) to plateau. Kamp notes that Snowflake’s real advantage over Redshift was what he called its ‘serverless’ model, and what other people call ‘ elastic scaling’: when you use Redshift, you have to provision servers and watch those provisioned servers carefully, because scaling Redshift wasn’t automatic in contrast, Snowflake (and BigQuery) could invisibly scale up to however much compute or storage it needed to execute your query, without any intervention from you. This architecture meant that compute and storage was decoupled from each other - and, more importantly, could be scaled completely separately from each other. Like BigQuery, Snowflake was a massively parallel processing (MPP) columnar data warehouse that was built on top of a ‘shared-nothing’ architecture. In late 2016, Snowflake emerged out of seemingly nowhere. (Sharp readers would note that BigQuery was already around at the time, but as Kamp says: “With AWS about 10x the size of GCP at the time, it was a no-brainer to go with Redshift (…) Redshift was the only game in town.”) Kamp and team had looked out at the cloud data warehousing space circa 2016, and saw that Redshift was the only real option they needed to support - and good thing too, since they had limited engineering resources. To be exact, Intermix.io worked only with Redshift. The idea here is that if, say, some server is about to run out of space somewhere or is showing some performance degradation, that’s going to start showing up in the performance of the actual operations pretty quickly - and needs to be addressed. Intermix.io works in a couple of ways: First, it tags all of that data, giving the service a meta-layer of understanding what does what, and where it goes second, it taps every input in order to gather metrics on performance and help identify those potential bottlenecks and lastly, it’s able to track that performance all the way from the query to the thing that ends up on a dashboard somewhere. That’s a problem Intermix.io and its founders, Paul Lappas and Lars Kamp, hope to solve. But as more and more information pours in from disparate sources, gets logged in obscure databases and is generally hard (or slow) to query, the process of getting that all into one neat place where a data scientist can actually start running the statistics is quickly running into one of machine learning’s biggest bottlenecks. The article went on to describe the service as: For any company built on top of machine learning operations, the more data it has, the better it is off - as long as it can keep it all under control. A Techcrunch article from 2018 describe the startup as ‘ Intermix.io looks to help data engineers find their worst bottlenecks’.

Lars Kamp started intermix.io with Paul Lappas in February 2016. The fact that Kamp had to write an email to his investors, and sell off his company to private equity buyers reflect some underlying market shifts that we should all pay attention to. And so our addressable market cratered within just two quarters. Snowflake offered a better product that made the migration from Redshift (the market leader by 10x in terms of number of customers) worthwhile. To hear Kamp tell it, he had been tying his entire company to the wrong horse: Snowflake addressed our core segment - SMBs that use Redshift and were unhappy due to performance issues. In the years afterwards, however, changes in the cloud data warehousing space made Redshift’s value proposition less and less attractive. Key to Intermix.io’s fate was the fact that it had bet the house on AWS’s Redshift. The post title sounds like a startup retrospective (which it was), but a close read reveals trends that the rest of us in data should pay close attention to. On June 17th, about two months ago, Lars Kamp of Intermix.io published a blog post titled Why we sold intermix.io to private equity in a shifting market.