Thursday 21 April 2022

Show HN: Dassana. JSON-native,schema-less logging solution built atop ClickHouse https://bit.ly/3L8fhdm

Show HN: Dassana. JSON-native,schema-less logging solution built atop ClickHouse Hello HN, I’m Gaurav. Founder & CEO of Dassana. We are coming out of stealth today and would like to invite the community to give us a try. https://bit.ly/3k020rv First, a bit of a backstory. I grew up with grep to search log files. The kind of person whose grep was aliased to grep -i . Then came along Splunk. It was a game-changer. For every single start-up I started (there are a few) I used Splunk and quite often we will run out of our ingestion quota. SumoLogic wasn’t cheaper either so we looked into DataDog. It was good until we started running issues with aggregate queries (facets etc), rehydration takes forever and the overall query experience is not fun (it wasn’t fun with Splunk and SumoLogic either). All these experiences over the last two decades led me to wish for a simple solution where I can just throw a bunch of JSON/CSV data and query it with simple SQL. These days most logs are structured to begin with and the complexity of parsing logs to extract fields etc has moved to log shippers such as fluentd, logstash etc. Enter HackerNews and ClickHouse. I first learned about ClickHouse from HackerNews and was completely floored by its performance. Given its performance and storage savings due to columnar storage, it was an obvious choice to build a logging solution on top of it. As we started doing POC with it, it was obvious that it is a perfect solution for us if we could solve the problem of schema management. Over the last six months or so, that’s what we have working on. We designed a storage scheme that flattens the JSON objects and exposes an SQL interface that takes a SQL and converts it to our schemaless table query. Being JSON native, we allow querying specific JSON objects in arrays. This is something that is not possible with many logging vendors and if you use something like Athena good luck figuring out the query- it is possible but quite complicated. Here is sample query - select count(distinct eventName) from aws_cloudtrail where awsRegion=us-east-1 Also, there are no indices, fields, facets etc in Dassana. You just send JSON/CSV logs and you query them with 0 latency. And yes, we do support distributed joins among different data sources (we call them apps). And like any other distributed system, it has limitations but it generally works great for almost all log-related use cases. One amazing side effect of what we built is that we can offer a unique pricing model that is a perfect match for logging data. Generally speaking, log queries tend to be specific. There is always some sort of a predicate- a user name, hostname, an IP address. But these queries run over large volumes of data. As such, these queries run insanely fast on our system and we are able to charge separately for queries and reduce the cost of ingestion dramatically. In general, we expect our solution to be about 10x cheaper (and 10x faster) than other logging systems. When not to use Dassana? Not suitable for unstructured data. We don’t offer full-text-search (FTS) yet. We are more like a database for logs than a lucence index for text files. With more and more people starting to use structured logs, this problem with either go away on its own but as I said, we do plan to offer FTS in the future. Note that you can already use log shippers such as fluent, vector,logstash etc to give structure to logs. What’s next? 1. Grafana plugin. Here is a sneak preview- https://bit.ly/3L3LNxb 2. Alerting/Slack notifications. You will be able to save queries and get Slack notifications when results match. 3. JDBC driver. 4. TBD. You tell us what to build. Email me and I will personally follow up with you: gk 8 dassana dot input/output I will be online all day today happy to answer any question. Feel free to reach out by email too. April 21, 2022 at 05:16PM

No comments:

Post a Comment