Thursday 26 September 2024

Show HN: Structured, GitHub App for Automated DBT PR Reviews https://bit.ly/3Y2l1xh

Show HN: Structured, GitHub App for Automated DBT PR Reviews Scaling data teams today means dealing with the complexity of the modern data stack. While DBT has become a core tool for transforming raw data into structured, analytics-ready tables, most teams are using it in ways that lead to chaos: duplicated models, inconsistent metrics, and inefficient SQL that directly impacts cloud spend. The real issue isn’t with DBT itself—it’s in how it’s applied across teams. Here’s the typical setup: Finance defines a revenue model, Marketing calculates customer lifetime value, and Product defines churn. All in DBT, but all with slightly different logic, leading to metric fragmentation. This results in data drift, conflicting reports, and a ton of unnecessary engineering time spent reconciling definitions. Worse, engineers end up re-inventing the wheel by duplicating logic that already exists in other models. The inefficiencies don’t stop there: suboptimal SQL patterns (e.g., full-table scans, poor joins) creep into production and drive up cloud costs. We designed our GitHub App to automate the grunt work of DBT model management, focusing on three key areas: preventing redundant logic, maintaining the semantic layer, and optimizing SQL performance. --- (1) Stop Redundant Models: A lot of teams waste time rebuilding models that already exist. Engineers aren’t aware of what’s been built, so they duplicate work. Our app automatically reviews pull requests, flags redundant models, and suggests reusing existing logic. This keeps your key metrics like revenue or churn consistent across teams and prevents conflicting reports. (2) Maintain the Semantic Layer: DBT’s value is in creating a semantic layer—a consistent definition of business metrics. But as teams scale, maintaining this layer gets tricky. People unknowingly break it with small changes, leading to inconsistencies. Our app checks every new model for deviations from the semantic layer, flagging inconsistencies before they go live. This prevents those all-too-common situations where two departments are debating whose revenue number is right. By ensuring everyone’s using the same definitions, you avoid trust issues with the data. (3) SQL Performance = Real Costs: Bad SQL isn’t just a performance problem—it’s a cost problem. Inefficient joins, full-table scans, and poorly written SQL in your DBT models can blow up your cloud bill. Our app reviews SQL in pull requests, flags inefficiencies, and suggests optimizations. Example: An engineer submits a model that joins two large tables without filtering. Our app flags the full-table scan and suggests using indexed columns and adding WHERE filters. This reduces query cost and improves performance before the code hits production. --- Data engineers are already stretched thin with the demands of modern data pipelines. By automating model consistency checks, semantic layer enforcement, and SQL performance reviews, our GitHub App frees up your team to focus on higher-impact work rather than wasting cycles on repetitive tasks or fighting fires caused by bad data logic. The app is live—give it a try, and let us know how it’s improving your workflow. Also, keep an eye out for our upcoming DBT code generation features—we’re automating more of the heavy lifting soon. https://bit.ly/4eC3fYc September 26, 2024 at 10:28PM

Wednesday 25 September 2024

Show HN: Oku – A Web browser with an emphasis on local-first data storage https://bit.ly/4gHOt4a

Show HN: Oku – A Web browser with an emphasis on local-first data storage Hello HN, My name is Emil. I'm a recent unemployed graduate, and I've been spending a lot of my time working on a passion project from my teenage years. When I was younger, I wanted a place on the Web that I could call my own—not a social media page, but a proper site. I was interested in the IndieWeb community for a while, but I grew to believe a P2P alternative to the Web made more sense than having people host federated services. My browser isn't production-ready, but I'm satisfied with the progress so far and would appreciate thoughts & feedback. Thank you! https://bit.ly/3MZQ7QP September 26, 2024 at 02:33AM

Show HN: FastIndex, an open-source search engine indexation tool https://bit.ly/3MXAidB

Show HN: FastIndex, an open-source search engine indexation tool There's a lot of paid search engine indexation tools out there and I wanted to create my own. Been working as an engineer for over a decade now and my open-source contribution has always been something I wanted to do. Thus I decided to create FastIndex, an open-source search engine indexation alternative to paid solutions such as TagParrot, URLMonitor, Omega Indexer and many more. Source: https://bit.ly/3MWJhf3 Wiki: https://bit.ly/3N1Pd6q https://bit.ly/3MWJhf3 September 26, 2024 at 02:16AM

Show HN: A new, improved, and open-source clipboard history Chrome extension https://bit.ly/3XWudnI

Show HN: A new, improved, and open-source clipboard history Chrome extension Hey all! I recently built a new, improved, and open-source chrome extension that allows you to access, track, and manage your clipboard's history! It's very useful for things like refactoring code, finding obscure commands, and backing up form data. Existing solutions are either closed-source, slow, riddled with ads, or all of the above. My goal is to make this extension the most trustworthy, performant, and easy to use version! Your feedback would be very much appreciated, thanks! Extension: https://bit.ly/3XYvAm7... Repo: https://bit.ly/3XVBLay https://bit.ly/3zBb9lN September 26, 2024 at 12:51AM

Show HN: Public Domain Torrent Site https://bit.ly/3MYyTmU

Show HN: Public Domain Torrent Site I have been working on this site for 10 years. It is a BitTorrent Indexer that uses WebTorrent to make public domain educational media freely available on the Internet. https://bit.ly/3XYtHpA September 25, 2024 at 10:37PM

Tuesday 24 September 2024

Show HN: A toxic conversation with ChatGPT for research https://bit.ly/3THCn0T

Show HN: A toxic conversation with ChatGPT for research I started this conversation with ChatGPT as an exercise to explore the ways in which young folks can be influenced by peers into toxic ways that may lead them astray. So instead of asking generally, I went for a devil's advocate approach. I am actually amazed and impressed by how suggestive and understanding of current youthful traits that ChatGPT is aware of. Any suggestions for other questions? When you pose a question, let me know what you intend on establishing from the response. https://bit.ly/3TJShrI September 25, 2024 at 05:08AM

Show HN: Chrome extension to summarize HN comments using AI and LLMs https://bit.ly/3BjkdfG

Show HN: Chrome extension to summarize HN comments using AI and LLMs Hello! I built this to solve a personal problem of where I didn't want to wade down large chains of HackerNews comment threads in order to get the key takeaways from the discussions. I built this Chrome extension, which supports both OpenAI and Ollama(local LLMs) to summarize comments and display the summary within Chrome's sidepanel. It's open source as you can see from the link, feedback appreciated! https://bit.ly/4ezQzRI September 25, 2024 at 02:16AM

Show HN: Broken Hill: A Productionized GCG Attack Tool for Use Against LLMs https://bit.ly/4dnGwxU

Show HN: Broken Hill: A Productionized GCG Attack Tool for Use Against LLMs https://bit.ly/4ddDgFm September 24, 2024 at 11:31PM

Show HN: Oodle – serverless, fully-managed, drop-in replacement for Prometheus https://bit.ly/4efFC7R

Show HN: Oodle – serverless, fully-managed, drop-in replacement for Prometheus Hello HN! My co-founder, Vijay and I are excited to open up Oodle for everyone. We used to be observability geeks at Rubrik. Our metrics bills grew like 20x over 4 years. We tried to control spend by getting better visibility, blocking high cardinality labels like pod_id, cluster_id, and customer_id. But that made debugging issues complicated. App engineers hated blocking metrics, and blocking others' code reviews was not fun for platform engineers either! Migrations (and lock-ins) were very painful, the first migration from Influx to Signalfx took 6+ months and the second migration from Splunk took over a year. Oodle is taking a new approach to building a cost-efficient serverless metrics observability platform. It delivers fast performance at high scale. We leverage custom storage format on S3, tuned for metrics data. Queries are serverless. The hard part is how to achieve fast performance while optimizing for costs (every cpu cycle, storage/memory byte counts!). We've written about the architecture in more detail on our blog: https://bit.ly/4gA0RTC... Try out our playground with 13M+ active time series/hr & 13B+ samples/day: https://bit.ly/3BhOYBF Explore all features with live data via Quick Signup: https://bit.ly/47EawEA - Instant exploration (<5min): Run one command to stream synthetic metrics to your account - Easy integration (<15min): Explore with your data from existing Prometheus or OTel setup. We’d love your feedback! cheers https://blog.oodle.ai/building-a-high-performance-low-cost-metrics-observability-system/ September 24, 2024 at 01:39PM

Show HN: OpenFreeMap – Open-Source Map Hosting https://bit.ly/3BhOHyD

Show HN: OpenFreeMap – Open-Source Map Hosting Hi HN, After 9 years of running my own OpenStreetMap tile server infra for MapHub ( https://bit.ly/3XypM17 ), I've open-sourced it and launched OpenFreeMap. You can either self-host or use our public instance. Everything is open-source, including the full production setup — there’s no 'open-core' model here. Check out the repo ( https://bit.ly/3zBqYc1 ). The map data comes from OpenStreetMap. I also provide weekly full planet downloads both in Btrfs and MBTiles formats. I aim to cover the running costs of the public instance through donations. Looking forwards for your feedback. https://bit.ly/4e8VCbW September 24, 2024 at 12:59PM

Show HN: An expression parser supporting multiple types https://bit.ly/3TENoQA

Show HN: An expression parser supporting multiple types This C library is part of a main project aimed at providing a reactive key-value (KV) database. The data is typed (numbers, strings, dates, or booleans) and can include formulas with references to other entries. Clients connected to this database receive a real-time data stream with updates to the subscribed keys, allowing them to react to changes and their dependencies. Essentially, it’s like building a distributed Excel, where data and formulas dynamically update across the system. I couldn’t find any libraries that offered the full set of features I needed for evaluating expressions, so I decided to create my own. This sub-project is open-source and available on GitHub. Feedback is welcome! https://bit.ly/3XTTD5p September 24, 2024 at 11:30AM

Monday 23 September 2024

Show HN: Interactive Noise-Level Map https://bit.ly/4gC8Jnx

Show HN: Interactive Noise-Level Map https://bit.ly/4gPlhIz September 23, 2024 at 06:22PM

Show HN: Trakk.js – Real-Time Code Monitoring and Docs Generation https://bit.ly/47zaWw3

Show HN: Trakk.js – Real-Time Code Monitoring and Docs Generation Hello HN, I’ve developed an npm package called trakk-js, designed to help frontend developers while they're developing their applications locally. It provides a clean UI panel that loads on top of your web application, and presents all of the below in real-time : - Tracks function calls, errors, requests, and user interactions (with screenshots) with detailed insights. - Filters unimportant events and shows app stats (slowest/fastest loops, frequent calls, etc.). - Generates automatic documentation with a single click. Why use it? - Debugging: Clean UI to trace your app’s code flow in real-time. - Onboarding: Newcomers can easily understand app behavior by tracking event flow. - Documentation: Generates accurate, up-to-date docs automatically after code changes. It's framework agnostic, works with .js, .jsx, .ts, .tsx files, and doesn’t modify your code. It's currently in test mode so feel free to check it out. Looking forward to your feedback! https://bit.ly/47yoIyV September 23, 2024 at 12:01PM

Sunday 22 September 2024

Show HN: I built a tool to roast landing pages using AI agents https://bit.ly/3Ba6R58

Show HN: I built a tool to roast landing pages using AI agents I built a tool to roast landing pages with AI agents. I was gathering feedback from watching landing page roast videos, and figured out I could prompt LLMs to analyse a screenshot and roast based on the same criteria. It's not 100% accurate yet, but it has been really insightful when I've tested it on my own websites. Let me know what you think! https://bit.ly/3B8mk5K September 23, 2024 at 02:05AM

Show HN: PlayCodeAI – A tool I created to let my kid create their own videogames https://bit.ly/47DT1UI

Show HN: PlayCodeAI – A tool I created to let my kid create their own videogames Hi hn, Its been a while since my kid has been dreaming to make videogames, as you know, the entry bar is high, too many technical details are involved. A few months ago, we started making games, we have tried many different ways, from dragging blocks to write code with a high-level library to make it simple, it is still too much work. Getting into the AI boat, I decided to give it a try for making games, turns out that it works super nicely, still, you still have many technical concepts involved, what language should the game be written in? where do you store the file/assets? how do you run it? I decided creating a simple playground where most of the tech details are already defined, html/javascript due to the portability they provide, we can get the code and see the result right away, same way, we can easily export it and upload it to a server. So far, the results are amazing, my kid has managed to create 10 simple games in a few hours with no help at all (the landing page includes screenshots from these). I strongly believe that this can help many other people to bring their ideas to life. As of now, the tool is simple, it runs on the client side, requiring an OpenAI API Key to interact with the models, it has the problem that only paid users can access this and I hope to remove that barrier soon but this was the simplest/fastest way to get things done. A big limitation relates to image loading, many images have CORS limitations that chatgpt can't detect, games without images aren't fun and I hope to solve this soon (suggestions accepted). I'd also love making it simple to share the results with others. Anyway, if your kid is willing to make a videogame, I encourage you to try this out, any feedback should help me to polish this to be usable for others. Thanks. https://bit.ly/3BhumJJ September 22, 2024 at 09:19PM

Show HN: Inbound Email (SMTP) to Webhook https://bit.ly/4ee73yU

Show HN: Inbound Email (SMTP) to Webhook Here's my first (hopefully of many) open source release. A minimal script to receive emails via SMTP, parse content (including headers), store attachments in Amazon S3, and forward email content to a webhook. I use it to power DMARC report storage and email content testing. Some of the big email API providers have inbound APIs but costs can rack up fast if you're using them at scale. Hence why I built this. https://bit.ly/3MVyTnK Features - SMTP server to receive emails concurrently - Parses incoming emails using mailparser - Uploads attachments to Amazon S3 - Forwards parsed email content to a specified webhook - Configurable via environment variables - Handles large attachments gracefully - Queue system for processing multiple emails and webhook requests simultaneously https://bit.ly/3MVyTnK September 22, 2024 at 07:19AM

Show HN: Formulaer – Free, simple and clean forms https://bit.ly/4gwfaIT

Show HN: Formulaer – Free, simple and clean forms https://bit.ly/3BmVAys September 22, 2024 at 08:04AM

Saturday 21 September 2024

Show HN: PDF to MD by LLMs – Extract Text/Tables/Image Descriptives by GPT4o https://bit.ly/4dio75H

Show HN: PDF to MD by LLMs – Extract Text/Tables/Image Descriptives by GPT4o I've developed a Python API service that uses GPT-4o for OCR on PDFs. It features parallel processing and batch handling for improved performance. Not only does it convert PDF to markdown, but it also describes the images within the PDF using captions like `[Image: This picture shows 4 people waving]`. In testing with NASA's Apollo 17 flight documents, it successfully converted complex, multi-oriented pages into well-structured Markdown. The project is open-source and available on GitHub. Feedback is welcome. https://bit.ly/3MSbO58 September 22, 2024 at 03:05AM

Show HN: A tool to easily convert any web page into an eBook https://bit.ly/47CBjkt

Show HN: A tool to easily convert any web page into an eBook I’ve built a tool that allows you to convert any web content into an ebook with just a few clicks. The idea came from my need to frequently save articles or web pages as ebooks for offline reading and sharing. However, I found existing tools either too complex or lacking flexibility, so I decided to create something simpler, faster, and that supports both EPUB and PDF formats. Key Features: - Quick Conversion: Input a list of URLs, with each URL treated as a chapter, and the tool will automatically generate an ebook in either EPUB or PDF format. - Content Optimization: It automatically optimizes the content for better structure and readability in ebook format. - Multilingual Support: It works with web pages in multiple languages. EPUB or PDF Output: You can choose the format you prefer. - Send Directly to Kindle: You can send the generated ebook directly to your Kindle device from Ebookany. I’d love to get feedback from the community to help improve the tool and add new features. Feel free to try the tool here: https://bit.ly/3MSTCZf . Thank you for reading, and I look forward to your thoughts! https://bit.ly/3MSTCZf September 21, 2024 at 07:59AM

Friday 20 September 2024

Show HN: Container Desktop – Podman Desktop Companion https://bit.ly/3XTvlJ3

Show HN: Container Desktop – Podman Desktop Companion https://bit.ly/4enTay8 September 20, 2024 at 07:08PM