Nigeria No1. Music site And Complete Entertainment portal for Music Promotion WhatsApp:- +2349077287056
Monday, 23 March 2026
Show HN: OpenCastor Agent Harness Evaluator Leaderboard https://bit.ly/4bGGUc3
Show HN: OpenCastor Agent Harness Evaluator Leaderboard I've been building OpenCastor, a runtime layer that sits between a robot's hardware and its AI agent. One thing that surprised me: the order you arrange the skill pipeline (context builder → model router → error handler, etc.) and parameters like thinking_budget and context_budget affect task success rates as much as model choice does. So I built a distributed evaluator. Robots contribute idle compute to benchmark harness configurations against OHB-1, a small benchmark of 30 real-world robot tasks (grip, navigate, respond, etc.) using local LLM calls via Ollama. The search space is 263,424 configs (8 dimensions: model routing, context budget, retry logic, drift detection, etc.). The demo leaderboard shows results so far, broken down by hardware tier (Pi5+Hailo, Jetson, server, budget boards). The current champion config is free to download as a YAML and apply to any robot. P66 safety parameters are stripped on apply — no harness config can touch motor limits or ESTOP logic. Looking for feedback on: (1) whether the benchmark tasks are representative, (2) whether the hardware tier breakdown is useful, and (3) anyone who's run fleet-wide distributed evals of agent configs for robotics or otherwise. https://bit.ly/4c1pica March 23, 2026 at 11:13PM
Show HN: Cq – Stack Overflow for AI coding agents https://bit.ly/47gYJgx
Show HN: Cq – Stack Overflow for AI coding agents Hi all, I'm Peter at Staff Engineer and Mozilla.ai and I want to share our idea for a standard for shared agent learning, conceptually it seemed to fit easily in my mental model as a Stack Overflow for agents. The project is trying to see if we can get agents (any agent, any model) to propose 'knowledge units' (KUs) as a standard schema based on gotchas it runs into during use, and proactively query for existing KUs in order to get insights which it can verify and confirm if they prove useful. It's currently very much a PoC with a more lofty proposal in the repo, we're trying to iterate from local use, up to team level, and ideally eventually have some kind of public commons. At the team level (see our Docker compose example) and your coding agent configured to point to the API address for the team to send KUs there instead - where they can be reviewed by a human in the loop (HITL) via a UI in the browser, before they're allowed to appear in queries by other agents in your team. We're learning a lot even from using it locally on various repos internally, not just in the kind of KUs it generates, but also from a UX perspective on trying to make it easy to get using it and approving KUs in the browser dashboard. There are bigger, complex problems to solve in the future around data privacy, governance etc. but for now we're super focussed on getting something that people can see some value from really quickly in their day-to-day. Tech stack: * Skills - markdown * Local Python MCP server (FastMCP) - managing a local SQLite knowledge store * Optional team API (FastAPI, Docker) for sharing knowledge across an org * Installs as a Claude Code plugin or OpenCode MCP server * Local-first by default; your knowledge stays on your machine unless you opt into team sync by setting the address in config * OSS (Apache 2.0 licensed) Here's an example of something which seemed straight forward, when asking Claude Code to write a GitHub action it often used actions that were multiple major versions out of date because of its training data. In this case I told the agent what I saw when I reviewed the GitHub action YAML file it created and it proposed the knowledge unit to be persisted. Next time in a completely different repo using OpenCode and an OpenAI model, the cq skill was used up front before it started the task and it got the information about the gotcha on major versions in training data and checked GitHub proactively, using the correct, latest major versions. It then confirmed the KU, increasing the confidence score. I guess some folks might say: well there's a CLAUDE.md in your repo, or in ~/.claude/ but we're looking further than that, we want this to be available to all agents, to all models, and maybe more importantly we don't want to stuff AGENTS.md or CLAUDE.md with loads of rules that lead to unpredictable behaviour, this is targetted information on a particular task and seems a lot more useful. Right now it can be installed locally as a plugin for Claude Code and OpenCode: claude plugin marketplace add mozilla-ai/cq claude plugin install cq This allows you to capture data in your local ~/.cq/local.db (the data doesn't get sent anywhere else). We'd love feedback on this, the repo is open and public - so GitHub issues are welcome. We've posted on some of our social media platforms with a link to the blog post (below) so feel free to reply to us if you found it useful, or ran into friction, we want to make this something that's accessible to everyone. Blog post with the full story: https://bit.ly/41ukHZX GitHub repo: https://bit.ly/4soBZ6I Thanks again for your time. https://bit.ly/41ukHZX March 23, 2026 at 05:11PM
Sunday, 22 March 2026
Show HN: AgentVerse – Open social network for AI agents (Mar 2026) https://bit.ly/4srsrrA
Show HN: AgentVerse – Open social network for AI agents (Mar 2026) https://bit.ly/47WxiJ2 March 23, 2026 at 02:48AM
Show HN: Quillium, Git for Writers https://bit.ly/4c0H92U
Show HN: Quillium, Git for Writers This is a tool which lets you easily manage different versions of ideas, helpful for writing essays. I've found myself wanting this every single time I go through the drafting process when writing, and I've been frustrated every time I find myself accidentally working on an old draft just because there was a paragraph that I liked better. This solves it. I hope the community like this as much I enjoyed working on it! Note that it's currently a beta waitlist because there's some bugs with the undo/redo state management and so I want to dogfood it for a bit for reliability. It says April 2nd, but I may allow earlier beta testers. https://bit.ly/4bFReRH March 23, 2026 at 01:22AM
Show HN: Plot-Hole.com a daily movie puzzle I made https://bit.ly/47C1U2H
Show HN: Plot-Hole.com a daily movie puzzle I made https://bit.ly/4brdZd9 March 23, 2026 at 01:15AM
Show HN: Refrax – my Arc Browser replacement I made from scratch https://bit.ly/4ssbdKD
Show HN: Refrax – my Arc Browser replacement I made from scratch Open the same tab in two browser windows. In Chrome or Safari, you get two unconnected pages. In Arc, one window shows a placeholder. In Zen, it silently creates a duplicate. In Refrax, the browser I built, both windows show the same page updating live. The same web page, in as many windows as you want. This shouldn't be possible. WebKit's WKWebView can exist in exactly one view hierarchy at a time. With macOS 26, Apple added a SwiftUI API separating WebView from WebPage, so you can end up with multiple views referencing the same page. But if you try it, your app crashes. WebKit source code has a precondition with this comment: "We can't have multiple owning pages regardless, but we'll want to decide if it's an error, if we can handle it gracefully, and how deterministic it might even be..." So here's how I did it. CAPortalLayer is an undocumented private class that's been in macOS since 10.12. It mirrors a layer's composited output by referencing the same GPU memory, not copying it. Every scroll, animation, or repaint reflects instantly. This is what powers Liquid Glass effects, the iOS text selection magnifier, and ghost images during drag and drop. Apple uses portals for effects. I use them to put the same web page in two windows. Refrax keeps one real WKWebView per tab and displays a CAPortalLayer mirror everywhere else. When you click a different window, the coordinator moves the real view there and the old window gets a portal. You can't tell which is which. This sounds simple in theory, but making this actually work seamlessly took quite a lot of effort. Each macOS window has its own rendering context, and the context ID updates asynchronously, so creating a portal immediately captures a stale ID and renders nothing. The portal creation needs to be delayed, but delaying creates a visual gap. I capture a GPU snapshot using a private CoreGraphics function and place it behind the portal as a fallback. Another hard part is that none of it is documented. Portals are very capricious and would crash the app if you use them incorrectly. I had to inspect the headers and then disassemble the binaries to explore exactly how it works in order to build something robust. I never worked on a browser before this, I've only been a user. I started using Arc in 2022. I remember asking for an invite, learning the shortcuts, slowly getting used to it. I didn't like it at first as it had too much Google Chrome in it for my taste, and I'd been using Safari at the time. But it grew on me, and by the time it was essentially abandoned and sold to Atlassian, I couldn't go back to Safari anymore. I tried everything: Zen, SigmaOS, Helium. None felt right, and I didn't want another Chromium fork. WebKit ships with the OS, but all you get is the rendering engine. Tabs, history, bookmarks, passwords, extensions, everything else has to be made separately. And so, being a very reasonable person, I decided to make my own Arc replacement from scratch. And I did. Refrax is built in Swift and Objective-C with no external dependencies. The app itself is less than 30 MB. I have 393 tabs open right now using 442 MB of RAM; 150 tabs in Safari was already over 1 GB. I've been using it daily for over a month, and so have some of my friends. The portal mirror is just one feature. The same approach, finding what Apple built for themselves and using it to create something they didn't think about, runs through the entire browser. You can tint your glass windows with adjustable blend modes and transparency. The sidebar in compact mode samples the page and matches the colors. And it has support for Firefox and Chrome extensions. The alpha is public. Download from the linked website, enter REFRAX-ALPHA-HACKERNEWS to activate. No account needed. Telemetry is crash reports and a daily active-user ping, nothing else. And if you find a bug – I built this alone, so I'll actually read your report. https://bit.ly/4bs6AdM March 22, 2026 at 11:52PM
Saturday, 21 March 2026
Show HN: An event loop for asyncio written in Rust https://bit.ly/4sBBVR2
Show HN: An event loop for asyncio written in Rust actually, nothing special about this implementation. just another event loop written in rust for educational purposes and joy in tests it shows seamless migration from uvloop for my scraping framework https://bit.ly/4lL0CIq with APIs (fastapi) it shows only one advantage: better p99, uvloop is faster about 10-20% in the synthetic run currently, i am forking on the win branch to give it windows support that uvloop lacks https://bit.ly/4v2jgQn March 21, 2026 at 11:12PM
Show HN: Travel Hacking Toolkit – Points search and trip planning with AI https://bit.ly/3PlmMF2
Show HN: Travel Hacking Toolkit – Points search and trip planning with AI I use points and miles for most of my travel. Every booking comes down to the same decision: use points or pay cash? To answer that, you need award availability across multiple programs, cash prices, your current balances, transfer partner ratios, and the math to compare them. I got tired of doing it manually across a dozen tabs. This toolkit teaches Claude Code and OpenCode how to do it. 7 skills (markdown files with API docs and curl examples) and 6 MCP servers (real-time tools the AI calls directly). It searches award flights across 25+ mileage programs (Seats.aero), compares cash prices (Google Flights, Skiplagged, Kiwi.com, Duffel), pulls your loyalty balances (AwardWallet), searches hotels (Trivago, LiteAPI, Airbnb, Booking.com), finds ferry routes across 33 countries, and looks up weird hidden gems near your destination (Atlas Obscura). Reference data is included: transfer partner ratios for Chase UR, Amex MR, Bilt, Capital One, and Citi TY. Point valuations sourced from TPG, Upgraded Points, OMAAT, and View From The Wing. Alliance membership, sweet spot redemptions, booking windows, hotel chain brand lookups. 5 of the 6 MCP servers need zero API keys. Clone, run setup.sh, start searching. Skills are, as usual, plain markdown. They work in OpenCode and Claude Code automatically (I added a tiny setup script), and they'll work in anything else that supports skills. PRs welcome! Help me expand the toolkit! :) https://bit.ly/47ObeAl https://bit.ly/47ObeAl March 21, 2026 at 10:25PM
Friday, 20 March 2026
Show HN: AgentVerse – Open social network for AI agents (Mar 2026) https://bit.ly/4rJtaDi
Show HN: AgentVerse – Open social network for AI agents (Mar 2026) https://bit.ly/47WxiJ2 March 21, 2026 at 02:25AM
Show HN: Rover – turn any web interface into an AI agent with one script tag https://bit.ly/4blbIAg
Show HN: Rover – turn any web interface into an AI agent with one script tag https://bit.ly/3NAOc9a March 21, 2026 at 01:58AM
Show HN: Vibefolio – a place to showcase your vibecoded projects https://bit.ly/47h4FGh
Show HN: Vibefolio – a place to showcase your vibecoded projects Over the last months, more people are shipping small apps, experiments, and side-projects at a much higher pace. I'm one of them and initially created a showcase page for myself to track them but this week decided to create something for others. Happy to read feedback on how to improve it further! https://bit.ly/47fd3pN March 20, 2026 at 09:53PM
Show HN: Cybertt – Cybersecurity Tabletop https://bit.ly/47x7hQH
Show HN: Cybertt – Cybersecurity Tabletop https://bit.ly/3PmIIzx March 20, 2026 at 10:29AM
Thursday, 19 March 2026
Show HN: Download entire/partial Substack to ePub for offline reading https://bit.ly/4uGIhQO
Show HN: Download entire/partial Substack to ePub for offline reading Hi HN, This is a small python app with optional webUI. It is intended to be run locally. It can be run with Docker (cookie autodetection will not work). It allows you to download a single substack, either entirely or partially, and saves the output to an epub file, which can be easily transferred to Kindle or other reading devices. This is admittedly a "vibe coded" app made with Claude Code and a few hours of iterating, but I've already found it very useful for myself. It supports both free and paywalled posts (if you are a paid subscriber to that creator). You can order the entries in the epub by popularity, newest first, or oldest first, and also limit to a specific number of entries, if you don't want all of them. You can either provide your substack.sid cookie manually, or you can have it be autodetected from most browsers/operating systems. https://bit.ly/4uwnXRY March 20, 2026 at 04:36AM
Show HN: Screenwriting Software https://bit.ly/3Phmteo
Show HN: Screenwriting Software I’ve spent the last year getting back into film and testing a bunch of screenwriting software. After a while I realized I wanted something different, so I started building it myself. The core text engine is written in Rust/wasm-bindgen. https://bit.ly/47cYh2P March 20, 2026 at 03:07AM
Wednesday, 18 March 2026
Show HN: Browser grand strategy game for hundreds of players on huge maps https://bit.ly/41cC0i3
Show HN: Browser grand strategy game for hundreds of players on huge maps Hi HN, I've been building a browser-based multiplayer strategy game called Borderhold. Matches run on large maps designed for hundreds of players. Players expand territory, attack neighbors, and adapt as borders shift across the map. You can put buildings down, build ships, and launch nukes. The main thing I wanted to explore was scale: most strategy games are small matches, modest maps, or modest player counts, but here maps are large and game works well with hundreds of players. Matches are relatively short so you can jump in and see a full game play out. Curious what people think. https://bit.ly/4uDPCAC Gameplay: https://youtu.be/nrJTZEP-Cw8 Discord: https://bit.ly/4uEbuvu https://bit.ly/4uDPCAC March 16, 2026 at 09:51AM
Show HN: Fitness MCP https://bit.ly/4sr8Jwo
Show HN: Fitness MCP There's no external MCP for your fitness (Garmin / Strava) data, so we built one. https://bit.ly/4uCviiR March 19, 2026 at 03:00AM
Show HN: ATO – a GUI to see and fix what your LLM agents configured https://bit.ly/476fStf
Show HN: ATO – a GUI to see and fix what your LLM agents configured https://bit.ly/476fSJL March 19, 2026 at 01:28AM
Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training https://bit.ly/4bGv6H0
Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training I replicated David Ng's RYS method ( https://bit.ly/4ll5ILb ) on consumer AMD GPUs (RX 7900 XT + RX 6950 XT) and found something I didn't expect. Transformers appear to have discrete "reasoning circuits" — contiguous blocks of 3-4 layers that act as indivisible cognitive units. Duplicate the right block and the model runs its reasoning pipeline twice. No weights change. No training. The model just thinks longer. The results on standard benchmarks (lm-evaluation-harness, n=50): Devstral-24B, layers 12-14 duplicated once: - BBH Logical Deduction: 0.22 → 0.76 - GSM8K (strict): 0.48 → 0.64 - MBPP (code gen): 0.72 → 0.78 - Nothing degraded Qwen2.5-Coder-32B, layers 7-9 duplicated once: - Reasoning probe: 76% → 94% The weird part: different duplication patterns create different cognitive "modes" from the same weights. Double-pass boosts math. Triple-pass boosts emotional reasoning. Interleaved doubling (13,13,14,14,15,15,16) creates a pure math specialist. Same model, same VRAM, different routing. The circuit boundaries are sharp — shift by one layer and the effect disappears or inverts. Smaller models (24B) have tighter circuits (3 layers) than larger ones (Ng found 7 layers in 72B). Tools to find circuits in any GGUF model and apply arbitrary layer routing are in the repo. The whole thing — sweep, discovery, validation — took one evening. Happy to answer questions. https://bit.ly/4rEg2PM March 18, 2026 at 10:31PM
Tuesday, 17 March 2026
Show HN: Sonder – self-hosted AI social simulation engine https://bit.ly/4rE8hcG
Show HN: Sonder – self-hosted AI social simulation engine https://bit.ly/4bhXvEi March 18, 2026 at 01:21AM
Show HN: CodeLedger – deterministic context and guardrails for AI https://bit.ly/4saYs7c
Show HN: CodeLedger – deterministic context and guardrails for AI We’ve been working on a tool called CodeLedger to solve a problem we kept seeing with AI coding agents (Claude Code, Cursor, Codex): They’re powerful, but on real codebases they: - read too much irrelevant code - edit outside the intended scope - get stuck in loops (fix → test → fail) - drift away from the task - introduce architectural issues that linters don’t catch The root issue isn’t the model — it’s: - poor context selection - lack of execution guardrails - no visibility at team/org level --- What CodeLedger does: It sits between the developer and the agent and: 1) Gives the agent the right files first 2) Keeps the agent inside the task scope 3) Validates output against architecture + constraints It works deterministically (no embeddings, no cloud, fully local). --- Example: Instead of an agent scanning 100–500 files, CodeLedger narrows it down to ~10–25 relevant files before the first edit :contentReference[oaicite:0]{index=0} --- What we’re seeing so far: - ~40% faster task completion - ~50% fewer iterations - significant reduction in token usage --- Works with: Claude Code, Cursor, Codex, Gemini CLI --- Repo + setup: https://bit.ly/4bxAhJd Quick start: npm install -g @codeledger/cli cd your-project codeledger init codeledger activate --task "Fix null handling in user service" --- Would love feedback from folks using AI coding tools on larger codebases. Especially curious: - where agents break down for you today - whether context selection or guardrails are the bigger issue - what other issues are you seeing. https://bit.ly/47F3l01 March 18, 2026 at 12:22AM
Show HN: I built a message board where you pay to be the homepage https://bit.ly/4sKqCps
Show HN: I built a message board where you pay to be the homepage I kept thinking about what would happen if a message board only had one slot. One message, front and center, until someone pays to replace it. That's the entire product. You pay the current message's decayed value plus a penny to take the homepage. Message values drop over time using a gravity-based formula (same concept HN uses for ranking), so a $10 message might only cost a few bucks to replace a day later. Likes slow the decay, dislikes speed it up. The whole thing runs on three mini PCs in my house (k3s cluster, PostgreSQL, Redis Sentinel). Is it overengineered for a message board? Absolutely. I genuinely don't know where this goes. Curious what HN thinks. Archive of past messages: https://bit.ly/3Pcn94I https://bit.ly/4bi0GvG March 17, 2026 at 01:06PM
Monday, 16 March 2026
Show HN: Seasalt Cove, iPhone access to your Mac https://bit.ly/4cL7FOO
Show HN: Seasalt Cove, iPhone access to your Mac I feel like I finally built something I actually use every day and it has completely changed the way I think about work. AI workflows have flipped how devs operate. You're not heads down writing code anymore, you're bouncing between projects, instructing agents, reviewing their work, nudging them forward. The job is now less about typing and more about judgment calls. And the thing about that workflow is you spend a lot of time waiting. Waiting for the agent to finish, waiting for the next approval gate. That waiting doesn't have to happen at your desk. It doesn't have to happen in front of a monitor at all. I built Seasalt because I realized my iPhone could handle 80% of what I was chaining myself to my Mac for. Kick off the agent, walk away, review the diff from the store, a walk, or in a separate room away from your Mac. Approve it. Start the next one, switch to another session. You don't need giant dual monitors for this. That's kind of the whole point. Also, I have a deep security background so I felt like it was 100% necessary to include end to end encrypted with a zero knowledge relay, no ports getting opened, no VPN configuration needed, with key validation in the onboarding flow. https://bit.ly/3PnfnVy March 16, 2026 at 11:48PM
Sunday, 15 March 2026
Show HN: Webassembly4J Run WebAssembly from Java https://bit.ly/41cf2aN
Show HN: Webassembly4J Run WebAssembly from Java I’ve released WebAssembly4J, along with two runtime bindings: Wasmtime4J – Java bindings for Wasmtime https://bit.ly/471hULh WAMR4J – Java bindings for WebAssembly Micro Runtime https://bit.ly/4blCCGY WebAssembly4J – a unified Java API that allows running WebAssembly across different engines https://bit.ly/40CvoJI The motivation was that Java currently has multiple emerging WebAssembly runtimes, but each exposes its own API. If you want to experiment with different engines, you have to rewrite the integration layer each time. WebAssembly4J provides a single API while allowing different runtime providers underneath. Goals of the project: Run WebAssembly from Java applications Allow cross-engine comparison of runtimes Make WebAssembly runtimes more accessible to Java developers Provide a stable interface while runtimes evolve Currently supported engines: Wasmtime WAMR Chicory GraalWasm To support both legacy and modern Java environments the project targets: Java 8 (JNI bindings) Java 11 Java 22+ (Panama support) Artifacts are published to Maven Central so they can be added directly to existing projects. I’d be very interested in feedback from people working on Java + WebAssembly integrations or runtime implementations. March 16, 2026 at 12:08AM
Show HN: Lockstep – A data-oriented programming language https://bit.ly/4lB6qEo
Show HN: Lockstep – A data-oriented programming language https://bit.ly/4lyvcF9 I want to share my work-in-progress systems language with a v0.1.0 release of Lockstep. It is a data-oriented systems programming language designed for high-throughput, deterministic compute pipelines. I built Lockstep to bridge the gap between the productivity of C and the execution efficiency of GPU compute shaders. Instead of traditional control flow, Lockstep enforces straight-line SIMD execution. You will not find any if, for, or while statements inside compute kernels; branching is entirely replaced by hardware-native masking and stream-splitting. Memory is handled via a static arena provided by the Host. There is no malloc, no hidden threads, and no garbage collection, which guarantees predictable performance and eliminates race conditions by construction. Under the hood, Lockstep targets LLVM IR directly to leverage industrial-grade optimization passes. It also generates a C-compatible header for easy integration with host applications written in C, C++, Rust, or Zig. v0.1.0 includes a compiler with LLVM IR and C header emission, a CLI simulator for validating pipeline wiring and cardinality on small datasets and an opt-in LSP server for real-time editor diagnostics, hover type info, and autocompletion. You can check out the repository to see the syntax, and the roadmap outlines where the project is heading next, including parameterized SIMD widths and multi-stage pipeline composition. I would love to hear feedback on the language semantics, the type system, and the overall architecture! https://bit.ly/4lyvcF9 March 16, 2026 at 01:14AM
Show HN: Open-source playground to red-team AI agents with exploits published https://bit.ly/4bawx1g
Show HN: Open-source playground to red-team AI agents with exploits published We build runtime security for AI agents. The playground started as an internal tool that we used to test our own guardrails. But we kept finding the same types of vulnerabilities because we think about attacks a certain way. At some point you need people who don't think like you. So we open-sourced it. Each challenge is a live agent with real tools and a published system prompt. Whenever a challenge is over, the full winning conversation transcript and guardrail logs get documented publicly. Building the general-purpose agent itself was probably the most fun part. Getting it to reliably use tools, stay in character, and follow instructions while still being useful is harder than it sounds. That alone reminded us how early we all are in understanding and deploying these systems at scale. First challenge was to get an agent to call a tool it's been told to never call. Someone got through in around 60 seconds without ever asking for the secret directly (which taught us a lot). Next challenge is focused on data exfiltration with harder defences: https://bit.ly/4b98dgc https://bit.ly/3PCKDjq March 15, 2026 at 11:29PM
Saturday, 14 March 2026
Show HN: Signet.js – A minimalist reactivity engine for the modern web https://bit.ly/3P7Oghg
Show HN: Signet.js – A minimalist reactivity engine for the modern web https://bit.ly/4uuhYwV March 15, 2026 at 03:58AM
Show HN: GrobPaint: Somewhere Between MS Paint and Paint.net https://bit.ly/472TcKq
Show HN: GrobPaint: Somewhere Between MS Paint and Paint.net https://bit.ly/47wryWg March 14, 2026 at 11:41PM
Show HN: Structural analysis of the D'Agapeyeff cipher (1939) https://bit.ly/4lwRcQA
Show HN: Structural analysis of the D'Agapeyeff cipher (1939) I am working on the D'Agapeyeff cipher, an unsolved cryptogram from 1939. Two findings that I haven't seen published before: 1. All 5 anomalous symbol values in the cipher cluster in the last column of a 14x14 grid. This turns out to be driven by a factor-of-2-and-7 positional pattern in the linear text. 2. Simulated annealing with Esperanto quadgrams (23M char Leipzig corpus) on a 2x98 columnar transposition consistently outscores English by 200+ points and recovers the same Esperanto vocabulary across independent runs. The cipher is not solved. But the combination of structural geometry and computational linguistics narrows the search space significantly. Work in progress, more to come! https://bit.ly/3PbwGc6 March 15, 2026 at 12:34AM
Friday, 13 March 2026
Show HN: Simple plugin to get Claude Code to listen to you https://bit.ly/4br70Qi
Show HN: Simple plugin to get Claude Code to listen to you Hey HN, My cofounder and I have gotten tired of CC ignoring our markdown files so we spent 4 days and built a plugin that automatically steers CC based on our previous sessions. The problem is usually post plan-mode. What we've tried: Heavily use plan mode (works great) CLAUDE.md, AGENTS.md, MEMORY.md Local context folder (upkeep is a pain) Cursor rules (for Cursor) claude-mem (OSS) -> does session continuity, not steering We use fusion search to find your CC steering corrections. - user prompt embeddings + bm25 - correction embeddings + bm25 - time decay - target query embeddings - exclusions - metadata hard filters (such as files) The CC plugin: - Automatically captures memories/corrections without you having to remind CC - Automatically injects corrections without you having to remind CC to do it. The plugin will merge, update, and distill your memories, and then inject the highest relevant ones after each of your own prompts. We're not sure if we're alone in this. We're working on some benchmarks to see how effective context injection actually is in steering CC and we know we need to keep improving extraction, search, and add more integrations. We're passionate about the real-time and personalized context layer for agents. Giving Agents a way to understand what you mean when you say "this" or "that". Bringing the context of your world, into a secure, structured, real-time layer all your agents can access. Would appreciate feedback on how you guys get CC to actually follow your markdown files, understand your modus operandi, feedback on the plugin, or anything else about real-time memory and context. - Ankur https://bit.ly/4bx6joK March 14, 2026 at 12:15AM
Show HN: Kube-pilot – AI engineer that lives in your Kubernetes cluster https://bit.ly/4lq1wK4
Show HN: Kube-pilot – AI engineer that lives in your Kubernetes cluster I built kube-pilot — an autonomous AI agent that runs inside your Kubernetes cluster and does the full dev loop: writes code, builds containers, deploys services, verifies they're healthy, and closes the ticket. You file a GitHub issue, it does the rest. What makes this different from AI coding tools: kube-pilot doesn't just generate code and hand it back to you. It lives inside the cluster with direct access to the entire dev stack — git, Tekton (CI/CD), Kaniko (container builds), ArgoCD (GitOps deployments), kubectl, Vault. Every tool call produces observable state that feeds into the next decision. The cluster isn't just where code runs — it's where the agent thinks. The safety model: all persistent changes go through git, so everything is auditable and reversible. ArgoCD is the only thing that writes to the cluster. Secrets stay behind Vault — the agent creates ExternalSecret references, never touches raw credentials. Credentials are scrubbed before reaching the LLM. Live demo: I filed GitHub issues asking it to build a 4-service office suite (auth, docs API, notification worker, API gateway). It built and deployed all of them autonomously. You can see the full agent loop — code, builds, deploys, verification, comments — on the closed issues: - https://bit.ly/4b8SihV... - https://bit.ly/4b8SiOX... - https://bit.ly/4lBAjEw... - https://bit.ly/4sPN6FP... One helm install gives you everything — the agent, Gitea (git + registry), Tekton, ArgoCD, Vault, External Secrets. No external dependencies. Coming next: Slack and Jira integrations (receive tasks and post updates where your team already works), Prometheus metrics and Grafana dashboards for agent observability, and Alertmanager integration so firing alerts automatically become issues that kube-pilot investigates and fixes. Early proof of concept. Rough edges. But it works. https://bit.ly/3Pk0p2x March 14, 2026 at 03:49AM
Show HN: I wrote my first neural network https://bit.ly/4ltOFGV
Show HN: I wrote my first neural network I have been interested in neural nets since the 90's. I've done quite a bit of reading, but never gotten around to writing code. I used Gemini in place of Wikipedia to fill in the gaps of my knowledge. The coolest part of this was learning about dual numbers. You can see in early commits that I did not yet know about auto-diff; I was thinking I'd have to integrate a CAS library or something. Now, I'm off to play with TensorFlow. https://bit.ly/4cGH7y9 March 14, 2026 at 01:21AM
Show HN: EdgeWhisper – On-device voice-to-text for macOS (Voxtral 4B via MLX) https://bit.ly/4cMsuJQ
Show HN: EdgeWhisper – On-device voice-to-text for macOS (Voxtral 4B via MLX) I built a macOS voice dictation app where zero bytes of audio ever leave your machine. EdgeWhisper runs Voxtral Mini 4B Realtime (Mistral AI, Apache 2.0) locally on Apple Silicon via the MLX framework. Hold a key, speak, release — text appears at your cursor in whatever app has focus. Architecture: - Native Swift (SwiftUI + AppKit). No Electron. - Voxtral 4B inference via MLX on the Neural Engine. ~3GB model, runs in ~2GB RAM on M1+. - Dual text injection: AXUIElement (preserves undo stack) with NSPasteboard+CGEvent fallback. - 6-stage post-processing pipeline: filler removal → dictionary → snippets → punctuation → capitalization → formatting. - Sliding window KV cache for unlimited streaming without latency degradation. - Configurable transcription delay (240ms–2.4s). Sweet spot at 480ms. What it does well: - Works in 20+ terminals/IDEs (VS Code, Xcode, iTerm2, Warp, JetBrains). Most dictation tools break in terminals — we detect them and switch injection strategy. - Removes filler words automatically ("um", "uh", "like"). - 13 languages with auto-detection. - Personal dictionary + snippet expansion with variable support (, ). - Works fully offline after model download. No accounts, no telemetry, no analytics. What it doesn't do (yet): - No file/meeting transcription (coming) - No translation (coming) - No Linux/Windows (macOS only, Apple Silicon required) Pricing: Free tier (5 min/day, no account needed). Pro at $7.99/mo or $79.99/yr. I'd love feedback on: 1. Would local LLM post-processing (e.g., Phi-4-mini via MLX) for grammar/tone be worth the extra ~1GB RAM? 2. For developers using voice→code workflows: what context would you want passed to your editor? 3. Anyone else building on Voxtral Realtime? Curious about your experience with the causal audio encoder. https://bit.ly/4bqT09c March 13, 2026 at 11:57PM
Show HN: What was the world listening to? Music charts, 20 countries (1940–2025) https://bit.ly/40pkPcZ
Show HN: What was the world listening to? Music charts, 20 countries (1940–2025) I built this because I wanted to know what people in Japan were listening to the year I was born. That question spiraled: how does a hit in Rome compare to what was charting in Lagos the same year? How did sonic flavors propagate as streaming made musical influence travel faster than ever? 88mph is a playable map of music history: 230 charts across 20 countries, spanning 8 decades (1940–2025). Every song is playable via YouTube or Spotify. It's open source and I'd love help expanding it — there's a link to contribute charts for new countries and years. The goal is to crowdsource a complete sonic atlas of the world. https://bit.ly/4s6DV3v March 10, 2026 at 05:18PM
Show HN: fftool – A Terminal UI for FFmpeg – Shows Command Before It Runs https://bit.ly/3NwJ71I
Show HN: fftool – A Terminal UI for FFmpeg – Shows Command Before It Runs https://bit.ly/4sCyUzG March 13, 2026 at 11:08AM
Thursday, 12 March 2026
Show HN: Global Maritime Chokepoints https://bit.ly/4sbPHdc
Show HN: Global Maritime Chokepoints https://bit.ly/4cLCnaE March 13, 2026 at 05:42AM
Show HN: Slop or not – can you tell AI writing from human in everyday contexts? https://bit.ly/4uwqqvW
Show HN: Slop or not – can you tell AI writing from human in everyday contexts? I’ve been building a crowd-sourced AI detection benchmark. Two responses to the same prompt — one from a real human (pre-2022, provably pre prevalence of AI slop on the internet), one generated by AI. You pick the slop. Three wrong and you’re out. The dataset: 16K human posts from Reddit, Hacker News, and Yelp, each paired with AI generations from 6 models across two providers (Anthropic and OpenAI) at three capability tiers. Same prompt, length-matched, no adversarial coaching — just the model’s natural voice with platform context. Every vote is logged with model, tier, source, response time, and position. Early findings from testing: Reddit posts are easy to spot (humans are too casual for AI to mimic), HN is significantly harder. I'll be releasing the full dataset on HuggingFace and I'll publish a paper if I can get enough data via this crowdsourced study. If you play the HN-only mode, you’re helping calibrate how detectable AI is on here specifically. Would love feedback on the pairs — are any trivially obvious? Are some genuinely hard? https://bit.ly/4upYoBV March 12, 2026 at 10:53PM
Wednesday, 11 March 2026
Show HN: A context-aware permission guard for Claude Code https://bit.ly/4cNXEk1
Show HN: A context-aware permission guard for Claude Code We needed something like --dangerously-skip-permissions that doesn’t nuke your untracked files, exfiltrate your keys, or install malware. Claude Code's permission system is allow-or-deny per tool, but that doesn’t really scale. Deleting some files is fine sometimes. And git checkout is sometimes not fine. Even when you curate permissions, 200 IQ Opus can find a way around it. Maintaining a deny list is a fool's errand. nah is a PreToolUse hook that classifies every tool call by what it actually does, using a deterministic classifier that runs in milliseconds. It maps commands to action types like filesystem_read, package_run, db_write, git_history_rewrite, and applies policies: allow, context (depends on the target), ask, or block. Not everything can be classified, so you can optionally escalate ambiguous stuff to an LLM, but that’s not required. Anything unresolved you can approve, and configure the taxonomy so you don’t get asked again. It works out of the box with sane defaults, no config needed. But you can customize it fully if you want to. No dependencies, stdlib Python, MIT. pip install nah && nah install https://bit.ly/4uo6cnR https://bit.ly/3PvHlOS March 12, 2026 at 12:26AM
Show HN:Conduit–Headless browser with SHA-256 hash chain - Ed25519 audit trails https://bit.ly/40qLGW8
Show HN:Conduit–Headless browser with SHA-256 hash chain - Ed25519 audit trails I've been building AI agent tooling and kept running into the same problem: agents browse the web, take actions, fill out forms, scrape data -- and there's zero proof of what actually happened. Screenshots can be faked. Logs can be edited. If something goes wrong, you're left pointing fingers at a black box. So I built Conduit. It's a headless browser (Playwright under the hood) that records every action into a SHA-256 hash chain and signs the result with Ed25519. Each action gets hashed with the previous hash, forming a tamper-evident chain. At the end of a session, you get a "proof bundle" -- a JSON file containing the full action log, the hash chain, the signature, and the public key. Anyone can independently verify the bundle without trusting the party that produced it. The main use cases I'm targeting: - *AI agent auditing* -- You hand an agent a browser. Later you need to prove what it did. Conduit gives you cryptographic receipts. - *Compliance automation* -- SOC 2, GDPR data subject access workflows, anything where you need evidence that a process ran correctly. - *Web scraping provenance* -- Prove that the data you collected actually came from where you say it did, at the time you say it did. - *Litigation support* -- Capture web content with a verifiable chain of custody. It also ships as an MCP (Model Context Protocol) server, so Claude, GPT, and other LLM-based agents can use the browser natively through tool calls. The agent gets browse, click, fill, screenshot, and the proof bundle builds itself in the background. Free, MIT-licensed, pure Python. No accounts, no API keys, no telemetry. GitHub: https://bit.ly/40mFlLj Install: `pip install conduit-browser` Would love feedback on the proof bundle format and the MCP integration. Happy to answer questions about the cryptographic design. March 12, 2026 at 12:15AM
Tuesday, 10 March 2026
Show HN: CryptoFlora – Visualize SHA256 to a flower using Rose curves https://bit.ly/4lkcFMp
Show HN: CryptoFlora – Visualize SHA256 to a flower using Rose curves I made this side tool to visualize SHA-256 while building a loyalty card wallet application to easily identify if a collected stamp is certified by the issuer by simply seeing it, instead of scanning something like a QR code or matching a serial number. I think there are more potential use cases, like creating a random avatar based on an email address or something else. Feel free to share your feedback :) source code: https://bit.ly/3Ngkfeo https://bit.ly/4cBM2jV March 11, 2026 at 04:52AM
Show HN: Readhn – AI-Native Hacker News MCP Server (Discover, Trust, Understand) https://bit.ly/3Nw8u3F
Show HN: Readhn – AI-Native Hacker News MCP Server (Discover, Trust, Understand) I felt frustrated finding high-signal discussions on HN, and I started this project to better understand how this community actually works. That led me to build readhn, an MCP server that helps with three things: - Discover: find relevant stories/comments by keyword, score, and time window - Trust: identify credible voices using EigenTrust-style propagation from seed experts - Understand: show why each result is ranked, with explicit signals instead of a black-box score It includes 6 tools: discover_stories, search, find_experts, expert_brief, story_brief, and thread_analysis. I also added readhn setup so AI agents can auto-configure it (Claude Code, Codex, Cursor, and others) after pip install. I’d love feedback on: 1) whether these ranking signals match how you evaluate HN quality, 2) trust-model tradeoffs, 3) what would make this useful in your daily workflow. If this is useful to you, starring the repo helps others discover it: https://bit.ly/40pmNKh https://bit.ly/40pmNKh March 11, 2026 at 01:49AM
Show HN: Claude Code Token Elo https://bit.ly/4s44FSx
Show HN: Claude Code Token Elo https://bit.ly/4ddLBfJ March 10, 2026 at 05:29AM
Show HN: Modulus – Cross-repository knowledge orchestration for coding agents https://bit.ly/3P3vAPB
Show HN: Modulus – Cross-repository knowledge orchestration for coding agents Hello HN, we're Jeet and Husain from Modulus ( https://bit.ly/4s9fGBW ) - a desktop app that lets you run multiple coding agents with shared project memory. We built it to solve two problems we kept running into: - Cross-repo context is broken. When working across multiple repositories, agents don't understand dependencies between them. Even if we open two repos in separate Cursor windows, we still have to manually explain the backend API schema while making changes in the frontend repo. - Agents lose context. Switching between coding agents often means losing context and repeating the same instructions again. Modulus shares memory across agents and repositories so they can understand your entire system. It's an alternative to tools like Conductor for orchestrating AI coding agents to build product, but we focused specifically on multi-repo workflows (e.g., backend repo + client repo + shared library repo + AI agents repo). We built our own Memory and Context Engine from the ground up specifically for coding agents. Why build another agent orchestration tool? It came from our own problem. While working on our last startup, Husain and I were working across two different repositories. Working across repos meant manually pasting API schemas between Cursor windows — telling the frontend agent what the backend API looked like again and again. So we built a small context engine to share knowledge across repos and hooked it up to Cursor via MCP. This later became Modulus. Soon, Modulus will allow teams to share knowledge with others to improve their workflows with AI coding agents - enabling team collaboration in the era of AI coding. Our API will allow developers to switch between coding agents or IDEs without losing any context. If you wanna see a quick demo before trying out, here is our launch post - https://bit.ly/3NtfI8E We'd greatly appreciate any feedback you have and hope you get the chance to try out Modulus. https://bit.ly/4s9fGBW March 10, 2026 at 07:52PM
Monday, 9 March 2026
Show HN: Latchup – Competitive programming for hardware description languages https://bit.ly/4bhDgFy
Show HN: Latchup – Competitive programming for hardware description languages https://bit.ly/4cEqHGt March 10, 2026 at 07:06AM
Show HN: I Was Here – Draw on street view, others can find your drawings https://bit.ly/4rUNti6
Show HN: I Was Here – Draw on street view, others can find your drawings Hey HN, I made a site where you can draw on street-level panoramas. Your drawings persist and other people can see them in real time. Strokes get projected onto the 3D panorama so they wrap around buildings and follow the geometry, not just a flat overlay. Uses WebGL2 for rendering, Mapillary for the street imagery. The idea is for it to become a global canvas, anyone can leave a mark anywhere and others stumble onto it. https://bit.ly/40TNruT March 10, 2026 at 06:04AM
Show HN: SAT Protocol – static social networking https://bit.ly/3PqUmJw
Show HN: SAT Protocol – static social networking https://bit.ly/4rXCy7f March 10, 2026 at 04:25AM
Show HN: ChatJC – chatbot for resume/LinkedIn/portfolio info https://bit.ly/3OZ9A8y
Show HN: ChatJC – chatbot for resume/LinkedIn/portfolio info https://bit.ly/4b3Iy8M March 10, 2026 at 01:37AM
Sunday, 8 March 2026
Show HN: Toolkit – Visual Simulators for How Internet Protocols and Systems Work https://bit.ly/4syfVWP
Show HN: Toolkit – Visual Simulators for How Internet Protocols and Systems Work https://bit.ly/4d7ddmL March 8, 2026 at 09:23PM
Saturday, 7 March 2026
Show HN: Jarvey - a local JARVIS for MacOS https://bit.ly/46LTYLE
Show HN: Jarvey - a local JARVIS for MacOS https://bit.ly/3OWd0Jh March 8, 2026 at 12:04AM
Show HN: SiClaw – Open-source AIOps with a hypothesis-driven diagnostic engine https://bit.ly/40dYpLH
Show HN: SiClaw – Open-source AIOps with a hypothesis-driven diagnostic engine https://bit.ly/4rYciJW March 8, 2026 at 03:27AM
Show HN: Help] I run 4 AI-driven companies simultaneously from my terminal https://bit.ly/4sC3Iki
Show HN: Help] I run 4 AI-driven companies simultaneously from my terminal https://bit.ly/4cAtEbg March 7, 2026 at 11:13PM
Show HN: MicroBin – Easy File Sharing for Everyone – Self-Hostable https://bit.ly/4b89DpR
Show HN: MicroBin – Easy File Sharing for Everyone – Self-Hostable https://bit.ly/3NlpnxY March 7, 2026 at 10:07PM
Friday, 6 March 2026
Show HN: mTile – native macOS window tiler inspired by gTile https://bit.ly/4cyeD9O
Show HN: mTile – native macOS window tiler inspired by gTile Built this with codex/claude because I missed gTile[1] from Ubuntu and couldn’t find a macOS tiler that felt good on a big ultrawide screen. Most mac options I tried were way too rigid for my workflow (fixed layouts, etc) or wanted a monthly subscription. gTile’s "pick your own grid sizes + keyboard flow" is exactly what I wanted and used for years. Still rough in places and not full parity, but very usable now and I run it daily at work (forced mac life). [1]: https://bit.ly/4rhJXNF https://bit.ly/40iPJUh March 6, 2026 at 11:21PM
Thursday, 5 March 2026
Show HN: Kanon 2 Enricher – the first hierarchical graphitization model https://bit.ly/4boHrAq
Show HN: Kanon 2 Enricher – the first hierarchical graphitization model Hey HN, This is Kanon 2 Enricher, the first hierarchical graphitization model. It represents an entirely new class of AI models designed to transform document corpora into rich, highly structured knowledge graphs. In brief, our model is capable of: - Entity extraction, classification, and linking: identifying key entities like individuals, companies, governments, locations, dates, documents, and more, and classifying and linking them together. - Hierarchical segmentation: breaking a document up into its full hierarchy, including divisions, sections, subsections, paragraphs, and so on. - Text annotation: extracting common textual elements such as headings, sigantures, tables of contents, cross-references, and the like. We built Kanon 2 Enricher from scratch. Every node, edge, and label in the Isaacus Legal Graph Schema (ILGS), which is the format it outputs to, corresponds to at least one task head in our model. In total, we built 58 different task heads jointly optimized with 70 different loss terms. Thanks to its novel architecture, unlike your typical LLM, Kanon 2 Enricher doesn't generate extractions token by token (which introduces the possibility of hallucinations) but instead directly classifies all the tokens in a document in a single shot. This makes it really fast. Because Kanon 2 Enricher's feature set is so wide, there are a myriad of applications it can be used for, from financial forensics and due diligence all the way to legal research. One of the coolest applications we've seen so far is where a Canadian government built a knowledge graph out of thousands of federal and provincial laws in order to accelerate regulatory analysis. Another cool application is something we built ourselves, a 3D interactive map of Australian High Court cases since 1903, which you can find right at the start of our announcement. Our model has already been in use for the past month, since we released it through a closed beta that included Harvey, KPMG, Clifford Chance, Clyde & Co, Alvarez & Marsal, Smokeball, and 96 other design partners. Their feedback was instrumental in improving Kanon 2 Enricher before its public release, and we're immensely thankful to each and every beta participant. We're eager to see what other developers manage to build with our model now that its out publicly. https://bit.ly/4ud0Aga March 3, 2026 at 09:55AM
Show HN: I built an AI exam prep platform for AWS certs after failing one myself https://bit.ly/4aY1AvG
Show HN: I built an AI exam prep platform for AWS certs after failing one myself Hey HN, I failed the AWS Advanced Networking Specialty exam. Studied for weeks, used the usual prep sites, thought I was ready — wasn't. The problem wasn't effort, it was the tools. Static question banks don't teach you to think through AWS architecture decisions. They teach you to pattern-match answers. That falls apart on the harder exams. So I built Knowza to fix that for myself, and then figured others probably had the same frustration. The idea: instead of a static question bank, use AI to generate questions, adapt to what you're weak on, and actually explain the reasoning behind each answer — the way a senior engineer would explain it, not a multiple choice rubric. The stack: Next.js + Amplify Gen 2 DynamoDB (direct Server Actions, no API layer) AWS Bedrock (Claude) for question generation and explanations Stripe for billing The hardest part honestly wasn't the AI — it was getting question quality consistent enough that I'd trust it for real exam prep. Still iterating on that. Early days, one person, built alongside a day job. Would love feedback from anyone who's grinded AWS certs or has thoughts on AI-generated educational content. knowza.ai https://bit.ly/3MMq0R7 March 5, 2026 at 09:27PM
Wednesday, 4 March 2026
Show HN: A shell-native cd-compatible directory jumper using power-law frecency https://bit.ly/4cz3be9
Show HN: A shell-native cd-compatible directory jumper using power-law frecency I have used this tool privately since 2011 to manage directory jumping. While it is conceptually similar to tools like z or zoxide, the underlying ranking model is different. It uses a power-law convolution with the time series of cd actions to calculate a history-aware "frecency" metric instead of the standard heuristic counters and multipliers. This approach moves away from point-estimates for recency. Most tools look only at the timestamp of the last visit, which can allow a "one-off" burst of activity to clobber long-term habits. By convolving a configurable history window (typically the last 1,000+ events), the score balances consistent habits against recent flukes. On performance: Despite the O(N) complexity of calculating decay for 1,000+ events, query time is ~20-30ms (Real Time) in ksh/bash, which is well below the threshold of perceived lag. I intentionally chose a Logical Path (pwd -L) model. Preserving symlink names ensures that the "Name" remains the primary searchable key. Resolving to physical paths often strips away the very keyword the user intends to use for searching. https://bit.ly/3N95WIu March 4, 2026 at 11:20AM
Tuesday, 3 March 2026
Show HN: DubTab – Live AI Dubbing in the Browser (Meet/YouTube/Twitch/etc.) https://bit.ly/4u1yiVL
Show HN: DubTab – Live AI Dubbing in the Browser (Meet/YouTube/Twitch/etc.) Hi HN — I’m Ethan, a solo developer. I built DubTab because I spend a lot of time in meetings and watching videos in languages I’m not fluent in, and subtitles alone don’t always keep up (especially when the speaker is fast). DubTab is a Chrome/Edge extension that listens to the audio of your current tab and gives you: 1.Live translated subtitles (optional bilingual mode) 2.Optional AI dubbing with a natural-sounding voice — so you can follow by listening, not just reading The goal is simple: make it easier to understand live audio in another language in real time, without downloading files or doing an upload-and-wait workflow. How you’d use it 1.Open a video call / livestream / lecture / any tab with audio 2.Start DubTab 3.Choose target language (and source language if you know it) 4.Use subtitles only, or turn on natural AI dubbing and adjust the audio mix (keep original, or duck it) What it’s good for 1.Following cross-language meetings/classes when you’re tired of staring at subtitles 2.Watching live content where you can’t pause/rewind constantly 3.Language learners who want bilingual captions to sanity-check meaning 4.Keeping up with live news streams on YouTube when events are unfolding in real time (e.g., breaking international updates like U.S./Iran/Israel-related developments) Link: https://bit.ly/40HBFUo I’ll be in the comments and happy to share implementation details if anyone’s curious. https://bit.ly/40HBFUo March 4, 2026 at 02:04AM
Show HN: I built a LLM human rights evaluator for HN (content vs. site behavior) https://bit.ly/4l4c4yi
Show HN: I built a LLM human rights evaluator for HN (content vs. site behavior) My health challenges limit how much I can work. I've come to think of Claude Code as an accommodation engine — not in the medical-paperwork sense, but in the literal one: it gives me the capacity to finish things that a normal work environment doesn't. Observatory was built in eight days because that kind of collaboration became possible for me. (I even used Claude Code to write this post — but am only posting what resonates with me.) Two companion posts: on the recursive methodology ( https://bit.ly/409tFeD... ) and what 806 evaluated stories reveal ( https://bit.ly/4r7k9DW... ). I built Observatory to automatically evaluate Hacker News front-page stories against all 31 provisions of the UN Universal Declaration of Human Rights — starting with HN because its human-curated front page is one of the few feeds where a story's presence signals something about quality, not just virality. It runs every minute: https://bit.ly/4aKNMpG . Claude Haiku 4.5 handles full evaluations; Llama 4 Scout and Llama 3.3 70B on Workers AI run a lighter free-tier pass. The observation that shaped the design: rights violations rarely announce themselves. An article about a company's "privacy-first approach" might appear on a site running twelve trackers. The interesting signal isn't whether an article mentions privacy — it's whether the site's infrastructure matches its words. Each evaluation runs two parallel channels. The editorial channel scores what the content says about rights: which provisions it touches, direction, evidence strength. The structural channel scores what the site infrastructure does: tracking, paywalls, accessibility, authorship disclosure, funding transparency. The divergence — SETL (Structural-Editorial Tension Level) — is often the most revealing number. "Says one thing, does another," quantified. Every evaluation separates observable facts from interpretive conclusions (the Fair Witness layer, same concept as fairwitness.bot — https://bit.ly/43DzQKs ). You get a facts-to-inferences ratio and can read exactly what evidence the model cited. If a score looks wrong, follow the chain and tell me where the inference fails. Per our evaluations across 805 stories: only 65% identify their author — one in three HN stories without a named author. 18% disclose conflicts of interest. 44% assume expert knowledge (a structural note on Article 26). Tech coverage runs nearly 10× more retrospective than prospective: past harm documented extensively; prevention discussed rarely. One story illustrates SETL best: "Half of Americans now believe that news organizations deliberately mislead them" (fortune.com, 652 HN points). Editorial: +0.30. Structural: −0.63 (paywall, tracking, no funding disclosure). SETL: 0.84. A story about why people don't trust media, from an outlet whose own infrastructure demonstrates the pattern. The structural channel for free Llama models is noisy — 86% of scores cluster on two integers. The direction I'm exploring: TQ (Transparency Quotient) — binary, countable indicators that don't need LLM interpretation (author named? sources cited? funding disclosed?). Code is open source: https://bit.ly/3MJJANP — the .claude/ directory has the cognitive architecture behind the build. Find a story whose score looks wrong, open the detail page, follow the evidence chain. The most useful feedback: where the chain reaches a defensible conclusion from defensible evidence and still gets the normative call wrong. That's the failure mode I haven't solved. My background is math and psychology (undergrad), a decade in software — enough to build this, not enough to be confident the methodology is sound. Expertise in psychometrics, NLP, or human rights scholarship especially welcome. Methodology, prompts, and a 15-story calibration set are on the About page. Thanks! https://bit.ly/4aKNMpG March 4, 2026 at 01:26AM
Show HN: Interactive WordNet Visualizer-Explore Semantic Relations as a Graph https://bit.ly/4l9DCCr
Show HN: Interactive WordNet Visualizer-Explore Semantic Relations as a Graph https://bit.ly/4l7NYTv March 3, 2026 at 10:17PM
Monday, 2 March 2026
Show HN: An Auditable Decision Engine for AI Systems https://bit.ly/4r0ct6d
Show HN: An Auditable Decision Engine for AI Systems https://bit.ly/4rKkhKt March 3, 2026 at 03:03AM
Show HN: PHP 8 disable_functions bypass PoC https://bit.ly/4coTizr
Show HN: PHP 8 disable_functions bypass PoC https://bit.ly/4ckhq6k March 3, 2026 at 02:12AM
Show HN: We filed 99 patents for deterministic AI governance(Prior Art vs. RLHF) https://bit.ly/3OHLRtr
Show HN: We filed 99 patents for deterministic AI governance(Prior Art vs. RLHF) For the last few months, we've been working on a fundamental architectural shift in how autonomous agents are governed. The current industry standard relies almost entirely on probabilistic alignment (RLHF, system prompts, constitutional training). It works until it's jailbroken or the context window overflows. A statistical disposition is not a security boundary. We've built an alternative: Deterministic Policy Gates. In our architecture, the LLM is completely stripped of execution power. It can only generate an "intent payload." That payload is passed to a process-isolated, deterministic execution environment where it is evaluated against a cryptographically hashed constraint matrix (the constitution). If it violates the matrix, it is blocked. Every decision is then logged to a Merkle-tree substrate (GitTruth) for an immutable audit trail. We filed 99 provisional patents on this architecture starting January 10, 2026. Crucially, we embedded strict humanitarian use restrictions directly into the patent claims themselves (The Peace Machine Mandate) so the IP cannot legally be used for autonomous weapons, mass surveillance, or exploitation. I wrote a full breakdown of the architecture, why probabilistic safety is a dead end, and the timeline of how we filed this before the industry published their frameworks: Read the full manifesto here: https://bit.ly/4l5y3Vx... The full patent registry is public here: https://bit.ly/4l1JNbI I'm the founder and solo inventor. Happy to answer any questions about the deterministic architecture, the Merkle-tree state persistence, or the IP strategy of embedding ethics directly into patent claims. March 2, 2026 at 11:56PM
Show HN: Open-Source Postman for MCP https://bit.ly/4l4lxG3
Show HN: Open-Source Postman for MCP https://bit.ly/40EKzC1 March 3, 2026 at 12:40AM
Sunday, 1 March 2026
Show HN: Vibe Code your 3D Models https://bit.ly/4aYHwto
Show HN: Vibe Code your 3D Models Hi HN, I’m the creator of SynapsCAD, an open-source desktop application I've been building that combines an OpenSCAD code editor, a real-time 3D viewport, and an AI assistant. You can write OpenSCAD code, compile it directly to a 3D mesh, and use an LLM (OpenAI, Claude, Gemini, ...) to modify the code through natural language. Demo video: https://www.youtube.com/watch?v=cN8a5UozS5Q A bit about the architecture: - It’s built entirely in Rust. - The UI and 3D viewport are powered by Bevy 0.15 and egui. - It uses a pure-Rust compilation pipeline (openscad-rs for parsing and csgrs for constructive solid geometry rendering) so there are no external tools or WASM required. - Async AI network calls are handled by Tokio in the background to keep the Bevy render loop smooth. Disclaimer: This is a very early prototype. The OpenSCAD parser/compiler doesn't support everything perfectly yet, so you will definitely hit some rough edges if you throw complex scripts at it. I mostly just want to get this into the hands of people who tinker with CAD or Rust. I'd be super happy for any feedback, architectural critiques, or bug reports—especially if you can drop specific OpenSCAD snippets that break the compiler in the GitHub issues! GitHub (Downloads for Win/Mac/Linux): https://bit.ly/3MDl1Cd Happy to answer any questions about the tech stack or the roadmap! https://bit.ly/3MDl1Cd February 27, 2026 at 06:27PM
Show HN: Logira – eBPF runtime auditing for AI agent runs https://bit.ly/3MP5orl
Show HN: Logira – eBPF runtime auditing for AI agent runs I started using Claude Code (claude --dangerously-skip-permissions) and Codex (codex --yolo) and realized I had no reliable way to know what they actually did. The agent's own output tells you a story, but it's the agent's story. logira records exec, file, and network events at the OS level via eBPF, scoped per run. Events are saved locally in JSONL and SQLite. It ships with default detection rules for credential access, persistence changes, suspicious exec patterns, and more. Observe-only – it never blocks. https://bit.ly/4sgvLW1 https://bit.ly/4sgvLW1 March 2, 2026 at 12:25AM
Subscribe to:
Comments (Atom)