Saturday, 22 July 2023

Show HN: Vanity, Recognition and Fighting Perfectionism – Buildlog for Git Vain https://bit.ly/3OmSyyr

Show HN: Vanity, Recognition and Fighting Perfectionism – Buildlog for Git Vain https://bit.ly/3Q8jt25 July 22, 2023 at 09:49AM

Friday, 21 July 2023

Show HN: I trained a 65B LLM on my texts to talk to myself (details inside) https://bit.ly/3Q5M4VS

Show HN: I trained a 65B LLM on my texts to talk to myself (details inside) I trained the 65b model on my texts so I can talk to myself. It's pretty useless as an assistant, and will only do stuff you convince it to, but I guess it's technically uncensored? I'll leave it up for a bit if you want to chat with it. I posted this to Reddit and had several hundred people talking to it. Salient points from that discussion: LLAMA 1 65b Rank 128 5 epochs Batch size 1, 256 cutoff Trained in the Oobabooga suite using bitsandbytes 4-bit quantization for the lora Loss around 1.5 seems to give the most coherent results Trained on raw text dumps that is then parsed by a crappy Blazor Server app I threw together in a few hours. Text format is just "Sender:The Message\n" Trained on 2x 3090 Training took about 16 hours at a 90% power cap on the 3090's Trained on ~30k texts (I talk a lot, that was just 2 years) There's nothing telling it that it's a robot, though it sometimes seems to know It was largely inspired by the Unreal Engine lora tutorial I generated a list of fake names and addresses, pulled a list of my contacts, and then scripted out swapping the names and addresses for fictitious PII. I don't really send other sensitive data through text and my account is so thoroughly associated with my real name/location that the data leakage risk is manageable for the short period of time I'll have this available. It tends to halucinate fake PII as well which I think is partially a side effect of the data scrubbing. You'll notice it says things like that I live at 420 Ligma. I'll need to mix in some actual assistant tasks to the dataset before it will actually be useful as an assistant. Right now it's largely just for idle conversation. It's pretty ADHD and will randomly go off on its own tangents. I don't think it's the model. I think I just talk like that. Let me know if you have any questions or comments. I built it for myself, but figured I'll let the communities that have taught and entertained me so much play with it a little, too. Note: it says some pretty unhinged stuff. There's absolutely no guardrails. It also tends to talk like you're already friends with history. https://bit.ly/3K9JA4S July 21, 2023 at 05:01PM

Show HN: Guiding LLM outputs using Zod https://bit.ly/44T95PT

Show HN: Guiding LLM outputs using Zod https://bit.ly/3JyuUMJ July 21, 2023 at 10:02PM

Show HN: Datalake for Computer Vision Projects https://bit.ly/3O1x0pC

Show HN: Datalake for Computer Vision Projects Buddhika, Kelum, and Chong Han here. We are building a self-hosted data infrastructure platform for computer vision. Our community page is https://bit.ly/43DZkUG In the past, we worked on a couple of high-scale computer vision projects in retail, farming, and hospitals in various capacities. These projects involved 2D object sections, 3D object tracking, and more advanced 3D perception. Like other CV Engineers, we observed a common factor during these projects: one needs a large volume of high-quality data to build a production-deployable CV system. Our biggest challenge was not having a robust data infrastructure to handle large volumes of data. Our S3 buckets were like a data swamp; we had so much raw image and video in storage buckets without tracking. Instead of working on CV, we had to develop tools for data operations. We understand that many of us have our own custom scripts and stitch them together to make things happen in the CV pipeline. However, it is brittle and cumbersome to maintain. We wanted to build a system on top of the cloud buckets such as S3 that store all file indexes, labels, metadata attributes, inference outputs, model training outcomes, and literally anything related to machine learning/computer vision. This makes it possible for us to search for anything and consume efficiently. This behaves as a DataLake (by the way, "DataLake" is an overused term). All other downstream processes in the CV pipeline can access data more efficiently via SDK and can also return data back to the Lake (e.g., training/inference outcomes). The reason we made it self-hosted is to address data security and privacy concerns. Since data is fundamental to AI, we believe that companies and organizations should have complete control over it. Currently, we support AWS, GCP, and Azure cloud buckets; soon, we will support local storage. We ship this as a Docker container so you can just install it on any VM or local server. The installation script will do all the configuration automatically. The Python SDK and documentation are available but not perfect yet. We’ve launched this under MIT and Elastic licenses so any developer can use it. Our goal is not to charge individual developers. We make money by charging a license fee for things like multiple users, multiple buckets, scalability with K8, and providing support. Give it a try: https://bit.ly/43DZkUG Let us know what you think. July 22, 2023 at 12:15AM

Show HN: Qwokka – see what's great on Netflix https://bit.ly/46VEHGD

Show HN: Qwokka – see what's great on Netflix https://bit.ly/470UmnQ July 21, 2023 at 04:20PM

Show HN: Primo – a visual CMS with Svelte blocks, a code editor, and SSG https://bit.ly/3rEJuft

Show HN: Primo – a visual CMS with Svelte blocks, a code editor, and SSG https://bit.ly/3O1aSMa July 21, 2023 at 01:38PM

Show HN: A non-VC backed content creation and social media platform https://bit.ly/3O10kg6

Show HN: A non-VC backed content creation and social media platform Hey HN, I'm soft launching my MVP today and would love to hear your honest feedback. For the past few months I've been working extremely hard on this side hustle in my spare time (I have a day job as a CTO). I'm building a platform for writers, bloggers and content creators that's built for them rather than for investors and advertisers like most similar products and social media platforms backed by VCs. I wrote about why I built it here: https://bit.ly/3NYY09j And the landing page is here: https://bit.ly/3OmExQf I would really love your honest feedback, if you care to share them with me. :) In the next week or two I'll write about how I built this MVP in three months - the tech, the architecture, the experiments and the missteps... If you are curious, stay tuned! July 21, 2023 at 01:22PM

Show HN: LibreScroll – enable flywheel-scrolling on any generic mouse https://bit.ly/3Y2ZXGh

Show HN: LibreScroll – enable flywheel-scrolling on any generic mouse Based on the framerate-independent momentum simulation[0] that I used in my TPMouse script[1] If you've ever used a mouse with Infinite-scrollwheel such as Logitech, this utility for Windows basically recreates that functionality for any generic mouse. Actually, it's even better than that: this allows for simultaneous horizontal and vertical scrolling, so essentially it combines two of the best features of the Logitech MX Master -- horizontal wheel, and unlocked momentum scrolling -- into one intuitive control scheme. To enable horizontal scrolling, set the X-sensitivity to a value you prefer. [0] https://bit.ly/3Y18HNa... [1] https://bit.ly/3SsrjkW https://bit.ly/3XXzEkZ July 19, 2023 at 10:36AM

Thursday, 20 July 2023

Show HN: Open Video Game Data: A new approach to evaluating games https://bit.ly/3K679f0

Show HN: Open Video Game Data: A new approach to evaluating games > Introduction Our idea is to offer an alternative to well-known sites like Metacritic and OpenCritic, but with a different approach. Instead of being a score aggregator, we will be a list aggregator. Metacritic brings together reviews from multiple review sites in one place, providing a final score of 0-100 based on a weighted arithmetic average, where some critics carry more weight than others. An alternative to Metacritic is OpenCritic, where all critics are weighted equally in the final average. However, both still work with numeric scores. > Why relying on scores can be problematic? - Ratings only reflect the state of the game at launch Today, more than ever, games are constantly evolving. It is common to have "patch day one", that is, games released with bugs and incomplete content. However, with time and help from the community, these games can be improved, as was the case with No Man's Sky. When No Man's Sky was released in 2016, its average on Metacritic was just 61, due to the troubled release. However, over the years, the game has evolved significantly with updates, but its Metacritic score remains frozen at 61. Alternative: As the lists are constantly evolving and updating, they more accurately reflect the current quality of the games, tracking their improvements and changes over time. - The average score can be unfair as it is based on the amount of critics Sometimes, the amount of crits heavily influences a game's rating. An example of this is The Legend of Zelda: Ocarina of Time, with an average of 99 on Metacritic, based on 22 critics. While The Legend of Zelda: Breath of the Wild averages a 97, based on 109 critics. Getting a high average based on a large number of critics is extremely difficult, and this can influence the overall perception of a game. Alternative: When a final list is created, all games have an equal chance of appearing in different lists. For example, game A might be included in 3 out of 11 lists, while game B might be mentioned in 5 out of 11 lists. The total amount of lists will always be the same for all games. - Relying on an average can be inaccurate Metacritic converts the different rating scales of review sites into a single percentage-based quantitative scale. However, this conversion can be inaccurate and unfair, as each site uses different rating systems. This approach can result in important information being lost during conversion, affecting the accuracy of the final result. Alternative: With our ranked lists approach, we eliminate the need to convert rating systems, as all lists, regardless of site, follow the same common logic. In all lists, there will always be first place, second place, and so on. > A great alternative: *Open Video Game Data* Our site aims to be just another alternative to note-based sites. Our approach to aggregating lists allows users to have a more comprehensive and up-to-date view of games as these lists are constantly updated by the community. The calculation method is quite simple and transparent. All lists on the site have a maximum size of 15 games. When a game ranks first in a list, it is rewarded with 15 points, while if it ranks last, it only receives 1 point. > Conclusion Open Video Game Data seeks to provide gamers and game enthusiasts with a reliable tool to make informed decisions about which games to play, taking into account critics' opinions and the ongoing evolution of the gaming industry. With the active participation of the community, users can add critic lists and can also create personal lists that are also aggregated, we hope to build an inclusive and reference platform for the gaming community, promoting a more complete and updated analysis about the games that so much we love. Come be part of our community! Create an account and join us to explore the world of playlists. Welcome to Open Video Game Data! Visit us at: https://bit.ly/44AxY38 July 21, 2023 at 01:26AM

Show HN: A fine-tuned Stable Diffusion model for generating Minecraft skins https://bit.ly/3O0fKkK

Show HN: A fine-tuned Stable Diffusion model for generating Minecraft skins https://bit.ly/44z4EKu July 20, 2023 at 11:40PM

Show HN: RAGstack – private ChatGPT for enterprise VPCs, built with Llama 2 https://bit.ly/44UFGoO

Show HN: RAGstack – private ChatGPT for enterprise VPCs, built with Llama 2 Hey hacker news, We’re the cofounders at Psychic.dev ( https://bit.ly/3O8lVDs ) where we help companies connect LLMs to private data. With the launch of Llama 2, we think it’s finally viable to self-host an internal application that’s on-par with ChatGPT, so we did exactly that and made it an open source project. We also included a vector DB and API server so you can upload files and connect Llama 2 to your own data. The RAG in RAGstack stands for Retrieval Augmented Generation, a technique where the capabilities of a large language model (LLM) are augmented by retrieving information from other systems and inserting them into the LLM’s context window via a prompt. This gives LLMs information beyond what was provided in their training data, which is necessary for almost every enterprise application. Examples include data from current web pages, data from SaaS apps like Confluence or Salesforce, and data from documents like sales contracts and PDFs. RAG works better than fine-tuning the model because it’s cheaper, it’s faster, and it’s more reliable since the provenance of information is attached to each response. While there are quite quite a few “chat with your data” apps at this point, most have external dependencies to APIs like OpenAI or Pinecone. RAGstack, on the other hand, only has open-source dependencies and lets you run the entire stack locally or on your cloud provider. This includes: - Containerizing LLMs like Falcon, Llama2, and GPT4all with Truss - Vector search with Qdrant. - File parsing and ingestion with Langchain, PyMuPDF, and Unstructured.io - Cloud deployment with Terraform If you want to dive into it yourself, we also published a couple of tutorials on how to deploy open source LLMs for your organization, and optionally give it access to internal documents without any data ever leaving your VPC. - How to deploy Llama 2 to Google Cloud (GCP): https://bit.ly/44BBESu... - How to connect Llama 2 to your own data using RAGstack: https://bit.ly/44zzBy5... Let a thousand private corporate oracles bloom! https://bit.ly/3rtwglu July 20, 2023 at 06:11PM

Show HN: Keeper – GPLv3 app to store your personal info based on YAML templates https://bit.ly/44UEIIU

Show HN: Keeper – GPLv3 app to store your personal info based on YAML templates Keeper is a command line GPLv3 application, programmed in Go, designed to privately store your personal info using custom formats described in YAML templates. Using Sqlite as the backend. https://bit.ly/3OiHJNL July 20, 2023 at 08:23AM

Show HN: I built a 'newspaper' that summarizes current events with GPT https://bit.ly/3DoxRM4

Show HN: I built a 'newspaper' that summarizes current events with GPT Hi HN! I built a proof of concept 'newspaper' that creates short news articles of today's current events with GPT. This is a very rough proof of concept but I can't help but think this concept is the future. I think we'll all have AI daily newspapers in whatever theme we want, covering any amount of information we want, with the ability to expand or contract article length instantly. Sharing just to get everyone thinking about this future. Thanks for reading! https://bit.ly/3DmWzfP July 20, 2023 at 11:29AM

Show HN: PDF Differ https://bit.ly/3DmRTXu

Show HN: PDF Differ https://bit.ly/3DlGRBJ July 20, 2023 at 11:46AM

Wednesday, 19 July 2023

Show HN: Scenic – explore and find interesting places along a route using GPT4 https://bit.ly/3Q0IOLl

Show HN: Scenic – explore and find interesting places along a route using GPT4 Hi HN, Made a small app because I found myself doing this manually quite a few times using GPT-4. You put in where you're starting and where you're going, and what kind of places you want to see along the way, and it'll find places and identify a reasonable(ish) path. It's very simple but it has some nice features (e.g. you can get a Google Maps link). The place features you search for can be basically anything and can be as opinionated as you'd like. This is just for fun, but please let me know if you have any suggestions! https://bit.ly/3K101Ar July 19, 2023 at 07:08PM

Show HN: Infisical – open-source secret management platform https://bit.ly/46XswZW

Show HN: Infisical – open-source secret management platform Hi HN, we’re the founders of Infisical, the open source secret management platform – it provides an end-to-end set of tools to manage your secrets across your team and infrastructure ( https://bit.ly/3G4HVMD ). Excited to show you all the progress that we’ve made in the past few months after our Launch HN in February ( https://bit.ly/3Dks2zj ) and Show HN in December ( https://bit.ly/3Wa8AND ). During the previous Show HN and Launch HN, we received a ton of feedback which helped us improve Infisical. We’ve since released: - Secret scanning: a new toolset to block commits with hardcoded secrets and continuously monitor your code. - Folders: Deeper organizational structure within projects to accommodate for microservice architectures and storage of more secret types like user API keys and OAuth tokens. - Node and Python SDKs, Webhooks: More ways to integrate and start syncing secrets with Infisical across your infrastructure. - Integrations with Terraform, Supabase, Railway, Checkly, Cloudflare Pages, Azure Key Vault, Laravel Forge, and more. - Secret Referencing and Importing: to create a proper single source of truth. - 1-click deployments to AWS EC2, Digital Ocean, Render, [Fly.io]( https://bit.ly/44x117U ): More ways to self-host Infisical on your own infrastructure. In addition, the platform has become more stable and undergone a full-coverage penetration test; we’ve also begun the SOC 2 (Type II) certification process. Overall, we’re really lucky to have support of the developer community, and, in fact, Infisical has gathered over 7k GitHub stars, and now processes over 200 million secrets per month for everyone from solo developers to public enterprises. Our repo is published under the MIT license so any developer can use Infisical. Again, the goal is to not charge individual developers. We make money by charging a license fee for some enterprise features as well as providing a hosted version and support. Check out Infisical Cloud ( https://bit.ly/3G4HVMD ) or self-host Infisical on your own infrastructure ( https://bit.ly/3G4HZMn ). We’d love to hear what you think! We’re excited to continue building Infisical, and keep shipping features for you. Please let us know if you have any thoughts, feedback, or feature suggestions! https://bit.ly/3G4HVMD July 19, 2023 at 04:40PM

Show HN: Efficient intermediate data sharing for Kedro pipelines https://bit.ly/3OmveRx

Show HN: Efficient intermediate data sharing for Kedro pipelines Data processing pipelines are becoming increasingly complex, and intermediate data sharing is becoming the bottleneck, especially for data-intensive analytics and data preprocessing in machine learning and AI. This blog shows the possibility of efficient data sharing in data science pipelines, which naturally fits the settings of Kubernetes. It demonstrates how existing codebases can benefit from it without requiring an overhaul of the engineering effort. https://bit.ly/3DlW8CM July 19, 2023 at 10:53AM

Show HN: Hash functions from C++ running in WebAssembly https://bit.ly/3q2WyKW

Show HN: Hash functions from C++ running in WebAssembly https://bit.ly/43rDeEY July 19, 2023 at 11:56AM

Show HN: I created a platform to rally my community https://bit.ly/3K5gGmq

Show HN: I created a platform to rally my community https://bit.ly/46SQdm5 July 19, 2023 at 07:14AM

Show HN: ProseMirror.Net https://bit.ly/43F4Jv2

Show HN: ProseMirror.Net We've released a translation of the core ProseMirror projects to C#! Currently we are utilizing this library in our DotNet backend to map collab edits and verify schema compliance for client submitted steps. It's not a focus of ours, but it will be interesting to see how this might get used on platforms C# runs on natively; like IOS or Android.. https://bit.ly/46S91BV July 19, 2023 at 12:48AM