Monday, 13 July 2020

Launch HN: Aquarium (YC S20) – Improve Your ML Dataset Quality https://bit.ly/3gVJtch

Launch HN: Aquarium (YC S20) – Improve Your ML Dataset Quality Hi everyone! I’m Peter from Aquarium ( https://bit.ly/2ATWWSy ). We help deep learning developers find problems in their datasets and models, then help fix them by smartly curating their datasets. We want to build the same high-power tooling for data curation that sophisticated ML companies like Cruise, Waymo, and Tesla have and bring it to the masses. ML models are defined by a combination of code and the data that the code trains on. A programmer must think hard about what behavior they want from their model, assemble a dataset of labeled examples of what they want their model to do, and then train their model on that dataset. As they encounter errors in production, they must collect and label data for the model to train on to fix these errors, and verify they're fixed by monitoring the model’s performance on a test set with previous failure cases. See Andrej Karpathy’s Software 2.0 article ( https://bit.ly/2C64Okw ) for a great description of this workflow. My cofounder Quinn and I were early engineers at Cruise Automation (YC W14), where we built the perception stack + ML infrastructure for self driving cars. Quinn was tech lead of the ML infrastructure team and I was tech lead for the Perception team. We frequently ran into problems with our dataset that we needed to fix, and we found that most model improvement came from improvement to a dataset’s variety and quality. Basically, ML models are only as good as the datasets they’re trained on. ML datasets need variety so the model can train on the types of data that it will see in production environments. In one case, a safety driver noticed that our car was not detecting green construction cones. Why? When we looked into our dataset, it turned out that almost all of the cones we had labeled were orange. Our model had not seen many examples of green cones at training time, so it was performing quite badly on this object in production. We found and labeled more green cones into our training dataset, retrained the model, and it detected green cones just fine. ML datasets need clean and consistent data so the model does not learn the wrong behavior. In another case, we retrained our model on a new batch of data that came from our labelers and it was performing much worse on detecting “slow signs” in our test dataset. After days of careful investigation, we realized it was due to a change to our labeling process that caused our labelers to label many “speed limit signs” as “slow signs,” which was confusing the model and causing it to perform badly on detecting “slow signs.” We fixed our labeling process, did an additional QA pass over our dataset to fix the bad labels, retrained our model on the clean data, and the problems went away. While there’s a lot of tooling out there to debug and improve code, there’s not a lot of tooling to debug and improve datasets. As a result, it’s extremely painful to identify issues with variety and quality and appropriately modify datasets to fix them. ML engineers often encounter scenarios like: Your model’s accuracy measured on the test set is at 80%. You abstractly understand that the model is failing on the remaining 20% and you have no idea why. Your model does great on your test set but performs disastrously when you deploy it to production and you have no idea why. You retrain your model on some new data that came in, it’s worse, and you have no idea why. ML teams want to understand what’s in their datasets, find problems in their dataset and model performance, and then edit / sample data to fix these problems. Most teams end up building their own one-off tooling in-house that isn’t very good. This tooling typically relies on naive methods of data curation that are really manual and involve “eyeballing” many examples in your dataset to discover labeling errors / failure patterns. This works well for small datasets but starts to fail as your dataset size grows above a few thousand examples. Aquarium’s technology relies on letting your trained ML model do the work of guiding what parts of the dataset to pay attention to. Users can get started by submitting their labels and corresponding model predictions through our API. Then Aquarium lets users drill into their model performance - for example, visualize all examples where we confused a labeled car for a pedestrian from this date range - so users can understand the different failure modes of a model. Aquarium also finds examples where your model has the highest loss / disagreement with your labeled dataset, which tends to surface many labeling errors (ie, the model is right and the label is wrong!). Users can also provide their model's embeddings for each entry, which are an anonymized representation of what their model “thought” about the data. The neural network embeddings for a datapoint (generated by either our users’ neural networks or by our stable of pretrained nets) encode the input data into a relatively short vector of floats. We can then identify outliers and group together examples in a dataset by analyzing the distances between these embeddings. We also provide a nice thousand-foot-view visualization of embeddings that allows users to zoom into interesting parts of their dataset. ( https://youtu.be/DHABgXXe-Fs?t=139 ) Since embeddings can be extracted from most neural networks, this makes our platform very general. We have successfully analyzed dataset + models operating on images, 3D point clouds from depth sensors, and audio. After finding problems, Aquarium helps users solve them by editing or adding data. After finding bad data, Aquarium integrates into our users’ labeling platforms to automatically correct labeling errors. After finding patterns of model failures, Aquarium samples similar examples from users’ unlabeled datasets (green cones) and sends those to labeling. Think about this as a platform for interactive learning. By focusing on the most “important” areas of the dataset that the model is consistently getting wrong, we increase the leverage of ML teams to sift through massive datasets and decide on the proper corrective action to improve their model performance. Our goal is to build tools to reduce or eliminate the need for ML engineers to handhold the process of improving model performance through data curation - basically, Andrej Karpathy’s Operation Vacation concept ( https://youtu.be/g2R2T631x7k?t=820 ) as a service. If any of those experiences speak to you, we’d love to hear your thoughts and feedback. We’ll be here to answer any questions you might have! July 13, 2020 at 04:05PM

Show HN: A Simple Search Engine https://bit.ly/2OjqTym

Show HN: A Simple Search Engine https://bit.ly/3fwbXcw July 13, 2020 at 03:58PM

Show HN: Income/savings calculator for moving to Canada https://bit.ly/2Zmo3iG

Show HN: Income/savings calculator for moving to Canada https://bit.ly/3erTlJb July 13, 2020 at 03:47PM

Show HN: Simple Google Login in Go https://bit.ly/2Dvhth8

Show HN: Simple Google Login in Go https://bit.ly/2OiNlaT July 13, 2020 at 11:35AM

Show HN: Soup.io Downloader https://bit.ly/3fqNiFU

Show HN: Soup.io Downloader https://bit.ly/32eqZQk July 13, 2020 at 10:11AM

Show HN: Primo – all-in-one IDE, CMS, component library, static site generator https://bit.ly/32aAlfO

Show HN: Primo – all-in-one IDE, CMS, component library, static site generator https://bit.ly/2OmurzW July 13, 2020 at 01:51PM

Show HN: A thread hierarchy management library in C https://bit.ly/32bmHcb

Show HN: A thread hierarchy management library in C https://bit.ly/3fvC1Va July 13, 2020 at 01:21PM

Sunday, 12 July 2020

Show HN: Computer Vision Boilerplate (CVB) https://bit.ly/38UeTx4

Show HN: Computer Vision Boilerplate (CVB) https://bit.ly/329mgiI July 10, 2020 at 06:05PM

SZA Narrates How She Caught Her Man Cheating on Her with Her Friend; Keke Palmer Comes in with the Plot Twist

SZA has recounted the heartbreaking experience of catching her man cheating on her with her home girl. The singer who trended on social media a few days ago after she crowned herself queen of R’n’B shared the tale in a Twitter thread. SZA revealed that she had been invited to a party by this particular friend […]

The post SZA Narrates How She Caught Her Man Cheating on Her with Her Friend; Keke Palmer Comes in with the Plot Twist appeared first on Best9jamusic.



source https://www.best9jamusic.com.ng/entertainment/sza-narrates-how-she-caught-her-man-cheating-on-her-with-her-friend-keke-palmer-comes-in-with-the-plot-twist/

Everything That Went Down On Week Six Of BBNaija Reunion

It’s been another exciting and intense week of Big Brother Naija. The tension, the laughs, and the forgiveness this week was amped up because the show ran for one hour instead of the usual thirty minutes from the previous weeks. It would continue to run for one hour when it returns next week Monday. This […]

The post Everything That Went Down On Week Six Of BBNaija Reunion appeared first on Best9jamusic.



source https://www.best9jamusic.com.ng/entertainment/everything-that-went-down-on-week-six-of-bbnaija-reunion/

Show HN: CubeChat – Party in 3D https://bit.ly/3iZ1eZU

Show HN: CubeChat – Party in 3D https://bit.ly/327pYtf July 12, 2020 at 04:56PM

Show HN: Library to Automatically Create UI for your ML Models https://bit.ly/32aO1HF

Show HN: Library to Automatically Create UI for your ML Models https://bit.ly/3iV7KAV July 12, 2020 at 04:55PM

Show HN: Aperio Fuzzer – A mutational fuzzer for testing web APIs https://bit.ly/3embfx4

Show HN: Aperio Fuzzer – A mutational fuzzer for testing web APIs https://bit.ly/2BUAtFF July 12, 2020 at 10:29PM

Pregnant YouTube star, Nicole Thea, dies along with unborn son

A PREGNANT YouTube star has died along with her unborn son, her heartbroken family announced tonight. Nicole Thea’s mum confirmed the 24-year-old passed away along with her unborn child called Reign on Saturday morning. Mum-to-be Nicole, who had 110,000 followers on Instagram, was a British dancer and Instagram influencer. Her devastated mum confirmed the double […]

The post Pregnant YouTube star, Nicole Thea, dies along with unborn son appeared first on Best9jamusic.



source https://www.best9jamusic.com.ng/entertainment/pregnant-youtube-star-nicole-thea-dies-along-with-unborn-son/

Show HN: 4x Your ML Model https://bit.ly/3iXlPxO

Show HN: 4x Your ML Model https://bit.ly/2ZlNYXB July 12, 2020 at 11:05AM

Victor Osuagwu’s daughter shares her father’s epic reply after she told him “I love you”

Angel Osuagwu, the teenage daughter of veteran Nollywood actor, Victor Osuagwu has shared her dad’s hillarious reply after she expressed her love for him. A twitter user had taken to the platform to start a new trend. He told Twitter users to tell their fathers ”I love you” and share the reply they get afterwards. […]

The post Victor Osuagwu’s daughter shares her father’s epic reply after she told him “I love you” appeared first on Best9jamusic.



source https://www.best9jamusic.com.ng/entertainment/victor-osuagwus-daughter-shares-her-fathers-epic-reply-after-she-told-him-i-love-you/

Show HN: Sed to C translator written in sed https://bit.ly/302qJ3V

Show HN: Sed to C translator written in sed https://bit.ly/32aAdwX July 12, 2020 at 06:11PM

“I Married Regina Daniels And My Other Wives As Virgins” – Ned Nwoko

Billionaire Politician, Ned Nwoko has disclosed that he married all his wives including popular actress Regina Daniels as virgins. In a recent interview, the politician said Regina is a very decent girl and he got more attracted to her when he found out she was a virgin. You got married to a popular actress, Regina […]

The post “I Married Regina Daniels And My Other Wives As Virgins” – Ned Nwoko appeared first on Best9jamusic.



source https://www.best9jamusic.com.ng/entertainment/i-married-regina-daniels-and-my-other-wives-as-virgins-ned-nwoko/

Show HN: Notado – Content-First Bookmarking https://bit.ly/2DqZXKX

Show HN: Notado – Content-First Bookmarking https://bit.ly/2DDkRa5 July 12, 2020 at 04:46PM

Show HN: Spaceboard – Pinterest for Markdown Notes https://bit.ly/2DAB4N7

Show HN: Spaceboard – Pinterest for Markdown Notes https://bit.ly/2DDkPit July 12, 2020 at 04:07PM