Tuesday, 11 November 2025

Show HN: Project AELLA – Open LLMs for structuring 100M research papers https://bit.ly/43qBgHX

Show HN: Project AELLA – Open LLMs for structuring 100M research papers We're releasing Project AELLA - an open-science initiative to make scientific knowledge more accessible through AI-generated structured summaries of research papers. Blog: https://bit.ly/3WMXjEW Visualizer: https://bit.ly/3WQtL9y Models: https://bit.ly/4p77d0d , https://bit.ly/3LDNoPy Highlights: - Released 100K research paper summaries in standardized JSON format with interactive visualization. - Fine-tuned open models (Qwen 3 14B & Nemotron 12B) that match GPT-5/Claude 4.5 performance at 98% lower cost (~$100K vs $5M to process 100M papers) - Built on distributed "idle compute" infrastructure - think SETI@Home for LLM workloads Goal: Process ~100M papers total, then link to OpenAlex metadata and convert to copyright-respecting "Knowledge Units" The models are open, evaluation framework is transparent, and we're making the summaries publicly available. This builds on Project Alexandria's legal/technical foundation for extracting factual knowledge while respecting copyright. Technical deep-dive in the post covers our training pipeline, dual evaluation methods (LLM-as-judge + QA dataset), and economic comparison showing 50x cost reduction vs closed models. Happy to answer questions about the training approach, evaluation methodology, or infrastructure! https://bit.ly/43qWIwj November 11, 2025 at 07:38PM

No comments:

Post a Comment