We just launched Ethos, an open-source tool that visualizes what Hacker News is really thinking. It extracts entities, tracks sentiment, and groups discussions by concept, giving you a structured lens into the discourse happening across one of the internet’s most influential tech communities.

At devrupt.io, we build AI workflows for real businesses that work smarter, not harder. Ethos is one of those projects - taking raw, unstructured data and turning it into something structured, searchable, and useful.

What Ethos Does

Ethos processes Hacker News stories and comments through an LLM workflow to produce several analysis views:

  • Dashboard - A high-level overview of current HN activity and trends.
  • Concepts - Groups discussions into thematic concepts so you can see what topics are clustering together.
  • Entities - Extracts and tracks named entities (people, companies, technologies) mentioned across threads.
  • Sentiment - Tracks the sentiment of discussions over time.
  • Discourse - Surfaces the structure of conversations and how arguments flow.
  • Search - Vector-powered search across the analyzed corpus.

You can explore it live at ethos.devrupt.io.

How the Workflow Works

The core of Ethos is a two-stage workflow: structured LLM analysis followed by vector embedding.

Stage 1: Structured Output with the LLM

Each HN story and comment is sent to llama-3.1-8b-instruct via OpenRouter with a strict JSON schema. The LLM is forced to return structured output with specific fields depending on whether it is analyzing a story or a comment.

For stories, the schema looks like this:

FieldDescription
core_ideaThe abstract concept beyond the specific story
concepts3-7 lowercase thematic tags
technologiesSpecific tech mentioned (e.g. Rust, PostgreSQL)
entitiesCompanies and products mentioned (e.g. OpenAI, ChatGPT)
community_angleWhy HN cares about this story
sentimentEnum from very negative to very positive
sentiment_scoreFloat from -1.0 to 1.0
controversy_potentiallow, medium, or high
intellectual_depthsurface, moderate, or deep

For comments, the schema is similar but tuned for conversation:

FieldDescription
argument_summaryDirect paraphrase of the comment
concepts2-5 abstract concepts
technologiesSpecific tech mentioned
entitiesCompanies and products mentioned
comment_typee.g. technical insight, personal experience, counterargument, humor
sentimentEnum from very negative to very positive
sentiment_scoreFloat from -1.0 to 1.0

Comments are also analyzed with the parent comment’s summary as context so the LLM understands the flow of conversation.

Temperature is set to 0 for deterministic output, and there is retry logic with exponential backoff for rate limits and server errors.

Stage 2: Embedding with ChromaDB

Once the structured output comes back from the LLM, the key fields are composed into a text string and passed to qwen3-embedding-8b (also via OpenRouter) to generate a vector embedding.

For stories, the embedding text is built from the core idea, concepts, community angle, and entities. For comments, it is built from the argument summary and concepts.

These embeddings are then stored in ChromaDB alongside the structured metadata. ChromaDB is configured with cosine similarity and maintains two collections: one for stories and one for comments. This means you can do semantic search across the entire analyzed corpus, and the metadata lets you filter and aggregate by concept, sentiment, entity, or comment type.

The Full Cycle

The ingestion runs in two passes. First it processes stories (fetching top and new stories from the HN API), then it processes comments in breadth-first order. Each item goes through the LLM for structured analysis, gets embedded, and is stored in both PostgreSQL (for relational queries) and ChromaDB (for vector search). There is also a version system so that when we update the prompt, old records can be re-analyzed automatically.

The Budget Build

One of the goals with Ethos was to see how far we could push this workflow on a minimal budget. The specific model matters less than having the right workflow in place. The answer: you can get surprisingly far.

The entire system was shipped for under $1 in infrastructure costs. OpenRouter gives you $1 in free credits when you sign up, and that was enough to build and ship the initial version.

We originally used qwen3-8b for the LLM and qwen3-embedding-8b for embeddings, but ran into capacity issues with that model. We switched to llama-3.1-8b-instruct to stay within a similar budget while getting higher throughput. The trade-off of using a different model family for the LLM versus the embedding model is something we are actively evaluating. The current prompt was an MVP to get things shipped, and we are working on tuning it and assessing result quality as more data flows through the system.

The key constraint was that the LLM needed to support structured output, which narrowed the field of viable models at this price point. But that’s the point - once the workflow and structured schemas are solid, you can swap models in and out as better or cheaper options become available. The workflow is the product, not any single model.

Open Source

Ethos is fully open source. If you want to run it yourself, it takes a few minutes:

git clone https://github.com/devrupt-io/ethos.git
cd ethos

Add your OpenRouter API key to the .env file, then:

docker compose --profile dev up -d

That’s it. The whole stack comes up in Docker and you can start hacking.

The repository is at github.com/devrupt-io/ethos. We welcome PRs and feedback, especially on which metrics (sentiment vs. concepts) you find most useful.

Check out the Hacker News discussion for the community conversation around the launch.