I found a method where you categorize your data into about 10 dimensions using a classifier, then add metadata with spaCy. After that, you can store it in SQL and use an LLM to retrieve the data and generate responses to queries. Surprisingly, it works pretty well.
Welcome to this forum
Technical Posting Guidelines
Here are a few guidelines to follow when posting:
- Posts should have more than 100 characters.
- Provide a link to the information you’re sharing.
- Mention your connection to the source - did you research it, or just find it useful?
- Include a description of the technical info so others understand it.
- If code repositories or models are available, include them.
If you have questions or need help, feel free to ask the mods!
So you’re just using an LLM to query an SQL database? Doesn’t seem that new. I’m not sure it will work as well as a vector database or a hybrid approach.
You’re shifting the weight from using embedding models to creating an SQL query with an LLM. Not sure it’s much more efficient.
But hey, if it works for your use case and you learned something new, good for you
@Micah
I think using thousands of dimensions to vectorize data is inefficient. The method I shared in the article gives similar results but is more efficient. I still need to do more research on its limitations, but I haven’t seen anyone implement RAG with just SQL, a categorizer, and an LLM. If you know of any, let me know!
@Stevie
There are already Text2SQL models for this.
You’re right that if your setup works and uses less compute, that’s great.
But it’s important to remember that it all depends on the use case. A Text2SQL model might be fine for emails, but other cases might need vector or hybrid approaches.
If it works for you, keep using it, but it may not be better for everyone.
@Micah
You’re right, there are a lot of details in this approach. RAG is often seen as the go-to solution, but I have a different way of looking at it.
Stevie said:
@Micah
You’re right, there are a lot of details in this approach. RAG is often seen as the go-to solution, but I have a different way of looking at it.
That’s a good approach If something simpler works, go with it!
@Stevie
You won’t find SQL to be slow.
You’ll find faster and more efficient alternatives in hybrid approaches like graphRAG + embeddings, which are some of the most advanced we have right now.
Luca said:
@Stevie
You won’t find SQL to be slow.
You’ll find faster and more efficient alternatives in hybrid approaches like graphRAG + embeddings, which are some of the most advanced we have right now.
I didn’t quite get what you mean. Could you explain a bit more?
@Stevie
Hmm…
Check out the Neo4j blog on how they implement embeddings on a knowledge base.