Checkstack - A Tech Lookup Tool
You know sometimes, I take on things because in pursuit of them, I learn everything in that domain
Checkstack checkstack.co is one such pursuit
It started with a simple question - "What's the internet made of?"
I looked at a few solutions to this question. Decent options but some had accuracy issues, they're expensive & some didn't have the scale that's needed for something like this
But it got my mind screaming with ideas on why how it can be cheaper, faster & better
So, one random evening, me & an intern at PlayTheory Labs decided that it's time to build out a Tech Lookup Database that indexes the internet
That was late 2023 - around November/December
After a bunch of interations and deep work - we finally had Checkstack
Tech Lookup Tool that indexes 100 million+ URLs on 25k+ Technologies
But that's the the size of the internet. It's 1.5+ Billion sites
And that's led me in so many rabbit holes to solve problems like -
- How do we get these URLs?
- How do we run a crawl this size without breaking our wallet?
- How do we process/extract data this large?
- What kind of Database should we use? Managed? Self-hosted?
- How do we gather tech identifiers for the top technologies?
- How does an Upload/Update of the DB look like when dealing with 100+ million URLs?
- And so on..
I'll write more on how we got here & where we're going wth this along with learnings
Thanks for dropping by
H