Thoughts

Checkstack - A Tech Lookup Tool

You know sometimes, I take on things because in pursuit of them, I learn everything in that domain

Checkstack checkstack.co is one such pursuit

It started with a simple question - "What's the internet made of?"

I looked at a few solutions to this question. Decent options but some had accuracy issues, they're expensive & some didn't have the scale that's needed for something like this

But it got my mind screaming with ideas on why how it can be cheaper, faster & better

So, one random evening, me & an intern at PlayTheory Labs decided that it's time to build out a Tech Lookup Database that indexes the internet

That was late 2023 - around November/December

After a bunch of interations and deep work - we finally had Checkstack

Tech Lookup Tool that indexes 100 million+ URLs on 25k+ Technologies

But that's the the size of the internet. It's 1.5+ Billion sites

And that's led me in so many rabbit holes to solve problems like -

  • How do we get these URLs?
  • How do we run a crawl this size without breaking our wallet?
  • How do we process/extract data this large?
  • What kind of Database should we use? Managed? Self-hosted?
  • How do we gather tech identifiers for the top technologies?
  • How does an Upload/Update of the DB look like when dealing with 100+ million URLs?
  • And so on..

I'll write more on how we got here & where we're going wth this along with learnings

Thanks for dropping by

H