I Created a Knowledge Base

‘Producing a good index “is a job of deep reading, of working to understand a text in order to make the most judicious selection of its key elements”. There are some things technology can’t do — for the time being, at least.’ The FT published in 2021, reviewing a book on the history of the index. Funny how less than a year later, that statement would not hold.

The index came to being to map information in books and help readers find information faster. Book indexes are to find mentions of certain topics, people, or entities. A library catalog is of the same logic but is created to navigate the thousands of books on hold.

When Google debuted in 1998, it became the index of the internet. In the pre-internet days, if you go to a library and ask a librarian for books on the role of the UK in the World War I, the librarian would look up the books in the catalog under the subject “The Great War” or “The European War” and will give you the names and call numbers of maybe 5 books on the subject that probably make mention of the UK. Your next step would be to get the books and look up their indexes for clues. The system is arduous, time-consuming, and prone to error.

When Google search debuted in 1998, it was a true revelation. A search engine built on top of an index that parses all written content and then ranks and pushes the most relevant sources to the top.

Search engines are designed to serve the most relevant content to a query in the shortest amount of time and redirect the searcher to somewhere else on the web. It's the inverse of searching library catalogs and sifting through book indexes. Because the entire content is indexed, you start with your keyword search instead of start with sources. That made search engines a revelation, starting from a blank canvas, an empty search bar, and accessing millions of sources in seconds.

Analogous is about finding parallels between the present and the past in the worlds of technology and finance. As I wrote here, I wanted to create a space where I can explore deeper context in the financial events of today’s technology companies through an understanding of historical precedents. But in my pursuit to becoming a good storyteller for these events, I realized I had to commit to being more of a generalist. A generalist can identify signals in uncommon places and connecting the dots where it might not be obvious. To achieve this with conventional technology - incessant googling - would be challenging. If I want to research the consolidation of the streaming video market, I'll be asking several questions as starting points; has this happened before? In which markets and when? What were the drivers of that consolidation and does that match with what's happening in streaming video? Those questions will naturally translate to a series of Google search queries and a bunch of links. It's a lot of noise, a long filtration process to get to anything meaningful. I'll be left to my own accord to draw these connections. I didn't like my prospects, so I thought I should build my own library, create an index for my catalog that carries a deep understanding of the sources, and can draw relationships and connect the dots. In computer science, that's called a knowledge base.

I'm creating a growing library of public sources, for now it's filings of publicly listed companies, but eventually, news, books, and magazines. Those sources can help me draw on events from the past, to tell better stories about the technology world of today, and perhaps that of tomorrow.

Long live the index.