Here’s a thought:

What if we could extract (mine) all the important information from the books and use it to train an AI?

If trained correctly, it will make this AI the smartest teacher on the planet.


Image Credit: Toptal. This image appeared here.

Why mine data inside books?

Books represent the purest form of knowledge lying around. They have been written with great amount of thought process and due diligence by the people who are masters in their field. Mining the books is like unearthing precious metals from a mine; where the knowledge is the precious metals and the books are the mine. However, those precious metals don’t hold as much value until they are crafted into beautiful designs and then polished to give them the sheen that makes them so valuable. We intend to polish all the information and present it in a much more accessible format.

How do we mine the data?

A little history

Back in 2005, Google started their own project — Google Books — to mine data from books with the same logic. However, in 2005 there were limited number of eBooks available in the market. Since then they are still trying to digitize the content of the books by creating their own OCR (Optical Character Recognition) but it is unfeasible to scan all 140 million books that were ever published.

The Google Books initiative has been hailed for its potential to offer unprecedented access to what may become the largest online body of human knowledge and promoting the democratization of knowledge.

Things have changed now. Almost every new book has an eBook version of it. It’s much more easier to mine data inside eBooks as it eliminates the need of converting them in digital version using OCR . And thanks to the incredible research in the field of machine learning and AI, we have sophisticated algorithms that can mine all the information inside eBooks.

How can we get started?

We need to create a digital library of books by getting their digital versions from the book publishers. We will open up this library for the students and simultaneously mine the data inside those books.

Who will benefit from this data?

Virtually all stakeholders will benefit from this.

  • Students will get access to eBooks, and we will be able to assist them in learning with our AI.
  • We will share our mined data with the book publishers so that they can take informed decisions based on the kind of content that is being read by their customers. The publishers can relay this information to the authors so that they can prevent book failures.

What can be done with the smartest teacher?

Information scales, therefore, our AI being the smartest teacher, can be made available to all students on the planet in an instant. It will be their personal guide that will assist them in learning the subject they chose. Also, the AI will keep on improving itself by analyzing studying habits of the students.

Think of this AI like the OS in the movie ‘Her’. The OS had a personal assistant named ‘Samantha’ that mimicked human behaviour. Samantha gave humanly answer to the questions put forward by the main character ‘Theodore’. Or your own personal ‘Jarvis’ from the movie ‘Iron Man’.

My startup, Learn Venue, is trying to accomplish this task for the past one and a half year. There has been a lot of progress so far. We have been featured and won a lot of startup competitions and have been selected for WebSummit Conference which is in Lisbon this year.

This journey hasn’t been easy. We had a working prototype back in February ’17 until the team split apart. The experience I gained while developing the product has taught me that the task in challenging but doable. I’m reaching out to all the people who think this project will bring a positive change. I’m asking you to come together and help us build it for future students or enthusiasts. It can be anyone ranging from technology freaks to business gurus to designers. A whole dedicated effort is required to make this project a reality.

I’m looking forward to hearing from you. You can reach out to me at: connect[at]saurabh[dot]io.


Amazon Kindle is already providing eBooks. What is so different about your product?

Kindle doesn’t provide all the scholarly eBooks. Also, we are adding intelligence to our library. This will not be an ordinary eBook library, but a more advanced one which assists and recommends you what to read.

“I’m a Machine Learning/NLP research scholar at MIT/Stanford. The end product you are trying to achieve is extremely challenging.”

Technological innovation is not possible without giving yourself a good challenge. One of the main motives of starting this project was to revolutionize how we study. Having gone through the student life, we believe we could have achieved much more, given the right study material and direction.

Why not mine the internet?

Internet, the sea of information, is not suitable for training an AI. There are few problems with it:

  • Information accuracy- There’s so much data that it’s hard to determine if a source is trustworthy. Google has been collecting data for over a decade but still it’s good at delivering the right answers to scholarly questions (just providing links is not the answer).
  • It’s huge and has redundancies in data. Looking for a results about the subject “Light” will result in huge amount of duplicate information.
  • We will have to create complex page ranking algorithms that guess whether an article on the internet is important or not- ie. build another Google.

What motivates you to do this?

I believe that every child deserves quality teaching. Each one of them possess talent that can help in devising solutions to trivial problems existing today. It all starts with educating them right. Education gives you the platform to think and implement. That implementation mixed with critical thinking leads to innovation.