Extract List Of Unique Words From a Book, Article (Supports .txt, .pdf format)



As an ESL, I usually wonder: “how many words I need to know to speak English well?”. I read somewhere that you don’t need more than 2,000 words to carry out 99% of daily conversations. 2,000 words don’t seem to be a lot and if I can learn 10 words a day, I can be conversational in any language in just 200 days (less than a year).

So, what if I want to enjoy the novels, books in English, how many words I need to know.

I doubt the number would be a lot more than 2,000 words.

So I made a tool to read all the text in a book (you can select more than one book in .pdf or .txt format) and it will export all the words in that book into a table.

Here are some results:

Adventures of Huckleberry Finn

Extract List Of Unique Words From a Book, Article (Supports .txt, .pdf format) 5

As you can see at the top, there are 6461 different words in this book.

Wuthering Heights

Extract List Of Unique Words From a Book, Article (Supports .txt, .pdf format) 6

for Wuthering Heights, you need to know 9,456 words.

That’s a lot of words.

The numbers aren’t 100% correct. There are some gibberish such as website address, two letter (like country codes) may get in the list. However, you can roughly estimate that there are 95% of the numbers are real words.

I find that this is a good way to quickly find out new words that I don’t know. I can click on the words to get the definition (if available) on the right.

You can download the program here for free. If you have any suggestion, please let me know.

Download Word Extraction app here

Click on a star to rate it!

Average rating 4.3 / 5. Vote count: 6

No votes so far! Be the first to rate this post.

8 thoughts on “Extract List Of Unique Words From a Book, Article (Supports .txt, .pdf format)

  1. Hi,

    Thanks for the time creating this app. It works like a charm. It is what I have been looking for!

    I have come with a similar idea, which now I see it is not a new thing 🙂 What I wanted is an app to extract the words of a book so I can prepare before reading it and make the process of reading more easily flowing. It is frustrating as I am underground with no internet connection and come to a word I do not know and I can’t check.

    What I would like to have in the app is a way to export the words so I can delete the ones I know and leave the words that are new for me and apply the learning process that works for me. Is it easy to export the words from this app?

    Thank you for your efforts!

    1. Hi Rosco,

      Actually, I’m making an app like that. However, it’s an web app that allows you to paste content/manage books. I’ll let you know when it’s ready.


  2. Hi! I study English and I was looking for such tool. It’s exactelly what I was looking for. It worked without problem. Now, I will use it selecting the words I don’t know and searching for the meaning at the dictionary. I ask you if could have an update on your software, in order it could find the meaning of all words in the same time, instead of to look up one by one. This would be the best! Thank you and wish all the best to you!

    1. That’s possible to make tools like that. However, I’m quite busy right now. If you have an offer, I’m willing to listen.

  3. Hi,
    This is a very nice and easy app. Only one function is missing: export the list of words.
    Thanks if you can help.


Leave a Reply

Your email address will not be published. Required fields are marked *