JDC Creations

How to Read Wikipedia (and related sites/wikis) Offline using WikiRead

Posted on April 11th, 2011 by jdc15 under Tutorials | 7 Comments »

Since the WikiRead software may be a little tricky to use for people with less experience, I’m writing a simple beginner’s tutorial here on how to download and view all of Wikipedia offline.  Note that this not only works with Wikipedia, but also Wikisource, Wiktionary, Wikia, and any other Wikimedia based wiki which publishes its database.

 

Step 1:  Download WikiRead!

This is probably the easiest step.  First download the software here:

Download WikiRead

Alternatively, you can download the version including Wikibooks as a sample.  After clicking on the link, you should get a popup asking you to open or save the file.  You will want to save it somewhere you can access it later, so click “Save File” (in Firefox) or the equivalent in your web browser.

 

Step 2:  Extract the program

First you will need an archiving tool such as 7-zip or WinRAR to extract the files.  Personally I use 7-zip, because it is a great open source file archiver.  After you have installed one of these programs, browse to where you have the file saved and open it with archiver.  Since I use 7-zip, I’ll give directions here.  After opening the archive with 7-zip, you should see a list of files.  Highlight all of the files.

Then right click on them, and choose “Copy To…”.

Then either type a new folder or click on the three dots “…” and choose where to store the program.  If you wish, you can make a new folder.  See, in order for the program to function, it needs to be outside of the archive.

 

Step 4:  Download Wikipedia

Now you’ll need  to download a dump of Wikipedia (or another wiki site).  Since Wikipedia is huge, this step may take several hours, so don’t be afraid to leave your computer on overnight to complete the download. You can get a dump of Wikipedia here.  You need to download and save the file called pages-articles.xml.bz2, linked directly here for quick access.  For dumps of other Wikimedia wikis, go here.  For Wikia dumps, follow the instructions on this page.

After downloading a wiki database dump, you’ll need to extract the XML data file.  This again may take several minutes even on a very fast machine.  Note that the file won’t be exactly named pages-articles.xml, but rather will have the database description prepended to the filename.  In my case, the file is called “enwiki-20110317-pages-articles.xml”.

 

Step 5:  Index the Database

Don’t worry, you’re almost finished.  Next, copy the extracted pages-articles.xml file to the WikiRead directory.  It should be in the same folder as “wikiread” and “wikiindex”.  After this, click and drag the pages-articles.xml file onto the executable file “wikiindex”.

The indexer will proceed to index and recompress the database.  Again this will take several minutes.  Just be patient and let it finish.

After this, you should be left with numerous different files starting with “archive”.  These contain the compressed and indexed data.

 

Step 6:  Test the Database and Delete the Original XML File

Now it’s time to give WikiRead a spin.  Double click on “wikiread”.  If everything went well you should see a screen like this:

Now you can search up different topics.  Since the indexer recreates all of the needed parts of the wiki in compressed form, it is now safe to delete the original “pages-articles.xml” and “pages-articles.xml.bz2″ files to save space.  Now you can enjoy Wikipedia offline without any need for an internet connection.  If you wish, you may right click on “wikiread” and then click “Send to” and then “Desktop” if you want to put a shortcut to it on your desktop.  If you have any questions or comments, feel free to leave a note below.

Thanks for reading my tutorial.


7 Responses to “How to Read Wikipedia (and related sites/wikis) Offline using WikiRead”

  1. KH says:

    Any possibility that you can modify this program to work with wikia dumps?

  2. jdc15 says:

    Hi!

    Wikia dumps should already work (as instructed above), though I haven’t updated this software in a long time. Let me know if it works for you.

    Regards,

    JDC

  3. KH says:

    Hey!

    It works well with the WoW Wikia dump, but if I try something like the Marvel or DC Wikia dump, it doesn’t work as well.

    Take for instance the Marvel Wikia entry for Iron Man (Earth-616), it displays (Iron Man (Earth-616) -> ). Using View-Wiki Source, I can see the entire text page for the page.

    I’m assuming that the issue is the non-standard way that the wikis are storing data (else why would WoW work so well and Marvel not?).

    Any thoughts on what the issue is would be helpful – I’ll try to download the packages you give under source and mess around.

    • jdc15 says:

      Hi,

      The code works by converting the Wiki information into HTML and then using wxWidgets to display it. If I get the time I’ll look into it in more detail.

      Regards,
      JDC

  4. WikiUser says:

    Too good. It worked the first time on following the steps. Thanks.

  5. Sam says:

    Hi,

    Just downloading the program now, hope it works, it’s fallout.wikia that I’m after downloading, it makes sense since I must’ve downloaded half the site 10 times over by now.

    Just a suggestion though. Haven’t run the prog yet, but you mention rendering the HTML in-program using vxWidgets. Would it be possible to either embed the user’s web browser, or otherwise control it (the same way Internet Explorer is often corralled into displaying stuff by Windows? Only not using Internet Explorer! (crosses self)

    If that’s not possible, how about just leaving the HTML pages as a directory structure, correctly linked to each other, and producing an index.html, or starthere.html or whatever, as an initial link?

    To enable some of the stuff like searching, perhaps your program could work as a rudimentary web server? Set to only accept requests from localhost, ideally that should keep most security problems at bay (least, as far as I know!).

    I just think it’d be better to leave HTML rendering to the experts. Since so much work’s gone into browsers over the years, more than one man could compete with.

    Look forward to trying it out. Especially since my net connection’s gone a bit flaky.

    Sam.

  6. Sam says:

    Well, I’ve used it, and it’s very useful! Maybe some rudimentary inclusion of images would be nice, though what with Stylesheets and the like, and ignoring them, in particular, it’s difficult to work out which images need showing where and when.

    Re my earlier idea, rather than employing Wikiread as a web server, which would be a bit of a big deal, maybe a plug-in for Mozilla? Would that be possible? Possibly alongside a load of properly organised HTML files.

Leave a Reply