Extract paragraphs from Wikipedia API using PHP cURL

Posted by Kane on Stack Overflow See other posts from Stack Overflow or by Kane
Published on 2010-05-21T06:25:26Z Indexed on 2010/05/21 7:20 UTC
Read the original article Hit count: 362

Filed under:
|
|
|
|

Here's what I'm trying to do using the Wikipedia (MediaWiki) API - http://en.wikipedia.org/w/api.php

  1. Do a GET on http://en.wikipedia.org/w/api.php?format=xml&action=opensearch&search=[keyword] to retrieve a list of suggested pages for the keyword

  2. Loop through each suggested page using a GET on http://en.wikipedia.org/w/api.php?format=json&action=query&export&titles=[page title]

  3. Extract any paragraphs found on the page into an array

  4. Do something with the array

I'm stuck on #3. I can see a bunch of JSON data that includes "\n\n" between paragraphs, but for some reason the PHP explode() function doesn't work.

Essentially I just want to grab the "meat" of each Wikipedia page (not titles or any formatting, just the content) and break it by paragraph into an array.

Any ideas? Thanks!

© Stack Overflow or respective owner

Related posts about wikipedia

Related posts about mediawiki