Extract paragraphs from Wikipedia API using PHP cURL

Posted by Kane on Stack Overflow See other posts from Stack Overflow or by Kane
Published on 2010-05-21T06:25:26Z Indexed on 2010/05/21 7:20 UTC
Read the original article Hit count: 491

Filed under:

wikipedia

|

mediawiki

|

php

|

curl

|

parser

Here's what I'm trying to do using the Wikipedia (MediaWiki) API - http://en.wikipedia.org/w/api.php

Do a GET on http://en.wikipedia.org/w/api.php?format=xml&action=opensearch&search=[keyword] to retrieve a list of suggested pages for the keyword
Loop through each suggested page using a GET on http://en.wikipedia.org/w/api.php?format=json&action=query&export&titles=[page title]
Extract any paragraphs found on the page into an array
Do something with the array

I'm stuck on #3. I can see a bunch of JSON data that includes "\n\n" between paragraphs, but for some reason the PHP explode() function doesn't work.

Essentially I just want to grab the "meat" of each Wikipedia page (not titles or any formatting, just the content) and break it by paragraph into an array.

Any ideas? Thanks!

© Stack Overflow or respective owner

Related posts about wikipedia

Wikipedia : Java library to remove wikipedia text markup removal

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I downloaded wikipedia dump and now want to remove the wikipedia markup in the contents of each page. I tried writing regular expressions but they are too many to handle. I found a python library but I need a java library because, I want to integrate into my code. Thank you. >>> More
How do I compile a Wikipedia lens and install?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I read a tutorial about how to compile and install a Wikipedia lens, but it didn't work. The tutorial sounds easy - i just copied and pasted to the file that was suppose to edit. I have tried some times and here are 2 edits edit 1: import logging import optparse import gettext from gettext import… >>> More
Wikipedia Images don't show up in browsers

as seen on Super User - Search for 'Super User'
It's a weird issue with wikipedia which had left me frustrated. When I go to wikipedia, with different browsers ( IE8, Chrome3, Opera10 ) no image in the site will show up. Even right-clicking the image to ( show, save, open in new tab/window ) will return nothing except when open in new tab/window… >>> More
A Guided Tour of Complexity

as seen on Geeks with Blogs - Search for 'Geeks with Blogs'
I just re-read Complexity – A Guided Tour by Melanie Mitchell , protégé of Douglas Hofstadter ( author of “Gödel, Escher, Bach”) http://www.amazon.com/Complexity-Guided-Tour-Melanie-Mitchell/dp/0199798109/ref=sr_1_1?ie=UTF8&qid=1339744329&sr=8-1 here are some notes and links: Evolved… >>> More
javascript replace text with images problem

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm extremely new to JS and have this code that I'm trying to tweak. WHen I was adding the array, I had tested it with only a couple of items and it was working fine, now it just doesn't work, and I can't figure out what is wrong with it!! Basically, I'm trying to change every instance of a card… >>> More

Related posts about mediawiki

Simple MediaWiki question

as seen on Pro Webmasters - Search for 'Pro Webmasters'
I'm thinking about making a mediawiki website. At first I'd like to try it at localhost. I'm running Kubuntu 11.10, so I did: sudo aptitude install apache2 mysql-server php5 php5-mysql php5-cli And I also fetched the latest mediawiki to: /home/boris/Its/sites/mediawiki-1.17.0 Now I'm supposed… >>> More
Useful extensions for MediaWiki

as seen on Server Fault - Search for 'Server Fault'
Can anyone suggestion some useful MediaWiki extension? I've installed PDF export, syntax lighlight, file link protocol, submit in toolbar, enforce strong password. But still eager to know any good/handy extensions. Thanks >>> More
mediawiki - assign Ctrl-S to save page (edit mode)

as seen on Super User - Search for 'Super User'
Mediawiki: I'd like to change the key combination alt-shift-s to be ctrl-s for saving a page that is currently being edited. >>> More
How do I set up ZScreen to upload images to my mediawiki?

as seen on Super User - Search for 'Super User'
I've set up a mediawiki with all the correct settings and enabled image uploading. When I do this manually this all works OK. I want to be able to upload screenshots automatically into my mediawiki using ZScreen. There is an option to do this: I press Test..., this work OK, however I'm unable… >>> More
Is there an equivalent of latex \newcommand in mediawiki

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to create a command in mediawiki. For example, in latex I can do \newcommand{\concept}{\textbf} Is it possible to create an alias for '''foo''' and so on? >>> More