Parsing HTML tables => PHP arrays

Does anyone know of code to parse an HTML table into a PHP array? There’s plenty taking a PHP array and converting it to HTML, but Google didn’t turn up anything written to go the other way.

8 comments ↓

#1 Eric TF Bat on 07.02.04 at 12:47 pm

It sounds like an interesting sort of five-finger exercise in PHP. I wouldn’t mind giving it a try; mail me directly in the next week or so and we’ll work out a spec, and I’ll see what I can do.

#2 Clayton Scott on 07.06.04 at 4:06 am

#!perl
use HTML::TableExtract;
use PHP::Serialization qw(serialize);

my $te = new HTML::TableExtract( headers => [qw(Date Price Cost)] );
$te->parse($html_string);

# Examine all matching tables
foreach my $ts ($te->table_states) {
$encoded = serialize($ts);
}

#3 Norbert Mocsnik on 07.15.04 at 11:07 pm

Recently I was looking for a print_r() output parser. Let me know if you have any ideas on where to look for it. Thanx. Yeah I’ve tried google.

#4 Russell on 02.18.05 at 3:32 pm

Did anyone ever get this working? I am interested…

#5 Ross on 03.04.05 at 9:20 am

If its well formed XHTML - then you can use the PEAR library: XML_Unserializer.

This will provide you with the XHTML in array format.

Some characters of your XHTML may need to be replaced:   is one of them that caused a head scratching hour of debugging!

#6 D L on 09.14.05 at 4:57 pm

I am trying to “scrape” content from URLs (XHTML content) and extract contents of HTML tables into XML files.

Here is my configuration ..

PHP5
PEAR extensions installed

Tidy
XML_Parser
XML_Serializer
XML_Tree

So I target a URL using Tidy to get cleaned XHTML in a string ..

then apply this through XML_Unserializer

but I get this error message after the tidy script is echoed to screen.

Error: XML_Parser Invalid document end at XML input line 1.

Looking at the example files supplied with of XML_Parser installation they are all XML format, not XHTML.

So what extensions are needed to extract contents of tables into PHP array? Stripping out redundant tags etc.

Here is an example URL I am using for test purposes ..

http://www.geohive.com/global/world.php

and I am trying to extract two columns into a simple XML format ..

country and population.

#7 Rob Lewis on 09.21.06 at 10:36 am

This following script might help you acheive what you need:

http://reallyshiny.com/scripts/table-extractor.txt

#8 Sanjeev on 01.29.08 at 10:33 am

I am trying using Perl HTML::Parser and also tried the HTML::TableContent modules.

But My program is not working to fetch the contents.
Sometime it is not giving the the content withing td

If somebody have succeeded in it then please help me out by sending some snippet snap or the link where i can get the details

Leave a Comment