Does anyone know of code to parse an HTML table into a PHP array? There’s plenty taking a PHP array and converting it to HTML, but Google didn’t turn up anything written to go the other way.
Parsing HTML tables => PHP arrays
July 1st, 2004 | PHP
The quest for perfectionism
July 1st, 2004 | PHP
Does anyone know of code to parse an HTML table into a PHP array? There’s plenty taking a PHP array and converting it to HTML, but Google didn’t turn up anything written to go the other way.
8 comments ↓
It sounds like an interesting sort of five-finger exercise in PHP. I wouldn’t mind giving it a try; mail me directly in the next week or so and we’ll work out a spec, and I’ll see what I can do.
#!perl
use HTML::TableExtract;
use PHP::Serialization qw(serialize);
my $te = new HTML::TableExtract( headers => [qw(Date Price Cost)] );
$te->parse($html_string);
# Examine all matching tables
foreach my $ts ($te->table_states) {
$encoded = serialize($ts);
}
Recently I was looking for a print_r() output parser. Let me know if you have any ideas on where to look for it. Thanx. Yeah I’ve tried google.
Did anyone ever get this working? I am interested…
If its well formed XHTML - then you can use the PEAR library: XML_Unserializer.
This will provide you with the XHTML in array format.
Some characters of your XHTML may need to be replaced: is one of them that caused a head scratching hour of debugging!
I am trying to “scrape” content from URLs (XHTML content) and extract contents of HTML tables into XML files.
Here is my configuration ..
PHP5
PEAR extensions installed
Tidy
XML_Parser
XML_Serializer
XML_Tree
So I target a URL using Tidy to get cleaned XHTML in a string ..
then apply this through XML_Unserializer
…
but I get this error message after the tidy script is echoed to screen.
Error: XML_Parser Invalid document end at XML input line 1.
Looking at the example files supplied with of XML_Parser installation they are all XML format, not XHTML.
So what extensions are needed to extract contents of tables into PHP array? Stripping out redundant tags etc.
Here is an example URL I am using for test purposes ..
http://www.geohive.com/global/world.php
and I am trying to extract two columns into a simple XML format ..
country and population.
This following script might help you acheive what you need:
http://reallyshiny.com/scripts/table-extractor.txt
I am trying using Perl HTML::Parser and also tried the HTML::TableContent modules.
But My program is not working to fetch the contents.
Sometime it is not giving the the content withing td
If somebody have succeeded in it then please help me out by sending some snippet snap or the link where i can get the details
Leave a Comment