How to remove a node based on contents of a subnode and then convert to CSV with headers intact

Posted by Morris Cox on Stack Overflow See other posts from Stack Overflow or by Morris Cox
Published on 2011-01-10T20:30:31Z Indexed on 2011/01/11 5:53 UTC
Read the original article Hit count: 336

Filed under:
|
|
|

I'm downloading a 10MB zipped XML file with wget, unzipping it to 40MB, trying to weed out test and expired entries (if the text of a certain node is "This is a test opportunity. Please DO NOT apply!" or if is in the past), and then convert to CSV.

However, I get this error:

PHP Notice: Array to string conversion in /home/morris/projects/grantsgov/xml2csv.php on line 46

I get Array errors because some entries in the XML file have more than one occurrence of a node (different text contents). The test entries are still present.

Contents of grantsgov.sh:

#!/bin/bash

wget --clobber "http://www.grants.gov/search/downloadXML.do;jsessionid=1n7GNpNF2tKZnRGLyQqf7Tl32hFJ1zndhfQpLrJJD11TTNzWMwDy!368676377?fname=GrantsDBExtract$(date +'%Y%m%d').zip" -O GrantsDBExtract$(date +"%Y%m%d").zip
    unzip GrantsDBExtract$(date +"%Y%m%d").zip
    php xml2csv.php
    zip GrantsDBExtracted$(date +"%Y%m%d").zip GrantsDBExtracted$(date +"%Y%m%d").csv

Contents of xml2csv.php:

<? 
$current_date = date("Ymd");

//$getfile=fopen("http://www.grants.gov/search/downloadXML.do;jsessionid=GqJNNmdLyJyMlsQqTzS2KdzgT5NMdhPp0QhG946JTmHzRltNTpMQ!368676377?fname=GrantsDBExtract$current_date.zip", "r");

$zip = zip_open("GrantsDBExtract$current_date.zip");
if(is_resource($zip)) {
  while ($zip_entry = zip_read($zip)) {
    $fp = fopen("./".zip_entry_name($zip_entry), "w");
    if (zip_entry_open($zip, $zip_entry, "r")) {
      $buf = zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
      fwrite($fp,"$buf");
      zip_entry_close($zip_entry);
      fclose($fp);
    }
  }
  zip_close($zip);
}
// For future potential use
$xml = new DOMDocument('1.0', 'ascii');
$xpath = new DOMXpath($xml);
$xslFile = "feddata.xsl";

$filexml="GrantsDBExtract$current_date.xml";

if (file_exists($filexml)) {
    $xml = simplexml_load_file($filexml);
echo "Loaded $filexml\n";
$xslt = new XSLTProcessor();
$xsl = new DOMDocument();
//$XSL->load('feddata.xsl', LIBXML_NOCDATA);
$xsl->load($xslFile, LIBXML_NOCDATA);
$xslt->importStylesheet($xsl);
$xslt->transformToXML($xml);
$f = fopen("GrantsDBExtracted$current_date.csv", 'w');
      // create the CSV header row on the first time here
      $first = FALSE;
      $fields = array();
      foreach($record as $key => $value)
      {
        $fields[] = $key;
      }
      fwrite($f,implode(";",$fields)."\n");

foreach ($xml->FundingOppSynopsis as $fos) {
   fputcsv($f, get_object_vars($fos),',','"');
}
}
fclose($f);
} else {
 exit('Failed to open file.');
}
?>

Contents of feddata.xsl:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>
<xsl:template match="node()|@*">
 <xsl:copy>
  <xsl:apply-templates select="node()|@*"/>
 </xsl:copy>
</xsl:template>

<xsl:template match="FundingOppSynopsis[AgencyMailingAddress = 'This is a test opportunity. Please DO NOT apply!']">
</xsl:template>
</xsl:stylesheet><xsl:strip-space elements="*"/>

Part of the XML file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Grants SYSTEM "http://www.grants.gov/search/dtd/XMLExtract.dtd">
  <Grants>
    <FundingOppSynopsis>
        <PostDate>08312007</PostDate>
        <UserID>None</UserID>
        <Password>None</Password>
        <FundingInstrumentType>CA</FundingInstrumentType>
        <FundingActivityCategory>DPR</FundingActivityCategory>
        <OtherCategoryExplanation>This is a test opportunity. Please DO NOT apply!</OtherCategoryExplanation>
        <NumberOfAwards>5</NumberOfAwards>
        <EstimatedFunding>4</EstimatedFunding>
        <AwardCeiling>2</AwardCeiling>
        <AwardFloor>1</AwardFloor>
        <AgencyMailingAddress>This is a test opportunity. Please DO NOT apply!</AgencyMailingAddress>
        <FundingOppTitle>This is a test opportunity. Please DO NOT apply!</FundingOppTitle>
        <FundingOppNumber>IVV-08312007-RG-OPP5</FundingOppNumber>
        <ApplicationsDueDate>09102007</ApplicationsDueDate>
        <ApplicationsDueDateExplanation>This is a test opportunity. Please DO NOT apply!</ApplicationsDueDateExplanation>
        <ArchiveDate>10102007</ArchiveDate>
        <Location>None</Location>
        <Office>None</Office>
        <Agency>None</Agency>
        <FundingOppDescription>This is a test opportunity. Please DO NOT apply!</FundingOppDescription>
        <CFDANumber>000000</CFDANumber>
        <EligibilityCategory>21</EligibilityCategory>
        <AdditionalEligibilityInfo>This is a test opportunity. Please DO NOT apply!</AdditionalEligibilityInfo>
        <CostSharing>N</CostSharing>
        <ObtainFundingOppText FundingOppURL="">Not Available</ObtainFundingOppText>
        <AgencyContact AgencyEmailDescriptor="This is a test opportunity. Please DO NOT apply!" AgencyEmailAddress="This is a test opportunity. Please DO NOT apply!">This is a test opportunity. Please DO NOT apply!</AgencyContact>
    </FundingOppSynopsis>

How can I fix the error and remove expired entries (based on ArchiveDate)? I suspect the second to last line in feddata.xsl needs to be fixed.

© Stack Overflow or respective owner

Related posts about php

Related posts about Xml