How to configure apache's mod_proxy_html to work as an ajax proxy?

Posted by dcerecedo on Server Fault See other posts from Server Fault or by dcerecedo
Published on 2013-04-24T23:53:23Z Indexed on 2014/06/11 9:28 UTC
Read the original article Hit count: 176

Filed under:
|

I'm trying to build a web site that let's you view and manipulate data from any page in any other website. To do that, I have to bypass 'Allow Origin' problems: i'm loading the other domain's content in an iframe and i have to manipulate its content with javascript downloaded from my domain.

My first attempt was to write a simple proxy myself, requesting the other domains page through a server proxy coded in Java that not only serves the content but rebuilds links (src's and href's) in the content so that the content referenced by these links alse get downloaded through my handmade proxy. The result is not bad but has problems with url's in css and scripts.

It's then that i realized that mod_proxy_html is supposed to do exactly all this job. The problem is that i cannot figure out how to make it work as expected.

Let's suppose my server runs in my-domain.com and to proxy and transform content from another domain i'd make a request like this:

my-domain.com/proxy?url=http://another-domain.com/some/content

I'd want mod_proxy_html to serve the content and rewrite following URLs in http://another-domain.com/some/content in the following ways:

  1. Absolute URLs not from another-domain.com: no rewritting
  2. Relative from root urls:/other/content -> /proxy?url=http://another-domain.com/other/content
  3. Relative urls: other/content -> /proxy?url=http://another-domain.com/some/content/other/content
  4. Relative to parent urls: ../other/content -> /proxy?url=http://another-domain.com/some/other/content

The url should be specified at runtime, not configuration time.

Can this be achieved with mod_proxy_html? Could anyone provide a simple working configuration to start with?

EDIT 1-First approach

The following site config will work fine with sites that use absolute url's everywhere like http://www.huffingtonpost.es/. Youc could try on this config on localhost: http://localhost/asset/http://www.huffingtonpost.es/

<VirtualHost *:80>
    ServerName localhost

    LogLevel debug

    ProxyRequests off
    RewriteEngine On
    RewriteRule ^/asset/(.*) $1 [P]
    ProxyHTMLURLMap $1 /asset/


    <Location /asset/>
            ProxyPassReverse /
        ProxyHTMLURLMap / /asset/
    </Location>
</VirtualHost>

But as explained in the documentation, if I hit a site using relative url's, I'd like to have these rewritten on the html via mod_proxy_html. So I shoud change the Location block as follows:

    <Location /asset/>
            ProxyPassReverse /

            #Depending on your system use one line or the other
            #Ubuntu:
            #SetOutputFilter proxy-html
            #any other system:
            ProxyHTMLEnable On 

        ProxyHTMLURLMap / /asset/
    </Location>

...which doesn't seem to work. Comments, hints and ideas welcome!

© Server Fault or respective owner

Related posts about apache-2.2

Related posts about mod-proxy