Is there a bug with Apache 2.2 and content filters (and maybe mod_proxy)?

Posted by asciiphil on Server Fault See other posts from Server Fault or by asciiphil
Published on 2013-11-01T15:50:08Z Indexed on 2013/11/01 15:57 UTC
Read the original article Hit count: 174

Filed under:
|

I'm running Apache 2.2.15-29 on RHEL 6 (actually Scientific Linux 6.4) and I'm trying to set up a reverse proxy with content rewriting so all of the links on the proxied web pages are rewritten to reference the proxy host. I'm running into a problem with some of the content rewriting and I'd like to know if this is a bug or if I'm doing something wrong (and how to do it right, if applicable).

I'm proxying a subdirectory on an internal host (internal.example.com/foo) onto the root of an external host (external.example.com). I need to rewrite HTML, CSS, and Javascript content to fix all of the URLs. I'm also hosting some content locally on the external host, which I don't think is a problem but I'm mentioning here for completeness.

My httpd.conf looks roughly like this:

<VirtualHost *:80>
    ServerName external.example.com
    ServerAlias example.com

    # Serve all local content directly, reverse-proxy all unknown URIs.
    RewriteEngine On
    RewriteRule ^(/(index.html?)?)?$ http://internal.example.com/foo/ [P]
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -f [OR]
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -d
    RewriteRule ^.*$ - [L]
    RewriteRule ^/~ - [L]
    RewriteRule ^(.*)$ http://internal.example.com$1 [P]

    # Standard header rewriting.
    ProxyPassReverse / http://internal.example.com/foo/
    ProxyPassReverseCookieDomain  internal.example.com external.example.com
    ProxyPassReverseCookiePath /foo/ /

    # Strip any Accept-Encoding: headers from the client so we can process the pages
    # as plain text.
    RequestHeader unset Accept-Encoding

    # Use mod_proxy_html to fix URLs in text/html content.
    ProxyHTMLEnable On
    ProxyHTMLURLMap http://internal.example.com/foo/ /
    ProxyHTMLURLMap http://internal.example.com/foo /
    ProxyHTMLURLMap /foo/ /

    ## Use mod_substitute to fix URLs in CSS and Javascript
    #<Location />
    #    AddOutputFilterByType SUBSTITUTE text/css
    #    AddOutputFilterByType SUBSTITUTE text/javascript
    #    Substitute "s|http://internal.example.com/foo/|/|nq"
    #</Location>

    # Use mod_ext_filter to fix URLs in CSS and Javascript
    ExtFilterDefine fixurlcss mode=output intype=text/css cmd="/bin/sed -rf /etc/httpd/fixurls"
    ExtFilterDefine fixurljs mode=output intype=text/javascript cmd="/bin/sed -rf /etc/httpd/fixurls"
    <Location />
        SetOutputFilter fixurlcss;fixurljs
    </Location>
</VirtualHost>

The text/html rewriting works just fine. When I use either mod_substitute or mod_ext_filter, the external server sends the pages as Transfer-Encoding: chunked, sends all of the data, and then closes the connection without sending the final, zero-length chunk. Some HTTP clients are unhappy with this. (Chrome won't process any content sent in this way, for example, so the pages don't get CSS applied to them.)

Here's a sample wget session:

$ wget -O /dev/null -S http://external.example.com/include/jquery.js
--2013-11-01 11:36:36--  http://external.example.com/include/jquery.js
Resolving external.example.com (external.example.com)... 192.168.0.1
Connecting to external.example.com (external.example.com)|192.168.0.1|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Date: Fri, 01 Nov 2013 15:36:36 GMT
  Server: Apache
  Last-Modified: Tue, 29 Oct 2013 13:09:10 GMT
  ETag: "1d60026-187b8-4e9e0ec273e35"
  Accept-Ranges: bytes
  Vary: Accept-Encoding
  X-UA-Compatible: IE=edge,chrome=1
  Content-Type: text/javascript;charset=utf-8
  Connection: close
  Transfer-Encoding: chunked
Length: unspecified [text/javascript]
Saving to: `/dev/null'

    [ <=>                                                         ] 100,280     --.-K/s   in 0.005s  

2013-11-01 11:36:37 (19.8 MB/s) - Read error at byte 100280 (Success).Retrying.

--2013-11-01 11:36:38--  (try: 2)  http://external.example.com/include/jquery.js
Connecting to external.example.com (external.example.com)|192.168.0.1|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 416 Requested Range Not Satisfiable
  Date: Fri, 01 Nov 2013 15:36:38 GMT
  Server: Apache
  Vary: Accept-Encoding
  Content-Type: text/html;charset=utf-8
  Content-Length: 260
  Connection: close

    The file is already fully retrieved; nothing to do.

Am I doing something wrong? Am I hitting some sort of Apache bug? What do I need to do to get it working? (Note that I'd prefer solutions that work within RHEL-6-packaged RPMs and upgrading to Apache 2.4 would be a last resort, as we have a lot of infrastructure built around 2.2 on this system at the moment.)

© Server Fault or respective owner

Related posts about apache2

Related posts about reverse-proxy