Long connection times from PHP to MySQL on EC2
- by Erik Giberti
I'm having an intermittent issue connecting to a database slave with InnoDB. Intermittently I get connections taking longer than 2 seconds. These servers are hosted on Amazon's EC2.
The app server is PHP 5.2/Apache running on Ubuntu. The DB slave is running Percona's XtraDB 5.1 on Ubuntu 9.10. It's using an EBS Raid array for the data storage.
We already use skip name resolve and bind to address 0.0.0.0.
This is a stub of the PHP code that's failing
        $tmp = mysqli_init();
        $start_time = microtime(true);
        $tmp-options(MYSQLI_OPT_CONNECT_TIMEOUT, 2);
        $tmp-real_connect($DB_SERVERS[$server]['server'], 
                   $DB_SERVERS[$server]['username'], 
                   $DB_SERVERS[$server]['password'], 
                   $DB_SERVERS[$server]['schema'], 
                   $DB_SERVERS[$server]['port']);
        if(mysqli_connect_errno()){
            $timer = microtime(true) - $start_time;
            mail($errors_to,'DB connection error',$timer);
        }
There's more than 300Mb available on the DB server for new connections and the server is nowhere near the max allowed (60 of 1,200). Loading on both servers is < 2 on 4 core m1.xlarge instances.
Some highlights from the mysql config
max_connections = 1200
thread_stack = 512K
thread_cache_size = 1024
thread_concurrency = 16
innodb-file-per-table
innodb_additional_mem_pool_size = 16M
innodb_buffer_pool_size = 13G
Any help on tracing the source of the slowdown is appreciated.
[EDIT] I have been updating the sysctl values for the network but they don't seem to be fixing the problem. I made the following adjustments on both the database and application servers.
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_sack = 0
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_fin_timeout = 20
net.ipv4.tcp_keepalive_time = 180
net.ipv4.tcp_max_syn_backlog = 1280
net.ipv4.tcp_synack_retries = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 87380 16777216
[EDIT] Per jaimieb's suggestion, I added some tracing and captured the following data using time. This server handles about 51 queries/second at this the time of day. The connection error was raised once (at 13:06:36) during the 3 minute window outlined below. Since there was 1 failure and roughly 9,200 successful connections, I think this isn't going to produce anything meaningful in terms of reporting.
Script:
date  /root/database_server.txt
(time mysql -h database_Server -D schema_name -u appuser -p apppassword -e '')  /dev/null 2 /root/database_server.txt
Results:
=== Application Server 1 ===
Mon Feb 22 13:05:01 EST 2010
real    0m0.008s
user    0m0.001s
sys     0m0.000s
Mon Feb 22 13:06:01 EST 2010
real    0m0.007s
user    0m0.002s
sys     0m0.000s
Mon Feb 22 13:07:01 EST 2010
real    0m0.008s
user    0m0.000s
sys     0m0.001s
=== Application Server 2 ===
Mon Feb 22 13:05:01 EST 2010
real    0m0.009s
user    0m0.000s
sys     0m0.002s
Mon Feb 22 13:06:01 EST 2010
real    0m0.009s
user    0m0.001s
sys     0m0.003s
Mon Feb 22 13:07:01 EST 2010
real    0m0.008s
user    0m0.000s
sys     0m0.001s
=== Database Server ===
Mon Feb 22 13:05:01 EST 2010
real    0m0.016s
user    0m0.000s
sys     0m0.010s
Mon Feb 22 13:06:01 EST 2010
real    0m0.006s
user    0m0.010s
sys     0m0.000s
Mon Feb 22 13:07:01 EST 2010
real    0m0.016s
user    0m0.000s
sys     0m0.010s
[EDIT] Per a suggestion received on a LinkedIn question, I tried setting the back_log value higher. We had been running the default value (50) and increased it to 150. We also raised the kernel value /proc/sys/net/core/somaxconn (maximum socket connections) to 256 on both the application and database server from the default 128. We did see some elevation in processor utilization as a result but still received connection timeouts.