[SYMPTOMS]
Strange timeouts occuring all over the system. The easiest one to check/reproduce is probably logging in using SSH:
the "username"-prompt appears immediately, but it takes ~20sec. until the "password" prompt shows up. After successful login, it takes another ~20sec. again until you get a shell prompt.
---------------
I've even had the exact same strange timeout when trying to connect to an Oracle database: The queries itself were damn fast, but logging into the database took forever!
---------------
[SOLUTION]
Check if your system can "reach" (ping?) the nameservers in /etc/resolv.conf - if it can't, it will try to, but timeout - so if you cannot fix this connection problem (or you don't want to), simply DELETE all invalid entries within the /etc/resolv.conf - and that's it!
Linux: strange timeouts
personal note
I know that probably noone is really interested in the background of this short knowledge base entry, but I've spent a WHOLE day finding this problem, and since it happened on an important server, I was about to lose my head if I couldn't fix it, so:
It took me ~3.5 hours to pinpoint the approximate location of the error, since there's a combination of apache/tomcat/oracle running, and the logs said that the database queries took a few msec, I thought: it's not the database.
But the logs said that loading a .jsp page took a few minutes (up to ~12mins sometimes) which made me think that it might be Tomcat causing problems.
Forcing Tomcat to recompile all JSPs did not fix anything, but in the apache logs I saw strange "AJP13: broken pipe" errors which made me think that it's a problem with mod_jk+AJP13.
I switched from AJP13 to the new Coyote connector, but this still wasn't it.
I've upgraded Tomcat - no change.
I've increased the logging "volume" and saw that it took up to 2mins to connect to the database - so I knew that it was somewhere in this corner.
I tried to connect manually to the database using sqlplus and there I also had a strange slowliness when establishing a connection - again: the queries were fast. This made me assume that it's not the JDBC.
----------
What finally led to the solution was that the login with SSH had 2 "timeout" places:
1) between username and password
2) between password and shell.
Luckily I've had the exactly same slow ssh-login with another server in the same net, which did NOT have this problem anymore after connecting it completely to the internet.
So I had to figure out what those 2 machines had in common:
It was their firewall-restricted-outgoing access to the internet - and this lead immediately to the assumption that there might be some "outgoing lookups" which fail - voila.
----------
It took me ~3.5 hours to pinpoint the approximate location of the error, since there's a combination of apache/tomcat/oracle running, and the logs said that the database queries took a few msec, I thought: it's not the database.
But the logs said that loading a .jsp page took a few minutes (up to ~12mins sometimes) which made me think that it might be Tomcat causing problems.
Forcing Tomcat to recompile all JSPs did not fix anything, but in the apache logs I saw strange "AJP13: broken pipe" errors which made me think that it's a problem with mod_jk+AJP13.
I switched from AJP13 to the new Coyote connector, but this still wasn't it.
I've upgraded Tomcat - no change.
I've increased the logging "volume" and saw that it took up to 2mins to connect to the database - so I knew that it was somewhere in this corner.
I tried to connect manually to the database using sqlplus and there I also had a strange slowliness when establishing a connection - again: the queries were fast. This made me assume that it's not the JDBC.
----------
What finally led to the solution was that the login with SSH had 2 "timeout" places:
1) between username and password
2) between password and shell.
Luckily I've had the exactly same slow ssh-login with another server in the same net, which did NOT have this problem anymore after connecting it completely to the internet.
So I had to figure out what those 2 machines had in common:
It was their firewall-restricted-outgoing access to the internet - and this lead immediately to the assumption that there might be some "outgoing lookups" which fail - voila.
----------
by the way...
I almost forgot:
The oracle listener also took ages to start.
The oracle listener also took ages to start.