View Single Post
  #3   (View Single Post)  
Old 12th March 2013
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,977
Default

Quote:
Originally Posted by kbeaucha View Post
...After we added this controller and Mac connection, we began to experience times when the upstream port at the remote site would become unresponsive. Data wasn't traversing the tunnel for anything behind the Soekris; I believe the tunnel was being dropped. The upstream port would not allow ssh logins and would not respond to pings.
Your problem reflects something more than an IPSec tunnel being dropped. Your non-VPN communications -- ping and ssh -- were non-functional.
Quote:
Power-cycling the Soekris would bring everything back.
Was there ever an admin monitoring the console at this time? For example, the OS may have been functional but the NIC was not, or the OS may have panicked and dropped into ddb(4), or the OS may have been hung. Without a console (and an admin, local or remote) you would not be able to determine which of these three possibilities was occurring.
Quote:
To eliminate the possibility that the Soekris was the cause, we replaced it with a (faster) PC Engines Alix unit. The problems seemed to go away for over a year, until last week, when the tunnel dropped again.

Due to some other problems I wasn't able to log into the Alix's serial port, but the upstream (and local network) ports still had link, and the admin for the switch that the upstream port was plugged into said he could see link and get the MAC address of the gateway.
Do you also lose ping response on the Alix? It's not completely clear if that's the case.

I am not sure what you mean by "link" -- if you are describing status lights on Ethernet hardware (switches / hubs) these have various meanings depending on NIC manufacturer but are related to physical connectivity and not to data transfer. In the event of a software failure (OS hang/ OS panic / NIC bug) electrical connections would not necessarily be severed.

I know enough about Ethernet to use it and administer it. I'm not a NIC hardware expert, nor a NIC driver writer. With that disclaimer out of the way, I think it is perfectly reasonable for Ethernet NICs to manage traffic independent of the OS, to ignore (or pass on, depending on the type of Ethernet) non-broadcast Ethernet frames destined for MAC addresses other than its own. In like manner, I assume a NIC could respond appropriately to Ethernet frames that query for its MAC address. This is different than responding to ARP requests for IP address / MAC address resolution.
Quote:
I am open to suggestions on what to look for if this should occur again to help resolve the problem.
  1. Plug in a console, for use by you or your admin for the switch. Use it when this occurs to determine if the OS is still operating, the OS has crashed, or the OS has panicked.
  2. Monitor ongoing operation, while things are going well. Pay special attention to free mbufs -- if you run out of message buffers, your network stack will stop moving data. You can see current mbuf usage with the -m option to netstat. Script something that notifies you if you start running out of mbuf capacity.
  3. Review system logs from at the time of the problem -- in the event of a hang/crash, these probably will not aid you. In the event you were logging to a remote syslog server, these will probably not aid you. But if you are/were logging to /var/log locally, inspect /var/log/messages* files for any messages at the time the errors occurred. In the event mbuf shortages were the cause, look for "mcplimit limit reached" messages.
Quote:
Originally Posted by ocicat View Post
..and the version of OpenBSD used is what?
That too, is another good question. Last you mentioned it, in May of 2012, your systems were running 5.0.

Last edited by jggimi; 12th March 2013 at 12:32 AM. Reason: typos
Reply With Quote