-------------------------------------------------------------------------------- Request Number 89904 Serial Number: 89904 Status:resolved Worked: 0 minutes Queue:NSI Subject: network troubles at UW external link maybe Requestors: bruce Cc: Admin Cc: Owner: dwpayne Priority: 12 / 14 Due: Sat Dec 4 21:49:42 2004 Created: Fri Dec 3 21:49:42 2004 (4 days ago) Last Contact: Tue Dec 7 07:44:23 2004 (23 hours ago) Last Update: Tue Dec 7 07:44:24 2004 by dwpayne Keywords: Department: Closure: Dependencies: ========================================================================== Date: Fri Dec 3 21:49:43 2004 (0 minutes) Ticket created by bruce at uwaterloo.ca -------------------------------------------------------------------------- Date: Fri, 3 Dec 2004 21:49:23 -0500 (EST) From: Bruce Campbell To: request@ist.uwaterloo.ca Cc: scott@sciborg.uwaterloo.ca Subject: network troubles at UW external link maybe X-Miltered: at minos by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Miltered: at minos by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Virus-Scanned: ClamAV 0.80/614/Wed Dec 1 10:44:43 2004 clamav-milter version 0.80j on localhost X-Virus-Scanned: ClamAV 0.80/614/Wed Dec 1 10:44:43 2004 clamav-milter version 0.80j on localhost X-Virus-Status: Clean X-Virus-Status: Clean X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.4 (ist.uwaterloo.ca [129.97.108.150]); Fri, 03 Dec 2004 21:49:30 -0500 (EST) perhaps. I'm troubleshooting videoconferencing problems between a facility in EIT and one at the University of Guelph. It uses IP, H.323 to be exact. This system had been working since it was originally installed in C2 (I think) then moved to EIT in January. The facility was not heavily used over the summer. Since September, connections would drop, and be problematic to restart. Videoconferencing between the EIT facility and one in C2 or E2 works fine. So, whatever the problem is, it appears to be something beyond our local networks. I conducted a series of bandwidth and latency tests between a unix system in EIT and one at UoG. This showed excellent bandwidth available, and close to zero packet loss. (ie maybe 1 packet per 10,000 was lost). However, I did notice that lost packets would sometimes show up over 10 seconds later. This struck me as odd, but presumably a TCP stack would just discard these late packets. I conducted the same tests between two onsite unix systems also, and did not see the bizarre late packet phenomenon. I have no idea whether this late packet thing is a real problem or not. The folks at Guelph claim their system works fine with others in Ontario, and they only see the problem with the UW systems. Tandberg tech support conducted some tests, and reported significant packet loss. They said the network is at fault. My tests to date do not seem to indicate a network problem. (tandberg makes the videoconference system) However, with the help of a sniffer, I too have confirmed packet loss during videoconferencing. I have essentially determined from tests from the tandberg that any attempt to connect to 129.97 behaves normally. For example, if I try to videoconference to the Arts mail server, I get a quick failure, as expected. Similarly with cn-rtext (the first step in the video protocol is to connect to tcp port 1720 on the target). So, a connect to port 1720 on cn-rtext returns a quick connection refused message. If I go one step beyond cn-rtext, like to ORION-WATERLOOU-RNE.DIST1-WTLO.IP.orion.on.ca or f0-16.na02.b011027-0.yyz01.atlas.cogentco.com then the "connection refused" packet one would normally expect is lost, or delayed... somewhere. If I use the telnet command to try to connect to port 1720 on either of those routers, I get an immediate connection refused. So, there is something about the tandberg tcp and/or the routers, or something, that seems to not work. I think. I know this is vague. ------------------------------------------------------------------------ Bruce Campbell Engineering Computing University of Waterloo http://www.eng.uwaterloo.ca/~bruce/ 519-888-4567 ext. 5889 PGP Key: http://www.eng.uwaterloo.ca/~bruce/public.txt Tune: E3E3A3A3A3A3 A3A3B3B3C3C3A3A3E3E3A3A3A3A3 \ A3A3B3B3C3C3A3A3E3E3D3D3D3D3B3B3C3C3D3D3D3+E3-E3E3 \ A4A4 A4A4A4A4 A4A4G3G3F3F3E3E3F3F3G3G3 G3G3G3G3 \ G3G3F3F3E3E3D3D3E3E3F3F3 F3F3 B3B3C3C3D3D3E3E3E3\ E3 ========================================================================== Date: Sat Dec 4 08:56:10 2004 (0 minutes) Correspondence added by bruce at uwaterloo.ca -------------------------------------------------------------------------- X-Authentication-Warning: ecserv7.uwaterloo.ca: www set sender to bruce@engmail.uwaterloo.ca using -f Date: Sat, 4 Dec 2004 08:55:52 -0500 From: Bruce Campbell To: IST RequestTracker Cc: "W. Nicoll, W. Scott Nicoll, Scott Nicoll" Subject: Re: [UW-IST #89904] AutoReply: network troubles at UW external link maybe References: In-Reply-To: User-Agent: Internet Messaging Program (IMP) 3.1 / FreeBSD-4.6.2 X-Originating-Ip: 65.93.96.46 X-Miltered: at rhadamanthus by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Miltered: at minos by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Virus-Scanned: ClamAV 0.80/614/Wed Dec 1 10:44:43 2004 clamav-milter version 0.80j on localhost X-Virus-Scanned: ClamAV 0.80/614/Wed Dec 1 10:44:43 2004 clamav-milter version 0.80j on localhost X-Virus-Status: Clean X-Virus-Status: Clean X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.4 (ist.uwaterloo.ca [129.97.108.150]); Sat, 04 Dec 2004 08:55:58 -0500 (EST) RT-Send-Cc: Some followup: Things appear to work fine at the moment (Saturday morning). This has been the pattern. Sometimes things work normally, and a 1 hour videoconference can be held between UW and UoG, other times things don't work. Right now, packet sniffing shows the normal TCP interaction at the beginning, followed by the udp traffic containing the video. And of course I have not ruled out a layer 3 anomoly at the Extreme router. I observed things work fine to all of 129.97 (inside and outside the Extreme area). Things work fine to cn-rtext. Then things behave strangely one step beyond that. It could be the fact that the target is not 129.97, and the problem could be caused at the Extreme route point, related to default route or something. Anyways, I will have to wait until it starts failing again to do more tests. -- Bruce Campbell Engineering Computing CPH-2374B University of Waterloo (519)888-4567 ext 5889 ---------------------------------------- This mail sent through www.mywaterloo.ca ========================================================================== Date: Sun Dec 5 21:46:42 2004 (0 minutes) Correspondence added by wwwrt@mona.uwaterloo.ca -------------------------------------------------------------------------- Date: Sun, 5 Dec 2004 21:46:37 -0500 From: IST Request Tracker To: rt-general@mona.uwaterloo.ca Subject: [UW-IST #89904] network troubles at UW external link maybe RT-Send-Cc: Comment submitted by bruce 9 hours of continuous tests Sunday worked, then things started to fail again Sunday evening. I've traced the problem to sporadic 5555/tcp source port blocking on return at the UW external router, or slightly beyond. Both orion and cogent are affected. ie. A packet with source port 5555/tcp can leave campus, but the reply does not make it back. Sometimes. The tandberg videoconference system uses source port 5555. Note that all ports should be open for the system in question (av-codec2) based on a previous request from Scott Nicoll. ========================================================================== Date: Mon Dec 6 11:15:41 2004 (0 minutes) Comments added by hawey -------------------------------------------------------------------------- RT-Send-CC: rwwatt@ist RT-Send-BCC: Roger - Of interest? If so, please take ownership of this ticket and move it into your Queue... Thanks, Heather [wwwrt@mona.uwaterloo.ca - Sun Dec 5 21:46:42 2004]: > > Comment submitted by bruce > > 9 hours of continuous tests Sunday worked, then > things started to fail again Sunday evening. > > I've traced the problem to sporadic 5555/tcp source > port blocking on return at the UW external router, or slightly > beyond. Both orion and cogent are affected. > > ie. A packet with source port 5555/tcp can leave campus, > but the reply does not make it back. Sometimes. > > The tandberg videoconference system uses source port 5555. > Note that all ports should be open for the system in > question (av-codec2) based on a previous request from > Scott Nicoll. > ========================================================================== Date: Mon Dec 6 12:34:14 2004 (0 minutes) Correspondence added by rwwatt -------------------------------------------------------------------------- Date: Mon, 6 Dec 2004 12:07:36 -0500 (EST) From: Roger Watt To: "\"Heather Wey via RT\" " Cc: Doug Payne Subject: Re: [UW-IST #89904] network troubles at UW external link maybe In-Reply-To: References: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.4 (ist.uwaterloo.ca [127.0.0.1]); Mon, 06 Dec 2004 12:33:54 -0500 (EST) X-Miltered: at minos by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Virus-Scanned: ClamAV 0.80/618/Sun Dec 5 18:09:24 2004 clamav-milter version 0.80j on localhost X-Virus-Status: Clean RT-Send-Cc: I don't own any queues in RT. I'll forward this to Doug. > Date: Mon, 6 Dec 2004 11:15:42 -0500 > From: "\"Heather Wey via RT\" " > > Cc: rwwatt@ist.uwaterloo.ca > Subject: [UW-IST #89904] network troubles at UW external link maybe > > > Roger - Of interest? If so, please take ownership of this ticket and > move it into your Queue... > > Thanks, > Heather > > > [wwwrt@mona.uwaterloo.ca - Sun Dec 5 21:46:42 2004]: > > > > > Comment submitted by bruce > > > > 9 hours of continuous tests Sunday worked, then > > things started to fail again Sunday evening. > > > > I've traced the problem to sporadic 5555/tcp source > > port blocking on return at the UW external router, or slightly > > beyond. Both orion and cogent are affected. > > > > ie. A packet with source port 5555/tcp can leave campus, > > but the reply does not make it back. Sometimes. > > > > The tandberg videoconference system uses source port 5555. > > Note that all ports should be open for the system in > > question (av-codec2) based on a previous request from > > Scott Nicoll. > > > ========================================================================== Date: Mon Dec 6 12:45:30 2004 (0 minutes) Queue changed from general to NSI by dwpayne ========================================================================== Date: Mon Dec 6 12:45:35 2004 (0 minutes) Taken by dwpayne ========================================================================== Date: Mon Dec 6 13:06:49 2004 (0 minutes) Correspondence added by dwpayne -------------------------------------------------------------------------- RT-Send-CC: rwwatt@ist.uwaterloo.ca > The tandberg videoconference system uses source port 5555. > Note that all ports should be open for the system in > question (av-codec2) based on a previous request from > Scott Nicoll. Good sleuthing, Bruce. Thanks for the additional specific data which helped me to see the possible problem. Although all TCP/UDP ports to the system(s) in question are indeed 'open', as in not expressly blocked, port 5555/tcp is identified by Cisco IP NBAR processing as being used by Napster, which is one of the problematic P2P apps that we traffic-shape because of bandwidth issues. I assume that what you're seeing is packet drops under intensive campus P2P loading. I suspect that your testing is not enough to induce packet drops all by itself, but when it's combined with all the P2P apps, chances are that the occasional 5555 packet drop will occur. If not drops, then at least some delays. Currently there is no IP exception list to the P2P traffic shaping; I guess I'll have to create one and add av-codec2 et.al. to it. That will take a bit of time. I'll update the ticket once that's been done, hopefully later this afternoon. ========================================================================== Date: Mon Dec 6 13:06:51 2004 (0 minutes) Status changed from new to open by dwpayne ========================================================================== Date: Mon Dec 6 15:18:41 2004 (0 minutes) Correspondence added by dwpayne -------------------------------------------------------------------------- RT-Send-CC: rwwatt@ist.uwaterloo.ca RT-Send-BCC: I've now modified the external router P2P shaping controls to use an exception list that includes the 4 IP addresses originally requested to be removed from the TCP/UDP port blocking rules. The change was implemented at 15:11:33 today (Dec 6). Please let me know if this helps (or makes it worse :-) ========================================================================== Date: Tue Dec 7 07:44:23 2004 (0 minutes) Correspondence added by dwpayne -------------------------------------------------------------------------- RT-Send-CC: RT-Send-BCC: I've heard no further problem reports, so I'll mark this resolved. Replying to this will re-open it, should that become necessary. ========================================================================== Date: Tue Dec 7 07:44:24 2004 (0 minutes) Status changed from open to resolved by dwpayne