--------------------------------------------------------------------------------
Request Number 89904 
Serial Number: 89904   Status:resolved Worked: 0 minutes  Queue:NSI
      Subject: network troubles at UW external link maybe
   Requestors: bruce
           Cc: 
     Admin Cc: 
        Owner: dwpayne
     Priority: 12 / 14
          Due: Sat Dec  4 21:49:42 2004
      Created: Fri Dec  3 21:49:42 2004 (4 days ago)
 Last Contact: Tue Dec  7 07:44:23 2004 (23 hours ago)
  Last Update: Tue Dec  7 07:44:24 2004 by dwpayne
	         
Keywords:
	Department: 
	Closure: 
Dependencies: 

==========================================================================
Date: Fri Dec  3 21:49:43 2004 (0 minutes)
Ticket created by bruce at uwaterloo.ca
--------------------------------------------------------------------------
Date: Fri, 3 Dec 2004 21:49:23 -0500 (EST)
From: Bruce Campbell 
To: request@ist.uwaterloo.ca
Cc: scott@sciborg.uwaterloo.ca
Subject: network troubles at UW external link maybe
X-Miltered: at minos by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")!
X-Miltered: at minos by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")!
X-Virus-Scanned: ClamAV 0.80/614/Wed Dec  1 10:44:43 2004 clamav-milter version 0.80j on localhost
X-Virus-Scanned: ClamAV 0.80/614/Wed Dec  1 10:44:43 2004 clamav-milter version 0.80j on localhost
X-Virus-Status: Clean
X-Virus-Status: Clean
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.4 (ist.uwaterloo.ca [129.97.108.150]); Fri, 03 Dec 2004 21:49:30 -0500 (EST)


perhaps.

I'm troubleshooting videoconferencing problems between a facility
in EIT and one at the University of Guelph.  It uses IP, H.323 to
be exact.  This system had been working since it was originally
installed in C2 (I think) then moved to EIT in January.  The
facility was not heavily used over the summer.  Since
September, connections would drop, and be problematic to
restart.

Videoconferencing between the EIT facility and one in C2 or
E2 works fine.  So, whatever the problem is, it appears to
be something beyond our local networks.

I conducted a series of bandwidth and latency tests between
a unix system in EIT and one at UoG.  This showed excellent
bandwidth available, and close to zero packet loss.  (ie maybe
1 packet per 10,000 was lost).  However, I did notice that
lost packets would sometimes show up over 10 seconds later.
This struck me as odd, but presumably a TCP stack would
just discard these late packets.  I conducted the same
tests between two onsite unix systems also, and did not
see the bizarre late packet phenomenon.  I have no idea
whether this late packet thing is a real problem or not.

The folks at Guelph claim their system works fine with
others in Ontario, and they only see the problem with
the UW systems.

Tandberg tech support conducted some tests, and reported
significant packet loss.  They said the network is at
fault.  My tests to date do not seem to indicate a network
problem.  (tandberg makes the videoconference system)

However, with the help of a sniffer, I too have confirmed
packet loss during videoconferencing.  I have essentially
determined from tests from the tandberg that any attempt to
connect to 129.97 behaves normally.   For example, if I
try to videoconference to the Arts mail server, I get a
quick failure, as expected.  Similarly with cn-rtext (the
first step in the video protocol is to connect to tcp port
1720 on the target).  So, a connect to port 1720 on cn-rtext
returns a quick connection refused message.  If I go
one step beyond cn-rtext, like to
ORION-WATERLOOU-RNE.DIST1-WTLO.IP.orion.on.ca or
f0-16.na02.b011027-0.yyz01.atlas.cogentco.com
then the "connection refused" packet one would
normally expect is lost, or delayed... somewhere.

If I use the telnet command to try to connect to port
1720 on either of those routers, I get an immediate
connection refused.  So, there is something about
the tandberg tcp and/or the routers, or something,
that seems to not work.  I think.

I know this is vague.

------------------------------------------------------------------------
Bruce Campbell
Engineering Computing
University of Waterloo
http://www.eng.uwaterloo.ca/~bruce/
519-888-4567 ext. 5889
PGP Key: http://www.eng.uwaterloo.ca/~bruce/public.txt
Tune: E3E3A3A3A3A3  A3A3B3B3C3C3A3A3E3E3A3A3A3A3  \
A3A3B3B3C3C3A3A3E3E3D3D3D3D3B3B3C3C3D3D3D3+E3-E3E3    \
A4A4  A4A4A4A4  A4A4G3G3F3F3E3E3F3F3G3G3  G3G3G3G3  \
G3G3F3F3E3E3D3D3E3E3F3F3  F3F3  B3B3C3C3D3D3E3E3E3\
E3

==========================================================================
Date: Sat Dec  4 08:56:10 2004 (0 minutes)
Correspondence added by bruce at uwaterloo.ca
--------------------------------------------------------------------------
X-Authentication-Warning: ecserv7.uwaterloo.ca: www set sender to bruce@engmail.uwaterloo.ca using -f
Date: Sat,  4 Dec 2004 08:55:52 -0500
From: Bruce Campbell 
To: IST RequestTracker 
Cc: "W. Nicoll, W. Scott Nicoll, Scott Nicoll" 
Subject: Re: [UW-IST #89904] AutoReply: network troubles at UW external link maybe 
References: 
In-Reply-To: 
User-Agent: Internet Messaging Program (IMP) 3.1 / FreeBSD-4.6.2
X-Originating-Ip: 65.93.96.46
X-Miltered: at rhadamanthus by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")!
X-Miltered: at minos by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")!
X-Virus-Scanned: ClamAV 0.80/614/Wed Dec  1 10:44:43 2004 clamav-milter version 0.80j on localhost
X-Virus-Scanned: ClamAV 0.80/614/Wed Dec  1 10:44:43 2004 clamav-milter version 0.80j on localhost
X-Virus-Status: Clean
X-Virus-Status: Clean
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.4 (ist.uwaterloo.ca [129.97.108.150]); Sat, 04 Dec 2004 08:55:58 -0500 (EST)
RT-Send-Cc: 


Some followup:

Things appear to work fine at the moment (Saturday morning).  This
has been the pattern.  Sometimes things work normally, and a 1 hour
videoconference can be held between UW and UoG, other times things
don't work.  Right now, packet sniffing shows the normal TCP interaction at
the beginning, followed by the udp traffic containing the video.

And of course I have not ruled out a layer 3 anomoly at the Extreme
router.  I observed things work fine to all of 129.97 (inside
and outside the Extreme area).  Things work fine to cn-rtext.
Then things behave strangely one step beyond that.  It could
be the fact that the target is not 129.97, and the problem
could be caused at the Extreme route point, related to default
route or something.

Anyways, I will have to wait until it starts failing again
to do more tests.

-- 
Bruce Campbell
Engineering Computing
CPH-2374B
University of Waterloo
(519)888-4567 ext 5889

----------------------------------------
This mail sent through www.mywaterloo.ca

==========================================================================
Date: Sun Dec  5 21:46:42 2004 (0 minutes)
Correspondence added by wwwrt@mona.uwaterloo.ca
--------------------------------------------------------------------------
Date: Sun, 5 Dec 2004 21:46:37 -0500
From: IST Request Tracker 
To: rt-general@mona.uwaterloo.ca
Subject: [UW-IST #89904] network troubles at UW external link maybe 
RT-Send-Cc: 

 
Comment submitted by bruce

9 hours of continuous tests Sunday worked, then
things started to fail again Sunday evening.

I've traced the problem to sporadic 5555/tcp source
port blocking on return at the UW external router, or slightly
beyond.  Both orion and cogent are affected.

ie.  A packet with source port 5555/tcp can leave campus,
     but the reply does not make it back.  Sometimes.

The tandberg videoconference system uses source port 5555.
Note that all ports should be open for the system in
question (av-codec2) based on a previous request from
Scott Nicoll.


==========================================================================
Date: Mon Dec  6 11:15:41 2004 (0 minutes)
Comments added by hawey
--------------------------------------------------------------------------
RT-Send-CC: rwwatt@ist
RT-Send-BCC:

Roger - Of interest?  If so, please take ownership of this ticket and 
move it into your Queue...

Thanks,
Heather


[wwwrt@mona.uwaterloo.ca - Sun Dec  5 21:46:42 2004]:

>  
> Comment submitted by bruce
> 
> 9 hours of continuous tests Sunday worked, then
> things started to fail again Sunday evening.
> 
> I've traced the problem to sporadic 5555/tcp source
> port blocking on return at the UW external router, or slightly
> beyond.  Both orion and cogent are affected.
> 
> ie.  A packet with source port 5555/tcp can leave campus,
>      but the reply does not make it back.  Sometimes.
> 
> The tandberg videoconference system uses source port 5555.
> Note that all ports should be open for the system in
> question (av-codec2) based on a previous request from
> Scott Nicoll.
> 


==========================================================================
Date: Mon Dec  6 12:34:14 2004 (0 minutes)
Correspondence added by rwwatt
--------------------------------------------------------------------------
Date: Mon, 6 Dec 2004 12:07:36 -0500 (EST)
From: Roger Watt 
To: "\"Heather Wey via RT\" " 
Cc: Doug Payne 
Subject: Re: [UW-IST #89904] network troubles at UW external link maybe 
In-Reply-To: 
References: 
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.4 (ist.uwaterloo.ca [127.0.0.1]); Mon, 06 Dec 2004 12:33:54 -0500 (EST)
X-Miltered: at minos by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")!
X-Virus-Scanned: ClamAV 0.80/618/Sun Dec  5 18:09:24 2004 clamav-milter version 0.80j on localhost
X-Virus-Status: Clean
RT-Send-Cc: 

I don't own any queues in RT. I'll forward this to Doug.

> Date: Mon, 6 Dec 2004 11:15:42 -0500
> From: "\"Heather Wey via RT\" "
>     
> Cc: rwwatt@ist.uwaterloo.ca
> Subject: [UW-IST #89904] network troubles at UW external link maybe
>
>
> Roger - Of interest?  If so, please take ownership of this ticket and
> move it into your Queue...
>
> Thanks,
> Heather
>
>
> [wwwrt@mona.uwaterloo.ca - Sun Dec  5 21:46:42 2004]:
>
> >
> > Comment submitted by bruce
> >
> > 9 hours of continuous tests Sunday worked, then
> > things started to fail again Sunday evening.
> >
> > I've traced the problem to sporadic 5555/tcp source
> > port blocking on return at the UW external router, or slightly
> > beyond.  Both orion and cogent are affected.
> >
> > ie.  A packet with source port 5555/tcp can leave campus,
> >      but the reply does not make it back.  Sometimes.
> >
> > The tandberg videoconference system uses source port 5555.
> > Note that all ports should be open for the system in
> > question (av-codec2) based on a previous request from
> > Scott Nicoll.
> >
>

==========================================================================
Date: Mon Dec  6 12:45:30 2004 (0 minutes)
Queue changed from general to NSI by dwpayne

==========================================================================
Date: Mon Dec  6 12:45:35 2004 (0 minutes)
Taken by dwpayne

==========================================================================
Date: Mon Dec  6 13:06:49 2004 (0 minutes)
Correspondence added by dwpayne
--------------------------------------------------------------------------
RT-Send-CC: rwwatt@ist.uwaterloo.ca

> The tandberg videoconference system uses source port 5555.
> Note that all ports should be open for the system in
> question (av-codec2) based on a previous request from
> Scott Nicoll.

Good sleuthing, Bruce. Thanks for the additional specific data which
helped me to see the possible problem.

Although all TCP/UDP ports to the system(s) in question are indeed
'open', as in not expressly blocked, port 5555/tcp is identified by
Cisco IP NBAR processing as being used by Napster, which is one of the
problematic P2P apps that we traffic-shape because of bandwidth issues.
I assume that what you're seeing is packet drops under intensive campus
P2P loading. I suspect that your testing is not enough to induce packet
drops all by itself, but when it's combined with all the P2P apps,
chances are that the occasional 5555 packet drop will occur. If not
drops, then at least some delays.

Currently there is no IP exception list to the P2P traffic shaping; I
guess I'll have to create one and add av-codec2 et.al. to it. That will
take a bit of time. I'll update the ticket once that's been done,
hopefully later this afternoon.

==========================================================================
Date: Mon Dec  6 13:06:51 2004 (0 minutes)
Status changed from new to open by dwpayne

==========================================================================
Date: Mon Dec  6 15:18:41 2004 (0 minutes)
Correspondence added by dwpayne
--------------------------------------------------------------------------
RT-Send-CC: rwwatt@ist.uwaterloo.ca
RT-Send-BCC:

I've now modified the external router P2P shaping controls to use an
exception list that includes the 4 IP addresses originally requested to
be removed from the TCP/UDP port blocking rules. The change was
implemented at 15:11:33 today (Dec 6).

Please let me know if this helps (or makes it worse :-)
==========================================================================
Date: Tue Dec  7 07:44:23 2004 (0 minutes)
Correspondence added by dwpayne
--------------------------------------------------------------------------
RT-Send-CC:
RT-Send-BCC:

I've heard no further problem reports, so I'll mark this resolved.
Replying to this will re-open it, should that become necessary.
==========================================================================
Date: Tue Dec  7 07:44:24 2004 (0 minutes)
Status changed from open to resolved by dwpayne