Level3 Outage – Likely Juniper Bug

Information is still limited at this time – at approximately 14:15 UTC an event occured causing our BGP Sessions with Level3 to reset and re-establish five minutes later.

There are confirmed reports that this event within Level3 has affected most of North America. This outage has affected other networks running Juniper routers with the majority of them seeing their devices core dump and reload.

At this point the speculation is that a new BGP Update Bug within specific Juniper version trains was just discovered/triggered.

Phyber’s engineering staff will be monitoring the situation closely and will disable our Level3 peerings if necessary to maintain stability of the network.

One Wilshire Power Notification

Date: Thursday, October 13, 2011 

Location of EventOne Wilshire, 624 Grand Ave, Los Angeles California

Critical Load: On Generator Power

Risk to Operation: Possible disruption in powered services.

Summary Description of Event:  The Data Center has experienced a utility power disruption. While power in the data center is protected, CoreSite facilities staff is on site and verifying proper functionality of the facility systems. 

Level3 Fiber Cut

Level3 has lost connectivity on the Fiber route connecting Phoenix to Dallas. Customers will notice intermittent packet loss and high latency for traffic traversing the Level3 network. We are closely monitoring the situation and will make a determination on adjusting our BGP routing policies and announcements depending on how the outage progresses.

99.83% Availability

Normally a 99.83% availability report for a given month would be absolutely dreadful, however our Pingdom report for August only tells part of the story…

August_availability

We moved datacenters!

Normally a monitoring system would be suspended for the duration of any significant maintenance that would result in outages/alarms being generated. For our move (and subsequent upgrade of our name servers) we elected to keep our external monitoring running to gauge our performance.

Phyber runs multiple redundant name servers – so even through we show downtime on the public IPs our customers’ experience was seamless with zero service impact. Unfortunately the migration did result in the end to some rather impressive machine Uptimes.

September is back to normal with 100% availability. Check out our public reports on Pingdom for up to the minute information on Phyber’s critical infrastructure.