recently, the Internet is also very interesting, one after another failure, let us first review.
May 11, 2015 evening around 21, NetEase news NetEase, cloud music, easy letter, Youdao, cloud notes and other mobile applications are not normal refresh, NetEase name game is also paralyzed. Cause of the failure: the backbone network was attacked.
the afternoon of May 27, 2015, some users reflect the Alipay network fault, account can not log in or pay. Fault: Waduan fiber. Duration: 4 hours,
at 11:09 on May 28, 2015, Ctrip official website and APP failure can not be opened, to 28 23:29 full recovery, the whole process takes more than 12 hours. Malfunction reason: misoperation. Duration: 12 hours or so,
June 5, 2015 headlines today, home page and APP are inaccessible, directly prompted 500 errors. Failure reason: unknown influence time: 30 minutes or so.
at 12:30 on June 15, 2015, knowing that the network can not be opened, a direct prompt [server raised a problem] wrong, at around 13:45, know that the page returned to normal. Cause of failure: engine room failure time: 60 minutes or so,
what is wrong, what makes us Internet business is really so fragile? Operators always do bad things behind? Or our system architecture suck? Or are we really weak operation ability? If generalized to see this, I will take it to a maintenance problem. But for the above failures, from the point of view of operation and maintenance, I will still say that the official conclusion is not professional enough, I hope this is not the internal kazakhstan.
1, NetEase said that the backbone received network attacks affecting business, seemingly the day seems to be affected NetEase business
2, four hours of fiber Waduan, so from the core business, the first principle is to restore business, I think Alipay did not even double live, will certainly have a backup center, why not all the past? Must be inside out of trouble. But Ali abuses, the negative things he can become positive, they put " 5.27" into technical support, a whoop and a holler.
3, Ctrip event, I wrote an article before [Ctrip event: depth analysis of operation and maintenance debt and solutions], do not go into details.
4, today’s headlines, 500 internal errors, this news can make headlines on their own, but there is no formal explanation. From the 500 error recovery time, it’s a bit long, 500 error is very good location, and my suspicion is Database >