Facebook mistakenly updates BGP and drags DNS with it
It seems that around 17:30 Spanish time, the Facebook team made a very important change in the BGP protocol, taking the DNS records with them, so that right now the Facebook, WhatsApp and Instagram domains have completely disappeared, if you try accessing any of these services will not charge you, because clients do not know how to get to the servers.
The Internet is made up of thousands of autonomous systems, also known as AS, these AS use the BGP protocol to communicate with other autonomous systems and exchange routes. When we connect to Facebook, the first thing we must do is consult the DNS servers to know where to go with the public IP address that you provide us, later the packets will be routed directly from the origin of the connection (us) to the destination, going through several intermediate routers, each and every one of these routers have the necessary routes to take us to the destination, which are the Facebook servers.
Although the servers are still up and running without problems, Facebook internally uses different DNS that they have in order to reach their own services, however, since the DNS does not work, logically no one can reach the destination. If we try to do an nslookup from our Internet connection, the DNS server will automatically tell us that the Facebook domain or any other related domain cannot be found.
The failure that has caused the entire Facebook platform to crash is a bad update of the BGP protocol, making it impossible to access these systems remotely immediately again to fix the problem, when a change is made to BGP, these Changes are quickly propagated to all other routers involved. There are Facebook people for hours in their datacentres who are physically trying to solve the problem, however, the people who have the necessary knowledge to be able to authenticate in the system and proceed with the changes are working remotely from their homes, and of course They can’t remotely access Facebook to fix it.
What has happened is like when we try to configure a firewall of a remote server via SSH, and by mistake we block ourselves. In this case, when updating the BGP protocol and due to the rapid propagation of the new routes with the incorporated changes, there is simply no longer a “path” to access these devices, they cannot go back with the changes because they have lost connectivity .
Facebook uses its own DNS for absolutely everything, for WhatsApp, for VoIP calls, Facebook’s internal email etc, therefore, if the DNS goes down, the way to fix it remotely also goes down. Because Facebook has very tight security to prevent attacks, and even to prevent employees themselves from making critical changes, only a few people have the knowledge and access credentials to access and fix it.
What if it really was an attack?
On the Internet, it is said that the Anonymous group has attacked Facebook, in the event that an attack has seriously compromised the company’s infrastructure, the most logical thing to do is cut all communications in the bud, which is precisely what Facebook has done by updating its protocol BGP to erase all routes from all routers in the world. For a company the size of Facebook, with people with years of experience behind them and who are among the best in the world in their field, it is quite strange that they have updated the BGP protocol badly to precisely lose all communication with the outside world, unless it’s for a good reason: a major hack.
Other services have problems too
Other services like Google and Telegram They are also having some stability issues, they may be the next to fall. Right now the operation of these services is not entirely correct, for example, they do not allow you to download photos or upload them, in addition, when browsing the Internet with Google it also gives an error on some occasions. If you have a smartphone, it is very possible that it tells you that you are connected to a WiFi network without an Internet connection, this is because to verify the Internet connection they make communications with Google servers, and it seems that they are down or do not work at all well For this reason, the mobile incorrectly tells us that they are also down.
The reason for these drops is because people try to enter Facebook domains in an “avalanche” and continuously, the DNS servers cannot resolve these domains correctly, and they have requests overload, for this reason, sometimes it seems that they also the different services have fallen when it is not like that.