Yes! Area 53, we're finally going there. Super secret projects facility located in the middle of nowhere Nevada Desert, its an area also known as Groom Lake. It has facilities built inside the mountains to hide the alien technologies. It is so top secret that even the president of united states does not have the level of security clearance to access the site. It's what's called, above top secret and it is classified as a black project. "For eyes only". You cannot go within 10 miles of its perimeter without putting yourself at the risk of getting shot.
A site managed by the private military contractors and it is so elusive that no one has any idea what goes on inside.
The whole facility is so compartmentalized and distributed in nature that even the people who work there are delegated only a small section of the work so one worker doesn't know what the other one is working on, even though they might be working on the same technology. There are NDA's in place to avoid people who work there from talking about their job to each other, or to anyone. There are specially designed 747 blacked out planes just to transport the people who work there.
This place is highly compartmentalized, hard for people to understand what goes on in there and it is super elusive.
Oh, wait.... I think I might have got this mixed up with Area 51. Obviously there is no such thing as Area 53. What I meant to discuss today is actually something called Route 53.
Don't worry, It's not a national highway. It is simply a DNS service provided by AWS. Fun fact, its called Route 53 because the standard port for DNS is 53.
Now let's hold on for a second. Before I understand how route 53 functions in AWS I'd like to know what in the world is DNS? What does it do and how does it function.
I guess, that's the rabbit hole we are going down today and not Area 51. Sorry if I had your hopes up. But, do bear with me as we drill down into this subject, it might not be very different in the way it functions compared to our area 51 analogy.
People say DNS, simply put, is a Domain Name System and its job is to simply translate the Domain Names, which is true but that is an extremely high level abstract.
Every website we access has a name. And internet specific format name. It might start with "https://" which 99% of the time, your browser automatically assigns that prefix which means its secured encrypted site and has a SSL/TLS security certificate (this is a good subject to cover in the future) , however without deviating for now, lets just focus on what comes after that.
After https:// you will mostly get to see
- www (world wide web)
- google (the name)
- .com (the root)
The last one which is .com here, in some cases can also be .net, .org, .gov etc...but it's still considered as a root.
So to put together, google.com. is considered to be a domain name. Which consists of two parts,
1. A name and
2. The Root.
But what does DNS have to do with it?
Each and every website that you type into your browser directs you to a server or a set of servers. A server is nothing more than a suped up super computer and its primary job is to serve you. Like for instance, provide you with a service where you can lookup anything on the internet.
In order for these servers to connect to the internet and serve you via the internet, they need a public IP Address. These IP Addresses come in two different formats, IPV4 and IPV6. IP stands for internet protocol. These are unique numbers through which all the computers communicate with each other.
It might seem like I might be going a bit off topic, but I assure you if you stay along it will all fall into place. At least I hope so.
So google.com when entered in a browser is just a name, but for your computer it doesn't meant much because it does not understand words but only binary. So the domain, google.com when you put into your browser, your computer gets some help to translate it into an IP address which by the way for google.com is 173.194.216.101. It is a Public network IPV4 address that belongs to one of many google.com servers.
Go ahead an paste that number in your browser instead of google.com and see what you get.
You have to remember, irrespective of what you type into your browser, the main goal for you is to establish a connection with the server (in this case google.com's server) so it can render the services you require.
Something is doing this translation for our computer, changing words into numbers. What's doing this?
Additionally, why do we need it to be translated, why cant we simply use the IPv4 address and paste it to our browsers instead of typing words?
Well, for starters we as humans are not very good at remembering numbers. That's why we maintain a phonebook, assign a name against a phone number. You can say, our minds function quite opposite to how computers function. We need a name, an image to remember things not numbers.
That is exactly what DNS does. It provides you with the underlying number (IP address) when you type in the name google.com. It looks up the information for you in the back end.
More Importantly DNS serves as a security layer. Let's say you'd like to visit a popular site you use frequently. Like your bank's website. chase.com for instance. The domain name Chase.com if freely available, I as an Hacker can create an exact replica of the site, use the same or similar domain and map it to my hosted services by assigning my IP. You login, thinking you're logging into your account, but actually its routing to a fake site created by me for the sole purpose of stealing your information. "Phishing" scams are not that different, but with Phishing scammers can only imitate the domain names and not the actual domain. HTTP's (Secure Service Layer certificate) DNS delegation and decentralization all aim to solve these potential issues
That's all good to know. But "how" exactly does DNS function? Remember, this is a rabbit hole we want to go down after all. So lets dig.
There are about 4.3 billion public IPV4 addresses and around 340 gazzillion trillion trillion or whatever IPV6 Addresses
It soon became very apparent to some very intelligent people at the top who understood the complexities of the internet, that this is going to turn out to be a one GIANT phonebook. But this is just not any phonebook. This is something that will be used by everybody in the world. So it needs to be managed in a very secure way.
Meaning, the entire phonebook has to be broken down into pieces and they had to establish a root level system where the job of translating a particular website needs to be delegated from top to bottom.
This means there is:
- Not one single entity managing the entire phonebook
- Lowers the risk of any one central organization tampering with the records
- Its Distributed in nature
- And the tasks are Delegated from one server to another (top down)
Lets put this into perspective and see how this actually works. Let's say I would like to go on a website called www.example.com.
Now notice, that period after the com. I put it there deliberately. .com. (followed by a period) represents root and your browser is smart enough to add that for you without you ever having to type it.
The .com. is the the starting point of this entire cycle. When I type in www.example.com the first thing my computer does, it will send a
Roots Hints file to a
resolver server. What my computer really wants in response is something called a "Zone File". Simply put, a zone file is like a phone book record which has the name www.example.com and its corresponding IP Address.
Don't worry about the roots hints file too much for now, but just remember it has 13 sets of data in it. (we'll get to that in a minute).
(The addresses of these servers are exactly what our computer handed over to the Resolver Server at the beginning in the form of a Roots Hints file. It wanted to give the Resolver Server a starting point to send it in the right direction)
But it is important to keep in mind that these 12 organizations only manage the servers, they do not manage the what's in those servers. The Management of the data inside these servers is handled by an organization called
IANA Internet Assigned Numbers Authority. IANA delegates the root management to the TLD Managers. There is an entire list of TLD Managers under
IANA's Database.
TLD simply stands for Top Level Domain. Such as .com, .net, .org etc. For e.g. the .com root TLD is managed by Verisign Global Registry services. That's just one, but there are hundreds of TLD's with different managers.
Now, back to our Resolver Server, when it goes to the Root Servers, it requests the best way to get to example.com. The response it gets in return is to go to this other dude because he knows all about .com. Basically pointing it to a Root Zone Server (In this case is managed by Verisign Global Registry services.
Back on the mission, our resolver server then reaches out to the Root Zone Server asking the same question. Where can I find the zone file for example.com? Now finally, the Root Zone Server responds, Oh, this guy. I know exactly where he lives. This is when the Root Zone Server points our Resolver Server in the direction of a Name Server. The Name server, knows exactly what our resolver server needs and it hands it the zone file its looking for.
So lets trace this step by step:
- Our computer wants to visit example.com so it sends out a root hints file to the resolver server,
- The resolver server is now on a mission and knows exactly where to go with the help of root hints file
- It reaches the Root servers asking for the root zone server. Depending on the root its requesting it gets directed towards that specific Root Zone Server
- The Root Zone Server recognizes looks at the root which is .com. and it directs our resolver server to a name server.
- The Name Server recognizes example.com and hands over the zone file to our resolver server
- The resolver server then returns to the client which is our computer with the zone files
- You're computer with the help of the zone file is now able to connect to your favorite website, example.com
By the way, this all happens within a split of a second.
One, question is though, how does the Root Zone Server knew exactly which Name Server was holding the Zone file our resolver server was looking for?
That's because, the Root Zone Server is what we call, a "Register". The server that holds the Zone file is known as the "Registrar".
If you ever registered a domain using godaady.com or any other domain name providers, you're doing it via a registrar. That's what godaddy.com is. They have a tie up with the domain registers, and when you buy that domain, godaddy.com 1. Checks with the Register if that name is available, if it is then, as a registrar it is its obligation to update the register with the information of the newly created domain. To put it simply, basically saying, 'hey if someone ever comes looking for this domain name, send them over to me.
I hope this makes a bit of a sense. One key thing to keep in mind is, at each point of interaction with the resolver server, that point of contact is considered by it as an authoritative server, and each authoritative server delegates the information to the one below it because it doesn't know all the details, but it only knows the right direction to point our resolver server to. Lets put this into perspective with a help of a simple diagram.
This whole process is also known as Walking the tree, a more of an upside down tree because the information is always delegated by the root servers from top to bottom (Top being the root part). Also good to know, that the Resolver Server resides within your ISP (Internet Service Provider) most of the times, but sometimes it might also be configured into your computer itself by the operating system.
Another important thing, this whole DNS resolution process doesn't happen all the time in real life scenarios. Most of the time, your ISP keeps the most popular zone files (most visited websites) already on its resolver server, this does help significantly reduce the DNS traffic. Even the Browser itself caches the zonefiles for a certain amount of time. The browser doesn't cache it for long though, because unlike some of the high traffic websites, some of the smaller ones usually do not have a static IP which could make the cached zone file invalid (We will cover the DNS routing feature in Part 2)
So, I'll wrap this up for now.
DNS resolution is not as simple as a phonebook translation, though it does that job, its highly distributed system where one server does not have the complete answer because it doesn't have all the information. It is also extremely secure due to its distributed nature.
Now, its beginning to sound a bit more like Area 51...
On (Part 2), lets dive into Route 53 and see exactly how it functions, and where it really fits into this whole DNS walking the tree cycle. I'll give you a hint, it does a lot more than what a typical DNS service does.
I hope you found this post useful, feel free to follow, comment below...I wish to see you on the next one.
Later.
Comments
Post a Comment