Ask HN: How did the internet discover my subdomain?
I have a domain that is not live. As expected, loading the domain returns: Error 1016.
However...I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com
This subdomain IS LIVE but has NOT been publicized/posted anywhere. It's a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.
I am using CloudFlare for my DNS.
How did the internet find my subdomain? Some sample user agents are: "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8", "Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36",
The bots are GET requests which are failing, as designed, but I'm wondering how the bots even knew the subdomain existed?!
Hi, our company does this basically "as-a-service".
The options how to find it are basically limitless. Best source is probably Certificate Transparency project as others suggested. But it does not end there, some other things that we do are things like internet crawl, domain bruteforcing on wildcard dns, dangling vhosts identification, default certs on servers (connect to IP on 443 and get default cert) and many others.
Security by obscurity does not work. You can not rely on "people won't find it". Once it's online, everyone can find it. No matter how you hide it.
Hi, former pentester here. If any one of your trusted clients is using a google/chromium based browser, the telemetry from that browser (webdiscovery) would reveal the existence of the subdomain in question. As others have said, security by obscurity doesn't work.
Current pen tester here and this guy is right. There was a Google blog post years ago where Google planted a site with an unguessable url and indexed it and used edge to surf on the site. Shortly after this site was also listed on Bing.
Google had a "gotcha" moment when Microsoft responded basically with "yeah we didn't steal it from Google, you had telemetry enabled"
Total shitshow
Would love to read this if a link is still around
The Bing Sting!
https://googleblog.blogspot.com/2011/02/microsofts-bing-uses...
https://www.bbc.com/news/technology-12343597
https://news.ycombinator.com/item?id=2165469
https://moz.com/blog/the-bing-sting-facts-why-bing-arent-cop...
Security through obscurity is a tool not a solution to security.
Use it as the last thing to do, not the 1st. If I run SSH on say 42531 it will be found, absolutely.... But 99%+ of automated scans will never see it: benefit me. But that is after all the sshd_config, PAM stuff, patching, misc hardening, etc is done 1st.
That's a worn out example, and just a point (I run on 22)... The benefit was to me that most skiddy scanners will never see it, and if I avoid the one actor out there that's looking to mass exploit an unpublished 0day, then if it's the last thing I did, I may have bought some extra time, cause they're going for 22.
So to mostly prevent this.
Disable direct IP access. Use wildcard certificates. Don't use guessable subdomains like www or mail.
security through obscurity just isn't. keep your shiz up to date and use proper access controls!
DANE would help here: register a harmless sounding domainname whose name leaks nothing, use DNSSEC and NSEC3, and host your hidden service in a sub-domain whose name is a 63 byte long string of randomly selected ASCII characters. But this isn't really an option.
Why the DNSSEC, which then requires NSEC3? Shouldn't a wildcard certificate do the job in conjunction with normal unsigned DNS?
Subdomainfinder.com ??
Dozens of others will also find it.
Really, it's this simple today.
Sorry for a bit of self promo, but just to explain we run https://reconwave.com/, basically EASM product but more focused on network/DNS/setup level.
Finding all things about domains is one of the things that we do. And yes, it's very easy.
There are many services like subdomainfinder - i.e. dnsdumpster and merklemap. We built our own as well on https://search.reconwave.com/. But it's a side project and it does not pay our bills.
I think your comment resulted in a hug of death for that service ;)
> Security by obscurity does not work. You can not rely on "people won't find it". Once it's online, everyone can find it. No matter how you hide it.
Especially do not name your domainnames in a way that leaks MNPI! Like, imagine if publicly traded companies A and B were discussing a merger or acquisition, do not name your domainname A-and-B.com, m'kay?
Case in point: When Daimler and Chrysler merged, they had a law firm (with no other ties to either company) register the DaimlerChrysler domains weeks before the merger was made public.
I don’t recall if anybody noticed before they went public, but as this thread shows, today it would be noticed for sure.
One of the earlier seasons of Survivor had the winner leaked because of something similar.
Their website had bios of every player, with a <playername>.jpg headshot. As they were voted out, their headshot was replaced with <playername>_eliminated.jpg.
As soon as someone realized that, they entered in every player's name with _eliminated.jpg. One player had a 404 for that file.
Well, I sure hope the remainder of my URLs are safe.
Like, in: example.com/secret-id-48723487345
I hope the last bit is not leaked somehow (?)
Btw, we need a "falsehoods programmers believe about URLs" ...
Although there is: https://www.netmeister.org/blog/urls.html
> Although there is: https://www.netmeister.org/blog/urls.html
I think the section named "Pathname" is wrong. It describes the path of an URL as if every server was Apache serving static files with its default configuration. It should describe how the path is converted into a HTTP request.
For instance, the article states that "all of these go to the same place : https://example.org https://example.org/ https://example.org// https://example.org//////////////////". That's wrong. A web client send a distinct HTTP request for each case, e.g starting with `GET // HTTP/1.1`. So the server will receive distinct paths. The assertion of "going to the same place" makes no sense in the general case.
That’s all absolutely true, but I have found that wildcard DNS zones with wildcard certificates tend to get zero un-solicited traffic as long as the client devices are not browsers.
I.e.: if the host is listening only to some specific host header but registered with a wildcard prefix, then drive-by attackers have no trivial way to guess the prefix.
I would never rely on this for security, but it does help cut down on the “spam” in the request logs so that I can focus on the real errors.
This works best for API endpoints not used by browsers or embedded into web pages.
It’s also my current preferred setup for Internet-facing non-production sites. Otherwise they get so much attack traffic that the real log entries might be less than 0.1% of the total.
[flagged]
Irrespective of whether they are proud of what they are doing, I found the post helpful and educational. Let's not prevent people from sharing their knowledge as it might help us to protect ourselves. A consequence of such line of questioning would be that in future they would be hesitant to share their knowledge to avoid being judged.
Why would enumerating a wildcard dns through brute force be something that evokes pride or shame?
I sadly did not see the comment above, but I'd like to just add, that this bruteforce and sniffing methods are target only against our paying customers.
We built global reverse-DNS dataset solely from cert transparency logs. Our active scanning/bruteforcing runs only for assets owned by our customers.
…as long as your tools are only in your hands to be used, correct? Once a tool is created and used on a machine with access to the greater internet, doesn’t your logic hold that its security is compromised inherently? Not saying you have been infiltrated, or a rogue employee has cleverly exported a copy or the methodology to duplicate it off-site, but I’m not saying that hasn’t happened either.
You can find a dozen projects on Github that do this, it's not sensitive information that needs protecting
It's not that hard to write this code. It's not a nuclear weapon.
[dead]
Given that bad actors can also do this, I'd say that publicly advertising the fact and thereby drawing attention to misconceptions about security is a net good thing.
If you look at the company they founded it's a service to protect yourself. Not to willy-nilly go out into the open web to find hidden subdomains.
I assumed they do it for customers who pay them to determine their security profile.
[flagged]
That's not what that phrase means. That's not even what the word "obscure" means. Obscurity is trying to not draw attention to something, or keep it hidden (as in "nobody knows that it's there", not "you know that it's there but can't access it"). Encryption doesn't obscure data unless you're stretching the definition of the word beyond its useful purpose.
[dead]
[dead]
verb: keep from being seen; conceal.
In what way is what he’s describing not obscurity?
Two points:
1. Encrypted data is not hidden. You still know that there is data, it's just in a form that you can't understand. Just as difficult higher-level math isn't "obscured" from a non-mathematician (who knows that it is math, but can't decode it), encrypted data is not obscured.
2. You could make the argument that the data is actually hidden, but the fact that data is there is not hidden. This is pointless pedantry, though. It is both contrary to the way that everybody uses the word and stretches the meaning of the word to the point that it's not useful. There is a common understanding of what "Security through obscurity" means ( https://en.wikipedia.org/wiki/Security_through_obscurity ) and interpreting it far beyond that is not useful. It simply breaks down communication into annoying semantic arguments. I enjoy semantic arguments, but not tedious, pedantic ones where one person just argues that a word isn't what everybody understands it to mean.
More specifically, it's about WHAT is being obscured. "Security through obscurity" is about trying to be secure by keeping the details or mechanisms of a system secret, not the data itself.
Running your SSH server on port 8822 is security through obscurity.
Port knocking isn't, I don't think.
Yes that is what the word obscure means.
But the phrase “security through obscurity” is an industry term that refers to keeping things secure purely by not letting people know they exist.
In contrast with encryption, where I can tell you exactly where the encrypted data is, but you can’t access it.
Security through obscurity is hiding a bicycle in a bush and hoping no one notices it, encryption is more like locking it to a bike rack with a very good lock.
[dead]
In every way, because context matters, and the original commenter intentionally recontextualized it just to be contrarian.
[dead]
It is about the existence or the methodology being obscured, not the contents of an encrypted message. The point of that phrase is to contrast one type of security for another. You and I can know exactly what tool was used to encrypt something, and all the mathematics behind it, but still fail to decrypt it without the requisite private key.
You wouldn’t call a room behind a locked door “obscured.” Even if it’s technically correct in the most stretched definition (which I’m not convinced of), either way it’s not how people actually use the word.
This was explained in the third sentence of the post that you're responding to
"Security through obscurity" can definitely be defined in a meaningful way.
The opposite of "bad security through obscurity" is using completely public and standard mechanisms/protocols/algorithms such as TLS, PGP or pin tumbler locks. The security then comes from the keys and other secrets, which are chosen from the space permitted by the mechanism with sufficient entropy or other desirable properties.
The line is drawn between obscuring the mechanism, which is designed to have measurable security properties (cryptographic strength, enumeration prevention, lock security pins), and obscuring the keys that are essentially just random hidden information.
Obscuring the mechanism provides some security as well, sure, but a public mechanism can be publicly verified to provide security based only on secret keys.
If we are going to go down this road, I want to call it occult security, because its sounds much more sexy, and its more accurate. you are casting spells and incantations to hide things from the world.
Semantics. Considering this is your first comment ever and your account was made an hour ago I'll assume this is ragebait
encryption obfuscates data, as in the data is completely illegible unless you have the proper keys
> To make so confused or opaque as to be difficult to perceive or understand
https://www.thefreedictionary.com/obfuscate
obscuring data is different, it’s about hiding it from view or minimising the likelihood of it being found.
> To make dim, indistinct, or impossible to see
https://www.thefreedictionary.com/obscure
they are two wholly different actions.
—
> Tiered access controls obscure who can do what in the system.
i’ve seen plenty of examples where an access control system explicitly says what role/tier is required. access control is for “trust” management (who do we trust with what).
This is the most confidently incorrect post I've seen in a long time.
seriously.
Actually it’s quite correct.
[dead]
> “Security through obscurity” is the only security there is.
> Encryption obscures data.
I don't think you understand what "security through obscurity" means. What encryption does is literally the opposite of obscure, in this context. It is out in the open and documented. And the same with the rest of your examples.
[dead]
Actually it's just too short. To be complete, it would have to be like "security through obsurity _OF THE MECHANISM_."
Which basically means it was always a shit saying, like most fancy quips were.
"Security by obscurity does not work"
This is one of those false voyeur OS internet tennets designed to get people to publish their stuff.
Obscurity is a fine strategy, if you don't post your source that's good. If you post your source, that's a risk.
The fact that you can't rely on that security measure is just a basic security tennet that applies to everything: don't rely on a single security measure, use redundant barriers.
Truth is we don't know how the subdomain got leaked. Subdomains can be passwords and a well crafted subdomain should not leak, if it leaks there is a reason.
> Subdomains can be passwords and a well crafted subdomain should not leak,
I disagree. A subdomain is not secret in any way. There are many ways in which it is transmitted unencrypted. A couple:
- DNS resolution, multiple resolvers and authoritative servers - TLS SNI - HTTP Host Header
There are many middle boxes that could perform safety checks on behalf of the client, and drop it into a list to be rescanned.
- Virus Scanners - Firewalls - Proxies
I once worked for a company which was using a subdomain of an internal development domain to do some completely internal security research on our own products. The entire domain got flagged in Safe Browsing despite never being exposed to the outside world. We think Chrome's telemetry flagged it, and since it was technically routable as a public IP (all public traffic on that IP was blackholed), Chrome thought it was a public website.
I saw a similar thing happen with a QA team's domains. Google flagged them as malicious and the company never managed to get them unflagged.
Our lawyers knew their lawyers so there was a friendly chat and we got added to an internal whitelist within Google.
>It's not encrypted in transit
Agree.
But who said that all passwords or shiboleths should all be encrypted in transit?
It can serve as a canary for someone snooping your traffic. Even if you encrypt it, you don't want people snooping.
To date of my subdomains that I never publish, I haven't had anyone attempting to connect with them.
It's one of those redundant measures.
And it's also one of those risks that you take, you can maximize security by staying at home all day, but going out to take the trash is a calculated risk that you must take or risk overfocusing on security.
It's similar to port knocking. If you are encrypting it, it's counterproductive, it's a low effort finishing touch, like a nice knot.
Truth is we don't know that the subdomain got leaked. The example user agent they give says that the methodology they're using is to scan the IPv4 space, which is a great example of why security through obscurity doesn't work here: The IPv4 space is tiny and trivial to scan. If your server has an IPv4 address it's not obscure, you should assume it's publicly reachable and plan accordingly.
> Subdomains can be passwords and a well crafted subdomain should not leak, if it leaks there is a reason.
The problem with this theory is that DNS was never designed to be secret and private and even after DNS over HTTPS it's still not designed to be private for the servers. This means that getting to "well crafted" is an incredibly difficult task with hundreds of possible failure modes which need constant maintenance and attention—not only is it complicated to get right the first time, you have to reconfigure away the failure modes on every device or even on every use of the "password".
Here are just a few failure modes I can think of off the top of my head. Yes, these have mitigations, but it's a game of whack-a-mole and you really don't want to try it:
* Certificate transparency logs, as mentioned.
* A user of your "password" forgets that they didn't configure DNS over HTTPS on a new device and leaves a trail of logs through a dozen recursive DNS servers and ISPs.
* A user has DNS over HTTPS but doesn't point it at a server within your control. One foreign server having the password is better than dozens and their ISPs, but you don't have any control over that default DNS server nor how many different servers your clients will attempt to use.
* Browser history.
Just don't. Work with the grain, assume the subdomain is public and secure your site accordingly.
> The IPv4 space is tiny and trivial to scan
Something many people don't expect is that the IPv6 space is also tiny and trivial to scan, if you follow certain patterns.
For example, many server hosts give you a /48 or /64 subnet, and your server is at your prefix::1 by default. If they have a /24 and they give you a /48, someone only has to scan 2^24 addresses at that host to find all the ones using prefix::1.
Assuming everyone is using /48 and binding to prefix::1, that's a 2^16 difference with scanning the IPv4 address space. Assuming a specific host with only one IPv6 /24 block and delegating /64, this is a 2^12 difference. Scanning for /64 on the entire IPv6 space is definitely not as tiny.
AWS only allows routing /80 to EC2 instances making a huge difference.
It doesn't mean that we should rely on obscurity, but the entire space is not tiny as IPv4 was.
Interesting, so you may see the Ipv6 space as a tree, and go just for the first addresses of the block.
But if you just choose a random address you would enjoy a bit more immunity from brute force scanners here.
IPv6 address space may be trivial from this perspective, but imagine trying to establish two-way contact with a user on a smartphone on a mobile network. Or a user whose Interface ID (64 bits) is regenerated randomly every few hours.
Just try leaving a User Talk page message on Wikipedia, and good luck if the editor even notices, or anyone finds that talk page again, before the MediaWiki privacy measures are implemented.
> Obscurity is a fine strategy
> Subdomains can be passwords and a well crafted subdomain should not leak
Your comment is really odd to read I'm not sure I understand you, but I'm sure you don't mean it like that. Just to re-iterate the important points:
1. Do not rely on subdomains for security, subdomains can easily leak in innumerable ways including in ways outside of your control.
2. Security by obscurity must never be relied on for security but can be part of a larger defense in depth strategy.
---
https://cwe.mitre.org/data/definitions/656.html
> This reliance on "security through obscurity" can produce resultant weaknesses if an attacker is able to reverse engineer the inner workings of the mechanism. Note that obscurity can be one small part of defense in depth, since it can create more work for an attacker; however, it is a significant risk if used as the primary means of protection.
It's a pretty weak cve category.
"The product uses a protection mechanism whose strength depends heavily on its obscurity, such that knowledge of its algorithms or key data is sufficient to defeat the mechanism."
If you can defeat the mechanism, that's not very impactful if it's one stage of a multi-round mechanism. Especially if vulnerating or crossing that perimeter alerts the admin!
Lots of uncreative blue teamers here
This is the worst take...
People consistently misuse the Swiss cheese security metaphor to justify putting multiple ineffective security barriers in place.
The holes in the cheese are supposed to represent unknown or very difficult to exploit flaws in your security layers, and that's why you ideally want multiple layers.
You can't just stack up multiple known to be broken layers and call something secure. The extra layers are inconvenient to users and readily bypassed by attackers by simply tackling them one at a time.
Security by obscurity is one such layer.
I've heard that Swiss cheese analogy when it comes to the seasoning on a cast iron pan.
Even if you have tons and tons of layers of seasoning, you still don't put tomato sauce or whatever on it.
So according to you, a picket fence or a wire fence is just a useless thing that makes things less usable by users?
Security does not consist only of 100% or 99.99% effective mechanisms, there needs to be a flow of information and an inherent risk, if you are only designing absolute barriers, then you are rarely considering the actual surface of relevant user interactions. A life form consisting only of skin might be very secure, but it's practically useless.
> "Security by obscurity does not work"
The saying is "security by obscurity is not security" which is absolutely true.
If your security relies on the attacker not finding it or not knowing how it works, it's not actually secure.
Obscurity has its own value of course, I strongly recommend running any service that's likely to be scanned for regularly on non-standard ports wherever practical simply to reduce the number of connection logs you need to sort through. Obscurity works for what it actually offers. That has nothing to do with security though, and unfortunately it's hard in cases where a human is likely to want to type in your service address because most user-facing services have little to no support for SRV records.
Two of the few services that do have widespread SRV support are SIP VoIP and Minecraft, and coincidentally the former is my day job while I've also run a personal Minecraft server for over a decade. I can say that the couple of systems I still have running public-facing SIP on port 5060 get scanned tens of thousands of times per hour while the ones running on non-standard ports get maybe one or two activations of fail2ban a month. Likewise my Minecraft server has never seen a single probe from anyone other than an actual player.
>"If your security relies on "
Again, if your security relies on any one thing, it's a problem. A secure system needs redundant mechanisms.
Can you think of a single mechanism that if implemented would make a system secure? I think not.
Sure, a 12 gauge slug right through the processor.
Good measure, but you may also want to keep some unslugged processors in case you need to counterattack.
Q.E.D
It's become an anti-cliche. Security via obscure technique is a valid security layer in the exact same way a physical lock tumbler will not unlock when any random key is inserted and twisted. It's not great but it's not terrible and it does a fine job until someone picks or breaks it open.
I don’t think that analogy works well, a subdomain that is not published is more like hiding the key to the front door in the garden somewhere… does a fine job of keeping the house secure until someone finds it…
Terrible analogy.
Why not use letters and packages which is the literal metaphor these services were built on?
It's like relying on public header information to determine whether an incoming letter or package is legitimate.
If it says: To "Name LastName" or "Company", then it's probably legitimate. Of course it's no guarantee, but it filters the bulk of Nigerian Prince spam.
It gets you past the junk box, but you don't have to trust it with your life.
Nuance.
Keeping a key secret is not security by obscurity, but keeping the existence of a lock secret is.
So many thoughts on that, but from my perspective - obscurity is ok, but you can not depend on it at all.
Great example is port knocking - it hides your open port from random nmap, but would you leave it as the only mechanism preventing people getting to your server? No. So does it make sense to have it? Well maybe, it's a layer.
Kerckhoffs' principle comes to my mind as well here.
So while I agree with you on that's obscurity is fine strategy, you can never depend on it ever.
>obscurity is fine strategy, you can never depend on it ever.
Right, I'm arguing that this is a property of all security mechanisms. You can never depend on a single security mechanism. Obscurity is no different. You cannot depend only on encryption, you cannot depend only on air gaps, you cannot depend only on obscurity, you cannot depend only on firewalls, you cannot depend only on user permissions, you cannot depend only on legal deterrents, you cannot depend only on legal threats, etc..
As long as you don't go into "nah, I have another protection barrier, I don't need the best possible security for my main barrier" mode...
Or in other words, if you place absolutely zero trust in it, consider it as good as broken by every single script kid, and publicly known, then yeah, it's fine.
But then, why are you investing time into it? Almost everybody that makes low-security barriers is relying on it.
> "Security by obscurity does not work"
Depends on the context and exposure. Sometimes a key under a rock is perfectly fine.
I used to work for a security company that REALLY oversold security risks to sell products.
The idea that someone was going to wardrive through your suburban neighborhood with a networked cluster of GPUs to crack your AES keys and run a MITM attack for web traffic is honestly pretty far fetched unless they are a nation-state actor.
Realistically we get into $3 wrench territory pretty quickly too.
They could also just cut and tip both ends of the Ethernet cable I have running between my house and my outbuilding too. I probably wouldn't notice if I'm asleep.
Metaforgotten, but this is a very standard attack surface, you don't need to imagine such a close tap, just imagine that at any point in the multi node internet an attacker has a node and snoops the traffic in its role as a relaying router.
With inflation looks like its now a $5 wrench :-)
https://xkcd.com/538/
AmazonBasics is good enough in this case! ;)
Obscurity can be fantastic.
One of my favorite patterns for sending large files around is to drop them in a public blob storage bucket with a type 4 guid as the name. No consumer needs to authenticate or sign in. They just need to know the resource name. After a period of time the files can be automatically expired to minimize the impact of URL sharing/stealing.
Wouldn't the blob storage host be able to see your obscure file?
I suppose if it's encrypted, no. Like the pastebin service I run, it's encrypted at rest. It doesn't even touch disks, so I mean, that's a decent answer to mine own question.
No, it's a very sensible slogan to keep people from doing a common, bad thing.
Obscurity helps cut down on noise and low effort attacks and scans. It only helps as a security mechanism in that the remaining access/error logs are both fewer and more interesting.
I definitely see it's value as a very naive recommendation to avoid someone literally relying on an algorithmic or low entropy secret. Literally something you may learn on your first class on security.
However on more advanced levels, a more common error is to ignore the risks of open source and being public. If you don't publish your source code, you are massively safer, period.
I guess your view on the subject depends on whether you think you are ahead of the curve by taking the naive interpretation. It's like investing in the stock market based on your knowledge of supply and demand.
making things obscure and hard to find is indeed a sound choice, as long as its not the single measure taken. i think people tout this sentence because its popular to say it, without thinking further.
you dont put an unauthenticated thing in a difficult to find subdomain and call it secure. but your nicely secured page is more secure if its also very tedious to find. its a less low hanging fruit.
as you state also there is always a leak needed. but dns system is quite leaky. and often sources wont fix or wont admit its even broken by their design.
strong passwords are also insecure if they leak, so you obscure them from prying eyes, securing it by obscurity.
A lot of the pushback I'm seeing is that people are assuming that you always want to make things more secure. That security is a number that needs to go up, like income or profit, as opposed to numbers that need to go down, like cost and taxes.
The possibility that I'm adding this feature to something that would otherwise have been published on a public domain does not cross people's mind, so it is not thought of an additional security measure, but a removal of a security feature.
Similarly it is assumed that there's an unauthenticated or authentication mechanism behind the subdomain. There may be a simple idempotent server running, such that there is no concern for abuse, but it may be desirable to reduce the code executed by random spearfishing scanners that only have an IP.
This brings me again to the competitive economic take on the subject, that people believe that this wisdom nugget they hold "that security by obscurity" is a valuable tennet, and they bet on it and desperately try to find someone to use it on. You can tell when a meme is overvalued because they try to use it on you even if it doesn't fit, it means they are dying to actually apply it.
My bet is that "Security through obscurity" is undervalued, not as a rule or law, or a definite thing, but as a basic correlation: keep a low profile, and you'll be safer. If you want to get more sales, you will need to be a bit more open and transparent and that will expose you to more risk, same if you want transparency for ethical or regulation reasons. You will be less obscure and you will need to compensate with additional security mechanisms.
But it seems evident to me that if you don't publish your shit, you are going to have much less risk, and need to implement less security mechanisms for the same risks as compared to voicing your infrastructure and your business, duh.
> This is one of those false voyeur OS internet tennets designed to get people to publish their stuff.
No it isn’t, it’s a push to get people to login protect whatever they want to keep to themselves.
It’s silly to say informing people that security through obscurity is a weak concept is trying to convince them to publish their stuff.
If security through obscurity didn't provide any benefit then governments wouldn't have built entire frameworks for protecting classified information.
So the only thing protecting classified docs is the public not knowing where they are? That's what security through obscurity is.
No, it's not the only thing, but it is one layer of defense in depth.
No one is saying that obfuscation should be the only layer. Your defense should never hinge on any single protection layer.
So we're all agreeing here. It's ok to hide stuff from sight, but hiding stuff from sight isn't actually security and can't replace at the very least, having password protection.
Depending on one's threat model, any technique can be a secure strategy.
Is my threat model a network of dumb nodes doing automatic port scanning? Tucking a system on an obscure IPv6 address and never sharing the address may work OK. Running some bespoke, unauthenticated SSH-over-Carrier-Pigeon (SoCP) tunnel may be fine. The adversaries in the model are pretty dumb, so intrusion detection is also easy.
But if the threat model includes any well-motivated, intelligent adversary (disgruntled peer, NSA, evil ex-boyfriend), it will probably just annoy them. And as a bonus, for my trouble, it will be harder to maintain going forward.
It's a bit more complex than that as well. You might have attackers of both types and different datapoints that have different security requirements. And these are not necessarily scalars, you may need integrity for one, privacy for the other.
Even when considering hi sophistication attackers, and perhaps especially with regards to them, you may want to leave some breadcrumbs for them to access your info.
If the deep state wants my company's info, they can safely get it by subpoenaing my provider's info, I don't need to worry about them as an attacker for privacy, as they have the access to the information if needed.
If your approach to security is to add cryptography everywhere and make everything as secure as possible and imagine that you are up against a nation-state adversary (or conversely, that you add security until you satisfy a requirement conmesurate with your adversary), then you are literally reducing one of the most important design requirements of your system to a single scalar that you attempt to maximize while not compromising other tradeoffs.
A straightforward lack of nuance. It's like having a tax strategy consisting of number go down, or pricing strategy of price go up, or cost strategy of cost go down, or risk strategy of no risk for me, etc...
Obscurity as a single control does not work. That's what the phrase hints at. In combination with other controls, it could be part of an effective defense. Context matters though.
The only thing you're definitely complicating with security by obscurity is getting a clear picture of your own security posture.
There are a number of companies, not just Palo Alto Networks, that perform various different scales of scans of the entire IPv4 space, some of them perform these scans multiple times per day.
I setup a set of scripts to log all "uninvited activity" to a couple of my systems, from which I discovered a whole bunch of these scanner "security" companies. Personally, I treat them all as malicious.
There are also services that track Newly Registered Domains (NRDs).
Tangentially:
NRD lists are useful for DNS block lists since a large number of NRDs are used for short term scam sites.
My little, very amateur, project to block them can be found here: https://github.com/UninvitedActivity/UninvitedActivity
Edited to add: Direct link to the list of scanner IP addresses (although hasn't been updated in 8 months - crikey, I've been busy longer than I thought): https://github.com/UninvitedActivity/UninvitedActivity/blob/...
Getting the domain name from the IP address is not trivial, though. In fact, it should be impossible, if the name really hasn't been published (barring guessing attempts), so OP's question stands.
The OP is misunderstanding what's happened, based on what's been posted. The OP has a server with an IP address. They're seeing GET requests in the server's logs and is assuming people have found the server's DNS name.
In fact, the scanners are simply searching the IP address space and simply sending GET requests to any IP address they find. No DNS discovery needed.
Are you sure that’s the case? IP addresses != domain, so I’m getting bots are including the Host header in their requests containing the obfuscated domain.
My guess is OP is using a public DNS server that sells aggregated user requests. All it takes is one request from their machine to a public machine on the internet, and it’s now public knowledge.
That entirely depends on whether the GET requests were providing the (supposed to be hidden) hostname in the `Host` header (and potentially SNI TLS extension).
I had this issue with internal domains indexed by Google. The domains where not published anywhere by my company. They were dcanned by leakix.net which apparently scans the whole web for vulnerabilities and publishes web pages containing the domain names associated with each IP address. I guess they read them from the certificates
There is another source, SNI certs showing up on a server or load balancer during the TLS handshake. When the client tries to connect to a server using SNI without indicating the server, some will reply with a default or give a list of valid server names.
That is when there is an explicit PTR record, for instance one of my assigned addresses can be named that way due to:
in the zone file for that IPv4, but unless they've explicitly configured, or are using a hosting service that does it without asking, this it won't be what is happening.It isn't practical to do a reverse lookup from “normal” name-to-address records like
(it is possible to build a partial reverse mapping by collecting a huge number of DNS query results, but not really practical unless you are someone like Google or Cloudflare running a popular resolution service)Not sure what you are trying to tell me. This isn't guaranteed to work. If you define a reverse lookup record for your domain, then that counts as published in my book.
This is correct.
I love how the ARPANET still lives on through reverse DNS PTRs.
https://www.youtube.com/watch?v=V78GUSOS-EM
I do something similar. Any hits on the default nginx vhost get logged, logs get parsed out and "repeat offenders" get put on the shitlist. I use ipset/iptables but this can also be done with fail2ban quite simply.
https://nbailey.ca/post/block-scanners/
This is security theater.
No, it's security by obscurity which is a single, but important, step above security theatre.
To not appear on the radar is to not invite investigation; if they can't see the door they won't try to pry it open.
If you're already on their radar, or if they already know the door is there (even if they can't directly see it), then it's less effective.
Only kinda.
Doing something like this can prevent you from showing up on Shodan.io which is used by many users/bots to find servers without running massive scans themselves.
How does an ip scan help with general DNS resolution at all?
They scan certain ports as well, which can provide them with 'fingerprints' as to what's running on those ports, which can then invite further investigation.
If ports 80 or 443 are open and there's a web server fingerprint (Apache, nginx, caddy, etc) then they could use further tools to try to discover domain names etc.
Certificate Transparency logs, or they don't actually know the domain name: just port-scanning[1] then making requests to open web ports.
[1] Turns out you can port-scan the entire internet in under 5 minutes: https://github.com/robertdavidgraham/masscan
Port scanning usually can't discover subdomains. Most servers don't expose the of the domains they server content for. In case of HTTP they usually only serve the subdomain content if the Host: request-header includes it.
I could be wrong, but the Palo Alto scanner says it's using global ipv4 space, so not using DNS at all. So actually the subdomain has not been discovered at all.
This is exactly what’s happening based on the log snippet posted. Has nothing to do with subdomains, has everything to do with it being on the internet.
How deep in the domain hierarchy you are doesn't matter from a network layer: a bare tld (yes this exists), a normal domain, a subdomain, a sub-subdomain, etc can all be assigned different IPs and go different places. You can issue a GET against / for any IP you want (like we see in the logs OP posted). The only time this would actually matter is if a host at an address is serving content for multiple hostnames and depends on the Host header to figure out which one to serve -- but even those will almost always have a default.
You can discover IP adresses, sure. Just enumerate them. But this doesn't give you the domain, as long as there is no reverse dns record.
I'm quite sure OP meant a virtual host only reachable with the correct Host: header.
Most servers just listen on :80 and respond to all requests. Almost nobody checks the host header intentionally, it's just a happy mistake if they use a reverse proxy.
You can often decloak servers behind Cloudflare because of this.
But OP's post already answered their question: someone scanned ipv4 space. And what they mean is that a server they point to via DNS is receiving requests, but DNS is a red herring.
This really depends on the setup. Most web servers host multiple virtual hosts. IP addresses are expensive.
If you're deploying a service behind a reverse proxy, it either must be only accessible from the reverse proxy via an internal network, or check the IP address of the reverse proxy. It absolutely must not trust X-Forwarded-For: headers from random IPs.
I just don't see how any of this matters. OP's server is reachable via ipv4 and someone sent an http request to it. Their post even says that this is the case.
I'm guessing they meant it discovered a virtual host behind a subdomain.
And in the case of HTTPS they need to insist on SNI (and TLSv3 requires it).
This.
I have a DNS client that feeds into my passive DNS database by reading CT logs and then trying to resolve them.
What do you use it for?
Last few times I tried to do this my ISP cut off my internet every time. Assholes. It comes back, but they're still assholes for it.
Not sure why everyone is going on about certificate transparency logs when the answer is right there in the user agent. The company is scanning the ipv4 space and came upon your IP and port.
Finding IP does not mean finding the domain. When doing HTTP request to IP you specify the domain you want to connect to. For example you can configure your /etc/hosts to have xxxnakedhamsters.google.com pointing to 8.8.8.8 and make the http request, which will cause Google getting the domain request (i.e. header Host: xxxnakedhamsters.google.com) and it will refuse it or try to redirect to http. Of course it's only related to HTTP because HTTPS will require certificate. That's why they're speaking about certificates.
But there's no evidence in the OP's post that they have, in fact, discovered the domain. The only thing posted is that there is a GET request to a listening web server.
The OP and all the people talking about certificates are making the same assumption. Namely that the scanning company discovered the DNS name for the server and tried to connect. When, if fact, they simply iterate through IP address blocks and make get requests to any listening web servers they find.
I really doubt CloudFlare gives them an IPv4 and they can see all the logs for said IPv4
OP states that the domain was discovered
No they didn't. They said "How did the internet find my subdomain?" They're assuming the internet found their subdomain. They don't provide any evidence that happened, just that they found their IP address.
Depending on the web server's configuration, you very much _can_ find the domain which is configured on an IP address, by attempting to connect to that IP address via HTTPS and seeing what certificate gets served. Here's an example:
https://138.68.161.203/
> Web sites prove their identity via certificates. Firefox does not trust this site because it uses a certificate that is not valid for 138.68.161.203. The certificate is only valid for the following names: exhaust.lewiscollard.com, www.exhaust.lewiscollard.com
I don't think that does you any good for Cloudflare, though. They will definitely be using SNI.
That doesn't really matter, though. While OP is using Cloudflare, the actual server behind it is still a publicly-accessible IP address that an IPv4 space scanner can easily stumble upon.
I misunderstood, I thought the subdomain was an R2 bucket. If it's just normal Cloudflare proxying to some backend this is probably the most likely answer.
That said, while I think it's not the case here, using Cloudflare doesn't mean the underlying host is accessible, as even on the free tier you can use Cloudflare Tunnels, which I often do.
they only state they are using cloudflare for DNS, they didn't say if they were proxying the connection
Also a valid point. I guess without more details all we can really do is speculate about the exact setup. That said, I do now agree that the most likely answer is "the underlying host was accessible and caught by an IPv4 scanner" since well, that's pretty much what it says anyway.
First thing I’d do for an IP that answers is a reverse lookup, so I expect that’s at least in the list of things they’d try.
> When doing HTTP request to IP you specify the domain you want to connect to
No, you make HTTP requests to an IP, not a domain. You convert the domain name to an IP in an earlier step (via a DNS query). You can connect to servers using their raw IPs and open ports all day if you like, which is what's happening here. Yes servers will (likely) reject the requests by looking at the host header, but they will still receive the request.
It's rather hilarious that nobody mentioned this in 7 hours. What am I missing?
~5 billion scans in a few hours is nothing for a company with decent resources. OP: in case you didn't follow, they're literally trying every possible IPv4 address and seeing if something exists on standard ports at that address.
I believe it would be harder to find out your domain that way if you were using SNI and only forwarded/served requests that used the correct host. But if you aren't using SNI, your server is probably just responding to any TLS connect request with your subdomain's cert, which will reveal your hostname.
> What am I missing?
That it was in fact mentioned many hours earlier, in more than one top level comment.
I was referring more to the fact that the user agent explicitly contained the answer, rather than suggestions that it was IP scanning. But you're right I do see one comment that mentions that. And many more likely assumed the OP already figured that part out.
The user agent contains a partial answer. IP scanning doesn't give you the actual subdomain, so the question is slightly wrong or there are missing pieces.
Judging by the logs (user agents really) right now in the submission, it's hard to tell if the requests were actually for the domain (since the request headers aren't included) or just for the IP.
Yes, that's the question being wrong option I listed.
> What am I missing?
It's very common for people to read only up to the point they feel they can comment, then skip immediately to the comment. So, basically, noone read it.
Funny, that'd be so unthinkable for me to do! But you're probably right.
Just the default hostname. It won't reveal all of them or any of the IP addresses of that box. secret-freedom-fighter.ice-cream-shop.example.com could have the same IP as example.com and you'd only know example.com
If you've got one cert with a subject alt name for each host, they'd see them all. If you use SNI and they have different certificates, the domains might still be in Certificate Transparency logs. If a wildcard cert is used, that could help to conceal the exact subdomain.
Okay. But how did they get the proper host header?
There are a couple easy possibilities depending on server config.
1. Not using SNI, and all https requests just respond with the same cert. (Example, go to https://209.216.230.207/ and you'll get a certificate error. Go to the cert details and you'll see the common name is news.ycombinator.com).
2. http upgrades to https with a redirect to the hostname, not IP address. (Example, go to http://209.216.230.207/ and you get a 301 redirect to https://news.ycombinator.com)
Could be a number of ways for example a default TLS cert, or a default vhost redirect.
I actually had a job once a few years ago where I was asked to hide a web service from crawlers and so I did some of these things to ensure no info leaked about the real vhost.
I don't think op said that they had the correct host header?
Who says they did?
Also it's Palo Alto. They're not some kiddie scripters. https://en.m.wikipedia.org/wiki/Palo_Alto_Networks
Hm?
They sell you security but provide you with CVEs en masse.
https://www.cybersecuritydive.com/news/palo-alto-networks--h...
Ah yes we all know if you sell a firewall the code has to be 100% bug free unbreakable
Looking at how they earned their 100s of CVEs, script kiddie almost looks like a compliment
Am I google when I come with the useragent 'google here, no evil'?
That perfectly fits midwit meme. Lots of people are smart enough to know transparency logs - but not smart enough to read OP post and understand the details.
The details aren't there, so it's "assume" rather than "understand".
The only proper response to OP's question is to ask for clarification: is the subdomain pointing to a separate IP? Are the logs vhost-specific or not?
If you don't get the answers, all you can do is to assume, and both assumptions may end up being right or wrong (with varying probability, perhaps).
I'm surprised nobody mentioned subfinder yet: https://github.com/projectdiscovery/subfinder
Subfinder uses different public and private sources to discover subdomains. Certificate Transparency logs are a great source, but it also has some other options.
Is it available under HTTPS? Then it's probably in a Certificate Transparency log.
Yes, https via cloudflare's automatic https. Thanks for the info.
Yeah this is a surprisingly little known fact- all certs being logged means all subdomain names get logged.
Wildcard certs can hide the subdomains, but then your cert works on all subdomains. This could be an issue if the certs get compromised.
Usually there isn’t sensitive information in subdomain names, but i suspect it often accidentally leaks information about infrastructure setups. "vaultwarden.example.com" existing tells you someone is probably running a vaultwarden instance, even if it’s not publicly accessible.
The same kind of info can leak via dns records too, I think?
> The same kind of info can leak via dns records too, I think?
That's correct "passive DNS" is sold by many large public DNS providers. They tell you (for a fee) what questions were asked and answered which meet your chosen criteria. So e.g. maybe you're interested, what questions and answers matched A? something.internal.bigcorp.example in February 2025.
They won't tell you who asked (IP address, etc.) but they're great for discovering that even though it says 404 for you, bigcorp.famous-brand-hr.example is checked regularly by somebody, probably BigCorp employees who aren't on their VPN - suggesting very strongly that although BigCorp told Famous Brand HR not to list them as a client that is in fact the HR system used by BigCorp.
I had coworkers at a previous employer go change settings in CloudFlare trying to troubleshoot instead of reaching out to me. They changed the option that caused CF proxy to issue a cert for every subdomain instead of using the wildcard. They didn't understand why I was pissed that they had now written every subdomain we had in use to the public record in addition to doing it without an approved change request.
Automated agents can tail the certificate log to discover new domains as the certs are issued. But if you want to explore subdomains manually, https://crt.sh/ is a nice tool.
If you're using infra in a way [cloudflare -> your VM] I'd recommend setting firewall on the VM in a way that it can be accessed only from Cloudflare.
This way, you will force everyone to go through Cloudflare and utilize all those fancy bot blocking features they have.
Do you know how to access these logs?
https://crt.sh/ is one example, if you sign using e.g. Let's Encrypt.
Answered below, but https://crt.sh/ is what I use.
https://www.merklemap.com/
If it is on DNS, it is discoverable. Even if it were not, the message you pasted says outright that they scan the entire IP space, so they could be hitting your server's IP without having a clue there is a subdomain serving your stuff from it.
> If it is on DNS, it is discoverable.
In the context of what OP is asking this is not true. DNS zones aren't enumerable - the only way to reliably get the complete contents of the zone is to have the SOA server approve a zone transfer and send the zone file to you. You can ask if a record in that zone exists but as a random user you can't say "hand over all records in this zone". I'd imagine that tools like Cloudflare that need this kind of functionality perform a dictionary search since they get 90% of records when importing a domain but always seem to miss inconspicuously-named ones.
> Even if it were not, the message you pasted says outright that they scan the entire IP space, so they could be hitting your server's IP without having a clue there is a subdomain serving your stuff from it.
This is likely what's happening. If the bot isn't using SNI or sending a host header then they probably found the server by IP. The fact that there's a heretofore unknown DNS record pointing to it is of no consequence. *EDIT: Or the Cert Transparency log as others have mentioned, though this isn't DNS per se. I learn something new every day :o)
> In the context of what OP is asking this is not true. DNS zones aren't enumerable - the only way to reliably get the complete contents of the zone is to have the SOA server approve a zone transfer and send the zone file to you.
This is generally true but also if you watch authoritative-only dns server logs for text strings matching ACL rejections, there's plenty of things out there which are fully automated crawlers attempting to do entire zone transfers.
There are a non zero number of improperly configured authoritative dns servers out there on the internet which will happily give away a zone transfer to anyone who asks for it, at least, apparently enough to be useful that somebody wrote crawlers for it. I would guess it's only a few percent of servers that host zonefiles but given the total size of the public Internet, that's still a lot.
In the context of DNSSEC dns zones are very much enumerable. Cloudflare does amazing tricks to avoid this https://blog.cloudflare.com/black-lies/
Cloudflare themselves gives more information here:
> NSEC3 was a “close but no cigar” solution to the problem. While it’s true that it made zone walking harder, it did not make it impossible. Zone walking with NSEC3 is still possible with a dictionary attack.
So, hardening it against enumerability is a question of inserting non-dictionary names.
Zone transfers are super interesting topic. Thanks for mentioning that.
It's basically the way how to get all DNS records a DNS server has. Interestingly in some countries this is illegal and in some this is considered best practice.
Generally, enabled zone transfers is considered as misconfiguration and should be disabled.
We did research on that few months back and found out that 8% of all global name servers have it enabled.[0]
[0] - https://reconwave.com/blog/post/alarming-prevalence-of-zone-...
That's concerning. I thought everyone knows that zone transfers should be generally disallowed, especially when coming from random hosts.
In practice it's not so far fetched: A zone transfer is just another dns query at the protocol level, i suppose you can conceptually view it as sending a file if you consider the dns response a file. Something like "host -t axfr my.domain ns1.my.domain" will show the zone depending on how a domain's name server is configured (eg in bind, allow-transfer directive can be used to make it public, require ip acl to match the query source, etc).
No sensible DNS provider has zone transfers enabled by default. OP mentioned using CloudFlare, and they certainly don't.
> in bind, allow-transfer directive
Configuring BIND as an authoritative server for a corporate domain when I was a wee lad is how I learned DNS. It was and still is bad practice to allow zone transfers without auth. If memory serves I locked it down between servers via key pairs.
If you know what to query, sure. You can't just say "give me all subdomains"; it doesn't work that way. The subdomain was discovered via certificate transparency logs.
Question: How does a subdomain get discovered by a member of the public if there are no references to it anywhere online?
The only thing I can think of that would let you do that would be a DNS zone transfer request, but those are almost always disallowed from most origin IPs.
https://en.m.wikipedia.org/wiki/DNS_zone_transfer
you also have zone walking with DNS NSEC
https://www.domaintools.com/resources/blog/zone-walking-zone...
See my comment above https://news.ycombinator.com/item?id=43289743 there are many techniques!
Certificate transparency logs.
Ahh yeah, my internet network knowledge was never super strong, and now is rusty to boot. Thanks for your note.
[dead]
Shouldn't the web server only respond to a configred domain, else 404?
Depends if it's configured like that, by default usually no
ArchiveTeam has some docs about this:
https://wiki.archiveteam.org/index.php/Finding_subdomains
I'm so often amazed (but no longer surprised) at the depth of niche (relatively) info and tools out there.
As others have said, likely cert transparency logs. Use a wildcard cert to avoid this. They are free using LetsEncrypt and possibly a couple other ACME providers. I have loads of wildcard certs. Bots will try guessing names but like you I do not use easily guessable names and the bots never find them. I log all DNS answers. I assume cloudflare supports strict-SNI but no idea if they have their own automation around wildcard certs. Sometimes I renew wildcard certs I am not even using just to give the bots something to do.
I have been just relying on CloudFlare's automatic https. But I will look into my own certs, though will likely just use CloudFlare's. I don't mind the internet knowing the subdomain I posted about; was curious how the bots found it!
Certificate Transparency would also be my guess. These are logs published by big TLS certificate issuers to cross-check and make sure they're not issuing certificates for domains they have no standing on.
The way around this is to issue a wildcard for your root domain and use that. Your main domain is discoverable but your subs aren't.
There are other routes: leaky extensions, leaky DNS servers, bad internet security system utilities that phone home about traffic. Who knows?
Unless your IP address redirects to your subdomain —not unheard of— it's not somebody IP/port scanning. Webservers don't typically leak anything about the domains they serve for.
Thanks for everyone's perspectives. Very educational and admittedly lots outside the boundaries of my current knowledge. I have thus far relied on CloudFlare's automatic https and simple instant subdomain setup for their worker microservice I'm using.
There are evidently technical/footprint implications of that convenience. Fortunately, I'm not really concerned with the subdomain being publicly known; was more curious how it become publicly known.
I had to scroll pretty far down to see the first comment refering to the second most likely leak (after certificate transparency lists): Some ISP sold their DNS query log, and your's was in it.
People buying such records do so for various reasons, for example to seed some crawler they've built.
There is a chance that your subdomain is the first/default virtual host in your web server setup (or the subdomain's access log is the default log file) so any requests to the server's IP address get logged to this virtual host. That means they didn't access your subdomain, they accessed via your server IP address but got logged in your subdomain's access log.
And this is the correct answer, thank you.
Transparency logs are fine except if you have a wildcard cert (or no https, obviously).
IP scans are just this: scans for live ports. If you do not provide a host header in your call you get whatever the default response was set up. This can be a default site, a 404 or anything else.
This discussion makes me wonder, how hard is it to find a Google Document that was shared with "Anyone with the link"?
TIL (from this thread) : You can abuse TLS handshakes to effectively reverse-DNS an IP address without ever talking to a DNS server! Is this built into dig yet? :)
(Alright, some IP addresses, not all of them)
I also wonder if this is a potential footgun for eSNI deployments: If you add eSNI support to a server, you must remember to also make regular SNI mandatory - otherwise, an eavesdropper can just ask your server nicely for the domain that the eSNI encryption was trying to hide from it.
Lifehack - it's especially awesome in cases where server operator is using self-signed certs / private cert authorities. Because you will not find these in public cert logs.
I'm having the same issue.
https://securitytrails.com/ also had my "secret" staging subdomain.
I made a catch-all certificate, so the subdomain didn't show up in CT logs.
It's still a secret to me how my subdomain ended up in their database.
They could be purchasing DNS query logs from ISPs.
Serious question: Do you really think that Cloudflare is trying to keep these kinds of thing private? If so, I'd suggest that's not a reasonable expectation.
Related question (not rhetorical). If you do DNS for subdomains yourself (and just use Cloudflare to point dns.example.com at your box) will the subdomain queries leak and show up in aggregate datasets? What I'm asking is if query recursion is always handled locally or if any of the reasonably common software stacks resolve it remotely.
As well as assuming Cloudflare sells DNS lists, it's probably safe to assume the operators of public resolvers like 8.8.8.8, 9.9.9.9 and 1.1.1.1 (that is Google, Quad9 and Cloudflare again) are looking at their logs and either selling them or using them internally.
maybe your server responded to a plain ip addressed request with the real name...
Host header is a request header, not a response one, isn't it?
He said he used a wildcard cert though. So what part of the response would contain the subdomain in that case?
Let me list some of the ways that precious subdomain could have been leaked
1) CZDS/DNS record sharing program
2) CT Logs
3) Browser SCT audit
4) Browser telemetry
5) DNS logs
6) DPI
7) Antivirus/OS telemetry
8) Virus/Malware/Tracker
9) Brute forcing DNS records
10) DNSSEC
11) Server softwares with AutoTLS
12) Servers screaming their hostnames over any protocol/banner thing
13) Typing anything on the browser search bar
14) Posting it anywhere
And many other novel ways I can't think of right now. I have successfully hidden some of my subdomains in the past but it definitely requires dedication. Simple silly mistakes can make all your efforts go waste. Ask any red/blue teamer.
Want to hide something? Roll everything on your own.
Most likely passive DNS data, if you use your subdomain you do DNS queries for it. If you use a DNS server to resolve your domains that shares this data, it can be picked up by others.
Using the Certificate Transparency logs I'd imagine.
Also note that your domains are live as they're allocated (they exist). Whether a web server or anything else actually backs them is a different question entirely.
For "secret" subdomains, you'll want a wildcard certificate. That way only that will show on the CT logs. Note that if you serve over IPv4, the underlying host will be eventually discovered anyways by brute-force host enumeration, and the domain can still be discovered using dictionary attacks / enumeration.
Never touched Cloudflare so this is as far as I can help you.
If a HTTPS service should be hard to discover, an easy way is to hide it behind a subdirectory. Something like https://subdomain.domain.example/hard_to_find_secret_string.
Another option are wildcard certificates.
This obviously can't be the only protection. But if an attacker doesn't know about a service, or misses it during discovery, they can't attack it.
Did you ever email the URL to somebody? We had the same issue years ago where google seemed to be crawling/indexing new subdomains it finds in emails.
Nope, never emailed or posted to anyone. Just me (it's my solo project at the moment).
Some CAs (Amazon) allow not publishing to the Certificate Transparency Log. But if you do this, browsers will block the connection by default. Chromium browsers have a policy option to skip this check for selected URLs. See: CertificateTransparencyEnforcementDisabledForURLs.
Some may find this more desirable than wildcard certificates and their drawbacks.
Firefox is currently rolling out the same thing. They will treat any non-publicly-logged certificate as insecure.
I’m surprised amazon offers the option to not log certificates. The whole idea is that every issued cert should get logged. That way, fraudulently-issued certs are either well documented in public logs- or at least not trusted by the browser.
It doesn't seem like the choice has any impact on that. It just protects user privacy if that's what they want to prioritize.
Depending on the issuer logging all certs would never work. You can't rely on the untrusted entity to out themselves for you.
The security comes from the browser querying the log and warning you if the entry is missing. In that sense declining to log a cert is similar to self signing one. The browser will warn and users will need to accept. As long as the vast majority of sites don't do that then we maintain a sort of herd immunity because the warnings are unexpected by the end user.
I should have included in my post, this technique only makes sense in the context of private or internal endpoints.
To avoid subdomain discovery, I usually acquire certificate domain level and add a wildcard SAN.
> Some may find this more desirable
Why?
A CISA article on wildcard security risks. Some of this is in part from common misimplementations (e.g.reusing private keys across servers), but not all of it.
https://www.cisa.gov/news-events/alerts/2021/10/08/nsa-relea... Direct: https://media.defense.gov/2021/Oct/07/2002869955/-1/-1/0/CSI...
Why not experiment with multiple variations. For example, as part of the experiment, run own DNS, use non-standard DNS encryption like CurveDNS, or even no DNS at all, use non-standard port for HTTPS, self-signed CA, TLS with no SNI extension, or even TCPCurve instead of CAs and TLS. If non-discoverability is the goal, there are inifinite ways to deviate from web developer norms.
If "the internet fails to find the subdomain" when using non-standard practices and conventions then perhaps "following the internet's recommendations", e.g., use Cloudflare, etc., might be partially at cause for discoverability.
Would be surprised if Expanse scans more than a relatively small selection of common ports.
Can I ask an adjacent question? I have a bunh of DNS A name entries for locallyaccessedservice.mydomain.tld point to my 10.0.0.x NAS's nginx reverse proxy so I can use HTTPS and DNS to access them locally and via Tailscale. My cert is for *.domain.tld. It's nothing critical and only accessible within my LAN, but is there any reason I shouldn't be doing this from a security point of view? I guess someone could phish that to another globally accessible server if DNS changed and I wouldn't notice but I don't see how that would be an issue. There are a couple nginx services exposed to public but not those specific domains so I guess that is an attack vector since.
As always, depends on your threat model. Generally having private IPs in public DNS is not great, because potential attacker gets "a general idea" how your private net looks like.
But I'd say there's no issue if everything else is secured properly.
Great thank you. I've mulled around running separate reverse proxies for public and internal services instead.
paloAlto (network devices like firewalls etc) is able to scan the sites that users want to visit behind their devices. these are very popular devices in many companies. users can also have agents installed on their computers that also have access to the sites they visit.
This is what I was thinking it must be, along the lines of Cisco NAC. Could monitor via browser plugin for full URLs or DNS server for domains.
I imagine the certificate transparency log is the avenue, but local monitoring and reporting up as a new URL or domain to scan for malware seems similarly plausible.
1) Are you sure that they are using the subdomain? They could be connecting via IP or an alternate host address.
2) Are you using TLS? Unless you are using a wildcard cert, then the FQDN will have been published as part of the certificate transparency logs.
DNS query type AXFR allows for subdomain querying. There are security restrictions around who can do it on what DNS servers. Given the number of places online one can run a subdomain query, I suspect it's mostly a matter of paying the right fees to the right DNS provider.
If you've made any kind of DNS entries involving this subdomain, then congratulations, you've notified the world of its existence. There are tools out there that leverage this information and let you get all the subdomains for a domain. Here's the first one I found in a quick search:
https://pentest-tools.com/information-gathering/find-subdoma...
This site will find any subdomain, for any domain, so long as it previously had a certificate (ssl/tls)
https://crt.sh/
This is incorrect (or at least only technically correct). This is only true for subdomains with public, trusted CA signed certificates since certificate transparency has existed and only for subdomains with a specific, non wildcard certificate.
https://crt.sh can find your subdomain only when it doesn't have a wildcard certificate(*.somedomain.com)
Thanks for mentioning. I checked it out, and am learning lots of new stuff (ie, realize how much I do not know).
Doesn’t find any of my semi secret subdomains.
Some bots scan using giant lists of subdomains, e.g. https://github.com/danielmiessler/SecLists/tree/master/Disco.... Your subdomain may be on that giant combined_subdomains list, or perhaps some other lists that other tools use.
Assuming this is not direct traffic to your IP people will say it is because of TLS logs. Maybe it is in your case. But if you spin up a CF worker on a subdomain to it you will also get hit by traffic immediately. And those certificates are wildcards. I think CF leaks subdomains in some cases. Never seen this behavior when using CF just as a DNS server though.
> I am using CloudFlare for my DNS.
Based on this it sounds like you exposed your resource and advertised it for others. Reverse dns, get IP, scan IP.
Probably simpler, you exposed resource on IPV4 publicly, if it exists, it'll be scanned. There's probably 100s of companies scanning entire 0.0.0.0/0 space at all times.
Could it be that Chrome shared the web page with advertisers?
https://www.ghacks.net/2021/03/16/wonder-about-the-data-goog...
Someone might used open-source tool like sublist3r
oh yes that for sure
yea was gonna mention this as well lol
Be careful with these. I had a subdomain like this (completely unlisted) with a Google OAuth flow on it, using a development mode Google app. Somehow, the domain was discovered, and Google decided that using their OAuth flow was a phishing scam, and delisted my entire toplevel domain as a result!
What do you mean "careful with these"? With subdomains?
Yes, unlisted subdomains. I updated my post to be clearer.
I must be missing something. What does “unlisted” mean in this context?
I have plenty of subdomains I don’t “advertise” (tell people about online) but “unlisted” is a weird thing to call those. Also I don’t see how it would matter at all when it comes to Google auth.
My guess is they blocked it based on the subdomain name itself. I made a “steamgames” subdomain to list stream games I have extra copies of (from bundles) for friends to grab for free. Less than a day after I put it up I started getting chrome scare pages. I switched it to “games” and there have been no issues.
DNS enumeration (brute force) with a good wordlist, zone transfer, or leaking the name through a certificate served when accessing your host via IP address are all possibilities.
The name "userfileupload" is far from not-obvious, so that would be my guess.
https://www.merklemap.com pops to mind.
Interesting! Just checked them out.
"MerkleMap gathers its information by continuously monitoring and live tailing Certificate Transparency (CT) logs, which are operated by organizations like Google, Cloudflare, and Let's Encrypt. "
I made this, thank you!
ICANN zone files - https://www.icann.org/resources/pages/czds-2014-03-03-en
Maybe it's a cloudflare controlled scanner?
Maybe you published the subdomain in a cert?
Snooped traffic is unlikely.
This is a good question, if you don't publish a subdomain, scanners should not reach it. If they do, there's a leak in your infra.
>I am using CloudFlare for my DNS.
Could have been discovered from the SSL cert request for the subdomain.
Others are saying CT logs but my own subdomains are on wildcard certificates, in which case I suspect they are discovered by DPI analysis of DNS traffic and resold, such as by Team Cymru.
I assume you host this with a https certificate, so you can look your subdomains at:
https://crt.sh/?q=sampledomain.com
Like people have said already; Certificate Transparency logs.
There are countless of tools to use for subdomain enumeration. I personally use subfinder or amass when doing recon on bug bounty targets.
LPT, this is an object lesson in the weakness of security through obscurity
Security by obscurity can be a great additional measure for an already secure system. It can reduce attack surface, make it less likely to get attacked in the first place. In some cases (like this one) it can also be much easier to break than expected.
I mean you could argue that this is more of a multi-factor authentication lesson.
Just knowing 1 "secret"— a subdomain in this case —shouldn't get you somewhere you shouldn't.
In general you should always assume that any password has been (or could be) compromised. So in this case, more factors should be involved such as IP restricting for access, an additional login page, certificate validation, something...
If you're using HTTPS, then you're probably using letsencrypt and so your subdomain will appear on the CT logs that are publicly accessible.
One thing you could do is use a wildcard certificate, and then use a non-obvious subdomain from that. I actually have something similar - in my set up, all my web-traffic goes to haproxy frontends which forward traffic to the appropriate backend, and I was sick of setting up multiple new certificates for each new subdomain, so I just replaced them all with a single wildcard cert instead. This means that I'm not advertising each new subdomain on the CT list, and even though they all look nominally the same when visiting - same holding page on index and same /api handling, just one of the subdomains decodes an additional URL path that provides access to status monitoring.
Separately, that Palo Alto Networks company is a real pain. They connect to absolutely everything in their attempts to spam the internet. Frankly, I'm sick of even my mail servers being bombarded with HTTP requests on port 25 and the resultant log spam.
Does the IP address for that subdomain have a DNS PTR record set? If it does, someone can discover the subdomain by querying the PTR record for the IP.
If it does, I did not set it up; it would have been automatically done by CloudFlare when I told it to use my custom subdomain for the upload urls.
Additionally to what other people said, you can assume Cloudflare is selling lists of DNS names to someone.
CSP headers can leak urls, but I assume that isn't the cause here if the subdomain is an entirely separate project
> Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple
So my guess is reverse DNS
It's pretty common to bruteforce subdomains of a domain you might be interested in, specially by attackers.
cloudflare uses certificates with numerous other site names included on the certificate as alt names so your site name could have been discovered by any other site that happens to use that same cert
Put it behind ipv6 and it won’t likely happen again. The address space is massive
If you ever email a link and it hits gmail, Google will index it.
What happens if you google your subdomain? Maybe the bots have some sort of dictionary files and they just run them, and when there is a match, then they append it with some .html extension, or maybe they prepend it to the match as a subdomain of it?
Did you send a link over Email, Whatsapp or something like?
By tightening DNS, server, and firewall configurations, you can minimize exposure of your internal subdomains to bots.
Your subdomain may have been discovered through certificate transparency logs, search engine crawling, passive DNS, https://arzhost.com/blogs/openssl-unable-to-write-random-sta... leaked links, or third-party analytics tools.
The discovery of your unpublished subdomain by bots likely stems from a combination of technical factors related to DNS, server configuration, and bot behavior. Here's a breakdown of the possible reasons and solutions:
1. DNS Leaks or Wildcard Records Wildcard DNS Entries: If your main domain (sampledomain.com) has a wildcard DNS record (e.g., .sampledomain.com), any subdomain (including userfileupload.sampledomain.com) could be automatically resolved to your server’s IP. Even if the main domain is inactive, the wildcard might expose the subdomain.
Exposed Subdomain DNS Records: If the subdomain’s DNS records (e.g., A/CNAME records) are explicitly configured but not removed, bots could reverse-engineer them via DNS queries or IP scans.
Fix: Remove or restrict wildcard DNS entries and delete unused subdomain records from your DNS provider (e.g., Cloudflare).
2. Server IP Scanning IP-Based Discovery: Bots like Expanse systematically scan IP addresses to identify active services. If your subdomain’s server is listening on ports 80/443 (HTTP/HTTPS), bots may:
Perform a port scan to detect open ports. Attempt common subdomains (e.g., userfileupload, upload, media) on the detected IP to guess valid domains. Fix:
Block unnecessary ports (e.g., close port 80/443 if unused). Use a firewall (e.g., ufw or Cloudflare Firewall Rules) to reject requests from suspicious IPs. 3. Cloudflare’s Default Behavior Page Rules or Workers: If the subdomain is configured with Cloudflare Workers, default error pages, or caching rules, it might generate responses that bots can crawl. For example:
A 404 Not Found page with a custom message could be indexed by search engines. Worker scripts might inadvertently expose endpoints (e.g., /_worker.js). Fix:
Delete unused subdomains from Cloudflare’s DNS settings. Ensure Workers/routes are only enabled for intended domains. 4. Reverse DNS Lookup IP-to-Domain Mapping: If your server’s IP address is shared or part of a broader range, bots might reverse-resolve the IP to discover associated domains (e.g., via dig -x <IP>).
Fix:
Use a dedicated IP address for sensitive subdomains. Contact your ISP to request removal from public IP databases. 5. Authentication Flaws Presigned URLs in Error Messages: If the subdomain’s server returns detailed error messages (e.g., 403 Forbidden) when accessed without authentication, bots might parse these messages to infer valid endpoints or credentials.
Fix:
Customize error pages to show generic messages (e.g., "Access Denied"). Log and block IPs attempting brute-force access. How to Prevent Future Discoveries Remove Unused DNS Records: Delete the subdomain from Cloudflare’s DNS settings entirely. Disable Wildcards: Avoid .sampledomain.com wildcards to limit exposure. Firewall Rules: Block IPs from scanners (e.g., Palo Alto Networks, Expanse) using Cloudflare’s DDoS Protection or a firewall. Monitor Logs: Use tools like grep or Cloudflare logs to track access patterns and block suspicious IPs. Use Authentication: Require API keys, tokens, or OAuth for all subdomain requests. Example Workflow for Debugging bash # Check Cloudflare DNS records for the subdomain: dig userfileupload.sampledomain.com +trace
# Inspect server logs for recent requests: grep -E "^ERROR|DENY" /var/log/nginx/access.log
# Block Expanse IPs via Cloudflare Firewall: # 1. Go to Cloudflare > Firewall > Tools. # 2. Add a custom rule to block IPs (e.g., from scaninfo@paloaltonetworks.com). By tightening DNS, server, and firewall configurations, you can minimize exposure of your internal subdomains to bots.
presumably it has a DNS record
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]