Ask HN: How did the internet discover my subdomain?

268 points by govideo 3 days ago

I have a domain that is not live. As expected, loading the domain returns: Error 1016.

However...I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com

This subdomain IS LIVE but has NOT been publicized/posted anywhere. It's a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.

I am using CloudFlare for my DNS.

How did the internet find my subdomain? Some sample user agents are: "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8", "Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36",

The bots are GET requests which are failing, as designed, but I'm wondering how the bots even knew the subdomain existed?!

yatralalala 2 days ago

Hi, our company does this basically "as-a-service".

The options how to find it are basically limitless. Best source is probably Certificate Transparency project as others suggested. But it does not end there, some other things that we do are things like internet crawl, domain bruteforcing on wildcard dns, dangling vhosts identification, default certs on servers (connect to IP on 443 and get default cert) and many others.

Security by obscurity does not work. You can not rely on "people won't find it". Once it's online, everyone can find it. No matter how you hide it.

  • no-dr-onboard 2 days ago

    Hi, former pentester here. If any one of your trusted clients is using a google/chromium based browser, the telemetry from that browser (webdiscovery) would reveal the existence of the subdomain in question. As others have said, security by obscurity doesn't work.

  • hackernewsdhsu 10 hours ago

    Security through obscurity is a tool not a solution to security.

    Use it as the last thing to do, not the 1st. If I run SSH on say 42531 it will be found, absolutely.... But 99%+ of automated scans will never see it: benefit me. But that is after all the sshd_config, PAM stuff, patching, misc hardening, etc is done 1st.

    That's a worn out example, and just a point (I run on 22)... The benefit was to me that most skiddy scanners will never see it, and if I avoid the one actor out there that's looking to mass exploit an unpublished 0day, then if it's the last thing I did, I may have bought some extra time, cause they're going for 22.

  • AtNightWeCode 2 days ago

    So to mostly prevent this.

    Disable direct IP access. Use wildcard certificates. Don't use guessable subdomains like www or mail.

    • compootr 2 days ago

      security through obscurity just isn't. keep your shiz up to date and use proper access controls!

  • cryptonector 2 days ago

    DANE would help here: register a harmless sounding domainname whose name leaks nothing, use DNSSEC and NSEC3, and host your hidden service in a sub-domain whose name is a 63 byte long string of randomly selected ASCII characters. But this isn't really an option.

    • Dylan16807 2 days ago

      Why the DNSSEC, which then requires NSEC3? Shouldn't a wildcard certificate do the job in conjunction with normal unsigned DNS?

  • 1970-01-01 2 days ago

    Subdomainfinder.com ??

    Dozens of others will also find it.

    Really, it's this simple today.

    • yatralalala 2 days ago

      Sorry for a bit of self promo, but just to explain we run https://reconwave.com/, basically EASM product but more focused on network/DNS/setup level.

      Finding all things about domains is one of the things that we do. And yes, it's very easy.

      There are many services like subdomainfinder - i.e. dnsdumpster and merklemap. We built our own as well on https://search.reconwave.com/. But it's a side project and it does not pay our bills.

    • binarymax 2 days ago

      I think your comment resulted in a hug of death for that service ;)

  • cryptonector 2 days ago

    > Security by obscurity does not work. You can not rely on "people won't find it". Once it's online, everyone can find it. No matter how you hide it.

    Especially do not name your domainnames in a way that leaks MNPI! Like, imagine if publicly traded companies A and B were discussing a merger or acquisition, do not name your domainname A-and-B.com, m'kay?

    • pacificmint 2 days ago

      Case in point: When Daimler and Chrysler merged, they had a law firm (with no other ties to either company) register the DaimlerChrysler domains weeks before the merger was made public.

      I don’t recall if anybody noticed before they went public, but as this thread shows, today it would be noticed for sure.

      • FireBeyond 6 hours ago

        One of the earlier seasons of Survivor had the winner leaked because of something similar.

        Their website had bios of every player, with a <playername>.jpg headshot. As they were voted out, their headshot was replaced with <playername>_eliminated.jpg.

        As soon as someone realized that, they entered in every player's name with _eliminated.jpg. One player had a 404 for that file.

  • amelius 2 days ago

    Well, I sure hope the remainder of my URLs are safe.

  • jiggawatts a day ago

    That’s all absolutely true, but I have found that wildcard DNS zones with wildcard certificates tend to get zero un-solicited traffic as long as the client devices are not browsers.

    I.e.: if the host is listening only to some specific host header but registered with a wildcard prefix, then drive-by attackers have no trivial way to guess the prefix.

    I would never rely on this for security, but it does help cut down on the “spam” in the request logs so that I can focus on the real errors.

    This works best for API endpoints not used by browsers or embedded into web pages.

    It’s also my current preferred setup for Internet-facing non-production sites. Otherwise they get so much attack traffic that the real log entries might be less than 0.1% of the total.

  • nkmnz 2 days ago

    [flagged]

    • ivell 2 days ago

      Irrespective of whether they are proud of what they are doing, I found the post helpful and educational. Let's not prevent people from sharing their knowledge as it might help us to protect ourselves. A consequence of such line of questioning would be that in future they would be hesitant to share their knowledge to avoid being judged.

    • tmerc 2 days ago

      Why would enumerating a wildcard dns through brute force be something that evokes pride or shame?

      • yatralalala 2 days ago

        I sadly did not see the comment above, but I'd like to just add, that this bruteforce and sniffing methods are target only against our paying customers.

        We built global reverse-DNS dataset solely from cert transparency logs. Our active scanning/bruteforcing runs only for assets owned by our customers.

        • 6stringmerc 2 days ago

          …as long as your tools are only in your hands to be used, correct? Once a tool is created and used on a machine with access to the greater internet, doesn’t your logic hold that its security is compromised inherently? Not saying you have been infiltrated, or a rogue employee has cleverly exported a copy or the methodology to duplicate it off-site, but I’m not saying that hasn’t happened either.

          • lkt 2 days ago

            You can find a dozen projects on Github that do this, it's not sensitive information that needs protecting

          • cryptonector 2 days ago

            It's not that hard to write this code. It's not a nuclear weapon.

    • lxgr 2 days ago

      Given that bad actors can also do this, I'd say that publicly advertising the fact and thereby drawing attention to misconceptions about security is a net good thing.

    • remlov 2 days ago

      If you look at the company they founded it's a service to protect yourself. Not to willy-nilly go out into the open web to find hidden subdomains.

    • BLKNSLVR 2 days ago

      I assumed they do it for customers who pay them to determine their security profile.

  • sl1ckback 2 days ago

    [flagged]

    • MyOutfitIsVague 2 days ago

      That's not what that phrase means. That's not even what the word "obscure" means. Obscurity is trying to not draw attention to something, or keep it hidden (as in "nobody knows that it's there", not "you know that it's there but can't access it"). Encryption doesn't obscure data unless you're stretching the definition of the word beyond its useful purpose.

      • elliotbnvl 2 days ago

        verb: keep from being seen; conceal.

        In what way is what he’s describing not obscurity?

        • MyOutfitIsVague 2 days ago

          Two points:

          1. Encrypted data is not hidden. You still know that there is data, it's just in a form that you can't understand. Just as difficult higher-level math isn't "obscured" from a non-mathematician (who knows that it is math, but can't decode it), encrypted data is not obscured.

          2. You could make the argument that the data is actually hidden, but the fact that data is there is not hidden. This is pointless pedantry, though. It is both contrary to the way that everybody uses the word and stretches the meaning of the word to the point that it's not useful. There is a common understanding of what "Security through obscurity" means ( https://en.wikipedia.org/wiki/Security_through_obscurity ) and interpreting it far beyond that is not useful. It simply breaks down communication into annoying semantic arguments. I enjoy semantic arguments, but not tedious, pedantic ones where one person just argues that a word isn't what everybody understands it to mean.

          More specifically, it's about WHAT is being obscured. "Security through obscurity" is about trying to be secure by keeping the details or mechanisms of a system secret, not the data itself.

          • genewitch 2 days ago

            Running your SSH server on port 8822 is security through obscurity.

            Port knocking isn't, I don't think.

        • dghlsakjg 2 days ago

          Yes that is what the word obscure means.

          But the phrase “security through obscurity” is an industry term that refers to keeping things secure purely by not letting people know they exist.

          In contrast with encryption, where I can tell you exactly where the encrypted data is, but you can’t access it.

          Security through obscurity is hiding a bicycle in a bush and hoping no one notices it, encryption is more like locking it to a bike rack with a very good lock.

        • DrammBA 2 days ago

          In every way, because context matters, and the original commenter intentionally recontextualized it just to be contrarian.

        • cnity 2 days ago

          It is about the existence or the methodology being obscured, not the contents of an encrypted message. The point of that phrase is to contrast one type of security for another. You and I can know exactly what tool was used to encrypt something, and all the mathematics behind it, but still fail to decrypt it without the requisite private key.

        • ewmiller 2 days ago

          You wouldn’t call a room behind a locked door “obscured.” Even if it’s technically correct in the most stretched definition (which I’m not convinced of), either way it’s not how people actually use the word.

        • Minor49er 2 days ago

          This was explained in the third sentence of the post that you're responding to

    • purkka 2 days ago

      "Security through obscurity" can definitely be defined in a meaningful way.

      The opposite of "bad security through obscurity" is using completely public and standard mechanisms/protocols/algorithms such as TLS, PGP or pin tumbler locks. The security then comes from the keys and other secrets, which are chosen from the space permitted by the mechanism with sufficient entropy or other desirable properties.

      The line is drawn between obscuring the mechanism, which is designed to have measurable security properties (cryptographic strength, enumeration prevention, lock security pins), and obscuring the keys that are essentially just random hidden information.

      Obscuring the mechanism provides some security as well, sure, but a public mechanism can be publicly verified to provide security based only on secret keys.

    • KaiserPro 2 days ago

      If we are going to go down this road, I want to call it occult security, because its sounds much more sexy, and its more accurate. you are casting spells and incantations to hide things from the world.

    • morellt 2 days ago

      Semantics. Considering this is your first comment ever and your account was made an hour ago I'll assume this is ragebait

    • dijksterhuis 2 days ago

      encryption obfuscates data, as in the data is completely illegible unless you have the proper keys

      > To make so confused or opaque as to be difficult to perceive or understand

      https://www.thefreedictionary.com/obfuscate

      obscuring data is different, it’s about hiding it from view or minimising the likelihood of it being found.

      > To make dim, indistinct, or impossible to see

      https://www.thefreedictionary.com/obscure

      they are two wholly different actions.

      > Tiered access controls obscure who can do what in the system.

      i’ve seen plenty of examples where an access control system explicitly says what role/tier is required. access control is for “trust” management (who do we trust with what).

    • monkaiju 2 days ago

      This is the most confidently incorrect post I've seen in a long time.

    • crazygringo 2 days ago

      > “Security through obscurity” is the only security there is.

      > Encryption obscures data.

      I don't think you understand what "security through obscurity" means. What encryption does is literally the opposite of obscure, in this context. It is out in the open and documented. And the same with the rest of your examples.

    • LaGrange 2 days ago

      Actually it's just too short. To be complete, it would have to be like "security through obsurity _OF THE MECHANISM_."

      Which basically means it was always a shit saying, like most fancy quips were.

  • TZubiri 2 days ago

    "Security by obscurity does not work"

    This is one of those false voyeur OS internet tennets designed to get people to publish their stuff.

    Obscurity is a fine strategy, if you don't post your source that's good. If you post your source, that's a risk.

    The fact that you can't rely on that security measure is just a basic security tennet that applies to everything: don't rely on a single security measure, use redundant barriers.

    Truth is we don't know how the subdomain got leaked. Subdomains can be passwords and a well crafted subdomain should not leak, if it leaks there is a reason.

    • zevlag 2 days ago

      > Subdomains can be passwords and a well crafted subdomain should not leak,

      I disagree. A subdomain is not secret in any way. There are many ways in which it is transmitted unencrypted. A couple:

      - DNS resolution, multiple resolvers and authoritative servers - TLS SNI - HTTP Host Header

      There are many middle boxes that could perform safety checks on behalf of the client, and drop it into a list to be rescanned.

      - Virus Scanners - Firewalls - Proxies

      • dharmab 2 days ago

        I once worked for a company which was using a subdomain of an internal development domain to do some completely internal security research on our own products. The entire domain got flagged in Safe Browsing despite never being exposed to the outside world. We think Chrome's telemetry flagged it, and since it was technically routable as a public IP (all public traffic on that IP was blackholed), Chrome thought it was a public website.

        • mkl95 2 days ago

          I saw a similar thing happen with a QA team's domains. Google flagged them as malicious and the company never managed to get them unflagged.

          • dharmab 2 days ago

            Our lawyers knew their lawyers so there was a friendly chat and we got added to an internal whitelist within Google.

      • TZubiri 2 days ago

        >It's not encrypted in transit

        Agree.

        But who said that all passwords or shiboleths should all be encrypted in transit?

        It can serve as a canary for someone snooping your traffic. Even if you encrypt it, you don't want people snooping.

        To date of my subdomains that I never publish, I haven't had anyone attempting to connect with them.

        It's one of those redundant measures.

        And it's also one of those risks that you take, you can maximize security by staying at home all day, but going out to take the trash is a calculated risk that you must take or risk overfocusing on security.

        It's similar to port knocking. If you are encrypting it, it's counterproductive, it's a low effort finishing touch, like a nice knot.

    • lolinder 2 days ago

      Truth is we don't know that the subdomain got leaked. The example user agent they give says that the methodology they're using is to scan the IPv4 space, which is a great example of why security through obscurity doesn't work here: The IPv4 space is tiny and trivial to scan. If your server has an IPv4 address it's not obscure, you should assume it's publicly reachable and plan accordingly.

      > Subdomains can be passwords and a well crafted subdomain should not leak, if it leaks there is a reason.

      The problem with this theory is that DNS was never designed to be secret and private and even after DNS over HTTPS it's still not designed to be private for the servers. This means that getting to "well crafted" is an incredibly difficult task with hundreds of possible failure modes which need constant maintenance and attention—not only is it complicated to get right the first time, you have to reconfigure away the failure modes on every device or even on every use of the "password".

      Here are just a few failure modes I can think of off the top of my head. Yes, these have mitigations, but it's a game of whack-a-mole and you really don't want to try it:

      * Certificate transparency logs, as mentioned.

      * A user of your "password" forgets that they didn't configure DNS over HTTPS on a new device and leaves a trail of logs through a dozen recursive DNS servers and ISPs.

      * A user has DNS over HTTPS but doesn't point it at a server within your control. One foreign server having the password is better than dozens and their ISPs, but you don't have any control over that default DNS server nor how many different servers your clients will attempt to use.

      * Browser history.

      Just don't. Work with the grain, assume the subdomain is public and secure your site accordingly.

      • immibis 2 days ago

        > The IPv4 space is tiny and trivial to scan

        Something many people don't expect is that the IPv6 space is also tiny and trivial to scan, if you follow certain patterns.

        For example, many server hosts give you a /48 or /64 subnet, and your server is at your prefix::1 by default. If they have a /24 and they give you a /48, someone only has to scan 2^24 addresses at that host to find all the ones using prefix::1.

        • Sayrus 2 days ago

          Assuming everyone is using /48 and binding to prefix::1, that's a 2^16 difference with scanning the IPv4 address space. Assuming a specific host with only one IPv6 /24 block and delegating /64, this is a 2^12 difference. Scanning for /64 on the entire IPv6 space is definitely not as tiny.

          AWS only allows routing /80 to EC2 instances making a huge difference.

          It doesn't mean that we should rely on obscurity, but the entire space is not tiny as IPv4 was.

          • TZubiri 2 days ago

            Interesting, so you may see the Ipv6 space as a tree, and go just for the first addresses of the block.

            But if you just choose a random address you would enjoy a bit more immunity from brute force scanners here.

        • AStonesThrow 2 days ago

          IPv6 address space may be trivial from this perspective, but imagine trying to establish two-way contact with a user on a smartphone on a mobile network. Or a user whose Interface ID (64 bits) is regenerated randomly every few hours.

          Just try leaving a User Talk page message on Wikipedia, and good luck if the editor even notices, or anyone finds that talk page again, before the MediaWiki privacy measures are implemented.

    • lyu07282 2 days ago

      > Obscurity is a fine strategy

      > Subdomains can be passwords and a well crafted subdomain should not leak

      Your comment is really odd to read I'm not sure I understand you, but I'm sure you don't mean it like that. Just to re-iterate the important points:

      1. Do not rely on subdomains for security, subdomains can easily leak in innumerable ways including in ways outside of your control.

      2. Security by obscurity must never be relied on for security but can be part of a larger defense in depth strategy.

      ---

      https://cwe.mitre.org/data/definitions/656.html

      > This reliance on "security through obscurity" can produce resultant weaknesses if an attacker is able to reverse engineer the inner workings of the mechanism. Note that obscurity can be one small part of defense in depth, since it can create more work for an attacker; however, it is a significant risk if used as the primary means of protection.

      • TZubiri 2 days ago

        It's a pretty weak cve category.

        "The product uses a protection mechanism whose strength depends heavily on its obscurity, such that knowledge of its algorithms or key data is sufficient to defeat the mechanism."

        If you can defeat the mechanism, that's not very impactful if it's one stage of a multi-round mechanism. Especially if vulnerating or crossing that perimeter alerts the admin!

        Lots of uncreative blue teamers here

    • Diggsey 2 days ago

      This is the worst take...

      People consistently misuse the Swiss cheese security metaphor to justify putting multiple ineffective security barriers in place.

      The holes in the cheese are supposed to represent unknown or very difficult to exploit flaws in your security layers, and that's why you ideally want multiple layers.

      You can't just stack up multiple known to be broken layers and call something secure. The extra layers are inconvenient to users and readily bypassed by attackers by simply tackling them one at a time.

      Security by obscurity is one such layer.

      • genewitch 2 days ago

        I've heard that Swiss cheese analogy when it comes to the seasoning on a cast iron pan.

        Even if you have tons and tons of layers of seasoning, you still don't put tomato sauce or whatever on it.

      • TZubiri 2 days ago

        So according to you, a picket fence or a wire fence is just a useless thing that makes things less usable by users?

        Security does not consist only of 100% or 99.99% effective mechanisms, there needs to be a flow of information and an inherent risk, if you are only designing absolute barriers, then you are rarely considering the actual surface of relevant user interactions. A life form consisting only of skin might be very secure, but it's practically useless.

    • wolrah 2 days ago

      > "Security by obscurity does not work"

      The saying is "security by obscurity is not security" which is absolutely true.

      If your security relies on the attacker not finding it or not knowing how it works, it's not actually secure.

      Obscurity has its own value of course, I strongly recommend running any service that's likely to be scanned for regularly on non-standard ports wherever practical simply to reduce the number of connection logs you need to sort through. Obscurity works for what it actually offers. That has nothing to do with security though, and unfortunately it's hard in cases where a human is likely to want to type in your service address because most user-facing services have little to no support for SRV records.

      Two of the few services that do have widespread SRV support are SIP VoIP and Minecraft, and coincidentally the former is my day job while I've also run a personal Minecraft server for over a decade. I can say that the couple of systems I still have running public-facing SIP on port 5060 get scanned tens of thousands of times per hour while the ones running on non-standard ports get maybe one or two activations of fail2ban a month. Likewise my Minecraft server has never seen a single probe from anyone other than an actual player.

      • TZubiri 2 days ago

        >"If your security relies on "

        Again, if your security relies on any one thing, it's a problem. A secure system needs redundant mechanisms.

        Can you think of a single mechanism that if implemented would make a system secure? I think not.

        • genewitch 2 days ago

          Sure, a 12 gauge slug right through the processor.

          • TZubiri a day ago

            Good measure, but you may also want to keep some unslugged processors in case you need to counterattack.

            Q.E.D

    • 1970-01-01 2 days ago

      It's become an anti-cliche. Security via obscure technique is a valid security layer in the exact same way a physical lock tumbler will not unlock when any random key is inserted and twisted. It's not great but it's not terrible and it does a fine job until someone picks or breaks it open.

      • gitgud 2 days ago

        I don’t think that analogy works well, a subdomain that is not published is more like hiding the key to the front door in the garden somewhere… does a fine job of keeping the house secure until someone finds it…

        • TZubiri 2 days ago

          Terrible analogy.

          Why not use letters and packages which is the literal metaphor these services were built on?

          It's like relying on public header information to determine whether an incoming letter or package is legitimate.

          If it says: To "Name LastName" or "Company", then it's probably legitimate. Of course it's no guarantee, but it filters the bulk of Nigerian Prince spam.

          It gets you past the junk box, but you don't have to trust it with your life.

          Nuance.

      • lxgr 2 days ago

        Keeping a key secret is not security by obscurity, but keeping the existence of a lock secret is.

    • yatralalala 2 days ago

      So many thoughts on that, but from my perspective - obscurity is ok, but you can not depend on it at all.

      Great example is port knocking - it hides your open port from random nmap, but would you leave it as the only mechanism preventing people getting to your server? No. So does it make sense to have it? Well maybe, it's a layer.

      Kerckhoffs' principle comes to my mind as well here.

      So while I agree with you on that's obscurity is fine strategy, you can never depend on it ever.

      • TZubiri a day ago

        >obscurity is fine strategy, you can never depend on it ever.

        Right, I'm arguing that this is a property of all security mechanisms. You can never depend on a single security mechanism. Obscurity is no different. You cannot depend only on encryption, you cannot depend only on air gaps, you cannot depend only on obscurity, you cannot depend only on firewalls, you cannot depend only on user permissions, you cannot depend only on legal deterrents, you cannot depend only on legal threats, etc..

      • marcosdumay 2 days ago

        As long as you don't go into "nah, I have another protection barrier, I don't need the best possible security for my main barrier" mode...

        Or in other words, if you place absolutely zero trust in it, consider it as good as broken by every single script kid, and publicly known, then yeah, it's fine.

        But then, why are you investing time into it? Almost everybody that makes low-security barriers is relying on it.

    • legitster 2 days ago

      > "Security by obscurity does not work"

      Depends on the context and exposure. Sometimes a key under a rock is perfectly fine.

      I used to work for a security company that REALLY oversold security risks to sell products.

      The idea that someone was going to wardrive through your suburban neighborhood with a networked cluster of GPUs to crack your AES keys and run a MITM attack for web traffic is honestly pretty far fetched unless they are a nation-state actor.

      • natebc 2 days ago

        Realistically we get into $3 wrench territory pretty quickly too.

        • throwway120385 2 days ago

          They could also just cut and tip both ends of the Ethernet cable I have running between my house and my outbuilding too. I probably wouldn't notice if I'm asleep.

          • TZubiri 2 days ago

            Metaforgotten, but this is a very standard attack surface, you don't need to imagine such a close tap, just imagine that at any point in the multi node internet an attacker has a node and snoops the traffic in its role as a relaying router.

    • bob1029 2 days ago

      Obscurity can be fantastic.

      One of my favorite patterns for sending large files around is to drop them in a public blob storage bucket with a type 4 guid as the name. No consumer needs to authenticate or sign in. They just need to know the resource name. After a period of time the files can be automatically expired to minimize the impact of URL sharing/stealing.

      • genewitch 2 days ago

        Wouldn't the blob storage host be able to see your obscure file?

        I suppose if it's encrypted, no. Like the pastebin service I run, it's encrypted at rest. It doesn't even touch disks, so I mean, that's a decent answer to mine own question.

    • unethical_ban 2 days ago

      No, it's a very sensible slogan to keep people from doing a common, bad thing.

      Obscurity helps cut down on noise and low effort attacks and scans. It only helps as a security mechanism in that the remaining access/error logs are both fewer and more interesting.

      • TZubiri 2 days ago

        I definitely see it's value as a very naive recommendation to avoid someone literally relying on an algorithmic or low entropy secret. Literally something you may learn on your first class on security.

        However on more advanced levels, a more common error is to ignore the risks of open source and being public. If you don't publish your source code, you are massively safer, period.

        I guess your view on the subject depends on whether you think you are ahead of the curve by taking the naive interpretation. It's like investing in the stock market based on your knowledge of supply and demand.

    • sim7c00 2 days ago

      making things obscure and hard to find is indeed a sound choice, as long as its not the single measure taken. i think people tout this sentence because its popular to say it, without thinking further.

      you dont put an unauthenticated thing in a difficult to find subdomain and call it secure. but your nicely secured page is more secure if its also very tedious to find. its a less low hanging fruit.

      as you state also there is always a leak needed. but dns system is quite leaky. and often sources wont fix or wont admit its even broken by their design.

      strong passwords are also insecure if they leak, so you obscure them from prying eyes, securing it by obscurity.

      • TZubiri 2 days ago

        A lot of the pushback I'm seeing is that people are assuming that you always want to make things more secure. That security is a number that needs to go up, like income or profit, as opposed to numbers that need to go down, like cost and taxes.

        The possibility that I'm adding this feature to something that would otherwise have been published on a public domain does not cross people's mind, so it is not thought of an additional security measure, but a removal of a security feature.

        Similarly it is assumed that there's an unauthenticated or authentication mechanism behind the subdomain. There may be a simple idempotent server running, such that there is no concern for abuse, but it may be desirable to reduce the code executed by random spearfishing scanners that only have an IP.

        This brings me again to the competitive economic take on the subject, that people believe that this wisdom nugget they hold "that security by obscurity" is a valuable tennet, and they bet on it and desperately try to find someone to use it on. You can tell when a meme is overvalued because they try to use it on you even if it doesn't fit, it means they are dying to actually apply it.

        My bet is that "Security through obscurity" is undervalued, not as a rule or law, or a definite thing, but as a basic correlation: keep a low profile, and you'll be safer. If you want to get more sales, you will need to be a bit more open and transparent and that will expose you to more risk, same if you want transparency for ethical or regulation reasons. You will be less obscure and you will need to compensate with additional security mechanisms.

        But it seems evident to me that if you don't publish your shit, you are going to have much less risk, and need to implement less security mechanisms for the same risks as compared to voicing your infrastructure and your business, duh.

    • yapyap 2 days ago

      > This is one of those false voyeur OS internet tennets designed to get people to publish their stuff.

      No it isn’t, it’s a push to get people to login protect whatever they want to keep to themselves.

      It’s silly to say informing people that security through obscurity is a weak concept is trying to convince them to publish their stuff.

      • HeatrayEnjoyer 2 days ago

        If security through obscurity didn't provide any benefit then governments wouldn't have built entire frameworks for protecting classified information.

        • ehutch79 2 days ago

          So the only thing protecting classified docs is the public not knowing where they are? That's what security through obscurity is.

          • HeatrayEnjoyer 2 days ago

            No, it's not the only thing, but it is one layer of defense in depth.

            No one is saying that obfuscation should be the only layer. Your defense should never hinge on any single protection layer.

            • ehutch79 a day ago

              So we're all agreeing here. It's ok to hide stuff from sight, but hiding stuff from sight isn't actually security and can't replace at the very least, having password protection.

    • 0hijinks 2 days ago

      Depending on one's threat model, any technique can be a secure strategy.

      Is my threat model a network of dumb nodes doing automatic port scanning? Tucking a system on an obscure IPv6 address and never sharing the address may work OK. Running some bespoke, unauthenticated SSH-over-Carrier-Pigeon (SoCP) tunnel may be fine. The adversaries in the model are pretty dumb, so intrusion detection is also easy.

      But if the threat model includes any well-motivated, intelligent adversary (disgruntled peer, NSA, evil ex-boyfriend), it will probably just annoy them. And as a bonus, for my trouble, it will be harder to maintain going forward.

      • TZubiri 2 days ago

        It's a bit more complex than that as well. You might have attackers of both types and different datapoints that have different security requirements. And these are not necessarily scalars, you may need integrity for one, privacy for the other.

        Even when considering hi sophistication attackers, and perhaps especially with regards to them, you may want to leave some breadcrumbs for them to access your info.

        If the deep state wants my company's info, they can safely get it by subpoenaing my provider's info, I don't need to worry about them as an attacker for privacy, as they have the access to the information if needed.

        If your approach to security is to add cryptography everywhere and make everything as secure as possible and imagine that you are up against a nation-state adversary (or conversely, that you add security until you satisfy a requirement conmesurate with your adversary), then you are literally reducing one of the most important design requirements of your system to a single scalar that you attempt to maximize while not compromising other tradeoffs.

        A straightforward lack of nuance. It's like having a tax strategy consisting of number go down, or pricing strategy of price go up, or cost strategy of cost go down, or risk strategy of no risk for me, etc...

    • batch12 2 days ago

      Obscurity as a single control does not work. That's what the phrase hints at. In combination with other controls, it could be part of an effective defense. Context matters though.

    • lxgr 2 days ago

      The only thing you're definitely complicating with security by obscurity is getting a clear picture of your own security posture.

BLKNSLVR 3 days ago

There are a number of companies, not just Palo Alto Networks, that perform various different scales of scans of the entire IPv4 space, some of them perform these scans multiple times per day.

I setup a set of scripts to log all "uninvited activity" to a couple of my systems, from which I discovered a whole bunch of these scanner "security" companies. Personally, I treat them all as malicious.

There are also services that track Newly Registered Domains (NRDs).

Tangentially:

NRD lists are useful for DNS block lists since a large number of NRDs are used for short term scam sites.

My little, very amateur, project to block them can be found here: https://github.com/UninvitedActivity/UninvitedActivity

Edited to add: Direct link to the list of scanner IP addresses (although hasn't been updated in 8 months - crikey, I've been busy longer than I thought): https://github.com/UninvitedActivity/UninvitedActivity/blob/...

  • mr_mitm 3 days ago

    Getting the domain name from the IP address is not trivial, though. In fact, it should be impossible, if the name really hasn't been published (barring guessing attempts), so OP's question stands.

    • melevittfl 3 days ago

      The OP is misunderstanding what's happened, based on what's been posted. The OP has a server with an IP address. They're seeing GET requests in the server's logs and is assuming people have found the server's DNS name.

      In fact, the scanners are simply searching the IP address space and simply sending GET requests to any IP address they find. No DNS discovery needed.

      • alfiedotwtf 2 days ago

        Are you sure that’s the case? IP addresses != domain, so I’m getting bots are including the Host header in their requests containing the obfuscated domain.

        My guess is OP is using a public DNS server that sells aggregated user requests. All it takes is one request from their machine to a public machine on the internet, and it’s now public knowledge.

      • lxgr 2 days ago

        That entirely depends on whether the GET requests were providing the (supposed to be hidden) hostname in the `Host` header (and potentially SNI TLS extension).

    • venj 3 days ago

      I had this issue with internal domains indexed by Google. The domains where not published anywhere by my company. They were dcanned by leakix.net which apparently scans the whole web for vulnerabilities and publishes web pages containing the domain names associated with each IP address. I guess they read them from the certificates

      • jhart99 2 days ago

        There is another source, SNI certs showing up on a server or load balancer during the TLS handshake. When the client tries to connect to a server using SNI without indicating the server, some will reply with a default or give a list of valid server names.

    • okasaki 2 days ago

          $ host 209.216.230.207
          207.230.216.209.in-addr.arpa domain name pointer news.ycombinator.com.
      • dspillett 2 days ago

        That is when there is an explicit PTR record, for instance one of my assigned addresses can be named that way due to:

            74.231.187.81.in-addr.arpa. 3600 IN PTR ns2.nogoodnamesareleft.com.
        
        in the zone file for that IPv4, but unless they've explicitly configured, or are using a hosting service that does it without asking, this it won't be what is happening.

        It isn't practical to do a reverse lookup from “normal” name-to-address records like

            ns2.nogoodnamesareleft.com. IN A 81.187.231.74
        
        (it is possible to build a partial reverse mapping by collecting a huge number of DNS query results, but not really practical unless you are someone like Google or Cloudflare running a popular resolution service)
      • mr_mitm 2 days ago

        Not sure what you are trying to tell me. This isn't guaranteed to work. If you define a reverse lookup record for your domain, then that counts as published in my book.

  • yabones 2 days ago

    I do something similar. Any hits on the default nginx vhost get logged, logs get parsed out and "repeat offenders" get put on the shitlist. I use ipset/iptables but this can also be done with fail2ban quite simply.

    https://nbailey.ca/post/block-scanners/

    • immibis 2 days ago

      This is security theater.

      • BLKNSLVR 2 days ago

        No, it's security by obscurity which is a single, but important, step above security theatre.

        To not appear on the radar is to not invite investigation; if they can't see the door they won't try to pry it open.

        If you're already on their radar, or if they already know the door is there (even if they can't directly see it), then it's less effective.

      • Sohcahtoa82 2 days ago

        Only kinda.

        Doing something like this can prevent you from showing up on Shodan.io which is used by many users/bots to find servers without running massive scans themselves.

  • drpossum 2 days ago

    How does an ip scan help with general DNS resolution at all?

    • BLKNSLVR 2 days ago

      They scan certain ports as well, which can provide them with 'fingerprints' as to what's running on those ports, which can then invite further investigation.

      If ports 80 or 443 are open and there's a web server fingerprint (Apache, nginx, caddy, etc) then they could use further tools to try to discover domain names etc.

parliament32 3 days ago

Certificate Transparency logs, or they don't actually know the domain name: just port-scanning[1] then making requests to open web ports.

[1] Turns out you can port-scan the entire internet in under 5 minutes: https://github.com/robertdavidgraham/masscan

  • andix 3 days ago

    Port scanning usually can't discover subdomains. Most servers don't expose the of the domains they server content for. In case of HTTP they usually only serve the subdomain content if the Host: request-header includes it.

    • benfortuna 3 days ago

      I could be wrong, but the Palo Alto scanner says it's using global ipv4 space, so not using DNS at all. So actually the subdomain has not been discovered at all.

      • reactordev 3 days ago

        This is exactly what’s happening based on the log snippet posted. Has nothing to do with subdomains, has everything to do with it being on the internet.

    • parliament32 3 days ago

      How deep in the domain hierarchy you are doesn't matter from a network layer: a bare tld (yes this exists), a normal domain, a subdomain, a sub-subdomain, etc can all be assigned different IPs and go different places. You can issue a GET against / for any IP you want (like we see in the logs OP posted). The only time this would actually matter is if a host at an address is serving content for multiple hostnames and depends on the Host header to figure out which one to serve -- but even those will almost always have a default.

      • andix 3 days ago

        You can discover IP adresses, sure. Just enumerate them. But this doesn't give you the domain, as long as there is no reverse dns record.

        I'm quite sure OP meant a virtual host only reachable with the correct Host: header.

    • hombre_fatal 3 days ago

      Most servers just listen on :80 and respond to all requests. Almost nobody checks the host header intentionally, it's just a happy mistake if they use a reverse proxy.

      You can often decloak servers behind Cloudflare because of this.

      But OP's post already answered their question: someone scanned ipv4 space. And what they mean is that a server they point to via DNS is receiving requests, but DNS is a red herring.

      • andix 3 days ago

        This really depends on the setup. Most web servers host multiple virtual hosts. IP addresses are expensive.

        If you're deploying a service behind a reverse proxy, it either must be only accessible from the reverse proxy via an internal network, or check the IP address of the reverse proxy. It absolutely must not trust X-Forwarded-For: headers from random IPs.

        • hombre_fatal 3 days ago

          I just don't see how any of this matters. OP's server is reachable via ipv4 and someone sent an http request to it. Their post even says that this is the case.

          • andix 3 days ago

            I'm guessing they meant it discovered a virtual host behind a subdomain.

    • cryptonector 2 days ago

      And in the case of HTTPS they need to insist on SNI (and TLSv3 requires it).

  • icehawk 3 days ago

    This.

    I have a DNS client that feeds into my passive DNS database by reading CT logs and then trying to resolve them.

  • giancarlostoro 3 days ago

    Last few times I tried to do this my ISP cut off my internet every time. Assholes. It comes back, but they're still assholes for it.

paxys 3 days ago

Not sure why everyone is going on about certificate transparency logs when the answer is right there in the user agent. The company is scanning the ipv4 space and came upon your IP and port.

  • p0w3n3d 3 days ago

    Finding IP does not mean finding the domain. When doing HTTP request to IP you specify the domain you want to connect to. For example you can configure your /etc/hosts to have xxxnakedhamsters.google.com pointing to 8.8.8.8 and make the http request, which will cause Google getting the domain request (i.e. header Host: xxxnakedhamsters.google.com) and it will refuse it or try to redirect to http. Of course it's only related to HTTP because HTTPS will require certificate. That's why they're speaking about certificates.

    • melevittfl 3 days ago

      But there's no evidence in the OP's post that they have, in fact, discovered the domain. The only thing posted is that there is a GET request to a listening web server.

      The OP and all the people talking about certificates are making the same assumption. Namely that the scanning company discovered the DNS name for the server and tried to connect. When, if fact, they simply iterate through IP address blocks and make get requests to any listening web servers they find.

      • denysvitali 2 days ago

        I really doubt CloudFlare gives them an IPv4 and they can see all the logs for said IPv4

      • p0w3n3d 2 days ago

        OP states that the domain was discovered

        • crazygringo 2 days ago

          No they didn't. They said "How did the internet find my subdomain?" They're assuming the internet found their subdomain. They don't provide any evidence that happened, just that they found their IP address.

    • lewiscollard 3 days ago

      Depending on the web server's configuration, you very much _can_ find the domain which is configured on an IP address, by attempting to connect to that IP address via HTTPS and seeing what certificate gets served. Here's an example:

      https://138.68.161.203/

      > Web sites prove their identity via certificates. Firefox does not trust this site because it uses a certificate that is not valid for 138.68.161.203. The certificate is only valid for the following names: exhaust.lewiscollard.com, www.exhaust.lewiscollard.com

      • jchw 3 days ago

        I don't think that does you any good for Cloudflare, though. They will definitely be using SNI.

        • kelnos 3 days ago

          That doesn't really matter, though. While OP is using Cloudflare, the actual server behind it is still a publicly-accessible IP address that an IPv4 space scanner can easily stumble upon.

          • jchw 2 days ago

            I misunderstood, I thought the subdomain was an R2 bucket. If it's just normal Cloudflare proxying to some backend this is probably the most likely answer.

            That said, while I think it's not the case here, using Cloudflare doesn't mean the underlying host is accessible, as even on the free tier you can use Cloudflare Tunnels, which I often do.

            • ratg13 2 days ago

              they only state they are using cloudflare for DNS, they didn't say if they were proxying the connection

              • jchw 2 days ago

                Also a valid point. I guess without more details all we can really do is speculate about the exact setup. That said, I do now agree that the most likely answer is "the underlying host was accessible and caught by an IPv4 scanner" since well, that's pretty much what it says anyway.

    • ghusto 3 days ago

      First thing I’d do for an IP that answers is a reverse lookup, so I expect that’s at least in the list of things they’d try.

    • paxys 2 days ago

      > When doing HTTP request to IP you specify the domain you want to connect to

      No, you make HTTP requests to an IP, not a domain. You convert the domain name to an IP in an earlier step (via a DNS query). You can connect to servers using their raw IPs and open ports all day if you like, which is what's happening here. Yes servers will (likely) reject the requests by looking at the host header, but they will still receive the request.

  • peeters 3 days ago

    It's rather hilarious that nobody mentioned this in 7 hours. What am I missing?

    ~5 billion scans in a few hours is nothing for a company with decent resources. OP: in case you didn't follow, they're literally trying every possible IPv4 address and seeing if something exists on standard ports at that address.

    I believe it would be harder to find out your domain that way if you were using SNI and only forwarded/served requests that used the correct host. But if you aren't using SNI, your server is probably just responding to any TLS connect request with your subdomain's cert, which will reveal your hostname.

    • Dylan16807 3 days ago

      > What am I missing?

      That it was in fact mentioned many hours earlier, in more than one top level comment.

      • peeters 3 days ago

        I was referring more to the fact that the user agent explicitly contained the answer, rather than suggestions that it was IP scanning. But you're right I do see one comment that mentions that. And many more likely assumed the OP already figured that part out.

        • Dylan16807 3 days ago

          The user agent contains a partial answer. IP scanning doesn't give you the actual subdomain, so the question is slightly wrong or there are missing pieces.

          • diggan 3 days ago

            Judging by the logs (user agents really) right now in the submission, it's hard to tell if the requests were actually for the domain (since the request headers aren't included) or just for the IP.

            • Dylan16807 2 days ago

              Yes, that's the question being wrong option I listed.

    • globular-toast 3 days ago

      > What am I missing?

      It's very common for people to read only up to the point they feel they can comment, then skip immediately to the comment. So, basically, noone read it.

      • flemhans 2 days ago

        Funny, that'd be so unthinkable for me to do! But you're probably right.

    • fragmede 3 days ago

      Just the default hostname. It won't reveal all of them or any of the IP addresses of that box. secret-freedom-fighter.ice-cream-shop.example.com could have the same IP as example.com and you'd only know example.com

      • A1kmm 3 days ago

        If you've got one cert with a subject alt name for each host, they'd see them all. If you use SNI and they have different certificates, the domains might still be in Certificate Transparency logs. If a wildcard cert is used, that could help to conceal the exact subdomain.

  • pkulak 3 days ago

    Okay. But how did they get the proper host header?

    • peeters 3 days ago

      There are a couple easy possibilities depending on server config.

      1. Not using SNI, and all https requests just respond with the same cert. (Example, go to https://209.216.230.207/ and you'll get a certificate error. Go to the cert details and you'll see the common name is news.ycombinator.com).

      2. http upgrades to https with a redirect to the hostname, not IP address. (Example, go to http://209.216.230.207/ and you get a 301 redirect to https://news.ycombinator.com)

    • INTPenis 3 days ago

      Could be a number of ways for example a default TLS cert, or a default vhost redirect.

      I actually had a job once a few years ago where I was asked to hide a web service from crawlers and so I did some of these things to ensure no info leaked about the real vhost.

    • jimnotgym 3 days ago

      I don't think op said that they had the correct host header?

    • paxys 2 days ago

      Who says they did?

  • 4ndrewl 3 days ago

    Also it's Palo Alto. They're not some kiddie scripters. https://en.m.wikipedia.org/wiki/Palo_Alto_Networks

  • ozim 3 days ago

    That perfectly fits midwit meme. Lots of people are smart enough to know transparency logs - but not smart enough to read OP post and understand the details.

    • seba_dos1 2 days ago

      The details aren't there, so it's "assume" rather than "understand".

      The only proper response to OP's question is to ask for clarification: is the subdomain pointing to a separate IP? Are the logs vhost-specific or not?

      If you don't get the answers, all you can do is to assume, and both assumptions may end up being right or wrong (with varying probability, perhaps).

andix 3 days ago

I'm surprised nobody mentioned subfinder yet: https://github.com/projectdiscovery/subfinder

Subfinder uses different public and private sources to discover subdomains. Certificate Transparency logs are a great source, but it also has some other options.

Kikawala 3 days ago

Is it available under HTTPS? Then it's probably in a Certificate Transparency log.

  • govideo 3 days ago

    Yes, https via cloudflare's automatic https. Thanks for the info.

    • snailmailman 3 days ago

      Yeah this is a surprisingly little known fact- all certs being logged means all subdomain names get logged.

      Wildcard certs can hide the subdomains, but then your cert works on all subdomains. This could be an issue if the certs get compromised.

      Usually there isn’t sensitive information in subdomain names, but i suspect it often accidentally leaks information about infrastructure setups. "vaultwarden.example.com" existing tells you someone is probably running a vaultwarden instance, even if it’s not publicly accessible.

      The same kind of info can leak via dns records too, I think?

      • tialaramex 3 days ago

        > The same kind of info can leak via dns records too, I think?

        That's correct "passive DNS" is sold by many large public DNS providers. They tell you (for a fee) what questions were asked and answered which meet your chosen criteria. So e.g. maybe you're interested, what questions and answers matched A? something.internal.bigcorp.example in February 2025.

        They won't tell you who asked (IP address, etc.) but they're great for discovering that even though it says 404 for you, bigcorp.famous-brand-hr.example is checked regularly by somebody, probably BigCorp employees who aren't on their VPN - suggesting very strongly that although BigCorp told Famous Brand HR not to list them as a client that is in fact the HR system used by BigCorp.

      • Arrowmaster 2 days ago

        I had coworkers at a previous employer go change settings in CloudFlare trying to troubleshoot instead of reaching out to me. They changed the option that caused CF proxy to issue a cert for every subdomain instead of using the wildcard. They didn't understand why I was pissed that they had now written every subdomain we had in use to the public record in addition to doing it without an approved change request.

    • thisisgvrt 3 days ago

      Automated agents can tail the certificate log to discover new domains as the certs are issued. But if you want to explore subdomains manually, https://crt.sh/ is a nice tool.

    • yatralalala 2 days ago

      If you're using infra in a way [cloudflare -> your VM] I'd recommend setting firewall on the VM in a way that it can be accessed only from Cloudflare.

      This way, you will force everyone to go through Cloudflare and utilize all those fancy bot blocking features they have.

codingdave 3 days ago

If it is on DNS, it is discoverable. Even if it were not, the message you pasted says outright that they scan the entire IP space, so they could be hitting your server's IP without having a clue there is a subdomain serving your stuff from it.

  • alexjplant 3 days ago

    > If it is on DNS, it is discoverable.

    In the context of what OP is asking this is not true. DNS zones aren't enumerable - the only way to reliably get the complete contents of the zone is to have the SOA server approve a zone transfer and send the zone file to you. You can ask if a record in that zone exists but as a random user you can't say "hand over all records in this zone". I'd imagine that tools like Cloudflare that need this kind of functionality perform a dictionary search since they get 90% of records when importing a domain but always seem to miss inconspicuously-named ones.

    > Even if it were not, the message you pasted says outright that they scan the entire IP space, so they could be hitting your server's IP without having a clue there is a subdomain serving your stuff from it.

    This is likely what's happening. If the bot isn't using SNI or sending a host header then they probably found the server by IP. The fact that there's a heretofore unknown DNS record pointing to it is of no consequence. *EDIT: Or the Cert Transparency log as others have mentioned, though this isn't DNS per se. I learn something new every day :o)

    • walrus01 3 days ago

      > In the context of what OP is asking this is not true. DNS zones aren't enumerable - the only way to reliably get the complete contents of the zone is to have the SOA server approve a zone transfer and send the zone file to you.

      This is generally true but also if you watch authoritative-only dns server logs for text strings matching ACL rejections, there's plenty of things out there which are fully automated crawlers attempting to do entire zone transfers.

      There are a non zero number of improperly configured authoritative dns servers out there on the internet which will happily give away a zone transfer to anyone who asks for it, at least, apparently enough to be useful that somebody wrote crawlers for it. I would guess it's only a few percent of servers that host zonefiles but given the total size of the public Internet, that's still a lot.

      • majke 3 days ago

        In the context of DNSSEC dns zones are very much enumerable. Cloudflare does amazing tricks to avoid this https://blog.cloudflare.com/black-lies/

        • eqvinox 3 days ago

          Cloudflare themselves gives more information here:

          > NSEC3 was a “close but no cigar” solution to the problem. While it’s true that it made zone walking harder, it did not make it impossible. Zone walking with NSEC3 is still possible with a dictionary attack.

          So, hardening it against enumerability is a question of inserting non-dictionary names.

    • yatralalala 2 days ago

      Zone transfers are super interesting topic. Thanks for mentioning that.

      It's basically the way how to get all DNS records a DNS server has. Interestingly in some countries this is illegal and in some this is considered best practice.

      Generally, enabled zone transfers is considered as misconfiguration and should be disabled.

      We did research on that few months back and found out that 8% of all global name servers have it enabled.[0]

      [0] - https://reconwave.com/blog/post/alarming-prevalence-of-zone-...

      • stwrzn 2 days ago

        That's concerning. I thought everyone knows that zone transfers should be generally disallowed, especially when coming from random hosts.

    • fulafel 3 days ago

      In practice it's not so far fetched: A zone transfer is just another dns query at the protocol level, i suppose you can conceptually view it as sending a file if you consider the dns response a file. Something like "host -t axfr my.domain ns1.my.domain" will show the zone depending on how a domain's name server is configured (eg in bind, allow-transfer directive can be used to make it public, require ip acl to match the query source, etc).

      • elric 3 days ago

        No sensible DNS provider has zone transfers enabled by default. OP mentioned using CloudFlare, and they certainly don't.

      • alexjplant 3 days ago

        > in bind, allow-transfer directive

        Configuring BIND as an authoritative server for a corporate domain when I was a wee lad is how I learned DNS. It was and still is bad practice to allow zone transfers without auth. If memory serves I locked it down between servers via key pairs.

  • dnsfax 3 days ago

    If you know what to query, sure. You can't just say "give me all subdomains"; it doesn't work that way. The subdomain was discovered via certificate transparency logs.

  • EQYV 3 days ago

    Question: How does a subdomain get discovered by a member of the public if there are no references to it anywhere online?

    The only thing I can think of that would let you do that would be a DNS zone transfer request, but those are almost always disallowed from most origin IPs.

    https://en.m.wikipedia.org/wiki/DNS_zone_transfer

  • govideo 3 days ago

    Ahh yeah, my internet network knowledge was never super strong, and now is rusty to boot. Thanks for your note.

  • paulnpace 3 days ago

    Shouldn't the web server only respond to a configred domain, else 404?

    • precommunicator 3 days ago

      Depends if it's configured like that, by default usually no

LinuxBender 3 days ago

As others have said, likely cert transparency logs. Use a wildcard cert to avoid this. They are free using LetsEncrypt and possibly a couple other ACME providers. I have loads of wildcard certs. Bots will try guessing names but like you I do not use easily guessable names and the bots never find them. I log all DNS answers. I assume cloudflare supports strict-SNI but no idea if they have their own automation around wildcard certs. Sometimes I renew wildcard certs I am not even using just to give the bots something to do.

  • govideo 3 days ago

    I have been just relying on CloudFlare's automatic https. But I will look into my own certs, though will likely just use CloudFlare's. I don't mind the internet knowing the subdomain I posted about; was curious how the bots found it!

oliwarner 2 days ago

Certificate Transparency would also be my guess. These are logs published by big TLS certificate issuers to cross-check and make sure they're not issuing certificates for domains they have no standing on.

The way around this is to issue a wildcard for your root domain and use that. Your main domain is discoverable but your subs aren't.

There are other routes: leaky extensions, leaky DNS servers, bad internet security system utilities that phone home about traffic. Who knows?

Unless your IP address redirects to your subdomain —not unheard of— it's not somebody IP/port scanning. Webservers don't typically leak anything about the domains they serve for.

govideo 3 days ago

Thanks for everyone's perspectives. Very educational and admittedly lots outside the boundaries of my current knowledge. I have thus far relied on CloudFlare's automatic https and simple instant subdomain setup for their worker microservice I'm using.

There are evidently technical/footprint implications of that convenience. Fortunately, I'm not really concerned with the subdomain being publicly known; was more curious how it become publicly known.

  • groestl 2 days ago

    I had to scroll pretty far down to see the first comment refering to the second most likely leak (after certificate transparency lists): Some ISP sold their DNS query log, and your's was in it.

    People buying such records do so for various reasons, for example to seed some crawler they've built.

ciaovietnam 3 days ago

There is a chance that your subdomain is the first/default virtual host in your web server setup (or the subdomain's access log is the default log file) so any requests to the server's IP address get logged to this virtual host. That means they didn't access your subdomain, they accessed via your server IP address but got logged in your subdomain's access log.

  • BrandoElFollito 3 days ago

    And this is the correct answer, thank you.

    Transparency logs are fine except if you have a wildcard cert (or no https, obviously).

    IP scans are just this: scans for live ports. If you do not provide a host header in your call you get whatever the default response was set up. This can be a default site, a 404 or anything else.

codazoda 2 days ago

This discussion makes me wonder, how hard is it to find a Google Document that was shared with "Anyone with the link"?

xg15 3 days ago

TIL (from this thread) : You can abuse TLS handshakes to effectively reverse-DNS an IP address without ever talking to a DNS server! Is this built into dig yet? :)

(Alright, some IP addresses, not all of them)

I also wonder if this is a potential footgun for eSNI deployments: If you add eSNI support to a server, you must remember to also make regular SNI mandatory - otherwise, an eavesdropper can just ask your server nicely for the domain that the eSNI encryption was trying to hide from it.

  • yatralalala 2 days ago

    Lifehack - it's especially awesome in cases where server operator is using self-signed certs / private cert authorities. Because you will not find these in public cert logs.

vince14 3 days ago

I'm having the same issue.

https://securitytrails.com/ also had my "secret" staging subdomain.

I made a catch-all certificate, so the subdomain didn't show up in CT logs.

It's still a secret to me how my subdomain ended up in their database.

  • selcuka 3 days ago

    They could be purchasing DNS query logs from ISPs.

  • johnklos 3 days ago

    Serious question: Do you really think that Cloudflare is trying to keep these kinds of thing private? If so, I'd suggest that's not a reasonable expectation.

    • fc417fc802 3 days ago

      Related question (not rhetorical). If you do DNS for subdomains yourself (and just use Cloudflare to point dns.example.com at your box) will the subdomain queries leak and show up in aggregate datasets? What I'm asking is if query recursion is always handled locally or if any of the reasonably common software stacks resolve it remotely.

      • immibis 2 days ago

        As well as assuming Cloudflare sells DNS lists, it's probably safe to assume the operators of public resolvers like 8.8.8.8, 9.9.9.9 and 1.1.1.1 (that is Google, Quad9 and Cloudflare again) are looking at their logs and either selling them or using them internally.

  • arccy 3 days ago

    maybe your server responded to a plain ip addressed request with the real name...

    • averageRoyalty 3 days ago

      Host header is a request header, not a response one, isn't it?

    • fc417fc802 3 days ago

      He said he used a wildcard cert though. So what part of the response would contain the subdomain in that case?

MacGyver101 2 days ago

Let me list some of the ways that precious subdomain could have been leaked

1) CZDS/DNS record sharing program

2) CT Logs

3) Browser SCT audit

4) Browser telemetry

5) DNS logs

6) DPI

7) Antivirus/OS telemetry

8) Virus/Malware/Tracker

9) Brute forcing DNS records

10) DNSSEC

11) Server softwares with AutoTLS

12) Servers screaming their hostnames over any protocol/banner thing

13) Typing anything on the browser search bar

14) Posting it anywhere

And many other novel ways I can't think of right now. I have successfully hidden some of my subdomains in the past but it definitely requires dedication. Simple silly mistakes can make all your efforts go waste. Ask any red/blue teamer.

Want to hide something? Roll everything on your own.

lockhead 3 days ago

Most likely passive DNS data, if you use your subdomain you do DNS queries for it. If you use a DNS server to resolve your domains that shares this data, it can be picked up by others.

perching_aix 2 days ago

Using the Certificate Transparency logs I'd imagine.

Also note that your domains are live as they're allocated (they exist). Whether a web server or anything else actually backs them is a different question entirely.

For "secret" subdomains, you'll want a wildcard certificate. That way only that will show on the CT logs. Note that if you serve over IPv4, the underlying host will be eventually discovered anyways by brute-force host enumeration, and the domain can still be discovered using dictionary attacks / enumeration.

Never touched Cloudflare so this is as far as I can help you.

andix 3 days ago

If a HTTPS service should be hard to discover, an easy way is to hide it behind a subdirectory. Something like https://subdomain.domain.example/hard_to_find_secret_string.

Another option are wildcard certificates.

This obviously can't be the only protection. But if an attacker doesn't know about a service, or misses it during discovery, they can't attack it.

8bitchemistry 3 days ago

Did you ever email the URL to somebody? We had the same issue years ago where google seemed to be crawling/indexing new subdomains it finds in emails.

  • govideo 3 days ago

    Nope, never emailed or posted to anyone. Just me (it's my solo project at the moment).

thedougd 3 days ago

Some CAs (Amazon) allow not publishing to the Certificate Transparency Log. But if you do this, browsers will block the connection by default. Chromium browsers have a policy option to skip this check for selected URLs. See: CertificateTransparencyEnforcementDisabledForURLs.

Some may find this more desirable than wildcard certificates and their drawbacks.

  • snailmailman 3 days ago

    Firefox is currently rolling out the same thing. They will treat any non-publicly-logged certificate as insecure.

    I’m surprised amazon offers the option to not log certificates. The whole idea is that every issued cert should get logged. That way, fraudulently-issued certs are either well documented in public logs- or at least not trusted by the browser.

    • fc417fc802 3 days ago

      It doesn't seem like the choice has any impact on that. It just protects user privacy if that's what they want to prioritize.

      Depending on the issuer logging all certs would never work. You can't rely on the untrusted entity to out themselves for you.

      The security comes from the browser querying the log and warning you if the entry is missing. In that sense declining to log a cert is similar to self signing one. The browser will warn and users will need to accept. As long as the vast majority of sites don't do that then we maintain a sort of herd immunity because the warnings are unexpected by the end user.

      • thedougd 2 days ago

        I should have included in my post, this technique only makes sense in the context of private or internal endpoints.

  • navigate8310 3 days ago

    To avoid subdomain discovery, I usually acquire certificate domain level and add a wildcard SAN.

1vuio0pswjnm7 2 days ago

Why not experiment with multiple variations. For example, as part of the experiment, run own DNS, use non-standard DNS encryption like CurveDNS, or even no DNS at all, use non-standard port for HTTPS, self-signed CA, TLS with no SNI extension, or even TCPCurve instead of CAs and TLS. If non-discoverability is the goal, there are inifinite ways to deviate from web developer norms.

If "the internet fails to find the subdomain" when using non-standard practices and conventions then perhaps "following the internet's recommendations", e.g., use Cloudflare, etc., might be partially at cause for discoverability.

Would be surprised if Expanse scans more than a relatively small selection of common ports.

zeagle 2 days ago

Can I ask an adjacent question? I have a bunh of DNS A name entries for locallyaccessedservice.mydomain.tld point to my 10.0.0.x NAS's nginx reverse proxy so I can use HTTPS and DNS to access them locally and via Tailscale. My cert is for *.domain.tld. It's nothing critical and only accessible within my LAN, but is there any reason I shouldn't be doing this from a security point of view? I guess someone could phish that to another globally accessible server if DNS changed and I wouldn't notice but I don't see how that would be an issue. There are a couple nginx services exposed to public but not those specific domains so I guess that is an attack vector since.

  • yatralalala 2 days ago

    As always, depends on your threat model. Generally having private IPs in public DNS is not great, because potential attacker gets "a general idea" how your private net looks like.

    But I'd say there's no issue if everything else is secured properly.

    • zeagle 2 days ago

      Great thank you. I've mulled around running separate reverse proxies for public and internal services instead.

arkfil 3 days ago

paloAlto (network devices like firewalls etc) is able to scan the sites that users want to visit behind their devices. these are very popular devices in many companies. users can also have agents installed on their computers that also have access to the sites they visit.

  • opello 3 days ago

    This is what I was thinking it must be, along the lines of Cisco NAC. Could monitor via browser plugin for full URLs or DNS server for domains.

    I imagine the certificate transparency log is the avenue, but local monitoring and reporting up as a new URL or domain to scan for malware seems similarly plausible.

supermatt 2 days ago

1) Are you sure that they are using the subdomain? They could be connecting via IP or an alternate host address.

2) Are you using TLS? Unless you are using a wildcard cert, then the FQDN will have been published as part of the certificate transparency logs.

daggersandscars 3 days ago

DNS query type AXFR allows for subdomain querying. There are security restrictions around who can do it on what DNS servers. Given the number of places online one can run a subdomain query, I suspect it's mostly a matter of paying the right fees to the right DNS provider.

mightybyte 2 days ago

If you've made any kind of DNS entries involving this subdomain, then congratulations, you've notified the world of its existence. There are tools out there that leverage this information and let you get all the subdomains for a domain. Here's the first one I found in a quick search:

https://pentest-tools.com/information-gathering/find-subdoma...

alberth 3 days ago

This site will find any subdomain, for any domain, so long as it previously had a certificate (ssl/tls)

https://crt.sh/

  • averageRoyalty 3 days ago

    This is incorrect (or at least only technically correct). This is only true for subdomains with public, trusted CA signed certificates since certificate transparency has existed and only for subdomains with a specific, non wildcard certificate.

  • socrateslee 3 days ago

    https://crt.sh can find your subdomain only when it doesn't have a wildcard certificate(*.somedomain.com)

  • govideo 3 days ago

    Thanks for mentioning. I checked it out, and am learning lots of new stuff (ie, realize how much I do not know).

  • nvarsj 3 days ago

    Doesn’t find any of my semi secret subdomains.

AtNightWeCode 2 days ago

Assuming this is not direct traffic to your IP people will say it is because of TLS logs. Maybe it is in your case. But if you spin up a CF worker on a subdomain to it you will also get hit by traffic immediately. And those certificates are wildcards. I think CF leaks subdomains in some cases. Never seen this behavior when using CF just as a DNS server though.

itscrush 2 days ago

> I am using CloudFlare for my DNS.

Based on this it sounds like you exposed your resource and advertised it for others. Reverse dns, get IP, scan IP.

Probably simpler, you exposed resource on IPV4 publicly, if it exists, it'll be scanned. There's probably 100s of companies scanning entire 0.0.0.0/0 space at all times.

melson 3 days ago

Someone might used open-source tool like sublist3r

  • rawbytes 3 days ago

    oh yes that for sure

  • ackbar03 3 days ago

    yea was gonna mention this as well lol

CGamesPlay 2 days ago

Be careful with these. I had a subdomain like this (completely unlisted) with a Google OAuth flow on it, using a development mode Google app. Somehow, the domain was discovered, and Google decided that using their OAuth flow was a phishing scam, and delisted my entire toplevel domain as a result!

  • yoavm 2 days ago

    What do you mean "careful with these"? With subdomains?

    • CGamesPlay 2 days ago

      Yes, unlisted subdomains. I updated my post to be clearer.

      • joshstrange 2 days ago

        I must be missing something. What does “unlisted” mean in this context?

        I have plenty of subdomains I don’t “advertise” (tell people about online) but “unlisted” is a weird thing to call those. Also I don’t see how it would matter at all when it comes to Google auth.

        My guess is they blocked it based on the subdomain name itself. I made a “steamgames” subdomain to list stream games I have extra copies of (from bundles) for friends to grab for free. Less than a day after I put it up I started getting chrome scare pages. I switched it to “games” and there have been no issues.

eat 2 days ago

DNS enumeration (brute force) with a good wordlist, zone transfer, or leaking the name through a certificate served when accessing your host via IP address are all possibilities.

The name "userfileupload" is far from not-obvious, so that would be my guess.

OuterVale 3 days ago
  • govideo 3 days ago

    Interesting! Just checked them out.

    "MerkleMap gathers its information by continuously monitoring and live tailing Certificate Transparency (CT) logs, which are operated by organizations like Google, Cloudflare, and Let's Encrypt. "

    • Eikon 3 days ago

      I made this, thank you!

TZubiri 2 days ago

Maybe it's a cloudflare controlled scanner?

Maybe you published the subdomain in a cert?

Snooped traffic is unlikely.

This is a good question, if you don't publish a subdomain, scanners should not reach it. If they do, there's a leak in your infra.

Saris a day ago

>I am using CloudFlare for my DNS.

Could have been discovered from the SSL cert request for the subdomain.

ThePowerOfFuet 3 days ago

Others are saying CT logs but my own subdomains are on wildcard certificates, in which case I suspect they are discovered by DPI analysis of DNS traffic and resold, such as by Team Cymru.

bashwizard 3 days ago

Like people have said already; Certificate Transparency logs.

There are countless of tools to use for subdomain enumeration. I personally use subfinder or amass when doing recon on bug bounty targets.

fsckboy 3 days ago

LPT, this is an object lesson in the weakness of security through obscurity

  • andix 3 days ago

    Security by obscurity can be a great additional measure for an already secure system. It can reduce attack surface, make it less likely to get attacked in the first place. In some cases (like this one) it can also be much easier to break than expected.

  • bangaladore 3 days ago

    I mean you could argue that this is more of a multi-factor authentication lesson.

    Just knowing 1 "secret"— a subdomain in this case —shouldn't get you somewhere you shouldn't.

    In general you should always assume that any password has been (or could be) compromised. So in this case, more factors should be involved such as IP restricting for access, an additional login page, certificate validation, something...

ralferoo 3 days ago

If you're using HTTPS, then you're probably using letsencrypt and so your subdomain will appear on the CT logs that are publicly accessible.

One thing you could do is use a wildcard certificate, and then use a non-obvious subdomain from that. I actually have something similar - in my set up, all my web-traffic goes to haproxy frontends which forward traffic to the appropriate backend, and I was sick of setting up multiple new certificates for each new subdomain, so I just replaced them all with a single wildcard cert instead. This means that I'm not advertising each new subdomain on the CT list, and even though they all look nominally the same when visiting - same holding page on index and same /api handling, just one of the subdomains decodes an additional URL path that provides access to status monitoring.

Separately, that Palo Alto Networks company is a real pain. They connect to absolutely everything in their attempts to spam the internet. Frankly, I'm sick of even my mail servers being bombarded with HTTP requests on port 25 and the resultant log spam.

spl757 3 days ago

Does the IP address for that subdomain have a DNS PTR record set? If it does, someone can discover the subdomain by querying the PTR record for the IP.

  • govideo 3 days ago

    If it does, I did not set it up; it would have been automatically done by CloudFlare when I told it to use my custom subdomain for the upload urls.

immibis 2 days ago

Additionally to what other people said, you can assume Cloudflare is selling lists of DNS names to someone.

f4c39012 3 days ago

CSP headers can leak urls, but I assume that isn't the cause here if the subdomain is an entirely separate project

Gabrys1 2 days ago

> Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple

So my guess is reverse DNS

nusl 3 days ago

It's pretty common to bruteforce subdomains of a domain you might be interested in, specially by attackers.

aspbee555 2 days ago

cloudflare uses certificates with numerous other site names included on the certificate as alt names so your site name could have been discovered by any other site that happens to use that same cert

clvx 2 days ago

Put it behind ipv6 and it won’t likely happen again. The address space is massive

bbarnett 2 days ago

If you ever email a link and it hits gmail, Google will index it.

3oil3 3 days ago

What happens if you google your subdomain? Maybe the bots have some sort of dictionary files and they just run them, and when there is a match, then they append it with some .html extension, or maybe they prepend it to the match as a subdomain of it?

_trampeltier 3 days ago

Did you send a link over Email, Whatsapp or something like?

pagealert 21 hours ago

By tightening DNS, server, and firewall configurations, you can minimize exposure of your internal subdomains to bots.

pagealert 21 hours ago

The discovery of your unpublished subdomain by bots likely stems from a combination of technical factors related to DNS, server configuration, and bot behavior. Here's a breakdown of the possible reasons and solutions:

1. DNS Leaks or Wildcard Records Wildcard DNS Entries: If your main domain (sampledomain.com) has a wildcard DNS record (e.g., .sampledomain.com), any subdomain (including userfileupload.sampledomain.com) could be automatically resolved to your server’s IP. Even if the main domain is inactive, the wildcard might expose the subdomain.

Exposed Subdomain DNS Records: If the subdomain’s DNS records (e.g., A/CNAME records) are explicitly configured but not removed, bots could reverse-engineer them via DNS queries or IP scans.

Fix: Remove or restrict wildcard DNS entries and delete unused subdomain records from your DNS provider (e.g., Cloudflare).

2. Server IP Scanning IP-Based Discovery: Bots like Expanse systematically scan IP addresses to identify active services. If your subdomain’s server is listening on ports 80/443 (HTTP/HTTPS), bots may:

Perform a port scan to detect open ports. Attempt common subdomains (e.g., userfileupload, upload, media) on the detected IP to guess valid domains. Fix:

Block unnecessary ports (e.g., close port 80/443 if unused). Use a firewall (e.g., ufw or Cloudflare Firewall Rules) to reject requests from suspicious IPs. 3. Cloudflare’s Default Behavior Page Rules or Workers: If the subdomain is configured with Cloudflare Workers, default error pages, or caching rules, it might generate responses that bots can crawl. For example:

A 404 Not Found page with a custom message could be indexed by search engines. Worker scripts might inadvertently expose endpoints (e.g., /_worker.js). Fix:

Delete unused subdomains from Cloudflare’s DNS settings. Ensure Workers/routes are only enabled for intended domains. 4. Reverse DNS Lookup IP-to-Domain Mapping: If your server’s IP address is shared or part of a broader range, bots might reverse-resolve the IP to discover associated domains (e.g., via dig -x <IP>).

Fix:

Use a dedicated IP address for sensitive subdomains. Contact your ISP to request removal from public IP databases. 5. Authentication Flaws Presigned URLs in Error Messages: If the subdomain’s server returns detailed error messages (e.g., 403 Forbidden) when accessed without authentication, bots might parse these messages to infer valid endpoints or credentials.

Fix:

Customize error pages to show generic messages (e.g., "Access Denied"). Log and block IPs attempting brute-force access. How to Prevent Future Discoveries Remove Unused DNS Records: Delete the subdomain from Cloudflare’s DNS settings entirely. Disable Wildcards: Avoid .sampledomain.com wildcards to limit exposure. Firewall Rules: Block IPs from scanners (e.g., Palo Alto Networks, Expanse) using Cloudflare’s DDoS Protection or a firewall. Monitor Logs: Use tools like grep or Cloudflare logs to track access patterns and block suspicious IPs. Use Authentication: Require API keys, tokens, or OAuth for all subdomain requests. Example Workflow for Debugging bash # Check Cloudflare DNS records for the subdomain: dig userfileupload.sampledomain.com +trace

# Inspect server logs for recent requests: grep -E "^ERROR|DENY" /var/log/nginx/access.log

# Block Expanse IPs via Cloudflare Firewall: # 1. Go to Cloudflare > Firewall > Tools. # 2. Add a custom rule to block IPs (e.g., from scaninfo@paloaltonetworks.com). By tightening DNS, server, and firewall configurations, you can minimize exposure of your internal subdomains to bots.

artursapek 3 days ago

presumably it has a DNS record