Posted: 6 Min ReadExpert Perspectives

Sample Results From Processing a Large Feed of Shady Covid-Type Domains

More on the Covid Threat Web Space

Introduction

There are a multitude of new sites with Covid-19 related names now in existence, and many vendors and individuals are producing lists of “shady covid domains” these days, which they distribute through various channels as a public service to help combat spam, scams, phishing, and malware attacks.

These feeds range from simple lists of newly registered domains with "covid" or "corona" or similar patterns in the name, to lists that have been processed by an AI or Machine Learning (ML) system that takes other characteristics of the registration/hosting into account, and attempts to weed out the likely legitimate sites as False Positives (FPs).

Many of our customers then submit those lists to Symantec, with a request to either “please block all of these threats!” or “please check all of these.” This is part of our WebPulse ecosystem, where we track Web reputation, threats, and shady behavior, and send that data to various Symantec products.

There are a multitude of new sites with Covid-19 related names now in existence, and many vendors and individuals are producing lists of “shady covid domains” these days, which they distribute through various channels as a public service to help combat spam, scams, phishing, and malware attacks.

Of course, no one – whether the original list creator or the Symantec Enterprise WebPulse team – can actually visit all of the sites (which totaled over 130,000 when I last saw an estimate). And even if someone did (briefly) visit each site, they would be faced with the daunting task of deciding, for those sites with Covid-19 related content, if the site is actually legitimate, or merely apparently legitimate.

Instead, we run these lists through our big AI/ML systems, to see what they think. (And we do manually visit quite a few of the sites, to see what we think.) This report will share some results that we think are representative of the lists in general.

(This is part of our contribution to the cyberthreatcoalition.org which is a group of thousands of researchers and volunteers working in this space. Check it out – among other things, they provide a free, well-vetted blocklist of malicious domains.)

Executive Summary

The results here come from an in-depth look at one of the large lists, but they are consistent with results we’ve seen from other lists, so we feel this is a good basic view of the current “Covid Web Threat Space”.

All of the big lists we have checked contain a relatively small number of actual threats, a slightly higher amount of False Positives (FPs – legitimate sites wrongly flagged as shady), and a huge number of other domains that aren't really being visited much.

WebPulse's AI generally agrees that these latter ones should be flagged as Suspicious as a precaution, as the WebPulse dynamic voting system is already flagging most of them with elevated Risk Levels.

Note: A vetted blocklist is available HERE 

Details

A more-or-less random block of 2000 shady domains was selected as a test set. (Basically, the block I happened to be checking at the time, and had the thought to spend more time to gather some formal statistics.) This list has been produced by a Machine Learning (ML) system, so in theory it should be mostly free of FPs.

Here is how the list of 2000 domains broke down:

Def. FPsLikely FPsTPs (evil)TPs (shady)Leftovers
19 7 30 1785 149

(1) FPs: In other words, we have 19/2000 marked in our DB as pretty definite FPs (legit sites) –  that is unless there is a really sneaky Bad Guy behind it, or it gets hacked – and another 7 that are probably legit, but would take more time to investigate and decide. For now, I consider them to be legit sites.

Typical FPs include: legitimate “covid/corona/etc.” domains that are trying to be helpful; sites selling Covid-19 T-shirts; subdomains on various government, university, and corporate domains that are set up for their employees or customers; unrelated existing domains for things like “Coronado” and “Corona” [beer]; and attempted typosquat/brand-abuse domains registered by MarkMonitor and similar services on behalf of their corporate clients.

(Min) FP% = 1.3%   //this is a minimum percentage (keep reading)...

(2) TPs: There were 30 “True Positive” domains which already carried a WebPulse database category indicating a “high confidence bad site” (Malware, Phishing, Spam, or Scam). These are the kinds of sites that people are warning you to watch out for when they publish their lists in the first place.

(Min) Malicious TP% = 1.5%   //again, this is a minimum

Well, that doesn’t look very good so far, as we have nearly equivalent FP and TP rates – meaning that if you decided to block every site on the list, you would basically be flipping a coin on each one as to whether it was keeping you safe or getting in your way.

(3) Shady TPs: However, there were a huge number of “shady” domains (1785/2000), or 89.25%, where we have a Suspicious or Placeholder (or both) category assigned in the DB. Many of these may be intended for shady use, but it's hard to say how many. In any case, we would certainly vote to block them until further evidence comes in.

So, if we add the “shady” ones to the “evil” ones, we have a much better TP percentage:

Total TP% = 90.75%

It’s also worth noting that the most common type of site in all of these lists, by far, is our category of “Placeholders” (i.e., either a “parking page” from the site host, or a basic “congratulations on your new site!” page, etc.).

One subgroup are the “squatters” who register domains and hope to be able to sell them to someone else later for a hefty markup. (Given the absurd number of domains in these lists, I suspect their dreams of wealth are greatly overblown in this case.)

However, most of the Placeholders are not the “hey, want to buy this domain?” type of pages, but more generic ones, and some of these could be domains that a Bad Guy is holding in reserve for a future attack. Either way, no harm if you decide to block these for now.

Leftovers

(4) What about the 149 sites marked as “Leftovers”? These sites aren’t in the DB, so they are being rated by WebPulse in real-time when someone visits them, via its “Dynamic Voting System”. And I thought it would be interesting to provide a snapshot of what that looks like:

  • 11 “Probably Shady” (fairly strong negative vote)
  • 20 “Maybe Shady” (weaker negative vote)
  • 66 “Neutral” (mixed votes)
  • 52 “Probably Benign” (weak positive vote)

Of interest to our final FP percentage estimation, there are 15 of these 149 with high enough traffic levels, and weak positive votes, that are are likely to be benign. Adding these 15 to the 36 FPs above, we end up with 51 likely FPs:

FP% (revised estimate) = 2.5%

...with an unknown number of additional FPs from among all of the low-traffic, low-evidence ones we decided to flag as “Shady TPs” above. (As a “guesstimate” I’d say at least 5%, maybe as high as 10%. Note also that the FP% is expected to rise over time, as more of the Placeholder domains get populated with the owner’s intended legitimate content.)

But for now, since somebody else's ML system says they're probably shady, and our AI/ML systems essentially agree, why not flag them to keep our customers safer?

Conclusion

In going through tens of thousands of domains from these lists, and looking at the opinions of WebPulse’s Dynamic Voting System, I consider it to be doing a darn good job of dynamically identifying the shadiest domains – good candidates to be added to our Database as Suspicious.

This includes thousands and thousands of Placeholder domains – by far the most common subtype – which are also worth adding to the DB.

But there are also a lot of Good Guys out there: well-meaning cyber citizens who want to do something positive as a contribution to helping us all through the pandemic. There are dozens of sites which crunch the statistics or present maps, often focusing on their particular corner of the world. There are hundreds of “wash your hands and cover your coughs” sites, relaying reliable advice about how to prevent (or slow) the spread of Covid-19, and describing the symptoms which should prompt you to see testing.

To keep helping our customers we will keep sorting through the lists, and separating out the legitimate useful sites that are trying to do useful and positive things, from the sites that are clearly up to no good, and a massive group in between of sites that (for now) are basically just sitting there.

Symantec Enterprise Blogs
You might also enjoy
6 Min Read

COVID-19 Outbreak Prompts Opportunistic Wave of Malicious Email Campaigns

Spammers, scammers, and other threat actors quick to take advantage of global panic surrounding coronavirus outbreak

Symantec Enterprise Blogs
You might also enjoy
2 Min Read

Malicious Android Apps Exploit Coronavirus Panic

Symantec found almost a dozen Android apps that pretended to be monitoring the Covid-19 outbreak but were actually infected with malware.

About the Author

Chris Larsen

Architect, Research Engineer- Symantec

Chris Larsen has decades of software development, natural language processing, and machine learning experience. At Symantec, he’s an Architect and Research Engineer on the WebPulse threat research team, and a long-time security blogger.

Want to comment on this post?

We encourage you to share your thoughts on your favorite social platform.