You rely on analytics and metrics every day in digital marketing. But what if they were not accurate? What if they weren’t accurate because they lacked basic cybersecurity and were easily tampered with? This is what I have witnessed for the last two decades, SMH (shaking my head), and the current tech used in programmatic advertising and related analytics is no better.
Not just errors in measurement, but active tampering with metrics
Let me start with an example. Did you know attackers can easily write false data into every version of Google Analytics before version 4, which was introduced in October 2020? (Most haven’t upgraded to GA4). Why would anyone want to tamper with your Google Analytics? For starters, traffic sellers can tamper with GA to make it appear they delivered traffic to a site that purchased traffic from them, when no actual traffic was sent, thus saving time, resources, and bandwidth and increasing profit margins. Adtech companies selling remarketing services can tamper with GA to claim credit for ecommerce purchases that had already occurred, even though none of it was “caused” by their remarketing efforts. How’d they do that? Simple. They take a 20-pageview visit that resulted in a purchase and insert a 21st pageview at the beginning of that sequence to make it appear that the visit came from a click on a remarketing ad, when no ads were even run; in this way, they altered a 20-pageview “direct” visit to look like a 21-pageview visit that came from remarketing. This is why some remarketing programs that appear too good to be true are indeed too good to be true. They appear so awesome because the analytics were falsified. (Performance marketers, don’t believe everything your vendor tells you; in fact, don’t believe ANYTHING your vendor tells you. You’re not actually getting 10,000% RoAS from these campaigns.)
Most sites have not upgraded to GA4 which has basic security measures. Data from @deepsee.io shows that in the top 1 million domains, about half of them use GA and about 1 in 4 (23% =18.1%+4.8%) have GA4; if we look into the long tail, across 5.4 million root domains, about 19% (1 in 5) have GA4. So 4 in 5 are still vulnerable to false data being written into their older Google Analytics.
In the last article I talked about errors and mistakes in measurement. Data showed that between 1/3 to 3/4 of ads never served, downloaded into the device, or rendered on screen, despite the ad buyer paying for the bids won. Would you simply trust the accuracy of the numbers reported by the ad platform you buy ads from, without any means to independently verify the data yourself? Of course, some of those errors were “unintentional,” but obviously you should be more careful with important stuff like metrics that determine what you pay for, right? See: Advertisers’ Own Ad-Server-of-Record — Why and How? In this article I am not talking about errors, but active tampering of the metrics and the prevalence and the ease with which that occurs, due to the lack of basic security mechanisms.
Lack of basic cybersecurity and the ease of tampering with metrics
As far back as the beginning of digital advertising, ad buyers relied on metrics like traffic counts to determine which sites they wanted to buy ads from (high traffic ones) and how much they would pay. So clever fraudsters used simple techniques to inflate traffic counts. This was as simple as taking the same comScore beacon and sticking it on hundreds of satellite sites, in order to inflate the traffic of a single main site. The lack of a basic security mechanism — checking which domain the beacon was actually installed on — meant that fraudsters could trivially manipulate comScore traffic counts. Thousands of advertisers relied on comScore traffic counts and paid millions of dollars for their data, services, and reports without realizing how inaccurate the data was because of how easily it was falsified. In the “programmatic era” (since 2012), brand new websites with no actual traffic, could easily falsify their own Google Analytics to make it appear they had tons of traffic, in order to trick ad exchanges into letting them onto the exchange. As you can already see, the lack of basic cybersecurity and the ease of tampering with metrics have wide ranging effects, not least of which is making criminals’ lives very easy for the last two decades of digital advertising.
Turning our attention to the tech used in programmatic advertising, we see the same lack of security mechanisms and basic checks that would have prevented the most basic and trivial forms of fraud. Take domain spoofing, for instance. This is where a fake site can “say” it is some other recognizable domain in order to get bids. All they have to do is falsify the domain they write into the bid request — i.e. fakesite123.com can simply write marthastewart.com into the bid request. (This is also how breitbart gets around your domain block lists). Any layperson with a shred of common sense would ask why don’t they check that the domain written into the bid request is the domain from which the bid was actually requested. Right?? Right???? But, alas, header bidder code doesn’t even do this basic check and domain spoofing remains rampant. Look at the quantities of faked bid requests in the table below (marked as “unknown or unpermissioned”) for top domains and apps — nearly 400 billion faked bid requests pretending to be yahoo.com, and another 200 billion for mail.yahoo (who still uses Yahoo Mail?) on a 30-day basis. Similarly, if the ad exchanges checked that the sellerID matched the domain, they’d be able to easily pick out domain spoofing; but alas, why lop off 50% of your own volume, when no one is complaining or even asking the exchange to check this?
Further, because everything in the bid request is declared (not detected), any fraudster can say anything they wanted in any of the variables. That means the data is easily falsified and unreliable. Any fraud detection tech that uses this unreliable data as input is also completely useless. Every vendor selling pre-bid fraud detection is essentially selling you snake oil. They may check that the domain in the bid request is not a known fraudulent domain; but fraudsters pass mainstream publishers’ domains by lying in the bid request. The vendor may check for IP addresses that are known to be part of botnets; but fraudsters rotate IP addresses, dump ones that are blacklisted and get new ones, or route traffic through proxy services to disguise the IP addresses. Pre-bid detection vendors may even check if a cookie were from a known bot; but bots simply dump those non-working cookies and get new ones to easily avoid this detection. So pre-bid fraud filtering is like bringing a sieve to try to plug a dam burst. I can’t think of any other way to put it kindly. A well known fraud detection firm claims to look at 15 trillion bid requests per week; when I asked them how many they actually catch and filter, they refused to answer. (hint: it’s because they let most of it through). Don’t waste your money on pre-bid fraud detection. It is entirely useless because the input data from the bid request is entirely declared, unreliable and easily falsified.
Anything declared should be treated with an abundance of skepticism.
What if they tell you they do some verification?
Cool, cool. But WHAT are they verifying and is it useful verification? In many cases, it’s not useful verification. Let me show you an example from over the years, in mobile advertising. Some platforms will say they check for mobile deviceIDs to ensure ads are served to mobile devices. But did they check if the deviceIDs were real ones? On the left side of the chart below, fraudsters can easily pass random deviceIDs and get ads to serve. If the platform or fraud detection vendor just checks for the presence of a deviceID, or doesn’t check if the IDs are real, then random IDs can get through. What if they did check the deviceIDs were real, by verifying there was a real phone number and SIM card associated with it? Fraudsters can actively get around these checks by replaying real deviceIDs that were previously verified to be real. Basic security checks that the deviceID being passed in the bid request is the same as the deviceID from which the call was made would have prevented this form of fraud.
The same thing happened with ads.txt. Exchanges checked for the PRESENCE of ads.txt files, but they didn’t check the CONTENTS of the ads.txt files and cross reference the data to sellers.json data. If they did, then half of the bid requests would be discarded for “no match” or “mismatch.” No exchange is going to cut off half of their own volume, if no buyers are complaining or even asking them about this.
What if the verification vendors were stupid?
Am I being unkind to verification vendors? No. I am not being unkind to verification vendors if they are actually exposing themselves — i.e. labeling something as a bot or not directly on the page, in the code, or in the bid request with no encryption or obfuscation. The bid request or code on page literally says IVT=1 or IVT=0. This means the bad guys can easily A/B test their bots against these verification vendors’ tech to make sure they all get marked IVT=0 (not a bot) and the ad gets served. Would you call this basic lack of basic security anything but stupid? I don’t think they can claim it’s an “oopsies” after ten years of selling millions of dollars of fraud detection services fraudulently to ad buyers; the tech didn’t and doesn’t work.
What kind of stupid would you call the screen shot above from the VAST ad serving standard: calling static images like firstQuartilePixel.gif, midpointPixel.gif, thirdQuartilePixel.gif or completePixel.gif to indicate how much of the video ad was watched? Which pixel do you think fraudsters would call to make it appear that all of the video ads were viewed to completion? Right, the completePixel.gif. Why do you think so many video ads appear to be viewed to completion? The metrics were falsified to look great because there was no basic security mechanism to prevent anyone from doing this. It is trivial for criminals to falsify all of these metrics. Yes, some parties do use other methods to verify — like the completePixel.gif should not be called 1 second after a 30 second video ad started playing — clearly it was not completely viewed. But how many verification vendors are doing this? Ask them to prove it to you.
Basic security measures are essential to ensuring the accuracy of the metrics that we rely on in digital advertising; not enough are doing these basic things. See: Why Does Digital Marketing Appear to Perform So Well and Fraud Appear So Low?
The vast weakness of unencrypted urls and obfuscated code
How do you get greater than 100% click throughs on ads? Bots copying the click through url and just repeatedly loading it, without any further ad impressions being served. These urls are plain text and all the variables like “utm_source=” can be read, copied, and falsified. Bad guys are literally mocking us by deliberately passing “utm_source=netflix.com” when Netflix does not run any programmatic ads. How do bad guys commit fraud at scale without even setting up websites, running botnets, or paying for bandwidth to serve actual webpages? They simply make naked ad calls. The following code block is a naked ad call for one of my own ads from my experimental campaign. Copy and paste it into any browser (make sure ad blocking is off) and you caTagsn load just my ad, with no corresponding webpage. Fraudsters can save a lot of bandwidth and time, by not having to load webpages; their bots just load the ads and get paid the CPM. Note in the code below, you can see the domain being passed — grunge.com. This ad never actually ran on that site; you just invoked it by pasting a url into your browser. This was possible because the url had no encryption and everything was in the clear. This makes it trivial for bad guys to do this kind of fraud.
Every affiliate link is an unencrypted url with unobfuscated parameters. Every click tracker url is in the clear, plain text. Mobile fraudsters ripped off Uber to the tune of millions of dollars by falsifying the data going into attribution platforms. They copied off the click through urls and repeatedly loaded them so they would be the “last click” and get credit for the cost per install. The examples of insecure ad tech can go on forever. I will end with this example. Fraudsters are so optimized now, they don’t even need to make naked ad calls like the above. They can simply construct bid requests using python scripts, flood the exchange endpoints, and make money. Five separate cases illustrate this form of large-scale fraud over the years: 1) Sports Bot in 2017 was no bot at all, just billions of faked bid requests saying they came from the domains of famous sports teams and sports leagues, 2) 404bot in 2020 was no bot at all, just trillions of faked bid requests that passed urls of non-existent webpages (404 error in server speak), 3) researcher C documented a vast botnet replaying click through urls going to an automotive site without running any ads, 4) researcher D found fraudsters generating high frequency faked bid requests sent to CTV endpoints, rotating amongst thousands of CTV app names, streaming device types, and residential IP addresses, and 5) researcher E tricking ad servers into serving ads by constructing entirely falsified bid requests in header bidder code, complete with IVT=0 (not a bot) and viewability=100% (viewability as 100%, yay!).
“organized crime syndicates are harvesting money from digital advertising every year as easily as turning on a faucet”
The lack of the most basic cybersecurity mechanisms have made ad fraud a veritable walk in the park for criminals from script kiddies to organized crime syndicates, harvesting money as easily as turning on a faucet. I’ll end by saying that I won’t be mean to the hard working folks who work in ad tech. They mean well, but it’s hard to fix a 10-yr old plane that is still in-flight. There is so much legacy code, new engineers literally have no idea where to even look for the problems, let alone fix them without crashing the plane entirely. They are also too busy putting out daily fires they don’t have the luxury of time to go back and do a code review for functionality, let alone security.
So What?
So, what should you do? Don’t assume the metrics have any security mechanisms in place; ask the vendors to show you and prove it to you. Don’t assume the data has not been tampered with; bad guys have both the motive and the means to easily tamper with unsecured data and unencrypted urls and parameters. Double check yourself; ask more questions; use common sense to see if things make sense.
And use an analytics platform with security, like FouAnalytics, if you want. See: Cybersecurity Measures Built Into FouAnalytics And serve your own ads, so you control the source of truth and have enough supporting data to understand if the metrics are reasonably correct because basic security mechanisms are in place to prevent tampering. See: Advertisers’ Own Server-of-Record, Operated by FouAnalytics
Discussion about this post