Vote fraud theory: Hammer, Scorecard, and NY Times json files

John Henderson

2020-11-15 22:31

background

I've been presented with another theory of fraud in the 2020 Presidential election. Have a look at the video, which compares snapshots of the election results as extracted from NY Times .json files to "prove" that votes were switched and lost. This is done via some python code we'll take a look at below. The video walks through this Natural News post in a very lively and shout-splaining fashion. Bold claims are made, including:

NY Times .json files are analogous to a "Flight Data Recorder (black box)" when it comes to being official transaction logs of election night results
over 500k votes were switched from Trump to Biden
Bannon and Giuliani have confirmed whistleblowers from Dominion, a ballot counting software company (that was on 11/11; the pattern is always the same: dangle some mind-blowing tidbit and then we never hear about it again? Nevertheless, for weeks I'll hear people citing "those Dominion whistleblowers they found")
a fraud simulator (spoiler: it just switches key states when you toggle between fraud and no fraud scenarios, with no explanation of how it's calculating anything at all)
Trump and DoD personnel watched this fraud happen in real time, have it recorded, and allowed the fraud to happen so they'd have proof (but still filed lawsuits in five states, just to withdraw from two)

In any case, we'll be focused on the data analysis and short answer: no, there's nothing here folks.

explanation

The gist of the theory is simple:

the NY Times stores the data from its live election night updates in json files
a person on the internet downloaded those and found irregularities
because the files must be ground truth, tied directly to ballot counting machines via infallible link, these irregularities imply fraud

There are two key legs this theory stands on. The first is a huge leap, in my opinion, that the NY Times data is to be treated as ground truth for anything. The news page adds some handwaving "truth boost" to this that the data is from Edison Research, also used by many news organizations… but again, where do they get it? Why do we think that Edison Research somehow equals "wired into the mainframe" here? Other sources on election night were wrong and had to be corrected, why are these magical .json files infallible?

Even the author of this code admits to fatal flaw:

nil

Just in case it's .json files that somehow raise credibility, if you aren't familiar they are literally just another way to store data. There's nothing magical about them. If the analysis had downloaded all the Times' Excel or .csv files, the situation would be identical.

The second key premise is that these findings are at all meaningful or indicative of something specific. Let's look at the example used in their primer section:

nil

The data is as follows, with columns added for candidate votes (total votes * %):

time	total votes	trump, %	trump, votes	biden, %	biden, votes
2020-11-04 01:51:26	573857	0.578	331689	0.401	230117
2021-11-04 01:51:52	574417	0.568	326269	0.406	233213
delta	560	-0.01	-5420	0.005	3096

Assuming the first premise (the perfect truth of files used for news article visualizations), this does look odd. In ~30 seconds, the total votes went up, but somehow one of the candidates lost ~10x that many, and the other gained ~5x that many, with the net exchange being way off from the change in total votes.

To apply this to all the data, someone wrote code which reads through all these snapshots and looks for these instances, tallies them all up, and spits out summary values per state, leading to these kinds of magnitudes across various states:

nil

Is it true? At least the presenter repeatedly welcomed folks to try for themselves, so I did.

Before we move on, I want you to try something. You now know the angle this theory takes and have seen some concrete numbers. In Pennsylvania, votes were "switched" and "lost" to the tune of 220k and 940k, respectively. In your mind, or better yet on paper, please draw a plot of y=votes and x=time might look like for a state with 6.86 million total votes (3% switched, 13% lost during election night).

digging in

I'd already been playing with voting data pulled from the Times, so it was a relatively simple matter to get all the timestamped files vs. the state and county results. I'm leaving some code out for brevity (like purging non-states from the dict of abbreviations used as states below), but this will get you close.

import data.us_state_abbrev as states
import json
import pandas as pd
import plotnine
import requests

from plotnine import *

### get all NY times data files
base_url = 'https://static01.nyt.com/elections-assets/2020/data/api/2020-11-03/race-page'
for state in states.us_state_abbrev.keys():
    state_code = state.replace(' ', '-')
    with open('./data/pres_ts_2020_nyt/{}.json'.format(state), 'wb') as f:
	f.write(requests.get('{}/{}/president.json'.format(base_url, state_code.lower())).content)

Next, we run through all the files and grab metadata and each snapshot, along with total votes and candidate percentages at that time:

def proc_results(state):
    with open('./data/pres_ts_2020_nyt/{}.json'.format(state), 'r') as f:
	dat = json.loads(f.read())

    results = [{
	'state': r['state_name'],
	'time': t['timestamp'],
	'votes_tot': t['votes'],
	'cand': c,
	'perc': v}
	for r in dat['data']['races'] for t in r['timeseries'] for c, v in t['vote_shares'].items()]

    return results

results = [proc_results(state) for state in states.us_state_abbrev.keys()]
results = [pd.DataFrame(r) for r in results]
df = pd.concat(results)

Lastly, we do some post-processing and add calculated columns.

df['time'] = pd.to_datetime(df['time'])
df['votes'] = (df['votes_tot'] * df['perc']).astype(int)
df['i'] = df.groupby(['state', 'cand'])['votes'].transform(lambda x: range(len(x)))
df = df.sort_values(['state', 'cand', 'time'])
df['votes_perc'] = df.groupby(['state', 'cand'], as_index=False)['votes'].transform(lambda x: x/x.max())
print(df)
###
       state                      time  votes_tot    cand   perc   votes  votes_perc   i
3    Alabama 2020-11-04 01:23:07+00:00       1053  bidenj  0.585     616    0.000731   0
5    Alabama 2020-11-04 01:28:39+00:00       2190  bidenj  0.602    1318    0.001563   1
7    Alabama 2020-11-04 01:29:35+00:00       3709  bidenj  0.489    1813    0.002150   2
9    Alabama 2020-11-04 01:32:46+00:00       4280  bidenj  0.474    2028    0.002405   3
11   Alabama 2020-11-04 01:40:51+00:00       5249  bidenj  0.476    2498    0.002963   4
..       ...                       ...        ...     ...    ...     ...         ...  ..
92   Wyoming 2020-11-11 20:35:54+00:00     276630  trumpd  0.700  193641    0.999819  46
94   Wyoming 2020-11-11 20:36:41+00:00     276648  trumpd  0.700  193653    0.999881  47
96   Wyoming 2020-11-11 20:37:31+00:00     276679  trumpd  0.700  193675    0.999995  48
98   Wyoming 2020-11-11 20:37:50+00:00     276681  trumpd  0.700  193676    1.000000  49
100  Wyoming 2020-11-11 21:13:55+00:00     276765  trumpd  0.699  193458    0.998874  50

If you haven't used plotnine, I highly recommend it. It's the python equivalent to ggplot2 in R, and I just don't think I'll ever be able to live without facet plots again.

p = ggplot(df, aes(x='time', y='votes', color='cand', group='state+cand')) + geom_line() + facet_wrap('~state', ncol=10)
p = p + scale_color_manual(breaks=['bidenj', 'trumpd'], values=['blue', 'red'], guide=False)
p = p + scale_y_continuous(name='delta')
p = p + scale_x_datetime(name='timestamp')
p = p + theme_minimal() + theme(text=element_text(family='Hack'),
				plot_background=element_rect(fill='white'),
				axis_text_x=element_text(angle=315, hjust=0))

nil

What do you see? Per our earlier exercise, when I did this last night, despite already thinking NYT json reliability was suspect, this certainly didn't look like I expected. The video is 18min of yelling the incontrovertable fact that this shows election theivery! I see… a bunch of dips followed by recovery and then all is well? Let's pick two and take a closer look:

nil

Hmmm. Let's look at the area around zero.

df_sub = df.loc[df['state'].isin(['Florida', 'Pennsylvania'])]
print(df_sub[df_sub['state']=='Pennsylvania'].head())
print(df_sub[df_sub['state']=='Florida'].head())
print(df_sub[df_sub['votes'] == 0])
print(df_sub.groupby('state', as_index=False).agg({'time': 'min', 'votes': 'first'}))
###
# first 5 rows for FL
      state                      time  votes_tot    cand   perc   votes  i  votes_perc
3   Florida 2020-11-04 00:02:01+00:00     560771  bidenj  0.428  240009  1    0.045334
5   Florida 2020-11-04 00:02:42+00:00     844914  bidenj  0.434  366692  2    0.069263
7   Florida 2020-11-04 00:03:26+00:00    1004171  bidenj  0.435  436814  3    0.082507
9   Florida 2020-11-04 00:05:44+00:00    1457441  bidenj  0.467  680624  4    0.128559
11  Florida 2020-11-04 00:07:05+00:00    1515932  bidenj  0.461  698844  5    0.132001

# first 5 rows for PA
	   state                      time  votes_tot    cand   perc  votes  i  votes_perc
2   Pennsylvania 2020-11-04 00:19:27+00:00         77  bidenj  0.571     43  1    0.000013
5   Pennsylvania 2020-11-04 00:30:20+00:00          0  bidenj  0.000      0  2    0.000000
6   Pennsylvania 2020-11-04 00:42:45+00:00         11  bidenj  0.636      6  3    0.000002
9   Pennsylvania 2020-11-04 00:59:15+00:00          0  bidenj  0.000      0  4    0.000000
10  Pennsylvania 2020-11-04 01:03:44+00:00      64535  bidenj  0.799  51563  5    0.015074

# entries with votes==0
	  state                      time  votes_tot    cand  perc  votes  i  votes_perc
1       Florida 2020-11-04 06:43:00+00:00          0  bidenj   0.0      0  0         0.0
0       Florida 2020-11-04 06:43:00+00:00          0  trumpd   0.0      0  0         0.0
5  Pennsylvania 2020-11-04 00:30:20+00:00          0  bidenj   0.0      0  2         0.0
9  Pennsylvania 2020-11-04 00:59:15+00:00          0  bidenj   0.0      0  4         0.0
1  Pennsylvania 2020-11-04 09:25:23+00:00          0  bidenj   0.0      0  0         0.0
4  Pennsylvania 2020-11-04 00:30:20+00:00          0  trumpd   0.0      0  2         0.0
8  Pennsylvania 2020-11-04 00:59:15+00:00          0  trumpd   0.0      0  4         0.0
0  Pennsylvania 2020-11-04 09:25:23+00:00          0  trumpd   0.0      0  0         0.0

# earliest time in each state
	  state                      time
0       Florida 2020-11-04 00:02:01+00:00
1  Pennsylvania 2020-11-04 00:19:27+00:00

Honestly, who knows, but at this point we can see a few things:

the data is in order of increasing time
the first entries have non-zero votes
later, both states have votes=0 entries
even so, each plot recovers, and near the same pre-dip total and in the same relative ranking (Biden or Trump ahead pre/post-dip)
the i column matches the order of the data; I added it before sorting by time. Note that in both cases the i=0 timestamp is not the earliest in the data. In the PA case, the first entry in the series (i=0) has a timestamp 9 hours later than the earliest timestamp

How those zeros got entered with later times, I have no idea, but if you do any data analysis or write code, this is not that surprising. Again, this is data used for a news site graphic. Maybe they started collecting data from Edison's API (and again, this is not to say that Edison is ground truth vote source), and someone noticed all their plots were missing a starting point where votes=0 so they manually entered it. Or the endpoint went down and logged a zero entry because it failed to collect any data? We don't even know if total votes and candidate percents were calculated with respect to the same source data.

What if we just treat votes=0 as a fluke and remove those rows (2 for FL, 6 for PA)? I don't suppose you drew a plot of what 220k switched votes and 940k lost votes looks like, did you? I'm guessing it isn't this:

nil

And if we leave in the zeros and just plot the data in the order of the snapshots, here's what we see. Again, I doubt this matches anyone's intuitive interpretation of "Trump lost 940k votes during the PA election."

nil

the code

Now, what's the code behind this, anyway? Well, again, while I admit I think this theory is absolutely preposterous, kudos for sharing the source the code. If you know python, I'm sorry nightmares are inevitable for you tonight:

def findfraud(NAME):
    with open(NAME + '.json', encoding="utf8") as f:
	x = json.load(f)
    TotalVotesLostTrump = 0
    TotalVotesLostBiden = 0
    TrumpToThird = 0
    TrumpToBiden = 0
    BidenToTrump = 0
    ThirdToTrump = 0
    ThirdToBiden = 0
    BidenToThird = 0
    TotalVotesLostThird = 0
    series = x["data"]["races"][0]["timeseries"]
    for i in range(len(series)):
	    thirdPartyNow = series[i]["votes"] * (1 - series[i]["vote_shares"]["bidenj"] - series[i]["vote_shares"]["trumpd"])
	    thirdPartyThen = series[i-1]["votes"] * (1 - series[i-1]["vote_shares"]["bidenj"] - series[i-1]["vote_shares"]["trumpd"])
	    TrumpNow = series[i]["votes"] * series[i]["vote_shares"]["trumpd"]
	    TrumpThen = series[i-1]["votes"] * series[i-1]["vote_shares"]["trumpd"]
	    BidenNow = series[i]["votes"] * series[i]["vote_shares"]["bidenj"]
	    BidenThen = series[i-1]["votes"] * series[i-1]["vote_shares"]["bidenj"]
	    if i != 0 and TrumpNow < TrumpThen and (TrumpThen - TrumpNow) > (0.00049999 * series[i]["votes"]) + 50:
		if BidenNow > BidenThen or thirdPartyNow > thirdPartyThen:
		    if TrumpNow - TrumpThen <= BidenNow - BidenThen or TrumpNow - TrumpThen <= thirdPartyNow - thirdPartyThen:
			print ("(TRUMP")
			print ("Index : " + str(i) + " Past Index : " + str(i-1))
			print (TrumpNow - TrumpThen)
			TrumpLostNow = TrumpThen - TrumpNow
			TrumpLostTotal = TrumpThen - TrumpNow
			if BidenNow > BidenThen and TrumpNow - TrumpThen <= BidenNow - BidenThen:
			    if BidenNow - BidenThen > TrumpLostTotal:
				TrumpToBiden += TrumpLostTotal
				TrumpLostTotal = 0
			    else:
				TrumpToBiden += BidenNow - BidenThen
				TrumpLostTotal -= BidenNow - BidenThen
			if thirdPartyNow > thirdPartyThen and TrumpNow - TrumpThen <= thirdPartyNow - thirdPartyThen:
			    if thirdPartyNow - thirdPartyThen > TrumpLostTotal:
				TrumpToThird += TrumpLostTotal
				TrumpLostTotal = 0
			    else:
				TrumpToThird += thirdPartyNow - thirdPartyThen
				TrumpLostTotal -= thirdPartyNow - thirdPartyThen
			if TrumpLostNow < 0:
			    TrumpLostNow = 0
			TotalVotesLostTrump += TrumpLostNow - TrumpLostTotal
			print ("TRUMP)")

That's just one of the three main logic branches. There's two others for Biden and third party votes just like it. I admit I don't follow all of this. Reading someone else's code, particularly heavily nested and inefficient like this is tedious. The gist is comparing timeseries indices for changes in vote counts, and trying to infer who they went to by comparing the shifts in other parties as well, hence printing out this summary at the end:

print (str(str(TotalVotesLostTrump)  + " TRUMP LOST"))
print (str(TrumpToBiden) + " Trump to Biden")
print (str(TrumpToThird) + " Trump to Third")
print (str(str(TotalVotesLostBiden)  + " BIDEN LOST"))
print (str(BidenToTrump) + " Biden to Trump")
print (str(BidenToThird) + " Biden to Third")
print (str(str(TotalVotesLostThird)  + " 3RD PARTY LOST"))
print (str(ThirdToBiden) + " Third to Biden")
print (str(ThirdToTrump) + " Third to Trump")
if BidenToTrump > TrumpToBiden:
    print (str(BidenToTrump - TrumpToBiden) + " TRUMP")
elif TrumpToBiden > BidenToTrump:
    print (str(TrumpToBiden - BidenToTrump) + " BIDEN")

There's another function, lostvotes() that looks to do something similar, tabulating all the lost votes per candidate. So, what do they show? I modified them to point to my data directory and ran it on Pennsylvania, which is a fun one due it's massive numbers.

findfraud('Pennsylvania')
lostvotes('Pennsylvania')

# findfrand()
88769.8330000007 TRUMP LOST
49718.495999999344 Trump to Biden
39051.33700000135 Trump to Third
27413.24899999972 BIDEN LOST
7159.232000000076 Biden to Trump
20254.016999999643 Biden to Third
223140.98500000124 3RD PARTY LOST
216223.19600000113 Third to Biden
6917.789000000117 Third to Trump
42559.26399999927 BIDEN

# lostvotes()
Index : 2 Past Index : 1
-77
Index : 4 Past Index : 3
-11
Index : 48 Past Index : 47
-239804
Index : 56 Past Index : 55
-586189
-187542.07999999996 TRUMP
-613071.5320000001 BIDEN
-25467.387999999948 THIRD
-826081

Returning to the video for a moment, we have the following quote at 7:55:

We know that this election was stolen over the net in real time. This list is organized by
voting systems. And now they're going to highlight some of the swing states. In
Pennsylvania they recorded 220,883 votes that switched to Joe Biden. Trump lost 941,248
votes.

findfraud() shows votes going all over the place. I suppose one would rationalize this away by saying that fraud is somewhat of a shell game, and having votes go from Trump to Biden and Biden to Trump would help hide the activities. Assuming lostvotes() is the summary situation, though, Biden lost ~3x the votes that Trump did. Again, I applaud the invitation to run the code, but with the rave reviews for this incredible, rock solid finding in the YouTube comments, I'm guessing no one actually did.

Even the original author of this work (PedeInspector) disagrees with this interpretation, despite it being already passed along publicly by Trump.

nil

My assessment at this point is that this code just blindly runs some calculations, without actually verifying what it's finding or whether that seems sane. For example… seeing the indices with the biggest losses at 48 and 56 in the data, let's go look at them.

print(df[df['state']=='Pennsylvania'].iloc[45:50])
	    state                      time  votes_tot    cand   perc   votes   i  votes_perc
92   Pennsylvania 2020-11-04 02:11:01+00:00    1106477  bidenj  0.643  711464  46    0.207993
94   Pennsylvania 2020-11-04 02:13:11+00:00    1111586  bidenj  0.641  712526  47    0.208303
96   Pennsylvania 2020-11-04 02:14:32+00:00     871782  bidenj  0.592  516094  48    0.150877
98   Pennsylvania 2020-11-04 02:14:56+00:00     877724  bidenj  0.590  517857  49    0.151393
100  Pennsylvania 2020-11-04 02:16:43+00:00     888907  bidenj  0.586  520899  50    0.152282

print(df[df['state']=='Pennsylvania'].iloc[52:57])
	    state                      time  votes_tot    cand   perc   votes   i  votes_perc
106  Pennsylvania 2020-11-04 02:18:59+00:00    1288475  bidenj  0.677  872297  53    0.255012
108  Pennsylvania 2020-11-04 02:19:33+00:00    1288604  bidenj  0.677  872384  54    0.255037
110  Pennsylvania 2020-11-04 02:21:59+00:00    1325632  bidenj  0.664  880219  55    0.257327
112  Pennsylvania 2020-11-04 02:22:45+00:00     739443  bidenj  0.627  463630  56    0.135540
114  Pennsylvania 2020-11-04 02:23:32+00:00     779178  bidenj  0.629  490102  57    0.143279

Taking a look at this anther way, we can just calculate the diff() on our votes column, and look for the biggest changes:

df = df.sort_values(['state', 'cand', 'i'])
df['delta'] = df.groupby(['state', 'cand'])['votes'].transform(lambda x: x.diff())
print(df[(df['state']=='Pennsylvania') & (df['delta'] < -10000)])
print(df[(df['state']=='Pennsylvania') & (df['delta'] < -10000)]['delta'].sum())
###
# biggest swings in votes
	    state                      time  votes_tot    cand   perc    votes    i  votes_perc     delta
96   Pennsylvania 2020-11-04 02:14:32+00:00     871782  bidenj  0.592   516094   48    0.150877 -196432.0
112  Pennsylvania 2020-11-04 02:22:45+00:00     739443  bidenj  0.627   463630   56    0.135540 -416589.0
97   Pennsylvania 2020-11-04 02:14:32+00:00     871782  trumpd  0.399   347841   48    0.103760  -42325.0
103  Pennsylvania 2020-11-04 02:17:03+00:00     774021  trumpd  0.301   232980   51    0.069497 -127916.0
113  Pennsylvania 2020-11-04 02:22:45+00:00     739443  trumpd  0.363   268417   56    0.080068 -145180.0
376  Pennsylvania 2020-11-04 04:08:51+00:00    2984522  trumpd  0.560  1671332  188    0.498553  -17876.0

# total "lost votes"
-946318.0

Does this fit the narrative at all? Biden is the source of 600k of those 900k lost votes… but the quote above presents Trump as having lost almost 1M votes in PA. How about the other swing states? lostvotes() returned all zeros for Arizona, Georgia, Michigan, and Wisconsin. How about findfraud()?

findfraud('Arizona')
findfraud('Georgia')
findfraud('Michigan')
findfraud('Wisconsin')

# Arizona
4492.284999999916 TRUMP LOST
532.839999999851 Trump to Biden
3959.445000000065 Trump to Third
11413.710000000196 BIDEN LOST
4917.905999999959 Biden to Trump
6495.804000000237 Biden to Third
36773.55299999989 3RD PARTY LOST
35530.15799999998 Third to Biden
1243.3949999999095 Third to Trump
4385.066000000108 TRUMP

# Georgia
20215.65600000025 TRUMP LOST
6956.512000000046 Trump to Biden
13259.144000000204 Trump to Third
5456.885000000213 BIDEN LOST
3530.7619999998715 Biden to Trump
1926.1230000003416 Biden to Third
98070.76399999915 3RD PARTY LOST
81906.734999999 Third to Biden
16164.02900000017 Third to Trump
3425.7500000001746 BIDEN

# Michigan
16807.160000000382 TRUMP LOST
9395.204000000726 Trump to Biden
7411.955999999656 Trump to Third
24250.137000000395 BIDEN LOST
11981.314000000828 Biden to Trump
12268.822999999567 Biden to Third
149435.23900000064 3RD PARTY LOST
124803.46400000044 Third to Biden
24631.775000000212 Third to Trump
2586.1100000001024 TRUMP

# Wisconsin
0 Trump to Biden
390.70799999998417 Trump to Third
13927.016999999993 BIDEN LOST
5800.282999999821 Biden to Trump
8126.734000000171 Biden to Third
59287.08899999898 3RD PARTY LOST
50930.62899999939 Third to Biden
8356.459999999588 Third to Trump
5800.282999999821 TRUMP

Are you seeing a pattern here? No matter how we slice and dice this thing, there's just no substance. In three of four cases, Biden lost more votes to Trump than the other way around (and I think the print out at the bottom is the net result, so Trump came out net beneffiting from this). Ironically, the one where Biden supposedly had an advantage was Georgia which was the closest of the four.

Here's all 50 states again, with the votes=0 entries removed:

nil

Is it everything you imagined?

a final angle

There's a more subtle issue going on here.

inputs are % share and total votes
for each candidate, we take as true both inputs to caluclate candidate votes
this implies that we have discovered the true votes behind the scenes
with each new update, the deltas are being calculated based on reported votes and percentages… while the whole premise of this calculation is that we've discovered a more accurate truth

In essence, the ability to infer that the data is manipulated/false, and yet continue to rely on it for subsequent calculations is contradictory. Let's return to the concrete example used from Michgan:

time	total votes	trump, %	trump, votes	biden, %	biden, votes
2020-11-04 01:51:26	573857	0.578	331689	0.401	230117
2021-11-04 01:51:52	574417	0.568	326269	0.406	233213
delta	560	-0.01	-5420	0.005	3096

Indeed, the data makes no sense here and there's no way to resolve the conflict. We're supposed to have a devicit of 3096-5420=-2324 votes above, but we're positive 560. Think of total votes like your net worth, and Trump and Biden as the % contributions from two of your bank accounts. I've just handed you a statement that shows that your net worth went up by $560, but individually your accounts are down by $2300. Is this proof of theft? Are you actually richer, or poorer? You can't actually say unless you can account for all the dollars, indluding the negative ones.

Also realize that the code above used negative values as the catch condition. But how do we differentiate between legitimate positive votes and "corrected" positive votes? Absent of all of this, the best off the cuff hunch you could have is "the account that lost the most probably did worse than the other," right? Except that's not what we find. I've grouped by state and candidate to add up all the "losses" that appear in the data, and plotted vs the final candidate percent (last data point in each timeseries).

Biden "lost" move votes by far, and he also won the states in which these losses occurred.

df_agg = df.groupby(['state', 'cand'], as_index=False).agg({
    'delta': [lambda x: x[x>0].sum(), lambda x: x[x<0].sum()],
    'perc': 'last',
    'votes': 'max',
    'votes_tot': 'last'})
df_agg.columns = ['state', 'cand', 'pos', 'neg', 'perc', 'votes_c', 'votes_tot']

p = ggplot(df_agg, aes(x='perc', y='neg', color='cand')) + geom_point()
p = p + scale_color_manual(breaks=['bidenj', 'trumpd'], values=['blue', 'red'], guide=False)
p = p + scale_y_continuous(name='sum of negative vote changes')
p = p + scale_x_continuous(name='ending candidate % votes')
p = p + theme_minimal() + theme(text=element_text(family='Hack'),
				plot_background=element_rect(fill='white'),
				axis_text_x=element_text(angle=315, hjust=0))

nil

Viewed another way, we can show cumulative "lost" votes vs. time for both candidates by state. As is no surprise, I don't think this fits the narrative.

nil

conclusion

This theory strikes me as the kind of thing wide eyed kids do when they've just had the epiphany that maybe with magnets in just the right positions they could create a perpetual motion machine. A bit of sketchy code on the internet pumped out numbers that said what people wanted to hear, and now this is holy fact etched in stone.

Take a moment to recall what you were actually presented with, not what you thought you saw, or hoped you saw:

someone told you that New York Times data files were equivalent to official, government records, functioning "like a flight recorder black box"
code you don't understand and never looked at ran some if/else logic and proclaimed that votes were switched and were lost between candidates
these lost votes were presented as having moved from Biden to Trump

Every idea you're presented with is a transaction. Someone's pitching you something in return for your "mental bet." With ideas on the internet, I think it feels like the bet is cheap: "Well, if this is wrong, I still know the overall story is true, even if this thing didn't manage to show it." If you bet wrong, what's the cost? Just move on to another theory, and no one will be tracking you down to pay up on all the comments, tweets, and posts re-circulating this information you ended up being wrong about.

Now, what if someone was selling you this story in exchange for a million dollars to save our country from an aggregious attack against democracy? I mean, aren't those what the stakes really are here? If this is true, soemthing criminally awful happened. Given what you've seen, will you step up materially and tangibly for the cause? Will you find a million dollars, no matter the personal cost, to save our country? If not, consider not betting with your neurons so cheaply, or at least taking a harder look before doing so.