To estimate how much it will cost to close the digital divide in broadband access we need two pieces of information: the cost to serve each location, and how many locations there are. In this post, I’m going to come up with a proof-of-concept estimate of the cost to serve any location; in a future post—which will likely be more interesting—I’ll use those two numbers to estimate how far broadband funding might go.
The FCC has a cost model that estimates build-out costs for broadband deployment, but it isn’t public information. To get access, you need to sign a protective order which stipulates you’re using the data for a filing in a specific old FCC docket. So I don’t have access to the data. What I can do is use bits of the data that is public and reverse engineer it to cover all locations. To be clear, I wish I could use the real data. That cost model would be hugely valuable to states and advocacy groups looking to contribute to the broadband deployment conversation. But here we go.
As part of the FCC’s RDOF reverse auction, they published reserve prices for each eligible census block group. This is the price below which they were willing to commit funds for broadband buildout. As detailed in this post, the reserve prices imply a cost of $26.5 billion over ten years for deployment to 5.3 million eligible locations. (To be clear, to my knowledge no one has confirmed the RDOF reserve prices are the same as the cost model, but it seems logical they are similar.) But using the most recently Form 477 data, about half of the currently unserved housing units were not eligible for RDOF, and thus we don’t have estimate of the cost to serve them (in the form of their reserve price). There’s a RDOF reserve price for less than 25% of the underserved housing units.
To estimate the cost to serve a location I use a simple model where price per location is a function of the density of the census block, the distance from that census block to the nearest census block served by either cable or fiber, an indicator for urban or rural, and state fixed effects to capture differences in cost in each of the states. To be sure, there are other datasets that would likely be useful in reverse engineering the cost, including the National Land Cover Database, elevation datasets, even soil types.
I’ve made a couple of grievous modeling sins. There’s covariance between my independent variables and these relationships might not be linear, but most importantly I’m making predictions outside the scope of the original dataset. For example, I’m using reserve prices from a program for unserved areas to predict costs in underserved areas.
Let’s look at the model results. I held out a data set to test the model against. For these Census blocks, we have the RDOF reserve price and the predicted RDOF reserve price. Here is the distribution of the difference between the predicted and actual cost.
Looking at the predicted and actual RDOF reserve price for underserved and unserved is also instructive. In the first image below, for “unserved” areas only, we see the model skews slightly less expensive than the reserve prices. This makes sense: if a census block wasn’t in RDOF, it was probably excluded for being lower cost. For underserved areas, the model comes in a little higher than the RDOF predictions. Underserved areas are well the model will preform the worst: RDOF reserve prices didn’t cover underserved. We’re stretching the predictive power of the RDOF reserve prices into underserved areas.
A map is a nice way to visualize it. The first map is the RDOF reserve prices. The second map adds in the modeled reserve prices for all unserved and underserved areas. It focuses on the areas south and east of Pittsburgh leading to the Appalachian Mountains.
Future posts can use this model to make estimates at the cost to serve underserved and unserved areas. But they’re just that—estimates. Stay tuned.
You describe your cost model as being estimated at the Census block level, but aren't the reserve prices, which I gather is what you use as your dependent variable, only available at the block group level? If so, how did you use the reserve prices in a regression where the unit of observation is a Census block?