Complex network of United States migration

Charyyev, Batyr; Gunes, Mehmet Hadi

doi:10.1186/s40649-019-0061-6

Research
Open access
Published: 24 January 2019

Complex network of United States migration

Batyr Charyyev¹ &
Mehmet Hadi Gunes¹

Computational Social Networks volume 6, Article number: 1 (2019) Cite this article

5397 Accesses
20 Citations
Metrics details

Abstract

Economists and social scientists have studied the human migration extensively. However, the complex network of human mobility in the United States (US) is not studied in depth. In this paper, we analyze migration network between counties and states in the US between 2000 and 2015 to analyze the overall structure of US migration and yearly changes using temporal analysis. We aggregated network on different time windows and analyzed for both county and state level. Analyzing flow between US counties and states, we focus on the migration during different periods such as economic prosperity of the housing boom and economic hardship of the housing bust. We observed that nodes at county and state level usually remain active, but there are considerable fluctuations on links. This indicates that migration patterns change over the time. However, we could identify a backbone at both county and state levels using disparity filter. Finally, we analyze impact of the political and socioeconomic factors on the migration. Using gravity model, we observe that population, political affiliation, poverty, and unemployment rate have influence on US migration.

Introduction

Research has shown that there is a correlation between economic growth and net migration as people seek opportunities [1]. People typically move to places with greater economic opportunity than they were located. Post-World War II to 1980s, the United States (US) economy was driven by manufacturing and people were inclined to move areas of high manufacturing [2]. Since larger population areas usually had more manufacturing based jobs, migration was towards areas of high population [3]. In addition, people typically moved from North to South and East to West.

As factories have been relocated to countries with less manufacturing costs, there has been a steady decline of manufacturing jobs in the US since the 1980s. As US economy has become primarily driven on services and technology, the migration patterns have changed [4]. As new jobs are not centralized anymore, people are less likely to move in the last three decades than they have been historically. In some cases, depopulation of urban centers has been witnessed. As technology and service-based jobs are not concentrated in certain areas, people do not need to head to hubs.

Common migration patterns can be observed during times of economic prosperity such as booms and bubbling and economic hardship such as recessions. During economic prosperity, financially secure people move to places with desirable living location and movement to certain hubs of job opportunities also increased [5]. On the other hand, during economic hardship, migration patterns are chaotic as people scatter towards different places and leave areas, which were once economically prosperous [6].

Certain factors pull people to certain locations and pulling factors include affordable housing, attractive climate, better employment opportunities, and family ties [7]. Other factors push people from their current location and pushing factors include higher taxes, unemployment, natural disasters, and low chances of marriage.

While researchers have analyzed international migration patterns as a complex network, to our knowledge, US migration patterns have not been analyzed as a complex network. Studies of international migration networks found out that (i) country migration network has a power law degree distribution; (ii) the world has become more interconnected and shows small world network characteristics; and (iii) there are communities among the countries, where people migrate.

In this paper, we analyze the network structure of US migration at county level and state level between 2000 and 2015. First, we analyze common network properties. We aggregate the data at different time windows to analyze emergence and robustness of network properties over the time. Then, we look at dynamic properties of the network. We analyze evolutionary dynamics at the node and link levels at different timescales and identify the structural backbone of migration in the US. Finally, we inspect political and socioeconomic factors affecting migration, and evaluate factors using gravity model of migration. In an earlier study [8], we analyzed the migration between 2004–2008 focusing on political aspects.

In the rest of the paper, we summarize related work in “Related work”, present our methodology in “Methodology”, perform detailed analysis of the US migration in “U.S. migration network”, analyze temporal network characteristics in “Network dynamics”, rank counties and states in “Central counties and states”, and analyze political and socioeconomic factors in “Migration factors”. Finally, we concluded in “Conclusions”.

Related work

In this section, we provide an overview of related work on analysis of complex migration network, dynamic temporal networks, and socioeconomic factors affecting migration.

Researchers have analyzed interregional and international migration patterns. Kemper [9] analyzes migration between Eastern and Western Germany before and after unification. Niedomysl [10] analyzes interregional migration of Sweden focusing on how demographic, socioeconomic, and geographical aspects determine residential preferences. Davis et al. [11] analyzes global migration focusing on the correlation of migrant destination choice along with historical, cultural, and economic factors. Fagiolo and Mastrorillo [12] study correlation of migration and international trade from complex network perspective. Tranos et al. [13] analyzes international migration using gravity model and network-based regression techniques. Authors reveal the importance of physical and cultural proximity in migration along with the existence of both pull effects of prosperity and push effects of young population. Aleskerov et al. [14] analyzes central countries in the international migration. The common objectives in these analyses are: how the structure and topology of the network evolve with time? Is there clustering between specific countries or regions? Which countries or regions are central in the migration network? These studies show the importance of socioeconomic, geographical, and political factors in shaping the migration network structure.

To analyze different social and economic factors, most studies use gravity model for migration. These studies focus on factors like infant mortality rate, education quality, political view, and income growth. Rayp and Ruyssen [15] extends gravity model of immigration focusing on economic and demographic factors in sub-Saharan African migration. Authors found that growth prospects and opportunities for employment and education are the main factors for migration rather than sociopolitical circumstances. Bergh et al. [16] analyzes effect of institutional quality and poverty on migration using a gravity model. They observed that migration decision depends on the expectations about future income levels and poor institutions act as a push factor, but absolute poverty in origin country limits migration. Similarly, [17] examined internal migration flows in New Zealand using the gravity model. Authors reveal that the deterrence effect of distance on migration increased in all periods until the 1996–2001 period and then decreased slightly in the 2001–2006 period. Furthermore, improvements in connectivity through reduced travel time have not increased migration flows.

As migration networks are not static, it is important to analyze the dynamic properties of the system to reveal important patterns. Holme and Saramäki [18] presents measures of dynamic networks including representation of dynamic networks as static graphs and models of temporal networks. Some studies have analyzed similar temporal networks. For instance, [19] analyzes dynamics of link utilization using records of communication in a large social network. They observed that roles of nodes change dramatically from day to day, and hence, authors assert that interventions targeting hubs will have significantly less effect than previously thought. Gautreau et al. [20] analyzed US airport network and observed that the links that disappear have similar properties as the ones that appear, and the disappearing and appearing links have a weight that is low on average but broadly distributed. They also reveal that links between airports with different traffic are very volatile. Bajardi et al. [21] examines dynamic patterns of cattle trade movements. They aggregate cattle trade movement data into different time windows and study temporal properties of the network. Authors observed that the nodes and links forming the backbone strongly vary depending on the time window and that the memory of the backbone rapidly fades away from one snapshot to the successive one. They also found that centrality of nodes fluctuate strongly over the time which hinders assessment of the spreading potential of premises.

Methodology

Data collection

We collected migration data among US counties between 2000 and 2015 from the Internal Revenue Service (IRS) data [22]. As each citizen needs to report their income and pay taxes, IRS collects addresses of citizens every year. IRS records address changes and provide county-level summaries, which we can use to calculate number of people migrating from one county to another each year.

We determined political affiliation of each county from the votes each party obtained during the presidential elections. We retrieved presidential election results for each county from the United States Census, every 4 year when the presidential elections happen [23]. Note that local election or congressional election results would provide a better picture of political inclinations of people in a county at every 2 year, but those results were not readily available.

We also gathered the unemployment and poverty rate data from the US Department of Agriculture [24] and federal expenditure and crime rate from the US Census [23].

Data cleaning

To ensure consistency, collected data needed some cleaning. For the data between 2000 and 2012, IRS applied a threshold of 20 to report migration from one county to the other. Hence, if less than 20 people migrated between two counties, it was not reported in the data. However, between 2013 and 2015, the threshold was increased to 40. Therefore, we increased the threshold to 40 for all periods. In addition, there was a change in the code of six counties. For example, Shannon County, SD with a county code of 46–113 changed its name to Oglala Lakota County, AD with a county code of 46–102. We updated entries for these six counties to account for changes in county codes. Moreover, we removed outlier data points in four counties of Kansas, as one had a very high weighted in-degree, whereas the other three had high weighted out-degrees. These values were higher than county populations and cancelled each other. Hence, we removed these four data points as a reporting anomaly. Finally, poverty rate data were not available for 2000–2003. Thus, we excluded these 4 years from analyses of socioeconomic factors affecting migration.

Network construction

The collected data are cleaned and analyzed by constructing migration network between counties. We used 2010 FIPS State and County codes to match entries in different data sets.

In the migration network, nodes represent counties, and directed and weighted edges represent the population that has migrated from one county to another. Immigration and emigration are represented by in-degree and out-degree, respectively. We find the migration flux of a county by dividing the migration of the county by its population for each year. In our analyses for each snapshot, we only considered nodes with an in-degree or out-degree greater than zero, and named them active nodes, where only active edges are present in the network.

Aggregating the data over various times, window ∆t enables us to create static snapshots that we can analyze with the usual techniques of network theory. In particular, we can create 16/∆t consecutive snapshots, as we have 16 years of data. By analyzing different values of ∆t, we can identify network properties that are prone to changes with the value of ∆t. In addition to ∆t = 1, we considered ∆t = 2 years and ∆t = 4 years which produce 8 and 4 aggregated networks, respectively. These periods enable us to analyze the effects of local elections with ∆t = 2 years and presidential elections with ∆t = 4 years. The 4 year periods also correspond to the housing boom between 2004 and 2007 and recession between 2008 and 2011.

U.S. migration network

In this section, we analyze networks aggregated at different timescales and levels (county and state) to identify properties that remain stable or change across the time and levels. Network size and characteristics depend on the time window ∆t. Table 1 summarizes the basic properties of networks for county and state levels. Results show that the US migration network has a small world network characteristic, as each network at each ∆t has a high clustering coefficient and a low average path length. We observe that some of the 3112 counties are not part of the migration network for every year. Over 15 years, 71 counties have incoming migration of less than 40 people and 57 counties have outgoing migration of less than 40 people, while 47 of the counties are common in both sets. Network characteristics are similar for each ∆t within the network level, but differ between the county and state. As ∆t timescale increases, the number of edges increases by an order magnitude, but the number of nodes does not change much. This indicates that there is not much activation/deactivation of nodes, as they are active most of the time.

Table 1 US migration network characteristics

Full size table

The number of edges and average weighted degree is lowest in 2014, while the average path length is the highest in both county and state levels (see Additional file 1: Table S1 and Additional file 2: Table S2). This indicates among the analyzed years, the least migration among counties occurred in 2014. Comparing the housing boom, i.e., 2004–2007, and housing bust, i.e., 2008–2011, periods, we observe that people migrated between more counties during the housing boom than the housing bust with similar overall network characteristics. Migration seems to have reduced between 2013 and 2015.

High modularity of the networks indicates that there are communities in each of the network. In addition, the assortativity of networks are slightly positive indicating a slight assortativity for all years with 2014 being the lowest. The assortative correlation between nodes indicates that people slightly prefer to move between counties of similar size rather than between counties of different sizes, i.e., large to small or small to large. State-level network has larger assortativity and smaller modularity compared to the county level, as it has a much denser network.

Analyzing migration over the 2000–2015 period, we detected 18 communities among counties using [25] algorithm implemented in gephi [26]. Additional file 3: Figure S1 presents a graph of the US migration network, where nodes represent US counties and are colored based on communities they belong. We observe that counties with the highest incoming population belong to the same community if they are geographically close by. Los Angeles, CA is in the same community with Clark, NV and Maricopa, AZ along with other counties in CA, whereas the top NY and TX counties are in their own community.

At the state level, we observe similar communities even when different periods are analyzed, as shown in Fig. 1, for years of 2000 and 2015. Additional file 4: Table S4 presents the communities of each state for each year. We observe that for each of the 16 networks, there are five communities among the states, but the community sets differ with each period, as some of the states belong to different communities during different years. For instance, some of the Midwest states are part of different communities. We observe that states are not segregated based on the political affiliation, but are clustered based on the geographic closeness.

We observe that 54.5% of edges are between counties of different states, whereas 29.3% of migration from the state is interstate (see Additional file 5: Table S3). That is, significant majority of migration is within the state, while there is a greater variation in the destination county in interstate migration. The only year, where majority of edges is within the state, is 2014 which indicates that there was less incentive to migrate to another state.

These results confirm that people migrate to geopolitically similar regions as reported in earlier studies of [10, 11]. In addition, US migration network has similar characteristics across the analyzed time periods, with some variation between the years that will be further analyzed.

An important characteristic to summarize a network is the degree distribution. Figure 2 presents the degree distribution for US County Migration network. We observe that overall shape of the degree distribution is an exponential distribution, and their distributions are similar across different time scales and periods (others not shown). We observe that, in general, people migrate from a larger number of counties than the number of counties they migrate to, and the difference increased during housing bust. Overall, the spread between in and out migration is higher during the housing boom, indicating an imbalanced flow among counties, whereas it is narrower during the housing bust, indicating a balanced flow to/from a county. Over the 15-year duration, the migration total of a county is sometimes greater than its population in the year 2000 (see Additional file 6: Figure S2-A). The counties with smaller inflow generally lose greater number of its population. In addition, migration to and from most of the counties is close to 0 for the 15 year duration, but largest counties have a larger net loss or gain through migration. For instance, CA-Los Angeles has the highest net loss, whereas CA-Riverside highest net gain.

Considering the state level, we observe that, over the 15 years, the total in and out migration of DC and in migration of Nevada is greater than its population of the year 2000 (see Additional file 6: Figure S2-B). Considering net migration over 15 years, the largest population gain by percentage is in Nevada and the largest loss is in New York states.

Network dynamics

Network dynamics can be characterized by activity and inactivity of nodes and edges along with how their properties change over the time once they are active. In the county-level network, nodes are usually active with a probability of 0.8 for being continuously active over the 15 year period. On the other hand, edges have a probability of 0.15 for being always active. Hence, in this section, we focus on the edge dynamics. We analyze how links appear and disappear from the network for different time windows of ∆t, and once they are active how their weight change.

Period of activity for a given value of ∆t is defined as the number of consecutive time steps in which the link is active. Figures 3 and 4 show activity and inactivity distribution of links for different values of ∆t, respectively. We observe that both distributions (activity and inactivity) in both levels (county and state) show similar behavior. When ∆t is 1 year, there is a slight fluctuation of P(t) when t = 2014 and 2015 because of the decline of migration in 2014 compared to other years. Some of the links, which were continuously active until 2014, change their state to inactive in 2014 and then back to active in 2015. With ∆t = 2 years or higher, this slight fluctuation disappears among different networks with the same ∆t.

In similar studies of cattle trade movements [21] and air transportation network [20], most of the nodes and links are continuously active or inactive, as they have daily data. Our network shows similar behavior in terms of activity and inactivity of links, as our data are yearly. Yet, there are considerable number of links that are active for long durations. In particular, the migration between counties/states that are central or geographical close is always active. For example, Los Angeles, CA and Harris, TX are central nodes in the county-level network or Texas and California are central nodes in the state-level network, and they have migration between these pairs every year. Similarly, Washoe, NV–Pershing, NV counties Utah–Colorado pair are geographically close and have yearly migration between each pair.

Since most of the links are active or inactive for short periods of time, we want to analyze the mechanism behind appearance and disappearance of links. We used mechanism proposed by [20] to evaluate the fraction of appearing f^a and disappearing f^d links as a function of their weight. This will enable us to see if there is a correlation between stability of links and the number of people migrating. Fraction of appearing links can be formulized as f^a(w) = E^a(w|t)/E(w|t), where E(w|t) is the number of links with weight w at time t and E^a(w|t) is the number of such links that were not active at time t and thus appearing at time t. Similar logic is applied for fraction of disappearance f^d, where E^d (w|t) is the number of links of weight w active at time t-1 but not active at t and E(w|t) is similar as for f^a. Figures 5 and 6 provide results of f^a and f^d for county and state levels, respectively, for different ∆t timescales.

We observe that f^a and f^d have almost identical behavior similar to the cattle trade movements [21] and airline transportation network [20]. However, overall behavior of f^a and f^d differs from those studies. In the cattle trade network, links with intermediate weight are stable, whereas links with small or large weight are unstable because of different commercial forces and limitations on a premise’s capacity. In air travel network, there is a positive correlation between links’ stability and weight, as links with large weights are economically convenient and involve hubs that are not necessarily the source or destination of a traveler. Our migration analysis shows some similarity to the air travel. However, in migration network, links with very large weights show unstable behavior, whereas they were more stable in the air transportation network. This indicates that mass migration does not always happen between two counties and fluctuates over the years.

When we compare county and state-level networks, it seems that there is a correlation between stability of a link and its weight with slight fluctuations in high weight links at the state level. However, when we remove self-edges from analyses (see Additional file 7: Figure S3), we can see that county and state levels have identical behavior in terms of the fraction of appearance and disappearance of links.

In our previous analyses, we concluded that migration from one county to another may fluctuate in different years. Thus, we analyze evolution of link weight over the time utilizing growth rate r_ij(t) = log(w_ij(t + 1)/w_ij(t)) function [21]. From Figs. 7 and 8, we can observe that most of the weight increments are small, but sudden large increase or decrease of weights can be observed with a non-negligible probability. This behavior can be seen at different network levels (i.e., county and state) or different ∆t timeframe. Similar results were obtained from analyses of firm growth [27], air transportation [20], and cattle trade movement [21].

We observed that links in a network fluctuate in terms of weight, as there is a strong instability of links with small weights. Hence, we want to identify the backbone of network if there is any. Global thresholding of links as a function of weight may give misleading results, because it may dismiss links with small weights which are locally very important [21]. Therefore, we utilized a disparity filter, which was introduced in [28], to identify backbone of the network. We created backbone of the network for each year (16 networks) for county and state levels. Then, we computed the overlap between (16 × 15)/2 pairs of the backbones. Overlap of two networks is calculated as the ratio of intersection to union of edges, i.e., | E₁ ∩ E₂ |/| E₁ ∪ E₂|.

In a disparity filter, significance level of filtering can be configured with the α parameter and we repeated the analysis for different values of α. Tables 2 and 3 present overlaps of the backbones in color-coded matrix. Overlap of two successive backbones is around 80% for the county level and 90% for the state level. Moving away from diagonal may reduce this value to 65% for the county level and to 80% for the state level. These results indicate that there exists a backbone for migration network at both levels, even though the county-level links are not as stably active.

Table 2 Evolution of yearly county-level migration network backbones

Full size table

Table 3 Evolution of yearly state-level migration network backbones

Full size table

We determine the backbone networks, as the ones produced with α = 0.5. Figure 9 presents the county- and state-level backbone networks, where node communities are colored with the same color.

Central counties and states

In this section, we compare different centrality metrics to determine and analyze top-ranked counties by each centrality metric. In particular, we compare population, weighted out-degree, weighted in-degree, degree, hub, eigenvector, PageRank, and betweenness centralities. Figure 10 compares the top 10 ranking of counties and states based on their populations in 2000, weighted out-degree, weighted in-degree, degree, hub, eigenvector, PageRank, and betweenness centralities considering the backbone identified in the previous section.

Betweenness centrality ranks nodes that are in between most of the shortest paths among node pairs. In our case, counties/states that are on the pathways of most migrations will be ranked highest. Hub centrality ranks nodes that are connected to authority nodes that receive links from hubs. Hence, a highly central hub links to highly central authorities, and vice versa. In our case, a high hub centrality indicates counties/states that have outgoing links to central counties receiving migration from many. Eigenvector centrality ranks nodes based on the centrality of nodes that they are connected to. In our case, a high-ranked county/state is connected to other high-ranked counties with incoming or outgoing migration. PageRank centrality adjusts the generous recognition of the eigenvector centrality, and ranks a node higher if it is linked from other important and stingy nodes or if it is highly linked.

Comparing the backbone network with the overall network (∆t = 16) (see Additional file 8: Figure S4-A), we observe overlap in the top five central counties. Except pagerank, most of the centralities shows similar ranking with higher ranks. Overall, we observe that sun-belt counties attract large inflow of migration are among the top 10 ranked counties. Even though Maricopa, AZ has relatively lower population than other counties in 2010, it is the top county with respect to most of the centrality measures.

Comparing the housing boom (2004–2007) and housing bust (2008–2012) with the backbone network, we observe that weighted in- and out-degree ranking of counties changed between different periods except the top-ranked Los Angeles, CA. In addition, while in-degree ranking of some counties were higher during housing boom than overall (e.g., San Diego, CA and Orange, CA), their rankings fell after housing boom. Housing bust seem to be beneficial to some counties (e.g., Maricopa, AZ), with higher in-degree and lower out-degree. We observe that except Maricopa, AZ, ranking of top betweenness counties change with the housing boom or bust. Except Maricopa, AZ, top six counties are in the same rank of hub centrality during different periods. Harris, TX has lower eigenvector centrality during housing boom, indicating fewer connections to central counties, while it has higher centrality during housing bust, indicating more connections to central counties. A reverse effect is observed with Clark, NV. We observe that majority of the counties in top PageRank are counties that are in the sun-belt. Among top-ranked counties, Clark, NV loses most ranking with the housing bust.

Figure 10b presents the top 10 states based on the centralities considering backbone network of states identified in the previous section. Similar to backbone of county network, in state backbone, top ranks in centralities are shared by California, Texas, New York, Florida, and Arizona. These states have high ranking due to their counties with high centrality. For example, California is driven by Los Angeles county, while Texas is driven by Harris county. Considering the overall network (∆t = 16) (see Additional file 8: Figure S4-B), New York State had the highest overall net population loss, even though it had the second highest incoming migration and fourth outgoing, and is not in top nine in any of the other centralities.

Migration factors

In this section, we analyze impacts of socioeconomic factors (such as unemployment, poverty, crime rate, and federal expenditure) and political factors on migration, and then perform a gravity model based analysis of identified factors.

Socioeconomic factors

We obtained the unemployment and poverty rate data from US Department of Agriculture [24] and federal expenditure and crime rate from US Census [23]. In our analysis, we calculated the difference of the unemployment and poverty rates at the destination county with respect to the source county. As the migration data are for a whole year we used the previous year’s data for source county and adjusted for annual percentage changes.

Figure 11 presents the migration change between source and destination county, where each bar shows how many individuals migrated between counties with a difference of a given percentage of poverty. For instance, in 2005, 1,172,986 individuals migrated to another county with a poverty 1% less than their origin county and 1,130,399 migrated to a county with a 1% higher poverty. Overall, people moved to a county with 1.16% less poverty than their source county in 2005 and 2.47% less poverty in 2009. These results indicate that people had stronger motivation to move to counties that are more prosperous especially in 2009.

Figure 12 presents the migration change between source and destination county, where each bar shows how many individuals migrated between counties with a difference of a given percentage of unemployment. For instance, in 2005, 3,152,814 individuals migrated to another county with an unemployment 1% less than their origin county and 3,825,457 migrated to a county with a 1% higher unemployment. Overall, people moved to a county with 0.28% fewer unemployment in 2005 and 3.72% higher unemployment in 2009. This indicates, in 2009, that a strong consideration for migration between counties was unemployment rate of destination county compared to 2005, where people moved to counties with slightly fewer jobs than their origin county.

When crime rate of counties were analyzed, we observed that in 2005 and 2009, people migrated to a county with a 0.006% higher and 0.02% fewer crime rate than their origin county, on average. Similarly, when we analyzed federal expenditure per county, we observed that in 2005 and 2009, people migrated to a county with a 0.53% and 0.1% fewer expenditure than their origin county, on average. These results indicate that crime rate or federal expenditure of a county was not a major factor in people’s decision for migration.

Political factors

We then analyze whether political affiliation of a county has a correlation with its migration. We utilized the presidential election results of 2004 and 2008, approximately at the beginning of housing boom and housing burst periods, obtained from US census [23]. Migration patterns of 2000 and 2012 were similar to the 2004 and 2008, respectively, as the same party had won the respective elections.

Table 4 shows the statistical significance of political affiliation and weighted in-degree of each county over the entire time frame. Results suggests that people are slightly attracted towards moderate Republican counties. This was expected; as in general, the housing boom was located in Republican counties. Another interesting observation is that being strongly politically affiliated (either Republican or Democratic) had a negative influence on the migration influx. This suggests that people tend to move towards more neutral areas as opposed to strong Republican or Democratic counties.

Table 4 Political affiliation and growth percentage

Full size table

Figure 13 presents the flow of people in 2005 based on the Republican Party vote in the 2004 Presidential election, which was won by the Republican candidate. Each block shows the number of people that migrated from a county with a range of vote for the Republican Party. For instance, first block indicates that 1,144,070 people migrated in 2005 from counties with a Republican vote less than 25% in the 2004 Presidential election. Some of these people migrated to another county with a similar Republican vote and a slim portion of them moved to a significantly Republican county with over 75% Republican vote. Overall, 756,276 people moved to a county with less than 25% vote in 2005. Note that people that did not migrate are not counted in these figures. Similarly, Fig. 14 shows the flow of people in 2009 based on the Democratic Party vote in the 2008 Presidential election, which was won by the Democratic candidate. Compared to the prior election, we observe greater mobility for highly Democratic counties.

We then analyzed the state-level migration based on political affiliation of States. In Fig. 15, we observe that after the 2004 Presidential election that was won by the Republican candidate, people migrated towards more Republican States. However, after the 2008 Presidential election that was won by the Democratic candidate, people still moved to the Republican States, as shown in Fig. 16.

To further explore the structure of the migration network between Democrat and Republican counties, we analyzed the community structure during the housing boom and the housing bust, where only counties that are majority Democrat or Republican. Figure 17 shows the community structure of the Democrat and Republican communities during the housing boom (2004–2007) without the other. We observe that 612 Democratic counties form 56 communities, but 47 are not connected to the giant component. Similarly, 2521 Republican counties form 106 communities with 86 being isolated. Figure 18 presents the community structure of the Democrat and Republican communities during the housing bust (2008–2011). The giant component gets broken into 35 and 47 communities, respectively, when the Republican and Democratic nodes are removed from the network. We observe that 913 Democratic counties form 41 communities, but 38 are not connected to the giant component. Similarly, 2226 Republican counties form 115 communities with 94 being isolated.

Overall, we observe that there are fewer Democratic counties, but they have larger populations. On the other hand, we see that there are more Republican counties, but with smaller populations. Removing nodes with either political affiliation disrupts the migration network. This suggests that even though there is an afflation between political parties, there are still several nodes that have cross afflation with the other counties. Close to half of the total migrant population crosses the political afflation network over the period of this study. These results suggests that political afflation, especially of the destination, has some impact on the US migration.

Gravity model

We utilized the gravity model of migration to analyze individual impact of each factor. We estimate the migration by ordinary least squares, as formulated in [29]

$$ln \, M_{ij} = \delta + \alpha ln \, P_{i} + \beta ln \, P_{j} - \gamma ln \, D_{ij} + \varepsilon_{ij} ,$$

(1)

where M_ij is number of people migrated from i to j and P_i, P_j, are populations of origin and destination, respectively, and D_ij is straight line distance between origin and destination in km. Distance between counties were retried from National Bureau of Economic Research [30]. The parameters δ, α, β, γ are estimated by Ordinary Least Squares (OLS) regression in which ε is random error term.

We normalized the data to get normalized values between 0 and 1. Then, we applied OLS regression on Eq. (1) by projecting migration flow on population, poverty, unemployment, and political affiliation independently. Projection of population, poverty, and unemployment is straightforward. However, when projecting politic affiliation, we first identified if the source is Democratic or Republican. If the source is Democratic, we used the Democratic percentage of votes for the source and the destination based on the last presidential election. We applied OLS regression for each year 2004 to 2015. Summary of parameters with mean square error for different ∆t time periods is provided in Table 5. We observe similar impact of each parameter for source and destination across different ∆t periods. In addition, Table 6 presents the parameters when all factors are considered. Combined results indicate that unemployment has the highest impact, while source political affiliation has the least.

Table 5 Parameter estimations of the gravity model of migration

Full size table

Table 6 Combined parameter estimations of the gravity model of migration

Full size table

Overall for population analyses, α and β converged to [0.5, 0.6] and γ converged to [0.7, 0.8]. Similar results were obtained for analyzes of migration in UK [31] with γ converging to [1.5, 1.6] and for analyzes of New Zealand [17] with α, β, and γ converging to [0.8, 0,9]. However, results for poverty and unemployment rate in US migration differ from related studies. For example, in analyses of internal migration in Africa [15], α converges to 0.38 and β converged to 6.4 for unemployment, while α converges to 0.32 and β converged to − 0.23 for poverty.

Conclusions

In this paper, we analyze the US migration as a complex network at county and state levels. Our results confirm what was found in the literature about migration dynamics. We observe that counties are usually active, but show a strong fluctuation in their links. For instance, links with small weights are typically unstable, but there are some links with very large weight that are fluctuating as well. Furthermore, we observed that once edges are active their weight increments are typically small, but might have sudden large increase or decrease. Despite fluctuations in link level, there are backbones in both the county and state-level networks. We observe that the migration network has maintained its characteristics during the political and economic instability, since the millennium and only recently has exhibited slightly differing network characteristics.

In analyses of political and socioeconomic factors, we observe that political inclination of destination and economic factors such as poverty and unemployment rate plays a role in the US migration, whereas crime rate and federal expenditure do not. In addition, population and geographic considerations were indeed strong indicators of US migration flow. Compared to earlier studies on migration, US migration conforms to the gravity model, but has some differences from other countries.

Finally, compared to other temporal networks of cattle movement and air transportation, US migration network has higher activity of nodes over the years and has edge dynamics somewhat similar to the airflow.

References

Rayer S, Brown DL. Geographic diversity of inter-county migration in the United States, 1980–1995. Popul Res Study. 2001;20(3):229–52.
Google Scholar
Beeson PE, DeJong DN, Troesken W. Population growth in U.S. counties 1840–1990. Reginal Science and Urban Economics. 2001.
McHugh KE, Gober P. Short term dynamics of U.S. Interstate Migration System. 1992.
Frey W. The new urban revival in the United States (vol. 30). Urban Studies. 1993.
Molloy R, Smith CL, Wozniak A. Internal migration in the United States. J Econ Perspect. 2011;25(3):173–96.
Article Google Scholar
Galle OR, Burr JA, Potter LB. Rethinking measures of migration: on the decomposition of net migration. Soc Indicators Res. 1993;28(2):157–71.
Article Google Scholar
Fagiolo G, Mastrorilllo M. The International Migration Network. 2012.
Goldade T, Charyyev B, Gunes MH. Network analysis of migration patterns in the United States. Complex networks & their applications. Berlin: Springer; 2017. p. 770–83.
Google Scholar
Kemper F-J. Internal migration in eastern and western. Reg Stud. 2004;38(6):659–78.
Article Google Scholar
Niedomysl T. Residential preferences for interregional migration in Sweden: demographic, so-cioeconomic, and geographical determinants. Environ Plann A. 2008;40(5):1109–31.
Article Google Scholar
Davis KF, D’Odorico P. Global spatio-temporal patterns in human migration: a complex network per-spective. PLoS ONE. 2013;8:e53723.
Article Google Scholar
Fagiolo G, Mastrorillo M. Does human migration affect international trade? A complex-network perspective. PLoS ONE. 2014;9:e97331.
Article Google Scholar
Tranos E, Gheasi M, Nijkamp P. International migration: a global complex network. Environ Plann B. 2015;42(1):4–22.
Article Google Scholar
Aleskerov F, Khutorskaya O, Buldyaev A, Yamilov A. Network analysis of international migration. In: International conference on network analysis. Berlin: Springer; 2016. p. 177–85.
Rayp G, Ruyssen I. Africa on the move: an extended gravity model of intra-regional migration Migration, a world in motion: a multinational conference on migration and migration policy. Maastricht: Association for Public Policy Analysis and Management Netherlands; 2010.
Bergh A, Mirkina I, Nilsson T. Pushed by poverty or by institutions? Determinants of global migration flows. IFN Working Paper. 2015.
Alimi O, Mare D, Poot J. Does distance still matter for internal migration and, if so, how? Evidence from 1986 to 2006. Labour, Employment and Work in New Zealand. 2015.
Holme P, Saramäki J. Temporal networks. Phys Rep. 2012;519(3):97–125.
Article Google Scholar
Braha D, Bar‐Yam Y. From centrality to temporary fame: dynamic. Complexity. 2006;12(2):59–63.
Article Google Scholar
Gautreau A, Barrat A, Barthélemy M. Microdynamics in stationary complex networks. In: Proceedings of the national academy of sciences. 2009. p. 8847–52.
Bajardi P, Barrat A, Natale F, Savini L, Colizza V. Dynamical patterns of cattle trade movements. PLoS ONE. 2011;6(5):e19869.
Article Google Scholar
SOI Tax Stats—migration data. (n.d.). From Internal Revenue Service: https://www.irs.gov/statistics/soi-tax-stats-migration-data. Accessed 30 Nov 2017.
United States Census. (n.d.). U.S. Census Bureau: https://www.census.gov/support/USACdataDownloads.html.
United States Department of Agriculture (n.d.). https://www.ers.usda.gov/data-products/county-level-data-sets/. Accessed 25 Jan 2018.
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008;2008(10):P10008.
Article Google Scholar
Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks. In: International AAAI conference on weblogs and social media. 2009.
Stanley MH, Amaral LA. Scaling behavior in the growth of companies. Nature. 1996;379(6568):804–6.
Article Google Scholar
Serrano MA, Boguná M. Extracting the multiscale backbone of complex weighted networks. Proc Natl Acad Sci. 2009;106(16):6483–8.
Article Google Scholar
Poot J, Alimi O, Cameron MP, Maré DC. Gravity model of migration: the successful comeback of an ageing superstar in regional science. J Reg Res. 2016; 63–86.
Research NB. County distance. http://www.nber.org/data/county-distance-database.html. 2016.
Lomax N, Norman P. Subnational migration in the United Kingdom: producing a consistent time series using a combination of available data and estimates. J Popul Res. 2013;30(3):265–88.
Article Google Scholar

Download references

Authors’ contributions

BC performed the research, data collection and analyzes. MG designed the research, provided guidance and contributed to the writing of the manuscript. Both authors read and approved the final manuscript.

Acknowledgements

We would like to thank Travis Goldade for initial work in this study. This material is based upon work in part supported by the National Science Foundation under Grant Number EPS-IIA-1301726.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

Data and materials are available at https://github.com/ComplexNetwork-USMigration/data.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, USA
Batyr Charyyev & Mehmet Hadi Gunes

Authors

Batyr Charyyev
View author publications
You can also search for this author in PubMed Google Scholar
Mehmet Hadi Gunes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehmet Hadi Gunes.

Additional files

Additional file 1: Table S1.

County-level US Migration Network Characteristics. Number of nodes, edges, avrg degree, avrg weighted degree, diameter, avrg path length, clustering coefficient, modularity, and assortativity for each network of ∆t at county level.

Additional file 2: Table S2.

State-level US Migration Network Characteristics. Number of nodes, edges, avrg degree, avrg weighted degree, diameter, avrg path length, clustering coefficient, modularity, and assortativity for each network of ∆t at state level.

Additional file 3: Figure S1.

County-Level Migration Network between 2000 and 2015. Node size is determined by its population. Counties with the highest 20 incoming migration, i.e., weighted in-degree, are labeled and label color shows whether the county was Republican or Democratic in the 2000 presidential elections. Nodes are colored based on the community they belong to and edges are colored based on the node they originate from.

Additional file 4: Table S4.

Communities among States. Communities of states for each year.

Additional file 5: Table S3.

Interstate and Intrastate Migration. Yearly links and migration within and between states.

Additional file 6: Figure S2.

County (left) and State (right) Ranking by Population. Counties and States are ranked by the population in 2000.

Additional file 7: Figure S3.

Fraction of Appearance and Disappearance of State-Level Links with no Self Edge (left: ∆t = 1, right: ∆t = 2). Weight is the number of migrating people.

Additional file 8: Figure S4.

Top 10 Ranking of Counties in the Overall Network based on Different Centralities. The network is obtained for ∆t=16.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Charyyev, B., Gunes, M.H. Complex network of United States migration. Comput Soc Netw 6, 1 (2019). https://doi.org/10.1186/s40649-019-0061-6

Download citation

Received: 21 February 2018
Accepted: 06 January 2019
Published: 24 January 2019
DOI: https://doi.org/10.1186/s40649-019-0061-6

Complex network of United States migration

Abstract

Introduction

Related work

Methodology

Data collection

Data cleaning

Network construction

U.S. migration network

Network dynamics

Central counties and states

Migration factors

Socioeconomic factors

Political factors

Gravity model

Conclusions

References

Authors’ contributions

Acknowledgements

Competing interests

Availability of data and materials

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Additional files

Rights and permissions

About this article

Cite this article

Share this article