Title: | Tools and Data for Quantitative Peace Science Research |
---|---|
Description: | These are useful tools and data sets for the study of quantitative peace science. The goal for this package is to include tools and data sets for doing original research that mimics well what a user would have to previously get from a software package that may not be well-sourced or well-supported. Those software bundles were useful the extent to which they encourage replications of long-standing analyses by starting the data-generating process from scratch. However, a lot of the functionality can be done relatively quickly and more transparently in the R programming language. |
Authors: | Steve Miller [aut, cre] |
Maintainer: | Steve Miller <[email protected]> |
License: | GPL-2 |
Version: | 1.1.9 |
Built: | 2024-11-16 05:05:49 UTC |
Source: | https://github.com/svmiller/peacesciencer |
add_archigos()
allows you to add some information about leaders to dyad-year
or state-year data. The function leans on an abbreviated version of the data, which also comes in this package.
add_archigos(data)
add_archigos(data)
data |
a dyad-year data frame (either "directed" or "non-directed") or state-year data frame |
The function leans on attributes of the data that are provided by the create_dyadyear()
or
create_stateyear()
function. Make sure that function (or data created by that function) appear at the top
of the proverbial pipe.
add_archigos()
takes a dyad-year or state-year data frame and adds a few summary
variables based off the leader-level data. These include whether there was a leader transition in the
state-year (or first/second state in the dyad-year), whether there was an "irregular" leader transition,
the number of leaders in the state-year, the unique leader ID for Jan. 1 of the year, and the unique leader
ID for Dec. 31 of the year.
Steven V. Miller
Goemans, Henk E., Kristian Skrede Gleditsch, and Giacomo Chiozza. 2009. "Introducing Archigos: A Dataset of Political Leaders" Journal of Peace Research 46(2): 269–83.
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_archigos() create_stateyears() %>% add_archigos()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_archigos() create_stateyears() %>% add_archigos()
add_atop_alliance()
allows you to add Alliance Treaty Obligations and Provisions (ATOP)
data to a (dyad-year, leader-dyad-year) data frame.
add_atop_alliance(data)
add_atop_alliance(data)
data |
a data frame with appropriate peacesciencer attributes |
Data are from version 5.0 of ATOP.
This function will also work with leader-dyad-years, though users should be careful with leader-level applications of alliance data. Alliance data are primarily communicated yearly, making it possible—even likely—that at least one leader-dyad in a given year is credited with an alliance that was not active in the particular leader-dyad. The ATOP alliance data are not communicated with time measurements more granular than the year, at least for dyad-years. The alliance-level data provided by ATOP do have termination dates, but I am unaware how well these start and termination dates coincide with particular members joining after the fact or exiting early. The alliance phase data appear to communicate that "phases" are understood as beginning/ending when the underlying document is amended in such a way that it affects one of their variable codings, but this may or may not be because of a signatory joining after the fact or exiting early. More guidance will be useful going forward, but use these data for leader-level analyses with that in mind.
add_atop_alliance()
takes a (dyad-year, leader-dyad-year) data frame and adds information about
the alliance pledge in that given dyad-year from the ATOP data. These include whether there was an alliance
with a defense pledge, an offense pledge, neutrality pledge, non-aggression pledge, or pledge for consultation
in time of crisis.
Steven V. Miller
Leeds, Brett Ashley, Jeffrey M. Ritter, Sara McLaughlin Mitchell, and Andrew G. Long. 2002. Alliance Treaty Obligations and Provisions, 1815-1944. International Interactions 28: 237-60.
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_atop_alliance()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_atop_alliance()
add_capital_distance()
allows you to add capital-to-capital
distance to a (dyad-year, leader-year, leader-dyad-year, state-year) data frame. The capitals are coded in the
cow_capitals
and gw_capitals
data frames, along with their latitudes and longitudes. The distance variable that
emerges capdist
is calculated using the "Vincenty" method (i.e. "as the crow
flies") and is expressed in kilometers.
add_capital_distance(data)
add_capital_distance(data)
data |
a data frame with appropriate peacesciencer attributes |
The function leans on attributes of the data that are provided by one of the "create" functions
in this package (e.g. create_dyadyear()
or create_stateyear()
).
add_capital_distance()
takes a (dyad-year, leader-year, leader-dyad-year, state-year) data frame and adds the
capital-to-capital distance between the first state and the second state (in dyad-year data) or the minimum
capital-to-capital distance for a given state in a given year. A minor
note about this function: cases of capital transition are recorded in the
underlying data but, in the conversion to capital-years (and eventual
merging into a dyad-year data frame), the Jan. 1 capital is used for calculating
distances.
Steven V. Miller
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_capital_distance() create_stateyears() %>% add_capital_distance()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_capital_distance() create_stateyears() %>% add_capital_distance()
add_ccode_to_gw()
allows you to match, as well as one can, Correlates of War system membership data
with Gleditsch-Ward system data.
add_ccode_to_gw(data)
add_ccode_to_gw(data)
data |
a data frame with appropriate peacesciencer attributes |
The data-raw
directory on the project's Github contains
more information about the underlying data that assists in merging in
these codes.
The user will invariably need to be careful and ask why they want these data included. The issue here is that both have a different composition and the merging process will not (and cannot) be perfect. We can note that a case like Gran Colombia is not too difficult to handle (i.e. CoW does not have this entity and none of the splinter states conflict with CoW's coding). However, there is greater weirdness with a case like the unification of West Germany and East Germany. Herein, Correlates of War treats the unification as the reappearance of the original Germany whereas Gleditsch-Ward treat the unification as an incorporation of East Germany into West Germany. The script will not create state-year or dyad-year duplicates for the Gleditsch-Ward codes. The size of the original data remain unchanged. However, there will be some year duplicates for various Correlates of War codes (prominently Serbia and Yugoslavia in 2006). Use with care. You can also use the countrycode package. Whether you use this function or the countrycode package, do not do this kind of merging without assessing the output.
add_ccode_to_gw()
takes a (dyad-year, leader-year, leader-dyad-year, state-year) data frame that already has Gleditsch-Ward
state system codes and adds their corollary Correlates of War codes.
Steven V. Miller
# just call `library(tidyverse)` at the top of the your script library(magrittr) create_dyadyears(system = "gw") %>% add_ccode_to_gw() create_stateyears(system = 'gw') %>% add_ccode_to_gw()
# just call `library(tidyverse)` at the top of the your script library(magrittr) create_dyadyears(system = "gw") %>% add_ccode_to_gw() create_stateyears(system = 'gw') %>% add_ccode_to_gw()
add_contiguity()
allows you to add Correlates of War contiguity
data to a dyad-year, leader-year, or leader-dyad-year, or state-year data frame.
add_contiguity(data)
add_contiguity(data)
data |
a data frame with appropriate peacesciencer attributes |
The contiguity codes in the dyad-year data range from 0 to 5. 1 = direct land contiguity. 2 = separated by 12 miles of water or fewer (a la Stannis Baratheon). 3 = separated by 24 miles of water or fewer (but more than 12 miles). 4 = separated by 150 miles of water or fewer (but more than 24 miles). 5 = separated by 400 miles of water or fewer (but more than 150 miles).
Importantly, 0 are the dyads that are not contiguous at all in the CoW contiguity data. This is a conscious decision on my part as I do not think of the CoW's contiguity data as exactly ordinal. Cross-reference CoW's contiguity data with the minimum distance data in this exact package to see how some dyads that CoW codes as not contiguous are in fact very close to each other, sometimes even land-contiguous. For example, Zimbabwe and Namibia are separated by only about a few hundred feet of water at that peculiar intersection of the Zambezi River where the borders of Zambia, Botswana, Namibia, and Zimbabwe meet. There is no contiguity record for this in the CoW data. There are other cases where contiguity records are situationally missing (e.g. India-Bangladesh, and Bangladesh-Myanmar in 1971) or other cases where states are much closer than CoW's contiguity data imply (e.g. Pakistan and the Soviet Union were separated by under 30 kilometers of Afghani territory). The researcher is free to recode these 0s to be, say, 6s, but this is why peacesciencer does not do this.
For additional clarity, the "master records" produce duplicates for cases when the contiguity relationship changed in a given year. This function returns the minimum contiguity relationship observed in that given year. There should be no duplicates in the returned output.
Be mindful that the data are fundamentally state-year and that extensions to leader-level data should be understood as approximations for leaders in a given state-year.
add_contiguity()
takes a data frame and adds information
about the contiguity relationship based on the "master records" for the
Correlates of War direct contiguity data (v. 3.2). If the data are dyad-year
(or leader dyad-year), the function returns the lowest contiguity type
observed in the dyad-year (if contiguity is observed at all). If the data
are state-year (or leader-year), the data return the total number of
land and sea borders calculated from these master records.
Steven V. Miller
Stinnett, Douglas M., Jaroslav Tir, Philip Schafer, Paul F. Diehl, and Charles Gochman (2002). "The Correlates of War Project Direct Contiguity Data, Version 3." Conflict Management and Peace Science 19 (2):58-66.
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_contiguity() create_stateyears() %>% add_contiguity()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_contiguity() create_stateyears() %>% add_contiguity()
add_cow_alliance()
allowed you to add Correlates of War alliance
data to a dyad-year data frame. However, this function is deprecated at the
request of the data set's maintainer and any use of the Correlates of War's
alliance data will have to be done manually. The function now returns a stop
communicating this development.
add_cow_alliance(data)
add_cow_alliance(data)
data |
a dyad-year or leader-dyad-year data frame (either "directed" or "non-directed") |
Duplicates in the original directed dyad-year alliance data were
pre-processed. Check cow_alliance
in the package's data-raw
directory on Github for more information.
This function will also work with leader-dyad-years, though users should be careful with leader-level applications of alliance data. Alliance data are primarily communicated yearly, making it possible—even likely—that at least one leader-dyad in a given year is credited with an alliance that was not active in the particular leader-dyad. The Correlates of War's alliance data are not communicated with time measurements more granular than the year. Apply these data to leader-level analyses with that in mind.
add_cow_alliance()
now returns a stop communicating the maintainer's
request to reject all software that facilitates the use of the data in this
fashion. add_cow_alliance()
previously took a dyad-year data frame and
added information about the alliance pledge in that given dyad-year. These
include whether there was an alliance with a defense pledge, neutrality
pledge, non-aggression pledge, or pledge for consultation in time of crisis
(entente).
Steven V. Miller
Gibler, Douglas M. 2009. International Military Alliances, 1648-2008. Congressional Quarterly Press.
## Not run: # just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_cow_alliance() ## End(Not run)
## Not run: # just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_cow_alliance() ## End(Not run)
add_cow_majors()
allows you to add Correlates of War major power variables
to a dyad-year, leader-year, leader dyad-year, or state-year data frame.
add_cow_majors(data)
add_cow_majors(data)
data |
a data frame with appropriate peacesciencer attributes |
Be mindful that the data are fundamentally state-year and that extensions to leader-level data should be understood as approximations for leaders in a given state-year.
add_cow_majors()
takes a data frame and adds information
about major power status for the given state or dyad in that year. If the
data are dyad-year (or leader dyad-year), the function returns two
columns for whether the first state (i.e. ccode1
) or the second
state (i.e. ccode2
) are major powers in the given year, according
to the Correlates of War. 1 = is a major power. 0 = is not a major
power. If the data are state-year (or leader-year), the functions
returns just one column (cowmaj
) for whether the
state was a major power in a given state-year.
Steven V. Miller
Correlates of War Project. 2017. "State System Membership List, v2016." Online, https://correlatesofwar.org/data-sets/state-system-membership/
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_cow_majors()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_cow_majors()
add_cow_mids()
merges in CoW's MID data to a dyad-year data frame.
The version of the CoW-MID data in this package is version 5.0.
add_cow_mids(data, keep)
add_cow_mids(data, keep)
data |
a dyad-year data frame (either "directed" or "non-directed") |
keep |
an optional parameter, specified as a character vector, passed to the function in a If |
Dyads are capable of having multiple disputes in a given year, which can create a problem for merging into a complete dyad-year data frame. Consider the case of France and Italy in 1860, which had three separate dispute onsets that year (MID#0112, MID#0113, MID#0306), as illustrative of the problem. This merging process employs several rules to whittle down these duplicate dyad-years for merging into a dyad-year data frame.
The function will also return a message to the user about the case-exclusion rules that went into this process. Users who are interested in implementing their own case-exclusion rules should look up the "whittle" class of functions also provided in this package.
add_cow_mids()
takes a dyad-year data frame and adds dyad-year dispute information
from the CoW-MID data.
Steven V. Miller
Palmer, Glenn, and Roseanne W. McManus and Vito D'Orazio and Michael R. Kenwick and Mikaela Karstens and Chase Bloch and Nick Dietrich and Kayla Kahn and Kellan Ritter and Michael J. Soules. 2021. "The MID5 Dataset, 2011–2014: Procedures, coding rules, and description" Conflict Management and Peace Science.
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_cow_mids() # keep just the dispute number and Side A/B identifiers cow_ddy %>% add_cow_mids(keep=c("dispnum","sidea1", "sidea2"))
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_cow_mids() # keep just the dispute number and Side A/B identifiers cow_ddy %>% add_cow_mids(keep=c("dispnum","sidea1", "sidea2"))
add_cow_trade()
allows you to add Correlates of War trade data to your
(dyad-year, leader-year, leader-dyad-year, state-year) data frame
add_cow_trade(data)
add_cow_trade(data)
data |
a data frame with appropriate peacesciencer attributes |
For the dyad-year (and leader-dyad-year) data, there must be some kind of information loss in order to work within the limited space available to this package. This package loads a truncated version of the data in which the trade values are rounded to three decimal points in order to greatly reduce the disk space for this package. I do not think this to be terribly problematic, though I admit I do not like it. If this is a problem for your research question, you may want to consider not using this function for dyad-year or leader-dyad-year data.
Be mindful that the data are fundamentally state-year or dyad-year and that extensions to leader-level data should be understood as approximations for leaders (leader-dyads) in a given state-year (dyad-year).
add_cow_trade()
takes a (dyad-year, leader-year, leader-dyad-year, state-year) data frame and
adds information about the volume of trade in that given dyad-year or state-year. For the state-year (leader-year)
data, these are minimally the sum of all imports and the sum of all exports. For dyad-year (leader-dyad-year) data,
this function returns the value of imports in current million USD in the first country from
the second country (and vice-versa) along with their "smooth" equivalents.
Steven V. Miller
Barbieri, Katherine, Omar M. G. Keshk, and Brian Pollins. 2009. "TRADING DATA: Evaluating our Assumptions and Coding Rules." Conflict Management and Peace Science. 26(5): 471-491.
# just call `library(tidyverse)` at the top of the your script library(magrittr) # The function below works, but depends on running `download_extdata()` beforehand. # cow_ddy %>% add_cow_trade() create_stateyears() %>% add_cow_trade()
# just call `library(tidyverse)` at the top of the your script library(magrittr) # The function below works, but depends on running `download_extdata()` beforehand. # cow_ddy %>% add_cow_trade() create_stateyears() %>% add_cow_trade()
add_cow_wars()
allows you to Correlates of War data to a
dyad-year or state-year data frame
add_cow_wars(data, type, intratype = "all")
add_cow_wars(data, type, intratype = "all")
data |
a data frame with appropriate peacesciencer attributes |
type |
the type of war you want to add. Options include "inter" or "intra". |
intratype |
the types of armed conflicts the user wants to consider, specified as a character vector.
Options include "local issues" and "central control". Applicable only if |
Intra-state war data are coerced into true state-year data by first selecting the duplicate state-years on unique onsets, then whichever war was the deadliest. The inter-state war data work functionally the same way.
On intra-state wars: the primary_state
is used to identify the government
principally fighting the domestic non-state actor over central control over
local issues. Internationalized civil wars are included in the data, but not
for outside actors that intervene on behalf of the government or rebel group.
Extra-state war functionality is not available right now as I try to figure out the demand for its use.
add_cow_wars()
takes a dyad-year or state-year data frame and
returns information about wars from either the inter-state or intra-state war
data set from the Correlates of War. The function works for state-year data
when the user wants information about extra-state wars or intra-state wars.
The function works for dyad-year data when the user wants information about
inter-state wars.
Steven V. Miller
Dixon, Jeffrey, and Meredith Sarkees. 2016. A Guide to Intra-State Wars: An Examination of Civil Wars, 1816-2014. Thousand Oaks, CA: Sage.
Sarkees, Meredith Reid, and Frank Wheldon Wayman. 2010. Resort to War: A Data Guide to Inter-State, Extra-State, Intra-State, and Non-State Wars, 1816-2007. Washington DC: CQ Press.
# just call `library(tidyverse)` at the top of the your script library(magrittr) create_stateyears(system = "cow") %>% add_cow_wars(type = "intra", intratype = "central control") create_stateyears(system = "cow") %>% add_cow_wars(type = "intra", intratype = "local issues") cow_ddy %>% add_cow_wars(type = "inter")
# just call `library(tidyverse)` at the top of the your script library(magrittr) create_stateyears(system = "cow") %>% add_cow_wars(type = "intra", intratype = "central control") create_stateyears(system = "cow") %>% add_cow_wars(type = "intra", intratype = "local issues") cow_ddy %>% add_cow_wars(type = "inter")
add_creg_fractionalization()
allows you to add information about the
fractionalization/polarization of a state's ethnic and religious groups to
your data.
add_creg_fractionalization(data)
add_creg_fractionalization(data)
data |
a data frame with appropriate peacesciencer attributes |
Please see the information for the underlying data creg
,
and the associated R script in the data-raw
directory, to see how
these data are generated.
The creg
data have a few duplicates. When standardizing to true CoW
codes, the duplicates concern Serbia/Yugoslavia in 1991 and 1992 as well as
Russia/the Soviet Union in 1991. When standardizing to true Gleditsch-Ward
codes, the duplicates concern Serbia/Yugoslavia in 1991 and Russia/Soviet
Union in 1991. In those cases, the function does a group-by arrange for
the more fractionalized/polarized estimate under the (reasonable, I think)
assumption that these are estimates prior to the dissolution of those
states. If this is problematic, feel free to consult the underlying data
and merge those in manually.
The underlying data have both Gleditsch-Ward codes and Correlates of War
codes. The merge it makes depends on what you declare as the "master"
system at the top of the pipe (i.e. in create_dyadyears()
or
create_stateyears()
). If, for example, you run
create_stateyears(system="cow")
and follow it with
add_gwcode_to_cow()
, the merge will be on the Correlates of War
codes and not the Gleditsch-Ward codes. You can see the script mechanics
to see how this is achieved.
Be mindful that the data are fundamentally state-year and that extensions to leader-level data should be understood as approximations for leaders in a given state-year.
add_creg_fractionalization()
takes a dyad-year, leader-year,
leader-dyad-year, or state-data frame, whether the primary state
identifiers are from the Correlates of War system or the Gleditsch-Ward
system, and returns information about the fractionalization and
polarization of the state(s) in a given year. The function returns four
additional columns when the data are state-year and returns eight
additional columns when the data are state-year (or leader-year).
The columns returned are the fractionalization of ethnic groups, the
polarization of ethnic groups, the fractionalization of religious groups,
and the polarization of religious groups. When the data are dyad-year
(or leader-dyad-year), the return doubles because it provides information
for both states in the dyad.
Steven V. Miller
Alesina, Alberto, Arnaud Devleeschauwer, William Easterly, Sergio Kurlat and Romain Wacziarg. 2003. "Fractionalization". Journal of Economic Growth 8: 155-194.
Montalvo, Jose G. and Marta Reynal-Querol. 2005. "Ethnic Polarization, Potential Conflict, and Civil Wars" American Economic Review 95(3): 796–816.
Nardulli, Peter F., Cara J. Wong, Ajay Singh, Buddy Petyon, and Joseph Bajjalieh. 2012. The Composition of Religious and Ethnic Groups (CREG) Project. Cline Center for Democracy.
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_creg_fractionalization() create_stateyears() %>% add_creg_fractionalization() create_stateyears(system = "gw") %>% add_creg_fractionalization()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_creg_fractionalization() create_stateyears() %>% add_creg_fractionalization() create_stateyears(system = "gw") %>% add_creg_fractionalization()
add_democracy()
allows you to add estimates of democracy to your data.
add_democracy(data)
add_democracy(data)
data |
a data frame with appropriate peacesciencer attributes |
Be mindful that the data are fundamentally state-year and that extensions to leader-level data should be understood as approximations for leaders in a given state-year.
A vignette on the package's website talks about how these data are here
primarily to encourage you to maximize the number of observations in the
analysis to follow. Xavier Marquez' QuickUDS
estimates have the best
coverage. If democracy is ultimately a control variable, or otherwise a
variable not of huge concern for the analysis (i.e. the user has no
particular stake on the best measurement of democracy or the best
conceptualization and operationalization of "democracy"), please
use Marquez' estimates instead of Polity or V-dem. If the user is
doing an analysis of inter-state conflict, and across the standard
post-1816 domain in conflict studies, definitely don't use
the Polity data because the extent of its missingness is both large and
unnecessary. Please read the vignette describing these issues
here: http://svmiller.com/peacesciencer/articles/democracy.html
add_democracy()
takes a (dyad-year, leader-year, leader-dyad-year,
state-year) data frame and adds information about the level of democracy
for the state or two states in the dyad in a given year. If the data are
dyad-year or leader-dyad-year, the function adds six total columns for
the first state (i.e. ccode1
or gwcode1
) and the second state (i.e.
ccode2
or gwcode2
) about the level of democracy measured by the
Varieties of Democracy project (v2x_polyarchy
), the Polity project
(polity2
), and Xavier Marquez' QuickUDS
extensions/estimates. If the
data are state-year or leader-year, the function returns three additional
columns to the original data that contain that same information for a given
state in a given year.
Steven V. Miller
Coppedge, Michael, John Gerring, Carl Henrik Knutsen, Staffan I. Lindberg, Jan Teorell, David Altman, Michael Bernhard, M. Steven Fish, Adam Glynn, Allen Hicken, Anna Luhrmann, Kyle L. Marquardt, Kelly McMann, Pamela Paxton, Daniel Pemstein, Brigitte Seim, Rachel Sigman, Svend-Erik Skaaning, Jeffrey Staton, Agnes Cornell, Lisa Gastaldi, Haakon Gjerlow, Valeriya Mechkova, Johannes von Romer, Aksel Sundtrom, Eitan Tzelgov, Luca Uberti, Yi-ting Wang, Tore Wig, and Daniel Ziblatt. 2020. "V-Dem Codebook v10" Varieties of Democracy (V-Dem) Project.
Marshall, Monty G., Ted Robert Gurr, and Keith Jaggers. 2017. "Polity IV Project: Political Regime Characteristics and Transitions, 1800-2017." Center for Systemic Peace.
Marquez, Xavier, "A Quick Method for Extending the Unified Democracy Scores" (March 23, 2016). doi:10.2139/ssrn.2753830
Pemstein, Daniel, Stephen Meserve, and James Melton. 2010. "Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type." Political Analysis 18(4): 426-449.
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_democracy() create_stateyears(system="gw") %>% add_democracy() create_stateyears(system="cow") %>% add_democracy()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_democracy() create_stateyears(system="gw") %>% add_democracy() create_stateyears(system="cow") %>% add_democracy()
add_fpsim()
allows you to add a variety of dyadic foreign policy
similarity measures to your (dyad-year, leader-dyad-year) data frame
add_fpsim(data, keep)
add_fpsim(data, keep)
data |
a data frame with appropriate peacesciencer attributes |
keep |
an optional parameter, specified as a character vector, about
what dyadic foreign policy similarity measure(s) the user wants returned
from this function. If |
For the dyad-year (and leader-dyad-year) data, there must be some kind of information loss in order to reduce the disk space data like these command. In this case, all calculations are rounded to three decimal spots. I do not think this to be terribly problematic, though I admit I do not like it. If this is a problem for your research question (though I can't imagine it would be), you may want to consider not using this function for dyad-year or leader-dyad-year data.
Be mindful that the data are fundamentally dyad-year and that extensions to leader-level data should be understood as approximations for leaders-dyads in a given dyad-year.
The data this function uses are directed dyad-year and the merge is a left-join, making this function agnostic about whether your dyad-year (or leader-dyad-year) data are directed or non-directed.
Haege's (2011) article reads at first glance as agnostic about which of these particular measures you should consider a "preferred" or "default" measure of dyadic foreign policy similarity. Indeed, the 2011 publication in Political Analysis mostly drives the point home that S has important limitations and the multiple variants Haege calculates are not substitutable. This means a user interested in measuring dyadic foreign policy similarity might have to cycle through all of them to assess their varying effects whereas a user interested in this as just a control variable for the model can (probably) get by with picking just one and not belaboring the measure any further.
An evaluation of the data, the article, and an email exchange with the author leads to the following points the user should consider. What follows is a rationale for why users should think of kappa as a default measure for dyadic foreign policy similarity, though why the "valued" equivalent for the alliance data is an inadvisable default. The example at the end of the document offers the operational "nudge" for what the user should want from this function.
The choice of measure will in part depend on the temporal domain. If the user has just a post-WWII sample, the UN voting measures offer better coverage. We're all partial to the alliance data, though, because of its 19th century coverage.
Haege implores the use of chance-corrected measures, like Cohen's (1960) kappa or Scott's (1955) pi. Of the two, Haege suggests kappa over pi. The rationale is the user would need to build in a very strong assumption that the baseline propensity of forming a tie in the dyad is the same for both members of the dyad to make Scott's (1955) pi as appropriate an estimate as Cohen's (1960) kappa even as both have the important chance correction.
The choice of squared versus absolute distances is arbitrary. Users probably do not think about the differences, or know about the differences. S was usually calculated with absolute differences in software packages, though this was never usually belabored to the user. Comparability with S might be an argument in favor of absolute distance as a default, but keep in mind that squared distances are much more commonly used in most other types of distance and association metrics.
The choice of binary or valued is also a design choice for the user to consider on the full merits, though the practice of valuing alliance ties on a quantitative scale builds in strong assumptions about the scale of alliance strength as presented in something like the Correlates of War or ATOP typology. S has traditionally done this by default, which is another reason its application in a lot of quantitative peace science research is suspect.
add_fpsim()
takes a (dyad-year, leader-dyad-year) data frame and
adds information about the dyadic foreign policy similarity, based on
several measures calculated and offered by Frank Haege.
Steven V. Miller
For any use of these data whatsoever (except for Tau-b), please cite Haege (2011). Data are version 2.0.
Haege, Frank M. 2011. "Choice or Circumstance? Adjusting Measures of Foreign Policy Similarity for Chance Agreement." Political Analysis 19(3): 287-305.
Tau-b is calculated by me and not Haege, and no additional citation (beyond citing the package) is necessary.
Additional citations depend on what particular measure of similarity you're using, whether Kendall's (1938) Tau-b, Signorino and Ritter's (1999) S, Cohen's (1960) kappa and Scott's (1955) pi. Haege (2011) is part of a chorus arguing against the use of S, though S measures are included in these data if you elect to ignore the chorus and use this measure. Likewise, Tau-b is in here, though it is not a good measure of dyadic foreign policy similarity for reasons that Signorino and Ritter (1999) mention. Haege (2011) argues for a chance-corrected measure of dyadic foreign policy similarity, either Cohen's (1960) kappa or Scott's (1955) pi.
Cohen, Jacob. 1960. "A Coefficient of Agreement for Nominal Scales." Educational and Psychological Measurement 20(1): 37-46.
Kendall, M.G. 1938. "A New Measure of Rank Correlation." Biometrika 30(1/2): 81–93.
Scott, William A. 1955. "Reliability of Content Analysis: The Case of Nominal Scale Coding." Public Opinion Quarterly 19(3): 321–5.
Signorino, Curtis S. and Jeffrey M. Ritter. "Tau-b or Not Tau-B: Measuring the Similarity of Foreign Policy Positions." 43(1): 115–44.
Haege (2011) also suggests you cite the underlying data informing the similarity measure, whether it is UN voting or alliances. In his case, he recommended a Voeten citation from 2013 and the alliance data proper. In the case of the alliances, I know Gibler's (2009) book is recommended even if the alliance data have since been updated (and reflected in this measure). In the UN voting data, my understanding is the 2017 paper in Journal of Conflict Resolution is also the preferred citation.
Bailey, Michael A., Anton Strezhnev, and Erik Voeten. 2017. "Estimating the Dynamic State Preferences from United Nations Voting Data." Journal of Conflict Resolution 61(2): 430–456.
Gibler, Douglas M. 2009. International Military Alliances, 1648-2008. Washington DC: CQ Press.
## Not run: # just call `library(tidyverse)` at the top of the your script. library(magrittr) # The function below works, but depends on # running `download_extdata()` beforehand. cow_ddy %>% add_fpsim() # Select just the two kappa measures that are suggested defaults. # `kappaba`: kappa for binary alliance data if you have pre-WWII data. # `kappavv`: kappa for UN voting data if you just post-WWII data. cow_ddy %>% add_fpsim(keep=c("kappaba", "kappavv")) ## End(Not run)
## Not run: # just call `library(tidyverse)` at the top of the your script. library(magrittr) # The function below works, but depends on # running `download_extdata()` beforehand. cow_ddy %>% add_fpsim() # Select just the two kappa measures that are suggested defaults. # `kappaba`: kappa for binary alliance data if you have pre-WWII data. # `kappavv`: kappa for UN voting data if you just post-WWII data. cow_ddy %>% add_fpsim(keep=c("kappaba", "kappavv")) ## End(Not run)
add_gml_mids()
merged in GML's MID data to a (dyad-year, leader-year,
leader-dyad-year, state-year) data frame. The current version of the GML MID
data is 2.2.1. However, this function is deprecated at the request of one of
the data set's maintainers, insisting that the use of the data be done
manually with design decisions made 100% at the discretion of the researcher.
The function now returns a stop communicating this development.
add_gml_mids(data, keep, init = "sidea-all-joiners")
add_gml_mids(data, keep, init = "sidea-all-joiners")
data |
a data frame with appropriate peacesciencer attributes |
keep |
an optional parameter, specified as a character vector,
applicable to just the dyad-year data, and passed to the function in a
If |
init |
how should initiators be coded? Applicable only to state-year,
leader-dyad-year, and leader-year data. This parameter accepts one of
three possible values ( |
Dyads are capable of having multiple disputes in a given year, which can create a problem for merging into a complete dyad-year data frame. Consider the case of France and Italy in 1860, which had three separate dispute onsets that year (MID#0112, MID#0113, MID#0306), as illustrative of the problem. This merging process employs several rules to whittle down these duplicate dyad-years for merging into a dyad-year data frame.
The function will also return a message to the user about the case-exclusion rules that went into this process. Users who are interested in implementing their own case-exclusion rules should look up the "whittle" class of functions also provided in this package.
Determining "initiation" for state-year summaries of inter-state disputes is possible since there is an implied directionality of "initiation." In about half of all cases, this is straightforward. You can use the participant summaries and determine that if the dispute was bilateral and the dispute did not escalate beyond an attack, the state on Side A initiated the dispute. For multilateral MIDs, these conditions still hold at least for originators. However, there is considerable difficulty for cases where 1) participant-level summaries suggested actions at the level of clash or higher, 2) the participant was a joiner and not an originator. The effort required to flesh this out is enormous, and perhaps forthcoming in a future update.
add_gml_mids()
allows you to make one of three judgment
calls here (see the arguments section of the documentation).
If it were my call to make, I would say you should probably use the option
"sidea-all-joiners"
. My review of the MID data with Doug Gibler
suggests most states that join a dispute are not roped into a conflict
(i.e. targeted by some other state) after the first incident. They
routinely initiate their entry into the conflict, which is
what this concept of "initiation" is supposed to capture in the
literature. There are no doubt cases where some third state is brought into
the dispute by the actions of some other state even as the original MID
coding rules place a high barrier on coding that type of dispute entry.
However, the time required to individually assess whether a state initiated
their entry into a MID under something other than the simplest of cases
(e.g. bilateral cases where the highest participant action fell short of
a clash) would be too time-consuming. It would require an audit of almost
half of all participant-level summaries in the data. In a forthcoming
publication, Gibler and Miller offer excellent coverage here with a
new data set on militarized events. However, this would include only
confrontations after World War II.
add_gml_mids()
now returns a stop communicating one of the maintainer's
requests to reject all software that facilitates the use of the data in this
fashion. Before, it took a (dyad-year, leader-year, leader-dyad-year,
state-year) data frame and adds dispute information from the GML MID data.
If the data are dyad-year, the return is a laundry list of information about
onsets, ongoing conflicts, and assorted participant- and dispute-level
summaries. If the data are leader-dyad-year, these are carefully matched to
leaders as well. If the data are state-year or leader-year, the function
returns information about ongoing disputes (and onsets) and whether there
were any ongoing disputes (and onsets) the state (or leader) initiated.
Steven V. Miller
Gibler, Douglas M., Steven V. Miller, and Erin K. Little. 2016. “An Analysis of the Militarized Interstate Dispute (MID) Dataset, 1816-2001.” International Studies Quarterly 60(4): 719-730.
## Not run: # just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_gml_mids() # keep just the dispute number and Side A/B identifiers cow_ddy %>% add_gml_mids(keep=c("dispnum","sidea1", "sidea2")) ## End(Not run)
## Not run: # just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_gml_mids() # keep just the dispute number and Side A/B identifiers cow_ddy %>% add_gml_mids(keep=c("dispnum","sidea1", "sidea2")) ## End(Not run)
add_gwcode_to_cow()
allows you to match, as well as one can, Gleditsch-Ward system membership data
with Correlates of War state system membership data.
add_gwcode_to_cow(data)
add_gwcode_to_cow(data)
data |
a data frame with appropriate peacesciencer attributes |
The data-raw
directory on the project's Github contains more
information about the underlying data that assists in merging in these
codes.
The user will invariably need to be careful and ask why they want these data included. The issue here is that both have a different composition and the merging process will not (and cannot) be perfect. We can note that a case like Serbia/Yugoslavia is not too difficult to handle (since "Serbia" never overlaps with "Yugoslavia" in the Gleditsch-Ward data and Correlates of War understands Serbia as the predecessor state, dominant state, and successor state to Yugoslavia). However, there is greater weirdness with a case like Yemen/Yemen Arab Republic. The script will not create state-year or dyad-year duplicates for the Correlates of War codes. The size of the original data remain unchanged. However, there will be some year duplicates for various Gleditsch-Ward codes (e.g. Yemen, again). Use with care. You can also use the countrycode package. Whether you use this function or the countrycode package, do not do this kind of merging without assessing the output.
add_gwcode_to_cow()
takes a (dyad-year, leader-year, leader-dyad-year, state-year)
data frame that already has Correlates of War
state system codes and adds their corollary Gleditsch-Ward codes.
Steven V. Miller
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_gwcode_to_cow() create_stateyears() %>% add_gwcode_to_cow()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_gwcode_to_cow() create_stateyears() %>% add_gwcode_to_cow()
add_igos()
allows you to add information from the Correlates of War International
Governmental Organizations data to dyad-year or state-year data, matching on Correlates of War system codes.
add_igos(data)
add_igos(data)
data |
a dyad-year data frame (either "directed" or "non-directed") or a state-year data frame. |
The function leans on attributes of the data that are provided by the create_dyadyear()
or
create_stateyear()
function. Make sure that function (or data created by that function) appear at the top
of the proverbial pipe.
add_igos()
takes a dyad-year data frame or state-year data frame and adds information available from
the Correlates of War International Governmental Organizations data. If the data are dyad-year, the function returns
the original data with just one additional column for the total number of mutual IGOs for which both members of the dyad
are full members. If the data are state-year, the function returns the original data with four additional columns. These are
the number of IGOs for which the state is a full member, the number of IGOs for which the state is an associate member, the
number of IGOs for which the state is an observer, and the number of IGOs for which the state is involved in any way (i.e. the
sum of the other three columns).
Steven V. Miller
Pevehouse, Jon C.W., Timothy Nordstron, Roseanne W McManus, and Anne Spencer Jamison. 2020. "Tracking Organizations in the World: The Correlates of War IGO Version 3.0 datasets." Journal of Peace Research 57(3): 492-503.
Wallace, Michael, and J. David Singer. 1970. "International Governmental Organization in the Global System, 1815-1964." International Organization 24: 239-87.
# just call library(tidyverse) at the top of the pipe library(magrittr) cow_ddy %>% add_igos() create_stateyears() %>% add_igos()
# just call library(tidyverse) at the top of the pipe library(magrittr) cow_ddy %>% add_igos() create_stateyears() %>% add_igos()
add_lead()
allows you to add some data recorded in the LEAD data to your leader-year or leader-dyad-year data.
add_lead(data, keep)
add_lead(data, keep)
data |
a leader-year or leader-dyad-year data frame |
keep |
an optional parameter, specified as a character vector, about what leader attributes
the user wants to return from this function. If |
add_lead()
takes a leader-year or leader-dyad-year data frame and adds some data recorded in the LEAD data to it.
For leader-dyad-year data, suffices of "1" and "2" are added to the data to indicate attributes of the first leader (obsid1
)
or the second leader (obsid2
), respectively.
Steven V. Miller
Ellis, Carli Mortenson, Michael C. Horowitz, and Allan C. Stam. 2015. "Introducing the LEAD Data Set." International Interactions 41(4): 718–741.
# just call `library(tidyverse)` at the top of the your script library(magrittr) create_leaderyears() %>% add_lead() create_leaderyears() %>% add_lead(keep = c("yrsexper"))
# just call `library(tidyverse)` at the top of the your script library(magrittr) create_leaderyears() %>% add_lead() create_leaderyears() %>% add_lead(keep = c("yrsexper"))
add_lwuf()
allows you to add estimates of leader
willingness to use force to leader-year data or leader-dyad-year data.
add_lwuf(data, keep)
add_lwuf(data, keep)
data |
a leader-year or leader dyad-year data frame as generated in peacesciencer |
keep |
an optional argument, specified as a character vector, of the variables from the |
See lwuf
for more information, but I'll copy-paste it here
too.
The letter published by Carter and Smith (2020) contains more information as to what these thetas refer. The "M1" theta is a variation of the standard Rasch model from the boilerplate information in the LEAD data. The authors consider this to be "theoretically relevant" or "risk-related" as these all refer to conflict or risk-taking. The "M2" theta expands on "M1" by including political orientation and psychological characteristics. "M3" and "M4" expand on "M1" and "M2" by considering all 36 variables in the LEAD data.
The authors construct and include all these measures, though their analyses
suggest "M2" is the best-performing measure. You should probably consider
using theta2_mean
as your default estimate of leader willingness
to use force in leader-year analyses.
add_lwuf()
takes a leader-year or leader-dyad-year data
frame and adds estimates of leader willingness to use force, as
generated by Carter and Smith (2020).
Steven V. Miller
Carter, Jeff and Charles E. Smith, Jr. 2020. "A Framework for Measuring Leaders' Willingness to Use Force." American Political Science Review 114(4): 1352–1358.
# just call `library(tidyverse)` at the top of the your script library(magrittr) create_leaderyears() %>% add_lwuf()
# just call `library(tidyverse)` at the top of the your script library(magrittr) create_leaderyears() %>% add_lwuf()
add_minimum_distance()
allows you to add the minimum
distance (in kilometers) to a (dyad-year, leader-year, leader-dyad-year, state-year) data frame. These estimates
are recorded in the cow_mindist
and gw_mindist
data that come with this package. The
data are current as of the end of 2019.
add_minimum_distance(data)
add_minimum_distance(data)
data |
a data frame with appropriate peacesciencer attributes |
The function leans on attributes of the data that are provided by one of the "create" functions
in this package (e.g. create_dyadyear()
or create_stateyear()
).
add_minimum_distance()
takes a (dyad-year, leader-year, leader-dyad-year, state-year) data frame and adds the
minimum distance between the first state and the second state (in dyad-year data) or the minimum
minimum (sic) distance for a given state in a given year.
Steven V. Miller
Schvitz, Guy, Luc Girardin, Seraina Ruegger, Nils B. Weidmann, Lars-Erik Cederman,
and Kristian Skrede Gleditsch. 2022. "Mapping The International System, 1886-2017:
The CShapes
2.0 Dataset." Journal of Conflict Resolution. 66(1): 144-161.
Weidmann, Nils B. and Kristian Skrede Gleditsch. 2010. "Mapping and Measuring Country Shapes: The cshapes
Package."
The R Journal 2(1): 18-24.
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_minimum_distance() create_dyadyears(system = "gw") %>% add_minimum_distance() create_stateyears(system ="gw") %>% add_minimum_distance()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_minimum_distance() create_dyadyears(system = "gw") %>% add_minimum_distance() create_stateyears(system ="gw") %>% add_minimum_distance()
add_nmc()
allows you to add the Correlates of War National Material Capabilities data to your data.
add_nmc(data)
add_nmc(data)
data |
a data frame with appropriate peacesciencer attributes |
Be mindful that the data are fundamentally state-year and that extensions to leader-level data should be understood as approximations for leaders in a given state-year.
add_nmc()
takes a (dyad-year, leader-year, leader-dyad-year, state-year) data frame and adds information
about the national material capabilities for the state or two states in the dyad in a given year.
If the data are dyad-year (or leader-dyad-year),
the function adds 12 total columns for the first state (i.e. ccode1
) and the
second state (i.e. ccode2
) for all estimates of national
military capabilities provided by the Correlates of War project. If the data are
state-year (or leader-year), the function returns six additional columns
to the original data that contain that same information for a given state in a given year.
Steven V. Miller
Singer, J. David, Stuart Bremer, and John Stuckey. (1972). "Capability Distribution, Uncertainty, and Major Power War, 1820-1965." in Bruce Russett (ed) Peace, War, and Numbers, Beverly Hills: Sage, 19-48.
Singer, J. David. 1987. "Reconstructing the Correlates of War Dataset on Material Capabilities of States, 1816-1985." International Interactions 14(1): 115-32.
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_nmc() create_stateyears() %>% add_nmc()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_nmc() create_stateyears() %>% add_nmc()
add_peace_years()
calculates peace years for your ongoing conflicts. The function
works for both dyad-year and state-year data generated in peacesciencer. As of the forthcoming
v. 0.7.0, add_peace_years()
will be deprecated for the more generic and versatile add_spells()
. Users
are free to continue with the function, though I recommend it only for more balanced panels (like state-year or dyad-year),
and less for imbalanced panels (like leader-years, or leader-dyad-years). As the change in name implies, add_spells()
will
have greater flexibility with both cross-sectional units and time.
add_peace_years(data, pad = FALSE)
add_peace_years(data, pad = FALSE)
data |
a dyad-year data frame (either "directed" or "non-directed") or state-year data frame |
pad |
an optional parameter, defaults to FALSE. If TRUE, the peace-year calculations fill in cases where panels are unbalanced/have gaps. Think of a state like Germany disappearing for 45 years as illustrative of this. |
The function internally uses sbtscs()
from stevemisc. In the interest of full disclosure,
sbtscs()
leans heavily on btscs()
from DAMisc. I optimized some code for performance.
Importantly, the underlying function (sbtscs()
in stevemisc, by way of btscs()
in DAMisc)
has important performance issues if you're trying to run it when your event data are sandwiched by observations
without any event data. Here's what I mean. Assume you got the full Gleditsch-Ward state-year data from 1816 to 2020
and then added the UCDP armed conflict data to it. If you want the peace-years for this, the function will fail because
every year from 1816 to 1945 (along with 2020, as of writing) have no event data. You can force the function to "not fail"
by setting pad = TRUE
as an argument, but it's not clear this is advisable for this reason. Assume you wanted event data
in UCDP for just the extrasystemic onsets. The data start in 1946 and, in 1946, the United Kingdom,
Netherlands, and France had extrasystemic conflicts. For all years before 1946, the events are imputed as 1
for those countries that had 1s in the first year of observation and everyone else is NA and implicitly assumed to be a zero.
For those NAs, the function runs a sequence resulting in some wonky spells in 1946 that are not implied by (the absence of) the
data. In fact, none of those are implied by the absence of data before 1946.
The function works just fine if you truncate your temporal domain to reflect the nature of your event data. Basically,
if you want to use this function more generally, filter your dyad-year or state-year data to make sure there are no years
without any event data recorded (e.g. why would you have a CoW-MID analyses of dyad-years with observations before 1816?). This
is less a problem when years with all-NAs succeed (and do not precede) the event data. For example, the UCDP conflict data
run from 1946 to 2019 (as of writing). Having 2020 observations in there won't compromise the function output when pad = TRUE
is included as an argument.
Finally, add_peace_years()
will only calculate the peace years and will leave the temporal dependence adjustment
to the taste of the researcher. Importantly, I do not recommend manually creating splines or square/cube terms because
it creates more problems in adjusting for temporal dependence in model predictions. In a regression formula in R,
you can specify the Carter and Signorino (2010) approach as
... + gmlmidspell + I(gmlmidspell^2) + I(gmlmidspell^3)
(assuming you ran add_peace_years()
on a dyad-year data frame
including the Gibler-Miller-Little conflict data).
The Beck et al. cubic splines approach is ... + splines::bs(gmlmidspell, 4)
. This function includes
the spell and three splines (hence the 4 in the command). Either approach makes for easier model predictions,
given R's functionality.
add_peace_years()
takes a dyad-year or state-year data frame and adds peace years for ongoing conflicts.
Dyadic conflict data supported include the Correlates of War (CoW) Militarized Interstate Dispute (MID) data set and the
Gibler-Miller-Little (GML) corrections to CoW-MID. State-level conflict data supported in this function include the UCDP
armed conflict data and the CoW intra-state war data.
Steven V. Miller
Armstrong, Dave. 2016. “DAMisc: Dave Armstrong's Miscellaneous Functions.” R package version 1.4-3.
Beck, Nathaniel, Jonathan N. Katz, and Richard Tucker. 1998. "Taking Time Seriously: Time-Series-Cross-Section Analysis with a Binary Dependent Variable." American Journal of Political Science 42(4): 1260–1288.
Carter, David B. and Curtis S. Signorino. 2010. "Back to the Future: Modeling Time Dependence in Binary Data." Political Analysis 18(3): 271–292.
Miller, Steven V. 2017. “Quickly Create Peace Years for BTSCS Models with sbtscs
in stevemisc
.”
http://svmiller.com/blog/2017/06/quickly-create-peace-years-for-btscs-models-with-stevemisc/
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_gml_mids(keep = NULL) %>% add_cow_mids(keep = NULL) %>% add_contiguity() %>% add_cow_majors() %>% filter_prd() %>% add_peace_years()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_gml_mids(keep = NULL) %>% add_cow_mids(keep = NULL) %>% add_contiguity() %>% add_cow_majors() %>% filter_prd() %>% add_peace_years()
add_rugged_terrain()
allows you to add information, however crude,
about the "ruggedness" of a state's terrain to your (dyad-year, leader-year, leader-dyad-year, state-year) data.
add_rugged_terrain(data)
add_rugged_terrain(data)
data |
a data frame with appropriate peacesciencer attributes |
Please see the information for the underlying data rugged
, and the
associated R script in the data-raw
directory, to see how these data are generated.
Importantly, these data are time-agnostic and move slowly. We're talking about geography here.
Both data sets benchmark around 1999-2000 and it's a leap of faith to use these data for comparisons
across the entirety of the Correlates of War or Gleditsch-Ward system membership. Every use of data
of these types have been either cross-sectional snapshots or for making state-to-state comparisons
after World War II (think of your prominent civil war studies here). Be mindful about what you expect
to get from these data.
The underlying data have both Gleditsch-Ward codes and Correlates of War codes. The merge it makes depends
on what you declare as the "master" system at the top of the pipe (e.g.. in create_dyadyears()
or
create_stateyears()
). If, for example, you run create_stateyears(system="cow")
and follow
it with add_gwcode_to_cow()
, the merge will be on the Correlates of War codes and not the Gleditsch-Ward
codes. You can see the script mechanics to see how this is achieved.
add_rugged_terrain()
takes a (dyad-year, leader-year, leader-dyad-year, state-year)
data frame, whether the primary state identifiers are from the Correlates of War
system or the Gleditsch-Ward system, and returns information about the
"ruggedness" of the state's terrain. The two indicators returned are the
"terrain ruggedness index" calculated by Nunn and Puga (2012) and a logarithmic
transformation of how mountainous the state is (as calculated by Gibler and Miller, 2014).
The dyad-year (leader-dyad-year) data get four additional columns (i.e. both indicators for both states
in the dyad) whereas the state-year data get just the two additional columns.
Steven V. Miller
Fearon, James D., and David Laitin, "Ethnicity, Insurgency, and Civil War" American Political Science Review 97: 75–90.
Gibler, Douglas M. and Steven V. Miller. 2014. "External Territorial Threat, State Capacity, and Civil War." Journal of Peace Research 51(5): 634-646.
Nunn, Nathan and Diego Puga. 2012. "Ruggedness: The Blessing of Bad Geography in Africa." Review of Economics and Statistics. 94(1): 20-36.
Riley, Shawn J., Stephen D. DeGloria, and Robert Elliot. 1999. "A Terrain Ruggedness Index That Quantifies Topographic Heterogeneity,” Intermountain Journal of Sciences 5: 23–27.
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_rugged_terrain() create_stateyears() %>% add_rugged_terrain() create_stateyears(system = "gw") %>% add_rugged_terrain()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_rugged_terrain() create_stateyears() %>% add_rugged_terrain() create_stateyears(system = "gw") %>% add_rugged_terrain()
add_sdp_gdp()
allows you to add estimated GDP and "surplus"
domestic product data from a 2020 analysis published in
International Studies Quarterly by Anders, Fariss, and Markowitz.
add_sdp_gdp(data)
add_sdp_gdp(data)
data |
a data frame with appropriate peacesciencer attributes |
The function leans on attributes of the data that are provided by one of the "create" functions. Make sure a recognized function (or data created by that function) appear at the top of the proverbial pipe. Users will also want to note that the underlying function access two different data sets. It appears that the results published in the International Studies Quarterly used Correlates of War classification, but a follow-up repository on Github uses Gleditsch-Ward classification. The extent to which these estimates are generated by simulation, it does mean the estimates will be slightly different across both data sets even for common observations (e.g. the United States in 1816).
Because these are large nominal numbers, the estimates have been log-transformed. Users can always exponentiate these if they choose. Researchers can use these data to construct reasonable estimates of surplus GDP per capita, but must exponentiate the underlying variables before doing this.
Be mindful that the data are fundamentally state-year and that extensions to leader-level data should be understood as approximations for leaders in a given state-year.
add_sdp_gdp()
takes a (dyad-year, leader-year, leader-dyad-year,
state-year) data frame and adds information about the estimated gross
domestic product (in 2011 USD) for that year, the estimated population
in that year, the GDP per capita in that year, and what Anders, Fariss
and Markowitz term the "surplus domestic product" in that year. If the
data are dyad-year (leader-dyad-year), the function adds eight total
columns for the first state (i.e. ccode1) and the second state
(i.e. ccode2) for all these estimates. If the data are state-year
(or leader-year), the function returns four additional columns to the
original data that contain that same information for a given state in
a given year.
Steven V. Miller
Anders, Therese, Christopher J. Fariss, and Jonathan N. Markowitz. 2020. "Bread Before Guns or Butter: Introducing Surplus Domestic Product (SDP)" International Studies Quarterly 64(2): 392–405.
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_sdp_gdp() create_stateyears() %>% add_sdp_gdp() create_stateyears(system = "gw") %>% add_sdp_gdp()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_sdp_gdp() create_stateyears() %>% add_sdp_gdp() create_stateyears(system = "gw") %>% add_sdp_gdp()
add_spells()
calculates "spells" in your state-year, leader-year,
or dyad-year data. The application here is mostly concerned with
things like "peace spells" between conflicts in a given cross-sectional
unit (e.g. a state or dyad).
add_spells(data, conflict_event_type = "ongoing", ongo = FALSE)
add_spells(data, conflict_event_type = "ongoing", ongo = FALSE)
data |
an applicable data frame (e.g. leader-year, dyad-year, state-year, as created in peacesciencer) |
conflict_event_type |
type of event for which spells should be calculated, either "ongoing" or "onset". Default is "ongoing". If "ongoing", the spells are calculated on the presence of an ongoing event. If "onset", spells are calculated on the onset of a conflict event with successive zeros (if observed) calculated as "peace". See Details section for more. |
ongo |
If TRUE, successive 1s are considered ongoing events and treated as NA after the first 1. If FALSE, successive 1s are all treated as failures. Defaults to FALSE. |
The function internally uses ps_spells()
from stevemisc. In
the interest of full disclosure, ps_spells()
leans heavily on
add_duration()
from spduration. I optimized some code
for performance.
Thinking of an application like peace-years, add_spells()
will
only calculate the peace years and will leave the temporal dependence
adjustment to the taste of the researcher. Importantly, I do not recommend
manually creating splines or square/cube terms because it creates more
problems in adjusting for temporal dependence in model predictions.
In a regression formula in R, you can specify the Carter and Signorino
(2010) approach as
... + gmlmidspell + I(gmlmidspell^2) + I(gmlmidspell^3)
(assuming
you ran add_spells()
on a dyad-year data frame including the
Gibler-Miller-Little conflict data). The Beck et al. cubic splines approach
is ... + splines::bs(gmlmidspell, 4)
. This function includes the
spell and three splines (hence the 4 in the command). Either approach
makes for easier model predictions, given R's functionality.
Thinking of our dyadic analyses of conflict, I've always understood
that something like "peace-years" should be calculated on the ongoing
event and not the onset of the event. Think of something like the
Iran-Iraq War (MID#2115) as illustrative here. The MID (which became
a war) started in 1980 and ended in 1988. There are no other bilateral
incidents between Iran-Iraq independent of the war, per Correlates of War
coding rules. If peace years are calculated at the "onset" of the event,
it would list peace-years between the two countries from 1981 to 1988.
I've never understood that to make sense, but still I've seen others insist
this is the correct way to do it. add_peace_years()
would force the
calculation on the ongoing event, which I still maintain is correct.
add_spells()
will allow you to calculate on onsets, even if
ongoing events are the default.
The underlying function for add_spells()
will stop without a return
if there are NAs bracketing observed events. The surest way
this will happen is if you're doing something like a dyad-year analysis
of inter-state conflicts from 1816 to 2010, but create_dyadyears()
created observations from 2011 to 2020 for you as well. Remove those
before using this function and confine the temporal domain to just those
time-units (e.g. years) for which there is observed event data.
See what I do in the example below.
add_spells()
takes a dyad-year, leader-year, or state-year data
frame and adds spells for ongoing conflicts. Dyadic conflict data supported
include the Correlates of War (CoW) Militarized Interstate Dispute (MID)
data set and the Gibler-Miller-Little (GML) corrections to CoW-MID.
State-level conflict data supported in this function include the UCDP
armed conflict data and the CoW intra-state war data. Leader-year
conflict data supported include the GML MID data.
Steven V. Miller
Beger, Andreas, Daina Chiba, Daniel W. Hill, Jr, Nils W. Metternich, Shahryar Minhas and Michael D. Ward. 2018. “spduration: Split-Population and Duration (Cure) Regression.” R package version 0.17.1.
Beck, Nathaniel, Jonathan N. Katz, and Richard Tucker. 1998. "Taking Time Seriously: Time-Series-Cross-Section Analysis with a Binary Dependent Variable." American Journal of Political Science 42(4): 1260–1288.
Carter, David B. and Curtis S. Signorino. 2010. "Back to the Future: Modeling Time Dependence in Binary Data." Political Analysis 18(3): 271–292.
# just call `library(tidyverse)` at the top of the your script library(magrittr) aaa <- subset(cow_ddy, year <= 2010) aaa %>% add_gml_mids(keep = NULL) %>% add_cow_mids(keep = NULL) %>% add_contiguity() %>% add_cow_majors() %>% filter_prd() %>% add_spells()
# just call `library(tidyverse)` at the top of the your script library(magrittr) aaa <- subset(cow_ddy, year <= 2010) aaa %>% add_gml_mids(keep = NULL) %>% add_cow_mids(keep = NULL) %>% add_contiguity() %>% add_cow_majors() %>% filter_prd() %>% add_spells()
add_strategic_rivalries()
merges in Thompson and Dreyer's (2012) strategic rivalry data
to a dyad-year data frame. The right-bound, as of right now, are bound at 2010.
add_strategic_rivalries(data, across_types = 1)
add_strategic_rivalries(data, across_types = 1)
data |
a dyad-year data frame (either "directed" or "non-directed") |
across_types |
optional, relevant for state-year, takes a value of 1, 2, or 3 to look for whether one of three types fits criteria for ideological, interventionary, positional, spatial rivalry. Defaults to 1. |
add_strategic_rivalries()
will include some other information derived from the rivalry data that the
user may not want (e.g. start year of the rivalry). Feel free to select those out after the fact. Function includes
an on-the-fly adjustment for Austria for rivalry #79. In this case, the Austria-Serbia rivalry continues for two years
after Austria-Hungary (ccode: 300
) became Austria (ccode: 305
).
The across_types
argument is optional and observed for only state-year calls. It defaults to 1. At the default,
the function looks into the rivalry data (in td_rivalries
) and focuses on the type1
column. If, say, a state has
an ongoing rivalry and it is primarily spatial, it codes that as a spatial rivalry. Assume you input across_types = 2
, the
function then looks across both the type1
and type2
columns to see if there is a spatial component to the rivalry as
either its primary or secondary dimension. If so, it codes that as a 1. across_types
must be 1, 2, or 3.
add_strategic_rivalries()
takes a dyad-year data frame and adds information about ongoing strategic rivalries. It
will also include a simple dummy variable for whether there was an ongoing rivalry in the year or not.
For state-year data, it returns the count of ongoing strategic rivalries for the state in the year meeting a certain criteria (i.e.
whether the state has an interventionary, ideological, positional, or spatial rivalry in an ongoing year, and how many).
Steven V. Miller
Miller, Steven V. 2019. "Create and Extend Strategic (International) Rivalry Data in R". URL: http://svmiller.com/blog/2019/10/create-extend-strategic-rivalry-data-r/
Thompson, William R. and David Dreyer. 2012. Handbook of International Rivalries. CQ Press.
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_strategic_rivalries() # across_types defaults to 1 create_stateyears() %>% add_strategic_rivalries()
# just call `library(tidyverse)` at the top of the your script library(magrittr) cow_ddy %>% add_strategic_rivalries() # across_types defaults to 1 create_stateyears() %>% add_strategic_rivalries()
add_ucdp_acd()
allows you to add UCDP Armed Conflict data to a state-year data frame
add_ucdp_acd(data, type, issue, only_wars = FALSE)
add_ucdp_acd(data, type, issue, only_wars = FALSE)
data |
state-year data frame |
type |
the types of armed conflicts the user wants to consider, specified as a character vector.
Options include "extrasystemic", "interstate", "intrastate", and "II". "II" is convenience shorthand for
"internationalized intrastate". If you want just one (say: "intrastate"), then the type you want in quotes
is sufficient. If you want multiple, wrap it in a vector with |
issue |
do you want to subset the data to just different armed conflicts over different types of issues?
If so, specify those here as you would with the |
only_wars |
subsets the conflict data to just those with intensity levels of "war" (i.e. >1,000 deaths). Defaults to FALSE. |
Right now, only state-year data are supported. It's worth saying
that "both" in the issue
argument should not be understood as
equivalent to c("territory","government")
. The former is a kind of
"AND" (in boolean speak) and is an explicit category in the data. The latter
is an "OR" (in boolean speak) and is in all likelihood what you want if you
are tempted to specify "both" in the issue
argument.
add_ucdp_acd()
takes a state-year data frame and returns state-year information from the
UCDP Armed Conflict data set (v. 20.1). The variables returned are whether there is an ongoing armed
conflict in that year, whether there was an armed conflict episode onset that year, what was the maximum
intensity observed that year (if an armed conflict was observed), and a character vector of the associated
conflict IDs that year.
Steven V. Miller
Gleditsch, Nils Petter; Peter Wallensteen, Mikael Eriksson, Margareta Sollenberg & Havard Strand (2002) Armed Conflict 1946–2001: A New Dataset. Journal of Peace Research 39(5): 615–637.
Pettersson, Therese; Stina Hogbladh & Magnus Oberg (2019). Organized violence, 1989-2018 and peace agreements. Journal of Peace Research 56(4): 589-603.
# just call `library(tidyverse)` at the top of the your script library(magrittr) library(dplyr) create_stateyears(system = "gw") %>% filter(between(year, 1946, 2019)) %>% add_ucdp_acd() create_stateyears(system = "gw") %>% filter(between(year, 1946, 2019)) %>% add_ucdp_acd(type = "intrastate", issue = "government")
# just call `library(tidyverse)` at the top of the your script library(magrittr) library(dplyr) create_stateyears(system = "gw") %>% filter(between(year, 1946, 2019)) %>% add_ucdp_acd() create_stateyears(system = "gw") %>% filter(between(year, 1946, 2019)) %>% add_ucdp_acd(type = "intrastate", issue = "government")
add_ucdp_onsets()
allows you to add information about conflict episode onsets from the UCDP
data program to state-year data.
add_ucdp_onsets(data)
add_ucdp_onsets(data)
data |
a state-year data frame |
The function leans on attributes of the data that are provided by the create_dyadyear()
or
create_stateyear()
function. Make sure that function (or data created by that function) appear at the top
of the proverbial pipe. The underlying data are version 19.1. Importantly, the UCDP yearly onset data are nominally state-year,
but technically state-dyad-episode-year for cases of onsets. For example, there are four France-1946 observations because of four
new conflict episodes with Cambodia, Laos, Thailand, and Vietnam. There are two Panama-1989 episodes, one for the invasion by
the United States and another for a failed coup attempt. That means the are duplicates in the original data that I process
into summaries. The user will probably want to consider some kind of recoding here.
add_ucdp_onsets()
takes a state-year data frame and adds a few summary
variables based off armed conflict onsets data provided by UCDP. The variables returned are
the sum of new conflict dyads (should they exist) in a given state-year, and the sum of new onset episodes (or new conflicts) that are
separated by one, two, three, five, or 10 years since the last conflict episode.
Steven V. Miller
Gleditsch, Nils Petter; Peter Wallensteen, Mikael Eriksson, Margareta Sollenberg & Havard Strand (2002) Armed Conflict 1946–2001: A New Dataset. Journal of Peace Research 39(5): 615–637.
Pettersson, Therese; Stina Hogbladh & Magnus Oberg (2019). Organized violence, 1989-2018 and peace agreements. Journal of Peace Research 56(4): 589-603.
# just call `library(tidyverse)` at the top of the your script library(magrittr) library(dplyr) create_stateyears(system="gw") %>% add_ucdp_onsets() create_stateyears() %>% add_gwcode_to_cow() %>% add_ucdp_onsets() # Recall, these are summaries. You'll need to post-process to what you want. create_stateyears(system="gw") %>% add_ucdp_onsets() %>% mutate(onset = ifelse(sumonset1 > 0, 1, 0))
# just call `library(tidyverse)` at the top of the your script library(magrittr) library(dplyr) create_stateyears(system="gw") %>% add_ucdp_onsets() create_stateyears() %>% add_gwcode_to_cow() %>% add_ucdp_onsets() # Recall, these are summaries. You'll need to post-process to what you want. create_stateyears(system="gw") %>% add_ucdp_onsets() %>% mutate(onset = ifelse(sumonset1 > 0, 1, 0))
These are leader-level data drawn from the Archigos data. Space considerations mean I offer here just a few columns based on these data. Data are version 4.1.
archigos
archigos
A data frame with 3409 observations on the following 11 variables.
gwcode
a numeric vector for the Gleditsch-Ward state code
obsid
a character vector for observation ID
leadid
the unique leader identifier
leader
the leader name
yrborn
the year the leader was born
gender
a categorical variable for leader gender ("M" for men, "W" for women)
startdate
a date for the leader start date
enddate
a date for the leader end date
entry
a character vector for the leader's entry type
exit
a character vector for the leader's exit type
exitcode
a character vector for more information about the leader's exit type
Space considerations mean I can only offer a few columns from the overall data. Archigos data are rich with information. Consult the raw data available on Hein Goeman's website for more.
To best conform with data requirements on CRAN, a few leader names were
renamed if they included irregular characters (e.g. umlauts or accents).
These leaders, in these particular applications, hav been renamed to "(Juan
Orlando) Hernandez" (HON-2014
), "(Antonio) Saca Gonzalez" (SAL-2004
),
"Julian Trujillo Largacha" (COL-1878
), "Cesar Gaviria Trujillo"
(COL-1990
), "Gabriel Garcia Moreno" (ECU-1869
), "Marcos A. Morinigo"
(PAR-1894-1
), "Higinio Morinigo" (PAR-1940
), "Sebastian Pinera"
(CHL-2010
), "Sauli Niinisto" (FIN-2012
), "Louis Gerhard De Geer"
(SWD-1876
), "Stefan Lofven" (SWD-2014
), "Lars Lokke Rasmussen"
(DEN-2009
, DEN-2015
), and "Fernando de Araujo" (ETM-2008-1
). None of
these names contain these special characters in the data here.
For clarity's sake, I renamed the ccode
column in the raw data to be
gwcode
. This is because it may deceive the user peeking into the data
that these are not Correlates of War state codes, but Gleditsch-Ward
state codes.
Goemans, Henk E., Kristian Skrede Gleditsch, and Giacomo Chiozza. 2009. "Introducing Archigos: A Dataset of Political Leaders" Journal of Peace Research 46(2): 269–83.
These are directed dyad-year-level data for alliance obligations and provisions from the ATOP project
atop_alliance
atop_alliance
A data frame with 272,046 observations on the following eight variables.
ccode1
a numeric vector for the Correlates of War state code for the first state
ccode2
a numeric vector for the Correlates of War state code for the second state
year
a numeric vector for the year
atop_defense
a numeric vector that equals 1 if there was an alliance observed with a defense pledge
atop_offense
a numeric vector that equals 1 if there was an alliance observed with a offense pledge
atop_neutral
a numeric vector that equals 1 if there was an alliance observed with a neutrality pledge
atop_nonagg
a numeric vector that equals 1 if there was an alliance observed with a non-aggression pledge
atop_consul
a numeric vector that equals 1 if there was an alliance observed with a consultation pledge
The data-raw
directory on the project's Github shows how the data were processed.
Leeds, Brett Ashley, Jeffrey M. Ritter, Sara McLaughlin Mitchell, and Andrew G. Long. 2002. Alliance Treaty Obligations and Provisions, 1815-1944. International Interactions 28: 237-60.
These are democracy data for all Correlates of War state system members.
ccode_democracy
ccode_democracy
A data frame with 16536 observations on the following 5 variables.
ccode
the Correlates of War system code
year
a numeric vector for the year
v2x_polyarchy
the Varieties of Democracy "polyarchy" estimate
polity2
the the polity2
score from the Polity project
xm_qudsest
an extension of the Unified Democracy Scores (UDS) estimates, made possibly by the QuickUDS
package from Xavier Marquez.
Missing data connote data that are unavailable for various reasons. Either there is no democracy data to code or, in the case of the Polity project, the state system member is outright not evaluated for the variable.
The Polity data are from 2017. The Varieties of Democracy data are version 10. Xavier Marquez' QuickUDS
estimates (i.e. extensions of Pemstein et al. (2010)) come from a package Marquez makes available on his Github (https://github.com/xmarquez/QuickUDS).
Coppedge, Michael, John Gerring, Carl Henrik Knutsen, Staffan I. Lindberg, Jan Teorell, David Altman, Michael Bernhard, M. Steven Fish, Adam Glynn, Allen Hicken, Anna Luhrmann, Kyle L. Marquardt, Kelly McMann, Pamela Paxton, Daniel Pemstein, Brigitte Seim, Rachel Sigman, Svend-Erik Skaaning, Jeffrey Staton, Agnes Cornell, Lisa Gastaldi, Haakon Gjerlow, Valeriya Mechkova, Johannes von Romer, Aksel Sundtrom, Eitan Tzelgov, Luca Uberti, Yi-ting Wang, Tore Wig, and Daniel Ziblatt. 2020. "V-Dem Codebook v10" Varieties of Democracy (V-Dem) Project.
Marshall, Monty G., Ted Robert Gurr, and Keith Jaggers. 2017. "Polity IV Project: Political Regime Characteristics and Transitions, 1800-2017." Center for Systemic Peace.
Marquez, Xavier, "A Quick Method for Extending the Unified Democracy Scores" (March 23, 2016). doi:10.2139/ssrn.2753830
Pemstein, Daniel, Stephen Meserve, and James Melton. 2010. "Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type." Political Analysis 18(4): 426-449.
This is a complete list of capitals and capital transitions for Correlates of War state system members. I
use it internally for calculating capital-to-capital distances in the add_capital_distances()
function.
cow_capitals
cow_capitals
A data frame with 252 observations on the following 7 variables.
ccode
a numeric vector for the Correlates of War state code
statenme
a character vector for the state
capital
a character vector for the name of the capital
styear
a character vector for the start year. See details section for more information.
endyear
a character vector for the end year. See details section for more information.
lat
a numeric vector of the latitude coordinates for the capital
lng
a numeric vector of the longitude coordinates for the capital
For convenience, the start year for most states is 1816. Samoa, for example, was not a state
in 1816. However, the functions that use the cow_capitals
data will not create observations for states
that did not exist at a given point in time.
The data should be current as of the end of 2020.
Cases where a start year is not 1816 indicate a capital transition. For example, Brazil's capital moved from Rio de Janeiro to Brasilia (a planned capital) in 1960. Only 25 states in the data experienced a capital transition. The most recent was Burundi in 2018. Indonesia, as of writing, is planning on a capital transition, but this has not been completed yet.
Kazakhstan renamed its capital for the state leader in 2019. These data retain the name of Astana. This will be changed in the event the software I use records this change.
The capitals data are not without some peculiarities. Prominently, Portugal transferred the Portuguese court from Lisbon to Rio de Janeiro from 1808 to 1821. This is recorded in the data. A knowledge of the inter-state conflict data will note there was no war or dispute between, say, Portugal and Spain (or Portugal and any other country) at any point during this time, but it does create some weirdness that would suggest a massive distance between two countries, like Portugal and Spain, that are otherwise land-contiguous.
On Spain: the republican government moved the capital at the start of the civil war (in 1936) to Valencia. However, it abandoned this capital by 1937. I elect to not record this capital transition.
The data also do some (I think) reasonable back-dating of capitals to coincide with states in transition without necessarily formal capitals by the first appearance in the state system membership data. These concern Lithuania, Kazakhstan, and the Philippines. Kaunas is the initial post-independence capital of Lithuania. Almaty is the initial post-independence capital of Kazakhstan. Quezon City is the initial post-independence capital of the Philippines. This concerns, at the most, one or two years for each of these three countries.
These contain an abbreviated version of the "master records" for the Correlates of War direct contiguity data. Data contain a few cosmetic changes to assist with some functions downstream from it.
cow_contdir
cow_contdir
A data frame with 2025840 observations on the following 4 variables.
ccode1
a numeric vector for the Correlates of War state code for the first state
ccode2
a numeric vector for the Correlates of War state code for the second state
conttype
a numeric vector for the contiguity relationship
begin
the year-month when this contiguity relationship begins (YYYYMM)
end
the year-month when this contiguity relationship ends (YYYYMM)
The "master record" provided by the Correlates of War is "non-directed." I make these data "directed" for convenience.
For clarity, the contiguity codes range from 1 to 5. 1 = direct land contiguity. 2 = separated by 12 miles of water or fewer (a la Stannis Baratheon). 3 = separated by 24 miles of water or fewer (but more than 12 miles). 4 = separated by 150 miles of water or fewer (but more than 24 miles). 5 = separated by 400 miles of water or fewer (but more than 150 miles). Cases of separation by more than 400 miles of water are not included in the master record (but are easily discerned based on complete dyad-year data).
Stinnett, Douglas M., Jaroslav Tir, Philip Schafer, Paul F. Diehl, and Charles Gochman (2002). "The Correlates of War Project Direct Contiguity Data, Version 3." Conflict Management and Peace Science 19 (2):58-66.
This is a complete directed dyad-year data frame of Correlates of War
state system members. I offer it here as a shortcut for various other functions when
I am working on new additions and don't want to invest time in waiting for
create_dyadyears()
to run. As a general rule, this data frame is
updated after every calendar year to include the most recently concluded
calendar year.
cow_ddy
cow_ddy
A data frame with the following 3 variables.
ccode1
a numeric vector for the Correlates of War state code for the first state
ccode2
a numeric vector for the Correlates of War state code for the second state
year
a numeric vector for the year
Data are a quick generation from the create_dyadyears()
function in this package.
This is a complete (I believe) data set on Correlates of War states and Gleditsch-Ward states, a byproduct
of a full_join()
between gw_states
and cow_states
that leans largely on the
state abbreviation variable.
cow_gw_years
cow_gw_years
A data frame with 16936 observations on the following 6 variables.
gwcode
a Gleditsch-Ward state code
stateabb
the state abbreviation, which was the greatest source of agreement between both data sets
gw_statename
the state name as it appears in the Gleditsch-Ward data
ccode
a Correlates of War state code
cow_statename
the state name as it appears in the Correlates of War data
year
a numeric vector for the year
The data-raw
directory on the project's Github contains more information about how these data were
created. I'm going to use it for internal stuff. The workflow is going to treat the Correlates of War state system
membership codes as more of the "master" codes, for which the user can add Gleditsch-Ward identifiers as they see
fit. Data are extended to 2020, assuming no changes to state system membership for either data set.
This is a non-directed dyad-year version of the Correlates of War IGOs data. I use it internally for merging IGOs data into dyad-year data.
cow_igo_ndy
cow_igo_ndy
A data frame with 917695 observations on the following 4 variables.
ccode1
the Correlates of War state system code for the first state
ccode2
the Correlates of War state system code for the second state
year
the year
dyadigos
the sum of mutual IGOs for which each state appears as a full member in a given year
The data-raw
directory on the project's Github contains additional information about how these data were generated from the otherwise
enormous dyad-year IGOs data provided by the Correlates of War project. Given the size of that data, and the size limitations of R packages for CRAN,
the data I provide here can only be simpler summaries. If you want specifics, you'll need to consult the underlying raw data provided on the Correlates
of War project.
Pevehouse, Jon C.W., Timothy Nordstrom, Roseanne W McManus, Anne Spencer Jamison, 2020. “Tracking Organizations in the World: The Correlates of War IGO Version 3.0 datasets”, Journal of Peace Research 57(3): 492-503.
Wallace, Michael, and J. David Singer. 1970. "International Governmental Organization in the Global System, 1815-1964." International Organization 24: 239-87.
This is a state-year version of the Correlates of War IGOs data. I use it internally for merging IGOs data into state-year data.
cow_igo_sy
cow_igo_sy
A data frame with 1557 observations on the following 5 variables.
ccode
the Correlates of War state system code for the state
year
the year
sum_igo_full
the sum of IGOs for which the state is a full member in a given year
sum_igo_associate
the sum of IGOs for which the state is just an associate member in a given year
sum_igo_observer
the sum of IGOs for which the state is just an observer in a given year
sum_igo_anytype
the sum of IGOs for which the state is a member of any kind in a given year.
The data-raw
directory on the project's Github contains additional information about how these data were generated from the otherwise
enormous dyad-year IGOs data provided by the Correlates of War project. Given the size of that data, and the size limitations of R packages for CRAN,
the data I provide here can only be simpler summaries. If you want specifics, you'll need to consult the underlying raw data provided on the Correlates
of War project.
Pevehouse, Jon C.W., Timothy Nordstrom, Roseanne W McManus, Anne Spencer Jamison. 2020. “Tracking Organizations in the World: The Correlates of War IGO Version 3.0 datasets”, Journal of Peace Research 57(3): 492-503.
Wallace, Michael, and J. David Singer. 1970. "International Governmental Organization in the Global System, 1815-1964." International Organization 24: 239-87.
These are the Correlates of War major powers data.
cow_majors
cow_majors
A data frame with 14 observations on the following 8 variables.
ccode
a numeric vector for the Correlates of War country code
styear
the start year as a major power
stmonth
the start month as a major power
stday
the start day as a major power
endyear
the end year as a major power
endmonth
the end month as a major power
endday
the end day as a major power
version
a version identifier
Data are provided "as-is" with no additional re-cleaning before inclusion into this data set (beyond eliminating the state abbreviation).
Correlates of War Project. 2017. "State System Membership List, v2016." Online, https://correlatesofwar.org/data-sets/state-system-membership/
These are directed dyadic dispute year data derived from the Correlates of War (CoW) Militarized Interstate Dispute (MID) project. Data are from version 5.0. These were whittled to where there is no duplicate dyad-years. Its primary aim here is merging into a dyad-year data frame.
cow_mid_ddydisps
cow_mid_ddydisps
A data frame with 10234 observations on the following 25 variables.
dispnum
a numeric vector for the CoW-MID dispute number
ccode1
a numeric vector for the focal state in the dyad
ccode2
a numeric vector for the target state in the dyad
year
a numeric vector for the dispute-year
cowmidongoing
a numeric vector for whether there was a dispute ongoing in that year
cowmidonset
a numeric vector for whether it was the onset of a new dispute (or new participant-entry into a recurring dispute)
sidea1
is ccode1
on side A of the dispute?
sidea2
is ccode2
on side A of the dispute?
fatality1
a numeric vector for the overall fatality level of ccode1
in the dispute
fatality2
a numeric vector for the overall fatality level of ccode2
in the dispute
fatalpre1
a numeric vector for the known fatalities (with precision) for ccode1
in the dispute
fatalpre2
a numeric vector for the known fatalities (with precision) for ccode2
in the dispute
hiact1
a numeric vector for the highest action of ccode1
in the dispute
hiact2
a numeric vector for the highest action of ccode2
in the dispute
hostlev1
a numeric vector for the hostility level of ccode1
in the dispute
hostlev2
a numeric vector for the hostility level of ccode2
in the dispute
orig1
is ccode1
an originator of the dispute?
orig2
is ccode2
an originator of the dispute?
fatality
a numeric vector for the fatality level of the dispute
hostlev
a numeric vector for the hostility level of the MID
mindur
a numeric vector for the minimum duration of the MID
maxdur
a numeric vector for the maximum duration of the MID
recip
a numeric vector for whether a MID was reciprocated
stmon
a numeric vector for the start month of the MID
The process of creating these is described at one of the references below. Importantly, these data are somewhat
"naive." That is: they won't tell you, for example, that Brazil and Japan never directly fought each other during World War II.
Instead, it will tell you that there were two years of overlap for the two on different sides of the conflict and that the highest
action for both was a war. The data are thus similar to what the EUGene
program would create for users back in the day. Use these
data with that limitation in mind.
Miller, Steven V. 2021. "How to (Meticulously) Convert Participant-Level Dispute Data to Dyadic Dispute-Year Data in R." URL: http://svmiller.com/blog/2021/05/convert-cow-mid-data-to-dispute-year/
Palmer, Glenn, and Roseanne W. McManus and Vito D'Orazio and Michael R. Kenwick and Mikaela Karstens and Chase Bloch and Nick Dietrich and Kayla Kahn and Kellan Ritter and Michael J. Soules. 2021. "The MID5 Dataset, 2011–2014: Procedures, coding rules, and description" Conflict Management and Peace Science.
These are directed dyadic dispute year data derived from the Correlates of War (CoW) Militarized Interstate Dispute (MID) project. Data are from version 5.0.
cow_mid_dirdisps
cow_mid_dirdisps
A data frame with 11390 observations on the following 18 variables.
dispnum
a numeric vector for the CoW-MID dispute number
ccode1
a numeric vector for the focal state in the dyad
ccode2
a numeric vector for the target state in the dyad
year
a numeric vector for the dispute-year
dispongoing
a numeric vector for whether there was a dispute ongoing in that year
disponset
a numeric vector for whether it was the onset of a new dispute (or new participant-entry into a recurring dispute)
sidea1
is ccode1
on side A of the dispute?
sidea2
is ccode2
on side A of the dispute?
fatality1
a numeric vector for the overall fatality level of ccode1
in the dispute
fatality2
a numeric vector for the overall fatality level of ccode2
in the dispute
fatalpre1
a numeric vector for the known fatalities (with precision) for ccode1
in the dispute
fatalpre2
a numeric vector for the known fatalities (with precision) for ccode2
in the dispute
hiact1
a numeric vector for the highest action of ccode1
in the dispute
hiact2
a numeric vector for the highest action of ccode2
in the dispute
hostlev1
a numeric vector for the hostility level of ccode1
in the dispute
hostlev2
a numeric vector for the hostility level of ccode2
in the dispute
orig1
is ccode1
an originator of the dispute?
orig2
is ccode2
an originator of the dispute?
The process of creating these is described at one of the references below. Importantly, these data are somewhat
"naive." That is: they won't tell you, for example, that Brazil and Japan never directly fought each other during World War II.
Instead, it will tell you that there were two years of overlap for the two on different sides of the conflict and that the highest
action for both was a war. The data are thus similar to what the EUGene
program would create for users back in the day. Use these
data with that limitation in mind.
Miller, Steven V. 2021. "How to (Meticulously) Convert Participant-Level Dispute Data to Dyadic Dispute-Year Data in R." URL: http://svmiller.com/blog/2021/05/convert-cow-mid-data-to-dispute-year/
Palmer, Glenn, and Roseanne W. McManus and Vito D'Orazio and Michael R. Kenwick and Mikaela Karstens and Chase Bloch and Nick Dietrich and Kayla Kahn and Kellan Ritter and Michael J. Soules. 2021. "The MID5 Dataset, 2011–2014: Procedures, coding rules, and description" Conflict Management and Peace Science.
This is an abbreviated version of the dispute-level CoW-MID data.
cow_mid_disps
cow_mid_disps
A data frame with 2436 observations on the following 7 variables.
dispnum
a numeric vector for the CoW-MID dispute number
outcome
a numeric vector for the outcome of the MID
styear
a numeric vector for the start year of the MID
stmon
a numeric vector for the start month of the MID
settle
a numeric vector for the how dispute was settled
fatality
a numeric vector for the fatality level of the dispute
mindur
a numeric vector for the minimum duration of the MID
maxdur
a numeric vector for the maximum duration of the MID
hiact
a numeric vector for the highest action of the MID
hostlev
a numeric vector for the hostility level of the MID
recip
a numeric vector for whether a MID was reciprocated
These data are purposely light on information; they're not intended to be used for dispute-level analyses, per se. They're intended to augment the directed dyadic dispute-year data by adding in variables that serve as exclusion rules to whittle the data from dyadic dispute-year to just dyad-year data.
Palmer, Glenn, and Roseanne W. McManus and Vito D'Orazio and Michael R. Kenwick and Mikaela Karstens and Chase Bloch and Nick Dietrich and Kayla Kahn and Kellan Ritter and Michael J. Soules. 2021. "The MID5 Dataset, 2011–2014: Procedures, coding rules, and description" Conflict Management and Peace Science.
These are non-directed dyad-year data for the minimum distance between states in the Correlates of War state system from 1886 to 2019. The data are generated from the cshapes package.
cow_mindist
cow_mindist
A data frame with 817053 observations on the following 4 variables.
ccode1
the Correlates of War state system code for the first state
ccode2
the Correlates of War state system code for the second state
year
the year
mindist
the minimum distance between states on Jan. 1 of the year, in kilometers
The data are generated from the cshapes package. Data are automatically generated (by default) as directed dyad-years. I elect to make them non-directed for space considerations. Making non-directed dyad-year data into directed dyad-year data isn't too difficult in R. It just looks weird to see the code that does it.
Previous versions of these data were for the minimum distance as of Dec. 31 of the referent year. These are now Jan. 1.
Most of the data I provide elsewhere in this package are to be understood as the data as they were at the start of
the year. This is how I process, for example, the capitals
data as they get merged in the add_capital_distance()
function. However, the script that generates these data are set at Jan. 1 of the year and not Dec. 31. Right now, the cshapes
does not appear to work on my system and I do not know why. Fortunately, the package authors made these data available.
Schvitz, Guy, Luc Girardin, Seraina Ruegger, Nils B. Weidmann, Lars-Erik Cederman,
and Kristian Skrede Gleditsch. 2022. "Mapping The International System, 1886-2017:
The CShapes
2.0 Dataset." Journal of Conflict Resolution. 66(1): 144-161.
Weidmann, Nils B. and Kristian Skrede Gleditsch. 2010. "Mapping and Measuring Country Shapes: The cshapes
Package." The R Journal 2(1): 18-24
These are version 6.0 of the Correlates of War National Military Capabilities data. Data omit the state abbreviation and version identifier for consideration.
cow_nmc
cow_nmc
A data frame with 15171 observations on the following 9 variables.
ccode
a numeric vector for the Correlates of War country code
year
the year
milex
an estimate of military expenditures (in thousands). See details section for more.
milper
an estimate of the size of military personnel (in thousands) for the state
irst
an estimate of iron and steel production (in thousands of tons)
pec
an estimate of primary energy consumption (thousands of coal-ton equivalents)
tpop
an estimate of the total population size of the state (in thousands)
upop
an estimate of the urban population size of the state (in thousands). See details section for more.
cinc
The Composite Index of National Capability ("CINC") score. See details section for more.
The user will want to be a little careful with how some of these data are used, beyond the typical caveat about how difficult it is to pin-point how many thousands of coal-tons a state like Baden was producing in the 19th century.
First, military expenditures are denominated in British pounds sterling for observations between 1816 and 1913. The observations from 1914 and beyond are denominated in current United States dollars. This is according to the manual.
Second, urban population size is an estimate based on, well, an estimate of the size of the population living in an area with 100,000 or more people.
Third, the Composite Index of National Capability score is calculated as each state's world share of each of the six composite indicators also included in the data in a given year. It theoretically is bound between 0 and 1. A state with a 1 is 100% responsible for 1) all of the military expenditures in the world, 2) is the only state with a military, 3) does all the iron and steel production, 4) all the world's primary energy consumption, and 5) is the only state in the world with a population and an urban population. Incidentally, the maximum scores observed in the data belong to the United States in 1945.
Singer, J. David, Stuart Bremer, and John Stuckey. (1972). "Capability Distribution, Uncertainty, and Major Power War, 1820-1965." in Bruce Russett (ed) Peace, War, and Numbers, Beverly Hills: Sage, 19-48.
Singer, J. David. 1987. "Reconstructing the Correlates of War Dataset on Material Capabilities of States, 1816-1985" International Interactions, 14: 115-32.
These are state-year level data for surplus and gross domestic product for Correlates of War state system members. Data also include population estimates for per capita standardization.
cow_sdp_gdp
cow_sdp_gdp
A data frame with 27753 observations on the following five variables.
ccode
a numeric vector for the Correlates of War state code
year
a numeric vector for the year
wbgdp2011est
a numeric vector for the estimated natural log of GDP in 2011 USD (log-transformed)
wbpopest
a numeric vector for the estimated population size (log-transformed)
sdpest
a numeric vector for the estimated surplus domestic product (log-transformed)
wbgdppc2011est
a numeric vector for the estimated GDP per capita (log-transformed)
These were extracted from the actual replication files from International Studies Quarterly. Because these data are ultimately being simulated, a user can expect some slight differences between the Correlates of War version of these data (which Anders et al. published) and the Gleditsch-Ward version of these data (which appear to be the one the authors will more vigorously support going forward).
Space considerations compel me to round these data to three decimal points. These "economic" data are routinely
the biggest in the package, and it's because of the decimal points. The justification for this is these data are estimated/simulated
anyways and the information loss is at the 1/1000th decimal point. This procedure basically cuts the size of the data to be less than
25% of its original size. The original simulations are available for remote download if you'd like. Type ?download_extdata()
for
more information.
Anders, Therese, Christopher J. Fariss, and Jonathan N. Markowitz. 2020. "Bread Before Guns or Butter: Introducing Surplus Domestic Product (SDP)" International Studies Quarterly 64(2): 392–405.
These are the Correlates of War state system membership data.
cow_states
cow_states
A data frame with 243 observations on the following 10 variables.
stateabb
a character vector for the state abbreviation
ccode
a numeric vector for the Correlates of War country code
statenme
a character vector for the state name
styear
the start year in the system
stmonth
the start month in the system
stday
the start day in the system
endyear
the end year in the system
endmonth
the end month in the system
endday
the end day in the system
version
a version identifier
Data are provided "as-is" with no additional re-cleaning before inclusion into this data set.
Correlates of War Project. 2017. "State System Membership List, v2016." Online, https://correlatesofwar.org/data-sets/state-system-membership/
These are state-year-level data for national trade from the Correlates of War project.
cow_trade_sy
cow_trade_sy
A data frame with 14410 observations on the following four variables.
ccode
the Correlates of War state system code
year
the year
imports
total imports of the state in current million USD
exports
total exports of the state in current million USD
The data-raw
directory on the project's Github shows how the data were processed.
Barbieri, Katherine and Omar M.G. Keshk. 2016. Correlates of War Project Trade Data Set Codebook, Version 4.0. Online: https://correlatesofwar.org
Barbieri, Katherine, Omar M.G. Keshk, and Brian Pollins. 2009. "TRADING DATA: Evaluating Our Assumptions and Coding Rules." Conflict Management and Peace Science, 26(5): 471-491.
These are a modified version of the inter-state war data from the Correlates of War project. Data are version 4.0. The temporal domain is 1816-2007. Data are functionally directed dyadic war-year.
cow_war_inter
cow_war_inter
A data frame with 1932 observations on the following 15 variables.
warnum
the Correlates of War war number
ccode1
the Correlates of War state code for side1
ccode2
the Correlates of War state code for side2
year
a numeric vector for the year
cowinteronset
a dummy variable for whether this is an inter-state war onset (i.e. either the year in StartYear1
or StartYear2
in the raw data)
cowinterongoing
a numeric constant of 1
sidea1
a numeric vector for the side in the war for ccode1
, either 1 or 2
sidea2
a numeric vector for the side in the war for ccode2
, either 1 or 2
initiator1
a dummy variable that equals 1 if ccode1
initiated the war
initiator2
a dummy variable that equals 1 if ccode2
initiated the war
outcome1
the outcome for ccode1
as numeric vector. Outcomes are 1 (winner), 2 (loser), 3 (compromise/tied),
4 (transformed into another type of war), 5 (ongoing at end of 2007, which is not observed in these data), 6 (stalemate),
7 (conflict continues below severity of war), and 8 (changed sides)
outcome2
the outcome for ccode2
as numeric vector. Outcomes are 1 (winner), 2 (loser), 3 (compromise/tied),
4 (transformed into another type of war), 5 (ongoing at end of 2007, which is not observed in these data), 6 (stalemate),
7 (conflict continues below severity of war), and 8 (changed sides)
batdeath1
the estimated deaths for ccode1
(-9 = unknown)
batdeath2
the estimated deaths for ccode2
(-9 = unknown)
resume
a dummy variable that equals 1 if this is a conflict resumption episode
See data-raw
directory for how these data were generated. These data are here if you want it, but I caution against using them
as gospel. There are a few problems here. One: -9s proliferate the data for battle deaths on either side, which is unhelpful. There are 10 cases where the sum
of battle deaths is exactly 1,000 or 1,001. This is suspicious. The "side" variables are not well-explained—in fact they're not explained at all in the codebook—
and this can lead a user astray if they want to interpret them analogous to the sidea
variables in the Correlates of War Militarized Interstate Dispute
data. You probably want to use the initiator variables for this. Further, the war data routinely betray the MID data and the two do not speak well to each other. The language Sarkees and Wayman (2010) use in their book
talk about how MIDs "precede" a war or are "associated" with a war, which forgets the war data are supposed to be a subset of the MID data. In one case (Gulf War),
they get the associated dispute number wrong and, in one prominent case (War of Bosnian Independence), they argue no MID exists at all (it's actually MID#3557).
Sarkees, Meredith Reid, and Frank Wheldon Wayman. 2010. Resort to War: A Data Guide to Inter-State, Extra-State, Intra-State, and Non-State Wars, 1816-2007. Washington DC: CQ Press.
These are a modified version of the intra-state war data from the Correlates of War project. Data are version 4.1. The temporal domain is 1816-2007.
cow_war_intra
cow_war_intra
A data frame with 1361 observations on the following 17 variables.
warnum
the Correlates of War war number
warname
the Correlates of War war name
wartype
a character vector for the type of war, either "local issues" or "central control"
year
a numeric vector for the year
cowintraonset
a dummy variable for whether this is a civil war onset (i.e. either the year in StartYear1
or StartYear2
in the raw data)
cowintraongoing
a numeric constant of 1
resume_combat
a dummy variable for whether this is a resumption of a conflict (i.e. StartYear2
is not -8)
primary_state
a dummy variable for whether the state is the primary state having the civil war
ccodea
the Correlates of War state code for the participant on Side A. -8 = not applicable (participant is not a state)
sidea
the name of the participant on Side A. -8 = not applicable (no additional party on this side)
ccodeb
the Correlates of War state code for the participant on Side B. -8 = not applicable (participant is not a state)
sideb
the name of the participant on Side B. -8 = not applicable (no additional party on this side)
intnl
a dummy variable for if this is an internationalized civil war
outcome
an unordered-categorical variable for the outcome of the civil war. Values include 1 (Side A wins), 2 (Side B wins), 3 (Compromise), 4 (war transformed into another type of war), 5 (war is ongoing at the end of 2007), 6 (stalemate), 7 (conflict continues below severity of war)
sideadeaths
the estimated deaths for the Side A participant (-9 = unknown, -8 = not applicable)
sidebdeaths
the estimated deaths for the Side B participant (-9 = unknown, -8 = not applicable)
ongo2007
a dummy variable for if this war is ongoing as of the end of 2007
See data-raw
directory for how these data were generated. In the Guinnea-Bissau Civil War (1998, 1999),
the "Mane Junta" have the accented-e scrubbed to coincide with CRAN's character requirements.
Dixon, Jeffrey, and Meredith Sarkees. 2016. A Guide to Intra-State Wars: An Examination of Civil Wars, 1816-2014. Thousand Oaks, CA: Sage.
Sarkees, Meredith Reid, and Frank Wheldon Wayman. 2010. Resort to War: A Data Guide to Inter-State, Extra-State, Intra-State, and Non-State Wars, 1816-2007. Washington DC: CQ Press.
create_dyadyears()
allows you to dyad-year data from
either the Correlates of War (CoW
) state system membership data or the
Gleditsch-Ward (gw
) system membership data. The function leans on internal
data provided in the package.
create_dyadyears(system = "cow", mry = TRUE, directed = TRUE, subset_years)
create_dyadyears(system = "cow", mry = TRUE, directed = TRUE, subset_years)
system |
a character specifying whether the user wants Correlates of War state-years ("cow") or Gleditsch-Ward ("gw") state-years. Correlates of War is the default. |
mry |
optional, defaults to TRUE. If TRUE, the function extends the script
beyond the most recent system membership updates to include observation to the
most recently concluded calendar year. For example, the Gleditsch-Ward data extend
to the end of 2017. When |
directed |
optional, defaults to TRUE. If TRUE, the function returns so-called "directed" dyad-year data. In directed dyad-year data, France-Germany (220-255) and Germany-France (255-220) are observationally different. If FALSE, the function returns non-directed data. In non-directed data, France-Germany and Germany-France in the same year are the same observation. The standard here is to drop cases where the country code for the second observation is less than the country code for the first observation. |
subset_years |
and optional character vector for subsetting the years
returned to just some temporal domain of interest to the user. For example,
|
create_dyadyears()
takes state system membership data provided
by either Correlates of War or Gleditsch-Ward and returns a dyad-year data frame.
Steven V. Miller
Miller, Steven V. 2019. “Create Country-Year and (Non)-Directed Dyad-Year Data With Just a Few Lines in R” http://svmiller.com/blog/2019/01/create-country-year-dyad-year-from-country-data/
# CoW is default, will include years beyond 2016 (most recent CoW update) create_dyadyears() # Gleditsch-Ward, include most recent years create_dyadyears(system="gw") # Gleditsch-Ward, don't include most recent years create_dyadyears(system="gw", mry=FALSE) # Gleditsch-Ward, don't include most recent years, directed = FALSE create_dyadyears(system="gw", mry=FALSE, directed = FALSE)
# CoW is default, will include years beyond 2016 (most recent CoW update) create_dyadyears() # Gleditsch-Ward, include most recent years create_dyadyears(system="gw") # Gleditsch-Ward, don't include most recent years create_dyadyears(system="gw", mry=FALSE) # Gleditsch-Ward, don't include most recent years, directed = FALSE create_dyadyears(system="gw", mry=FALSE, directed = FALSE)
create_leaderdays()
allows you to generate leader-day data from
leader-level data provided in peacesciencer.
create_leaderdays(system = "archigos", standardize = "none")
create_leaderdays(system = "archigos", standardize = "none")
system |
a leader system with which to create leader-days. Right now, only "archigos" is supported. |
standardize |
a character vector of length one: "cow", "gw", or "none". If "cow", the function standardizes the leader-days to
just those that overlap with state system membership in the Correlates of War state system (see: |
create_leaderdays()
, as of writing, only supports the Archigos data set of leaders. I envision this function
being mostly for internal uses. Basically, create_leaderyears()
effectively starts by first running a version of
create_leaderdays()
. So, why not have this function too?
create_leaderdays()
takes leader-level data available in peacesciencer
and returns a leader-day-level data frame.
Steven V. Miller
Goemans, Henk E., Kristian Skrede Gleditsch, and Giacomo Chiozza. 2009. "Introducing Archigos: A Dataset of Political Leaders" Journal of Peace Research 46(2): 269–83.
create_leaderdays() create_leaderdays(standardize = "gw")
create_leaderdays() create_leaderdays(standardize = "gw")
create_leaderdyadyears()
allows you to created leader dyad-year data from the
Archigos data first introduced and described by Goemans et al. (2009).
create_leaderdyadyears(directed = TRUE, system = "gw")
create_leaderdyadyears(directed = TRUE, system = "gw")
directed |
optional, defaults to TRUE. If TRUE, the function returns so-called "directed" leader dyad-year data. If FALSE, the function returns non-directed data where the state codes for the second leader are all greater than the state codes for the second leader. |
system |
a character specifying whether the user wants Correlates of War state-years ("cow") or Gleditsch-Ward ("gw") state-years. Gleditsch-Ward is the default. |
This is a complete and universal leader dyad-year data frame for all possible dyadic
leader pairings from 1870 to 2015. This has several implications. First: these data are enormous. The
output is over 2 million rows long! Second: the time required to create these data from scratch would
take too long for a normal function call. This amounts to an unholy combination of data that are too large
for CRAN's disk space restrictions (5 MB) and too time-consuming to do from scratch every time. Thus, the
data are pre-generated and stored remotely. Check download_extdata()
for more information.
create_leaderdyadyears()
takes remote data available for separate download
and returns a complete leader dyad-year data frame for all leaders, and all possible dyads,
from 1870 to 2015.
Steven V. Miller
Goemans, Henk E., Kristian Skrede Gleditsch, and Giacomo Chiozza. 2009. "Introducing Archigos: A Dataset of Political Leaders" Journal of Peace Research 46(2): 269–83.
## Not run: # download_extdata() # ^ make sure you've run this first. # default is directed create_leaderdyadyears() # non-directed create_leaderdyadyears(directed = FALSE) ## End(Not run)
## Not run: # download_extdata() # ^ make sure you've run this first. # default is directed create_leaderdyadyears() # non-directed create_leaderdyadyears(directed = FALSE) ## End(Not run)
create_leaderyears()
allows you to generate leader-year
data from leader-level data provided in peacesciencer
create_leaderyears(system = "archigos", standardize = "none", subset_years)
create_leaderyears(system = "archigos", standardize = "none", subset_years)
system |
a leader system with which to create leader-years. Right now, only "archigos" is supported. |
standardize |
a character vector of length one: "cow", "gw", or "none".
If "cow", the function standardizes the leader-years to just those that
overlap with state system membership in the Correlates of War state
system (see: |
subset_years |
and optional character vector for subsetting the years
returned to just some temporal domain of interest to the user. For example,
|
create_leaderyears()
, as of writing, only supports the
Archigos data set of leaders.
Many leader ages are known with precision. Many are not recorded in the Archigos data. Knowing well that years are aggregates of days, the leader age variable that gets returned in this output should be treated as an approximation of the leader's age.
Be mindful that leader tenure is calculated before any standardization argument. Archigos has some leader entries that precede the state system entry for the state, or otherwise do not coincide with state system dates. For example, Lynden Pindling was in his seventh year as leader of The Bahamas (in various titles) before independence in 1973 (in which he became prime minister). Leader tenure is not tethered to state system dates in situations like this (only the dates recorded in the Archigos data).
The leader tenure variable returned here does have the odd effect of
potentially misstating leader tenure, or at least making it seem unusual.
For example, Jimmy Carter (USA-1877
) was president in 1977 (year 1),
1978 (year 2), 1979 (year 3), 1980 (year 4), and exited in January 1981
(year 5). Again: years are aggregates of days and it's not evident how else
this information should be perfectly communicated with that in mind. Users
with some R skills can extract the underlying information from the
archigos
data and, perhaps, calculate something like the maximum
leader tenure (in days) on either Dec. 31 of the referent year, or leader
exit before Dec. 31 that year, or something to that effect. No matter, I
think this to at least be a defensible variable to present to the user
with those limitations in mind. If the user is interested in leader tenure
in a leader-year analysis, this variable should be fine. If the user is
interested in something like the effect of a fifth year on some kind of
leader behavior, they will want to figure out something else.
create_leaderyears()
takes leader-level data available in
peacesciencer and returns a leader-year-level data frame. This minimal
output contains the observation ID from Archigos, the year, the state code
for the leader (i.e. either Correlates of War or Gleditsch-Ward, depending
on the standardize
argument), the leader's name in Archigos (if it
may help the reader to have that), an approximation of the leader's age,
and the year in office for the leader (as a running count, starting at 1).
Steven V. Miller
Goemans, Henk E., Kristian Skrede Gleditsch, and Giacomo Chiozza. 2009. "Introducing Archigos: A Dataset of Political Leaders" Journal of Peace Research 46(2): 269–83.
# standardize = 'none' is default create_leaderyears() create_leaderyears(standardize = 'gw')
# standardize = 'none' is default create_leaderyears() create_leaderyears(standardize = 'gw')
create_statedays()
allows you to create state-day data from
either the Correlates of War (CoW
) state system membership data or the
Gleditsch-Ward (gw
) system membership data. The function leans on internal
data provided in the package.
create_statedays(system = "cow", mry = TRUE)
create_statedays(system = "cow", mry = TRUE)
system |
a character specifying whether the user wants Correlates of War state-years ("cow") or Gleditsch-Ward ("gw") state-years. Correlates of War is the default. |
mry |
optional, defaults to TRUE. If TRUE, the function extends the script
beyond the most recent system membership updates to include observation to the
most recently concluded calendar year. For example, the Gleditsch-Ward data extend
to the end of 2017. When |
create_statedays()
takes state system membership data provided
by either Correlates of War or Gleditsch-Ward and returns a simple state-day
data frame.
Steven V. Miller
Miller, Steven V. 2019. “Create Country-Year and (Non)-Directed Dyad-Year Data With Just a Few Lines in R” http://svmiller.com/blog/2019/01/create-country-year-dyad-year-from-country-data/
# CoW is default, will include years beyond 2016 (most recent CoW update) create_statedays() # Gleditsch-Ward, include most recent years create_statedays(system="gw") # Gleditsch-Ward, don't include most recent years create_statedays(system="gw", mry=FALSE)
# CoW is default, will include years beyond 2016 (most recent CoW update) create_statedays() # Gleditsch-Ward, include most recent years create_statedays(system="gw") # Gleditsch-Ward, don't include most recent years create_statedays(system="gw", mry=FALSE)
create_stateyears()
allows you to generate state-year data from
either the Correlates of War (CoW
) state system membership data or the
Gleditsch-Ward (gw
) system membership data. The function leans on internal
data provided in the package.
create_stateyears(system = "cow", mry = TRUE, subset_years)
create_stateyears(system = "cow", mry = TRUE, subset_years)
system |
a character specifying whether the user wants Correlates of War state-years ("cow") or Gleditsch-Ward ("gw") state-years. Correlates of War is the default. |
mry |
optional, defaults to TRUE. If TRUE, the function extends the script
beyond the most recent system membership updates to include observation to the
most recently concluded calendar year. For example, the Gleditsch-Ward data extend
to the end of 2017. When |
subset_years |
and optional character vector for subsetting the years
returned to just some temporal domain of interest to the user. For example,
|
create_stateyears()
takes state system membership data provided
by either Correlates of War or Gleditsch-Ward and returns a simple state-year
data frame.
Steven V. Miller
Miller, Steven V. 2019. “Create Country-Year and (Non)-Directed Dyad-Year Data With Just a Few Lines in R” http://svmiller.com/blog/2019/01/create-country-year-dyad-year-from-country-data/
# CoW is default, will include years beyond 2016 (most recent CoW update) create_stateyears() # Gleditsch-Ward, include most recent years create_stateyears(system="gw") # Gleditsch-Ward, don't include most recent years create_stateyears(system="gw", mry=FALSE)
# CoW is default, will include years beyond 2016 (most recent CoW update) create_stateyears() # Gleditsch-Ward, include most recent years create_stateyears(system="gw") # Gleditsch-Ward, don't include most recent years create_stateyears(system="gw", mry=FALSE)
This is a data set with state-year estimates for ethnic and religious fractionalization/polarization, by way of the Composition of Religious and Ethnic Groups (CREG) project at the University of Illinois. I-L-L.
creg
creg
A data frame with 11523 observations on the following 9 variables.
ccode
a Correlates of War state code
gwcode
a Gleditsch-Ward state code
creg_ccode
a numeric code for the state, mostly patterned off Correlates of War codes but with important differences. See details section for more.
year
the year
ethfrac
an estimate of the ethnic fractionalization index. See details for more.
ethpol
an estimate of the ethnic polarization index. See details for more.
relfrac
an estimate of the religious fractionalization index. See details for more.
relpol
an estimate of the religious polarization index. See details for more.
The data-raw
directory on the project's Github contains more information about how these data were
created. Pay careful attention to how I assigned CoW/G-W codes. The underlying data are version 1.02.
The state codes provided by the CREG project are mostly Correlates of War codes, but with some differences. Summarizing these differences: the state code for Serbia from 1992 to 2013 is actually the Gleditsch-Ward code (340). Russia after the dissolution of the Soviet Union (1991-onward) is 393 and not 365. The Soviet Union has the 365 code. Yugoslavia has the 345 code. The code for Yemen (678) is effectively the Gleditsch-Ward code because it spans the entire post-World War II temporal domain. Likewise, the code for post-unification Germany is the Gleditsch-Ward code (260) as well. The codebook actually says it's 265 (which would be East Germany's code), but this is assuredly a typo based on the data.
The codebook cautions there are insufficient data for ethnic group estimates for Cameroon, France, India, Kosovo, Montenegro, Mozambique, and Papua New Guinea. The French case is particularly disappointing but the missing data there are a function of both France's constitution and modelling issues for CREG (per the codebook). There are insufficient data to make religious group estimates for China, North Korea, and the short-lived Republic of Vietnam.
The fractionalization estimates are the familiar Herfindahl-Hirschman concentration index. The polarization formula comes by way of Montalvo and Reynal-Querol (2000), though this book does not appear to be published beyond its placement online. I recommend Montalvo and Reynal-Querol (2005) instead. You can cite Alesina (2003) for the fractionalization measure if you'd like.
In the most literal sense of "1", the group proportions may not sum to exactly 1 because of rounding in the
data. There were only two problem cases in these data worth mentioning. First, in both data sets, there would
be the occasional duplicates of group names by state-year (for example: Afghanistan in 1951 in the ethnic group
data and the United States in 1948 in the religious group data). In those cases, the script I make available
in the data-raw
directory just select distinct values and that effectively fixes the problem of duplicates,
where they do appear. Finally, Costa Rica had a curious problem for most years in the religious group data. All
Costa Rica years have group data for Protestants, Roman Catholics, and "others." Up until 1964 or so, the "others"
are zero. Afterward, there is some small proportion of "others". However, the sum of Protestants, Roman Catholics, and
"others" exceeds 1 (pretty clearly) and the difference between the sum and 1 is entirely the "others." So, I drop
the "others" for all years. I don't think that's terribly problematic, but it's worth saying that's what I did.
Alesina, Alberto, Arnaud Devleeschauwer, William Easterly, Sergio Kurlat and Romain Wacziarg. 2003. "Fractionalization". Journal of Economic Growth 8: 155-194.
Montalvo, Jose G. and Marta Reynal-Querol. 2005. "Ethnic Polarization, Potential Conflict, and Civil Wars" American Economic Review 95(3): 796–816.
Nardulli, Peter F., Cara J. Wong, Ajay Singh, Buddy Petyon, and Joseph Bajjalieh. 2012. The Composition of Religious and Ethnic Groups (CREG) Project. Cline Center for Democracy.
declare_attributes()
allows the user to
declare peacesciencer-specific attributes to data they
bring from outside the package. This allows the user to use
package functions as shortcuts, where appropriate.
declare_attributes(data, data_type, system, conflict_type)
declare_attributes(data, data_type, system, conflict_type)
data |
a data frame for which you want peacesciencer-specific attributes |
data_type |
optional, but a character vector of length 1 coinciding with the type of data the user believes the data frame is. Options include: 'dyad_year', 'leader_day', 'leader_year', 'leader_dyad_year', 'state_day', or 'state_year'. |
system |
optional, but a character vector of length 1 coinciding with the state system of the data. If specified at all, must be 'cow' or 'gw'. |
conflict_type |
optional, and applicable to just conflict data and the "whittle" class functions in peacesciencer. If specified, must be a character vector of length 1 that is either 'cow' or 'gml'. |
The function's documentation will include what attributes are available to be declared. No doubt, the list of potential attributes will grow in time, but the attributes that can be declared are limited to just what I've built into the package to this point. Users cannot declare more than one attribute of a given type (i.e. a user cannot declare the system to be both Correlates of War and Gleditsch-Ward).
The idea here is, basically, to allow the user to use functions in peacesciencer for data they have created or have acquired from elsewhere. However, this functions provides no assurances about quality control in the various merges built elsewhere into this package. This package aggressively tests functions for data generated in-house. If your outside data have merges, the various "add" functions may not perfectly perform. There is no real way I can control for this since the data are coming from outside the package and not through one of the "create" functions. In your particular case, that may not be much of a problem. However, it's the user's responsibility to do their own quality control in this situation.
declare_attributes()
takes a data frame and
adds peacesciencer-specific attributes to the data frame.
This will allow the user to take advantage of many of the
functions in this package without starting the process with one
of the "create" functions. If nothing is declared in the function,
no attribute is added and the function just returns the original
data without any change.
Steven V. Miller
# just call `library(tidyverse)` at the top of the your script library(magrittr) data.frame(ccode = 2, year = c(1816:1830)) -> usa_years usa_years %>% declare_attributes(data_type = 'state_year', system = 'cow')
# just call `library(tidyverse)` at the top of the your script library(magrittr) data.frame(ccode = 2, year = c(1816:1830)) -> usa_years usa_years %>% declare_attributes(data_type = 'state_year', system = 'cow')
download_extdata()
leverages R's inst
directory
flexibility to allow you to download some extra data and store it in
the package.
download_extdata(overwrite = FALSE)
download_extdata(overwrite = FALSE)
overwrite |
logical, defaults to FALSE. If FALSE, the function checks to see if you've already downloaded the data and, if you already have, it does nothing. If TRUE, the function redownloads the data. |
download_extdata()
downloads some extra data stored on
my website (http://svmiller.com) and sticks them in the extdata
directory in the package.
Running download_extdata()
returns the following data that will be
stored in the package's extdata
directory.
These are directed dyad-year-level data for dyadic trade from the Correlates of War project. The trade values presented here have been rounded to three decimal points to conserve space. The data downloaded by this function are about 4.1 megabytes in size.
COLUMN | DESCRIPTION |
ccode1 |
a numeric vector for the Correlates of War state code for the first state |
ccode2 |
a numeric vector for the Correlates of War state code for the second state |
year |
the year |
flow1 |
imports of ccode1 from ccode2 , in current million USD |
flow2 |
imports of ccode2 from ccode1 , in current million USD |
smoothflow1 |
smoothed flow1 values |
smoothflow2 |
smoothed flow2 values |
These are all directed leader dyad-year data from 1870-2015. Data come from the Archigos data (version 4.1). The data are standardized to just those observations where both leaders and states appear in the CoW state system data. The data downloaded by this function are about 2 megabytes in size.
COLUMN | DESCRIPTION |
year |
the year |
obsid1 |
the unique Archigos (v. 4.1) observation ID for the first leader |
obsid2 |
the unique Archigos (v. 4.1) observation ID for the second leader |
ccode1 |
a numeric vector for the Correlates of War state code for the first state |
ccode2 |
a numeric vector for the Correlates of War state code for the second state |
gender1 |
the gender of obsid1 ("M" or "F") |
gender2 |
the gender of obsid2 ("M" or "F") |
leaderage1 |
the approximate age (i.e. year - yrborn ) for obsid1 in the year |
leaderage2 |
the approximate age (i.e. year - yrborn ) for obsid2 in the year |
yrinoffice1 |
a running count for the tenure of obsid1 , starting at 1. |
yrinoffice2 |
a running count for the tenure of obsid2 , starting at 1. |
These are all directed leader dyad-year data from 1870-2015. Data come from the Archigos data (version 4.1). The data represent every possible dyadic leader-pairing in the Archigos data (which is denominated in the Gleditsch-Ward system), but standardizes leader dyad-years to Gleditsch-Ward state system dates. The data downloaded by this function are about 2.2 megabytes in size.
COLUMN | DESCRIPTION |
year |
the year |
obsid1 |
the unique Archigos (v. 4.1) observation ID for the first leader |
obsid2 |
the unique Archigos (v. 4.1) observation ID for the second leader |
gwcode1 |
a numeric vector for the Gleditsch-Ward state code for the first state |
gwcode2 |
a numeric vector for the Gleditsch-Ward state code for the second state |
gender1 |
the gender of obsid1 ("M" or "F") |
gender2 |
the gender of obsid2 ("M" or "F") |
leaderage1 |
the approximate age (i.e. year - yrborn ) for obsid1 in the year |
leaderage2 |
the approximate age (i.e. year - yrborn ) for obsid2 in the year |
yrinoffice1 |
a running count for the tenure of obsid1 , starting at 1. |
yrinoffice2 |
a running count for the tenure of obsid2 , starting at 1. |
The FPSIM data set provides measures of foreign policy similarity of dyads based on alliance ties (Correlates of War, version 4.1) and UN General Assembly voting (Voeten, version 17) for all members of the Correlates of War state system. The alliance data cover the time period from 1816 to 2012, and the UN voting data from 1946 to 2015. The similarity measures include various versions of Ritter and Signorino's S (weighted/non-weighted by material capabilities; squared/absolute distance metrics) as well as the chance-corrected measures Cohen's (1960) kappa and Scott's (1955) pi. The measures based on alliance data come in two versions: one is based on valued alliance ties and the other is based on binary alliance ties. Data were last updated on December 7, 2017, and this description was effectively plagiarized (with his blessing) from Frank Haege's Dataverse.
These data are directed dyad-years with 17 columns and 1,872,198 observations. They will almost certainly be the largest data set I nudge/ask you to download remotely. The file containing this information is 18.6 MB in size. To reduce size further, these decimal points have also been rounded to three spots.
Haege generated all estimates of dyadic foreign policy similarity, except
for the taub
column. That was generated separately, by me.
COLUMN | DESCRIPTION |
year |
the year |
ccode1 |
the Correlates of War state code for the first state |
ccode2 |
the Correlates of War state code for the second state |
taub |
Tau-b (valued alliance data) |
srsvas |
unweighted S (squared distances, valued alliance data) |
srswvas |
weighted S (squared distances, valued alliance data) |
srsvaa |
unweighted S (absolute distances, valued alliance data) |
srswvaa |
weighted S (absolute distances, valued alliance data) |
kappava |
Kappa (squared distances, valued alliance data) |
piva |
Pi (squared distances, valued alliance data) |
srsba |
Unweighted S (binary alliance data) |
srswba |
Weighted S (binary alliance data) |
kappaba |
Kappa (binary alliance data) |
piba |
Pi denominator (binary alliance data) |
srsvvs |
Unweighted S (squared distances, valued UN voting data) |
srsvva |
Unweighted S (absolute distances, valued UN voting data) |
kappavv |
Kappa (squared distances, valued UN voting data) |
pivv |
Pi (squared distances, valued UN voting data) |
These are non-directed dyadic minimum distance data from Schvitz et al. (2022) for all Correlates of War states from the start of 1886 to the end of 2019. Note that I call these "data plus", with the idea of informally branding these as a kind of augmentation of what you might otherwise do with the cshapes package. This data set has over 4.4 million rows for each dyadic minimum distance for all available years. Within each year, there is a recorded minimum distance for Jan. 1, June 30, Dec. 31 and, in addition, any day within the year where the composition of the international system (or shape of a state) changed, as recorded in cshapes. Sometimes these changes concern the dyadic minimum distance; sometimes they don't. For example, the League of Nations is responsible for a lot shape changes (i.e. system entry) in the CoW state system data in the year 1920. That obviously won't change the dyadic minimum distance between the U.S. and Canada, which will always be zero. Sometimes the start of the year (Jan. 1), the midpoint of the year (June 30), or the end of the year (Dec. 31) coincides with a system change. Often it doesn't. Note that a referent day (Jan. 1, June 30, Dec. 31) may not appear in a given year for a given dyad if that date exists outside CoW state system membership. For example, Canada doesn't appear as a state system member until Jan. 10, 1920. The goal of this data set is allow you to more quickly generate dyadic minimum distances within peacesciencer's functionality if you are proficient in tidyverse verbs. You could also use it to highlight how often the dyadic minimum distance may vary within a year for a given dyad.
Despite the dimensions of the data set, it's not too big of a download. The data are about 1.7 MB in size.
COLUMN | DESCRIPTION |
ccode1 |
the Correlates of War state code for the first state |
ccode2 |
the Correlates of War state code for the second state |
year |
the year |
date |
a date, coinciding with either a system change date or a referent day (i.e. Jan. 1, June 30, Dec. 31) |
change_date |
a date that, when present, indicates the shape of the system changed on that day |
mindist |
the dyadic minimum distance (in kilometers) |
These are non-directed dyadic minimum distance data from Schvitz et al. (2022) for all Gleditsch-Ward states from the start of 1886 to the end of 2019. Note that I call these "data plus", with the idea of informally branding these as a kind of augmentation of what you might otherwise do with the cshapes package. This data set has over 3.7 million rows for each dyadic minimum distance for all available years. Within each year, there is a recorded minimum distance for Jan. 1, June 30, Dec. 31 and, in addition, any day within the year where the composition of the international system (or shape of a state) changed, as recorded in cshapes. Sometimes these changes concern the dyadic minimum distance; sometimes they don't. For example, the dissolution of the Soviet Union is responsible for a lot shape changes (i.e. system entry) in 1991. That obviously won't change the dyadic minimum distance between the U.S. and Canada, which will always be zero. Sometimes the start of the year (Jan. 1), the midpoint of the year (June 30), or the end of the year (Dec. 31) coincides with a system change. Often it doesn't. Note that a referent day (Jan. 1, June 30, Dec. 31) may not appear in a given year for a given dyad if that date exists outside G-W state system membership. For example, Haiti disappears from the state system on July 4, 1915 and reappears on Aug. 15, 1934. That means there won't be any dyadic minimum distance observations with the U.S., for example, on Dec. 31, 1915 or June 30, 1934. The goal of this data set is allow you to more quickly generate dyadic minimum distances within peacesciencer's functionality if you are proficient in tidyverse verbs. You could also use it to highlight how often the dyadic minimum distance may vary within a year for a given dyad.
Despite the dimensions of the data set, it's not too big of a download. The data are about 1.4 MB in size.
COLUMN | DESCRIPTION |
gwcode1 |
the Gleditsch-Ward state code for the first state |
gwcode2 |
the Gleditsch-Ward state code for the second state |
year |
the year |
date |
a date, coinciding with either a system change date or a referent day (i.e. Jan. 1, June 30, Dec. 31) |
change_date |
a date that, when present, indicates the shape of the system changed on that day |
mindist |
the dyadic minimum distance (in kilometers) |
Steven V. Miller
Barbieri, Katherine, Omar M. G. Keshk, and Brian Pollins. 2009. "TRADING DATA: Evaluating our Assumptions and Coding Rules." Conflict Management and Peace Science. 26(5): 471-491.
Goemans, Henk E., Kristian Skrede Gleditsch, and Giacomo Chiozza. 2009. "Introducing Archigos: A Dataset of Political Leaders" Journal of Peace Research 46(2): 269–83.
Haege, Frank. 2011. "Choice or Circumstance? Adjusting Measures of Foreign Policy Similarity for Chance Agreement." Political Analysis 19(3): 287-305.
Schvitz, Guy, Luc Girardin, Seraina Ruegger, Nils B. Weidmann, Lars-Erik Cederman,
and Kristian Skrede Gleditsch. 2022. "Mapping The International System, 1886-2017:
The CShapes
2.0 Dataset." Journal of Conflict Resolution. 66(1): 144-161.
Weidmann, Nils B. and Kristian Skrede Gleditsch. 2010. "Mapping and Measuring Country Shapes: The cshapes
Package."
The R Journal 2(1): 18-24.
## Not run: # Here's where the data are going to be downloaded. system.file("extdata", package="peacesciencer") # Now, let's download the data. download_extdata() ## End(Not run)
## Not run: # Here's where the data are going to be downloaded. system.file("extdata", package="peacesciencer") # Now, let's download the data. download_extdata() ## End(Not run)
This is a simple data set that communicates directed dyads in the Correlates of War data that appear in the same year,
but not in any particular day in the year. They are used in an anti-join in the create_dyadyears()
function in this package.
false_cow_dyads
false_cow_dyads
A data frame the following four variables.
ccode1
a numeric vector for the Correlates of War state code for the first state
ccode2
a numeric vector for the Correlates of War state code for the second state
year
a numeric vector for the year
in_ps
a constant that equals 1 if these data would appear in create_dyadyears()
if you were not careful to remove them.
Think of the directed Suriname and Republic of Vietnam dyad here as illustrative here. The Republic of Vietnam exits the Correlates of War state system on April 30, 1975 whereas Suriname enters the state system on November 25, 1975. Both appear in the same year, but not at the same time.
This is a simple data set that communicates directed dyads in the Gleditsch-Ward data that appear in the same year,
but not in any particular day in the year. They are used in an anti-join in the create_dyadyears()
function in this package.
false_gw_dyads
false_gw_dyads
A data frame the following four variables.
gwcode1
a numeric vector for the Gleditsch-Ward state code for the first state
gwcode2
a numeric vector for the Gleditsch-Ward state code for the second state
year
a numeric vector for the year
in_ps
a constant that equals 1 if these data would appear in create_dyadyears()
if you were not careful to remove them.
Think of the directed Suriname and Republic of Vietnam dyad here as illustrative here. The Republic of Vietnam exits the Correlates of War state system on April 30, 1975 whereas Suriname enters the state system on November 25, 1975. Both appear in the same year, but not at the same time.
filter_prd()
filters a dyad-year data frame to just those that
are "politically relevant." This is useful for discarding unnecessary (and unwanted)
observations that just consume space in memory.
filter_prd(data)
filter_prd(data)
data |
a dyad-year data frame (either "directed" or "non-directed") |
"Political relevance" can be calculated a few ways. Right now, the function considers only "direct" contiguity and Correlates of War major power status. You can employ maximalist definitions of "direct contiguity" to focus on just the land-contiguous. This function is inclusive of any type of contiguity relationship.
As of version 0.5, filter_prd()
is a shortcut for add_contiguity()
and/or add_cow_majors()
if the function is executed in the absence of the data needed to create
politically relevant dyads. See the example below for what this means.
filter_prd()
takes a dyad-year data frame, assuming it has columns for
major power status and contiguity type, calculates whether the dyad is "politically
relevant", and subsets the data frame to just those observations.
Steven V. Miller
Weede, Erich. 1976. "Overwhelming preponderance as a pacifying condition among contiguous Asian dyads." Journal of Conflict Resolution 20: 395-411.
Lemke, Douglas and William Reed. 2001. "The Relevance of Politically Relevant Dyads." Journal of Conflict Resolution 45(1): 126-144.
# just call `library(tidyverse)` at the top of the your script library(magrittr) A <- cow_ddy %>% add_contiguity() %>% add_cow_majors() %>% filter_prd() A # you can also use it as a shortcut for the other functions required # to calculate politically relevant dyads. B <- cow_ddy %>% filter_prd() B identical(A,B)
# just call `library(tidyverse)` at the top of the your script library(magrittr) A <- cow_ddy %>% add_contiguity() %>% add_cow_majors() %>% filter_prd() A # you can also use it as a shortcut for the other functions required # to calculate politically relevant dyads. B <- cow_ddy %>% filter_prd() B identical(A,B)
This is a simple data set of 71 arms races reported by Gibler et al. in their 2005 article in Journal of Peace Research.
grh_arms_races
grh_arms_races
A data frame the following five variables.
race_id
the arms race identifier
ccode1
a numeric vector for the Correlates of War state code for the first state
ccode2
a numeric vector for the Correlates of War state code for the second state
styear
the start year for the arms race
endyear
the end year for the arms race
Data are taken from the appendix of Gibler, Rider, and Hutchison's 2005 article in Journal of Peace Research. Read the article and appendix for more information about coding procedures.
Gibler, Douglas M., Toby J. Rider, and Marc L. Hutchison. 2005. "Taking Arms Against a Sea of Troubles: Conventional Arms Races during Periods of Rivalry" Journal of Peace Research 42(2): 131–47.
This is a complete list of capitals and capital transitions for Gleditsch-Ward state system members. I
use it internally for calculating capital-to-capital distances in the add_capital_distances()
function.
gw_capitals
gw_capitals
A data frame with 248 observations on the following 7 variables.
gwcode
a numeric vector for the Gleditsch-Ward state code
statenme
a character vector for the state
capital
a character vector for the name of the capital
styear
a character vector for the start year. See details section for more information.
endyear
a character vector for the end year. See details section for more information.
lat
a numeric vector of the latitude coordinates for the capital
lng
a numeric vector of the longitude coordinates for the capital
For convenience, the start year for most states is 1816. Samoa, for example, was not a state
in 1816. However, the functions that use the gw_capitals
data will not create observations for states
that did not exist at a given point in time.
The data should be current as of the end of 2020.
Cases where a start year is not 1816 indicate a capital transition. For example, Brazil's capital moved from Rio de Janeiro to Brasilia (a planned capital) in 1960. Only 25 states in the data experienced a capital transition. The most recent was Burundi in 2018. Indonesia, as of writing, is planning on a capital transition, but this has not been completed yet.
Kazakhstan renamed its capital for the state leader in 2019. These data retain the name of Astana. This will be changed in the event the software I use records this change.
The capitals data are not without some peculiarities. Prominently, Portugal transferred the Portuguese court from Lisbon to Rio de Janeiro from 1808 to 1821. This is recorded in the data. A knowledge of the inter-state conflict data will note there was no war or dispute between, say, Portugal and Spain (or Portugal and any other country) at any point during this time, but it does create some weirdness that would suggest a massive distance between two countries, like Portugal and Spain, that are otherwise land-contiguous.
On Spain: the republican government moved the capital at the start of the civil war (in 1936) to Valencia. However, it abandoned this capital by 1937. I elect to not record this capital transition.
On Myanmar: the Gleditsch-Ward system stands out as having Myanmar entered for the bulk of the 19th century. The capitals recorded for Myanmar (Burma) coincide with capitals of the Konbaung dynasty.
The data also do some (I think) reasonable back-dating of capitals to coincide with states in transition without necessarily formal capitals by the first appearance in the state system membership data. These concern Lithuania, Kazakhstan, and the Philippines. Kaunas is the initial post-independence capital of Lithuania. Almaty is the initial post-independence capital of Kazakhstan. Quezon City is the initial post-independence capital of the Philippines. This concerns, at the most, one or two years for each of these three countries.
This is a complete (I believe) data set on Gleditsch-Ward states and Correlates of War states, a byproduct
of a full_join()
between gw_states
and cow_states
that leans largely on the
state abbreviation variable.
gw_cow_years
gw_cow_years
A data frame with 18425 observations on the following 6 variables.
gwcode
a Gleditsch-Ward state code
stateabb
the state abbreviation, which was the greatest source of agreement between both data sets
gw_statename
the state name as it appears in the Gleditsch-Ward data
ccode
a Correlates of War state code
cow_statename
the state name as it appears in the Correlates of War data
year
a numeric vector for the year
The data-raw
directory on the project's Github contains more information about how these data were
created. I'm going to use it for internal stuff. The workflow is going to treat the Gleditsch-Ward state system
membership codes as more of the "master" codes, for which the user can add Correlates of War identifiers as they see
fit. Data are extended to 2020, assuming no changes to state system membership for either data set.
This is a complete directed dyad-year data frame of Gleditsch-Ward state system members. I offer it here as a shortcut for various other functions. As a general rule, this data frame is updated after every calendar year to include the most recently concluded calendar year.
gw_ddy
gw_ddy
A data frame with the following 3 variables.
gwcode1
a numeric vector for the Correlates of War state code for the first state
gwcode2
a numeric vector for the Correlates of War state code for the second state
year
a numeric vector for the year
Data are a quick generation from the create_dyadyears(system="gw")
function in this package.
These are non-directed dyad-year data for the minimum distance between states in the Gleditsch-Ward state system from 1886 to 2018. The data are generated from the cshapes package.
gw_mindist
gw_mindist
A data frame with 868813 observations on the following 4 variables.
gwcode1
the Gleditsch-Ward state system code for the first state
gwcode2
the Gleditsch-Ward state system code for the second state
year
the year
mindist
the minimum distance between states on Jan. 1 of the year, in kilometers
The data are generated from the cshapes package. The package authors purport that the data are generated to be compatible with the Gleditsch-Ward system. I trust them on this; indeed, Gleditsch is one of the authors of the cshapes package.
Data are automatically generated (by default) as directed dyad-years. I elect to make them non-directed for space considerations. Making non-directed dyad-year data into directed dyad-year data isn't too difficult in R. It just looks weird to see the code that does it.
Previous versions of these data were for the minimum distance as of Dec. 31 of the referent year. These are now Jan. 1.
Most of the data I prove elsewhere in this package are to be understood as the data as they were at the start of
the year. This is how I process, for example, the capitals
data as they get merged in the add_capital_distance()
function. However, the script that generates these data are set at Jan. 1 of the year and not Dec. 31. Right now, the cshapes
does not appear to work on my system and I do not know why. Fortunately, the package authors made these data available.
Schvitz, Guy, Luc Girardin, Seraina Ruegger, Nils B. Weidmann, Lars-Erik Cederman,
and Kristian Skrede Gleditsch. 2022. "Mapping The International System, 1886-2017:
The CShapes
2.0 Dataset." Journal of Conflict Resolution. 66(1): 144-161.
Weidmann, Nils B. and Kristian Skrede Gleditsch. 2010. "Mapping and Measuring Country Shapes: The cshapes
Package." The R Journal 2(1): 18-24
These are state-year level data for surplus and gross domestic product for Correlates of War state system members. Data also include population estimates for per capita standardization.
gw_sdp_gdp
gw_sdp_gdp
A data frame with 27387 observations on the following five variables.
gwcode
a numeric vector for the Gleditsch-Ward state code
year
a numeric vector for the year
wbgdp2011est
a numeric vector for the estimated natural log of GDP in 2011 USD (log-transformed)
wbpopest
a numeric vector for the estimated population size (log-transformed)
sdpest
a numeric vector for the estimated surplus domestic product (log-transformed)
wbgdppc2011est
a numeric vector for the estimated GDP per capita (log-transformed)
These were provided by Anders on a separate Github repository for this project. Because these data are ultimately being simulated, a user can expect some slight differences between the Correlates of War version of these data (which Anders et al. published) and the Gleditsch-Ward version of these data (which appear to be the one the authors will more vigorously support going forward).
Space considerations compel me to round these data to three decimal points. These "economic" data are routinely
the biggest in the package, and it's because of the decimal points. The justification for this is these data are estimated/simulated
anyways and the information loss is at the 1/1000th decimal point. This procedure basically cuts the size of the data to be less than
25% of its original size. The original simulations are available for remote download if you'd like. Type ?download_extdata()
for
more information.
Anders, Therese, Christopher J. Fariss, and Jonathan N. Markowitz. 2020. "Bread Before Guns or Butter: Introducing Surplus Domestic Product (SDP)" International Studies Quarterly 64(2): 392–405.
These are the independent states in Gleditsch and Ward's data.
gw_states
gw_states
A data frame with 216 observations on the following 5 variables.
gwcode
a numeric vector for the Gleditsch-Ward country code
stateabb
a character vector for state abbreviation
statename
a character vector for the state name
startdate
the start date in the data
enddate
the end date in the data
Data originally provided by Gleditsch with no column names. Column names
were added before some light re-cleaning in order to generate these data.
"Wuerttemberg" and "Cote D'Ivoire" in the statename
column needed to be
renamed to ensure maximal compliance with CRAN, which raises notes for
every non-ASCII character that appears in its package. I do not think this
to be problematic at all and, after all, state names should never be
a basis for something like a match or merge you would do in
countrycode.
Gleditsch, Kristian S. and Michael D. Ward. 1999. "A Revised List of Independent States since the Congress of Vienna." International Interactions 25(4): 393–413.
These are democracy data for all Correlates of War state system members.
gwcode_democracy
gwcode_democracy
A data frame with 18289 observations on the following 5 variables.
gwcode
the Gleditsch-Ward system code
year
a numeric vector for the year
v2x_polyarchy
the Varieties of Democracy "polyarchy" estimate
polity2
the the polity2
score from the Polity project
xm_qudsest
an extension of the Unified Democracy Scores (UDS) estimates, made possibly by the QuickUDS
package from Xavier Marquez.
Missing data connote data that are unavailable for various reasons. Either there is no democracy data to code or, in the case of the Polity project, the state system member is outright not evaluated for the variable.
The Polity data are from 2017. The Varieties of Democracy data are version 10. Xavier Marquez' QuickUDS
estimates (i.e. extensions of Pemstein et al. (2010)) come from a package Marquez makes available on his Github (https://github.com/xmarquez/QuickUDS).
Coppedge, Michael, John Gerring, Carl Henrik Knutsen, Staffan I. Lindberg, Jan Teorell, David Altman, Michael Bernhard, M. Steven Fish, Adam Glynn, Allen Hicken, Anna Luhrmann, Kyle L. Marquardt, Kelly McMann, Pamela Paxton, Daniel Pemstein, Brigitte Seim, Rachel Sigman, Svend-Erik Skaaning, Jeffrey Staton, Agnes Cornell, Lisa Gastaldi, Haakon Gjerlow, Valeriya Mechkova, Johannes von Romer, Aksel Sundtrom, Eitan Tzelgov, Luca Uberti, Yi-ting Wang, Tore Wig, and Daniel Ziblatt. 2020. "V-Dem Codebook v10" Varieties of Democracy (V-Dem) Project.
Marshall, Monty G., Ted Robert Gurr, and Keith Jaggers. 2017. "Polity IV Project: Political Regime Characteristics and Transitions, 1800-2017." Center for Systemic Peace.
Marquez, Xavier, "A Quick Method for Extending the Unified Democracy Scores" (March 23, 2016). doi:10.2139/ssrn.2753830
Pemstein, Daniel, Stephen Meserve, and James Melton. 2010. "Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type." Political Analysis 18(4): 426-449.
This is a data set with state-year estimates for ethnic fractionalization.
hief
hief
A data frame with 8808 observations on the following 5 variables.
ccode
a Correlates of War state code
gwcode
a Gleditsch-Ward state code
year
the year
efindex
a numeric vector for the estimate of ethnic fractionalization
The data-raw
directory on the project's Github contains more information about how these data were
created.
Drazanova, Lenka. 2020. "Introducing the Historical Index of Ethnic Fractionalization (HIEF) Dataset: Accounting for Longitudinal Changes in Ethnic Diversity." Journal of Open Humanities Data 6:6 doi:10.5334/johd.16
These are an abbreviated version of the LEAD Data Set, incorporating variables that I think are most interesting or potentially useful from these data.
LEAD
LEAD
A data frame with 3409 observations on the following 12 variables.
obsid
an observational ID from archigos
leveledu
0 = primary, 1 = secondary, 2 = university, 3 = graduate
milservice
did leader have prior military service?
combat
did leader have prior combat experience in military service?
rebel
was leader previously part of a rebel group?
warwin
was leader previously part of a winning war effort as part of military service?
warloss
was leader previously part of a losing war effort as part of military service?
rebelwin
was leader previously part of a winning war effort as part of a rebel group?
rebelloss
was leader previously part of a losing war effort as part of a rebel group?
yrsexper
previous years of experience in politics before becoming a leader
physhealth
does leader have physical health issues?
mentalhealth
does leader have mental health issues?
Data are ported from Ellis et al. (2015). Users who want more of these variables included in peacesciencer should raise an issue on Github.
Ellis, Carli Mortenson, Michael C. Horowitz, and Allan C. Stam. 2015. "Introducing the LEAD Data Set." International Interactions 41(4): 718–741.
This is a simple data set that matches, as well as one can, leader codes across Archigos 4.1, Archigos 2.9, and the LEAD data set.
leader_codes
leader_codes
A data frame the following four variables.
obsid
the observation ID in the Archigos data
leadid
the leader ID in version 4.1 of the Archigos data
leadid29
the leader ID in version 2.9 of the Archigos data
leaderid
the leader ID in the LEAD data
These data treat version 4.1 of the Archigos data as the gospel leader data (if you will) for which the observation ID (obsid
)
is the master code indicating a leader tenure period. It also builds in an assumption that various observations that duplicate in the
LEAD data should not have duplicated. This concerns Francisco Aguilar Barquer (who appears twice), Emile Reuter (who appears twice),
and Gunnar Thoroddsen (who appears three times) in the LEAD data despite having uninterrupted tenures in office. None of the covariates
associated with these leaders change in the LEAD data, which is why I assume they were duplicates.
These are the estimates of leader willingness to use force as estimated by Carter and Smith (2020).
lwuf
lwuf
A data frame with 3409 observations on the following 9 variables.
obsid
an observational ID from archigos
theta1_mean
the mean simulated M1 theta, as estimated by Carter and Smith (2020)
theta1_sd
the standard deviation of simulated M1 thetas
theta2_mean
the mean simulated M2 theta, as estimated by Carter and Smith (2020)
theta2_sd
the standard deviation of simulated M2 thetas
theta3_mean
the mean simulated M3 theta, as estimated by Carter and Smith (2020)
theta3_sd
the standard deviation of simulated M3 thetas
theta4_mean
the mean simulated M4 theta, as estimated by Carter and Smith (2020)
theta4_sd
the standard deviation of simulated M4 thetas
The letter published by the authors contains more information as to what these thetas refer. The "M1" theta is a variation of the standard Rasch model from the boilerplate information in the LEAD data. The authors consider this to be "theoretically relevant" or "risk-related" as these all refer to conflict or risk-taking. The "M2" theta expands on "M1" by including political orientation and psychological characteristics. "M3" and "M4" expand on "M1" and "M2" by considering all 36 variables in the LEAD data.
The authors construct and include all these measures, though their analyses suggest "M2" is the best-performing measure.
Carter, Jeff and Charles E. Smith, Jr. 2020. "A Framework for Measuring Leaders' Willingness to Use Force." American Political Science Review 114(4): 1352–1358.
These are Zeev Maoz' data for what states are regional or global powers at a given point time. They are extensions of the Correlates of War major power data, which only codes "major" power without consideration of regional or global distinctions. Think of Austria-Hungary as intuitive of the issue here. Austria-Hungary is a major power in the Correlates of War data, but there is good reason to treat Austria-Hungary as a major power only within Europe. That is what Zeev Maoz tries to do here.
maoz_powers
maoz_powers
A data frame with 20 observations on the following 5 variables.
ccode
a numeric vector for the Correlates of War country code
regstdate
the start date for regional power status
regenddate
the end date for regional power status
globstdate
the start date for global power status
globenddate
the end date for global power status
Maoz, Zeev. 2010. Network of Nations: The Evolution, Structure, and Impact of International Networks, 1816-2001. Cambridge University Press.
BibTeX
Data Frame of CitationsThis is a BibTeX
file, loaded as a data frame, to assist the user in properly citing the source material that is used in this package.
ps_bib
ps_bib
A data frame with the following columns.
CATEGORY
the BibTeX
entry type
BIBTEXKEY
the BibTeX
unique entry key
ADDRESS
another BibTeX
field
ANNOTE
another BibTeX
field
AUTHOR
a list of authors for this entry
BOOKTITLE
another BibTeX
field, for book title (if appropriate)
CHAPTER
another BibTeX
field, for chapter (if appropriate)
CROSSREF
another BibTeX
field
EDITION
another BibTeX
field, for edition of book (if appropriate)
EDITOR
another BibTeX
field, for book editor (if appropriate)
HOWPUBLISHED
another BibTeX
field
INSTITUTION
another BibTeX
field
JOURNAL
another BibTeX
field, for the journal name (if appropriate)
KEY
another BibTeX
field
MONTH
another BibTeX
field
NOTE
another BibTeX
field
NUMBER
another BibTeX
field, for journal volume number (if appropriate)
ORGANIZATION
another BibTeX
field
PAGES
another BibTeX
field, for pages of the entry
PUBLISHER
another BibTeX
field, for book publisher (if appropriate)
SCHOOL
another BibTeX
field
SERIES
another BibTeX
field
TITLE
another BibTeX
field, for title of the entry
TYPE
another BibTeX
field
VOLUME
another BibTeX
field, for journal volume (if appropriate)
YEAR
another BibTeX
field, for year of publication
KEYWORDS
another BibTeX
field, used primarily for selective filtering in this package
URL
another BibTeX
field, for website (if appropriate)
OWNER
another BibTeX
field
TIMESTAMP
another BibTeX
field, used occasionally when I started populating my master file (you will see some old entries here)
DOI
another BibTeX
field, for a digital object identifier (used rarely)
EPRINT
another BibTeX
field
JOURNALTITLE
another BibTeX
field, which I think is actually a BibLaTeX
field
ISSN
another BibTeX
field
ABSTRACT
another BibTeX
field, for entry abstract (if appropriate)
DATE.ADDED
another BibTeX
field
DATE.MODIFIED
another BibTeX
field
See data-raw
directory for how these data were generated. The data were created by bib2df, which is now a package dependency.
I assume the user has some familiarity with BibTeX
. Some entries were copy-pasted from my master bibliography file that I started in 2008 or so.
BibTeX
Entries Associated with peacesciencer Data and Functionsps_cite()
allows the user to get citations to scholarship that they
should include in their papers that incorporate the functions and data in this package.
ps_cite(x, column = "keywords")
ps_cite(x, column = "keywords")
x |
a character vector |
column |
a character vector for the particular column of |
The base functionality here is simple pattern-matching on keywords in ps_bib
. This
simple pattern-matching is in base R. I assume the user has some familiarity with BibTeX
.
ps_cite()
takes a character vector and scans the ps_bib
data in
this package to return a BibTeX
citation (or citations) for the researcher to use
to properly cite the material they are getting from this package. The citations
are returned as a full BibTeX
entry (or entries) that they can copy-paste into their
own BibTeX
file.
Steven V. Miller
# Cite the package ps_cite("peacesciencer")
# Cite the package ps_cite("peacesciencer")
This is a simple data set that communicates the version numbers of data included in this package. It's a companion
to the data frame ps_bib
, and other information functions like ps_cite()
and ps_version()
. The latter
uses this data set.
ps_data_version
ps_data_version
A data frame the following four variables.
category
a category for the type of data
data
the name of the particular data source coinciding with the category
version
the version number included in peacesciencer for this data source
bibtexkey
a character key for the BibTeX
key corresponding with an appropriate citation in ps_bib
Version numbers that are years should be understood as data sources with no formal version numbering system, per se. Instead, they communicate a year of last update. For example, the Correlates of War does not formally version number its state system data as it does its MID data. Likewise, the Anders et al. (2020) simulations of population and surplus/gross domestic product are not formally versioned, per se. Instead, the data were published and last updated in 2020.
ps_version()
allows the user to see version information
about data included in peacesciencer.
ps_version(cat)
ps_version(cat)
cat |
a category of data type the user wants, as a character |
The base functionality here is simple pattern-matching on keywords
in ps_data_version
. This simple pattern-matching is in base R. I
assume the user has some familiarity with the types of data included in
this package.
The searching is done by category included in the ps_data_version
data.
Users may want to just minimally run ps_version()
with no argument
specified to see for themselves what's in it. Typing
unique(ps_data_version$category)
may also get them started.
The user can consider this a companion function to ps_cite()
.
Whereas ps_cite()
will return the appropriate citation
to use in the bibliography, it may not tell them the version number at all.
For example, the classic and suggested citations for the Correlates of War
National Material Capabilities data are too Singer et al. (1972) and Singer
(1987), though the data included in this package are about 30 years older
than the most recent citation of the two.
The information communicated here can/should be included alongside a parenthetical citation. For example, the contiguity data are quite a bit more current than the suggested citation to Stinnett et al. (2002). Thus, a user may want to cite the data in their paper as something like (Stinnett et al. 2002, v. 3.2).
ps_version()
takes a character vector and scans the
ps_data_version
data in this package to return information about
the particular data versions included in peacesciencer as well as a
suggested citation key for scanning ps_cite()
. If no category is
specified for searching, it just returns all version information for
all data included in functions in this package.
Steven V. Miller
# What can you search for... unique(ps_data_version$category) # will show the data versions for everything ps_version() # will show data versions for particular categories of data ps_version("democracy") ps_version("leaders")
# What can you search for... unique(ps_data_version$category) # will show the data versions for everything ps_version() # will show data versions for particular categories of data ps_version("democracy") ps_version("leaders")
This is a data set on state-level estimates for the "ruggedness" of a state's terrain.
rugged
rugged
A data frame with 192 observations on the following 6 variables.
ccode
a Correlates of War state code
gwcode
a Gleditsch-Ward state code
rugged
the terrain ruggedness index
newlmtnest
the (natural log) percentage estimate of the state's terrain that is mountainous
The data-raw
directory on the project's Github contains more information about how these data were
created. It goes without saying that these data move slowly so the data are really only applicable for making state-to-state
comparisons and not states-in-time comparisons. The terrain ruggedness index is originally introduced by Riley et al. (1999) but
is amended by Nunn and Puga (2012). The mountain terrain data was originally created by Fearon and Laitin (2003) but extended and
amended by Gibler and Miller (2014). The data are functionally time-agnostic—use with caution in your state-year analyses—but all
data sets seem to benchmark around 1999-2000. I'm not sure it matters that much, but it matters a little at the margins, I suppose,
if you suspect there are major differences in interpretation of how much more "rugged" the Soviet Union was than Russia, or Yugoslavia
than Serbia.
Fearon, James D., and David Laitin, "Ethnicity, Insurgency, and Civil War" American Political Science Review 97: 75–90.
Gibler, Douglas M. and Steven V. Miller. 2014. "External Territorial Threat, State Capacity, and Civil War." Journal of Peace Research 51(5): 634-646.
Nunn, Nathan and Diego Puga. 2012. "Ruggedness: The Blessing of Bad Geography in Africa." Review of Economics and Statistics. 94(1): 20-36.
Riley, Shawn J., Stephen D. DeGloria, and Robert Elliot. 1999. "A Terrain Ruggedness Index That Quantifies Topographic Heterogeneity,” Intermountain Journal of Sciences 5: 23–27.
show_duplicates()
shows which data are duplicated
in data generated in peacesciencer. It's a useful diagnostic tool
for users doing some do-it-yourself functions with peacesciencer.
show_duplicates(data)
show_duplicates(data)
data |
a dyad-year data frame or a state-year data frame created in peacesciencer. |
The function leans on attributes of the data that are
provided by the create_dyadyear()
or create_stateyear()
function. Make sure that function (or data created by that function)
appear at the top of the proverbial pipe.
The data returned will also have a new column called duplicated
.
Thus, an implicit assumption in this function is the user does not have
a column in the data with this name that is of interest to the user.
It will be overwritten.
show_duplicates()
takes a dyad-year data frame or
state-year data frame generated in peacesciencer and
shows what observations are duplicated by unique combination of
dyad-year or state-year, contingent on what was supplied to it.
Steven V. Miller
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% show_duplicates() cow_mid_dirdisps %>% show_duplicates()
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% show_duplicates() cow_mid_dirdisps %>% show_duplicates()
A simple summary of all strategic (inter-state) rivalries from Thompson and Dreyer (2012).
td_rivalries
td_rivalries
A data frame with 197 observations on the following 10 variables.
rivalryno
a numeric vector for the rivalry number
rivalryname
a character vector for the rivalry name
ccode1
the Correlates of War state code for the state with the lowest Correlates of War state code in the rivalry
ccode2
the Correlates of War state code for the state with the highest Correlates of War state code in the rivalry
styear
a numeric vector for the start year of the rivalry
endyear
a numeric vector for the end year of the rivalry
region
a character vector for the region of the rivalry, per Thompson and Dreyer (2012)
type1
a character vector for the primary type of the rivalry (spatial, positional, ideological, or interventionary)
type2
a character vector for the secondary type of the rivalry, if applicable (spatial, positional, ideological, or interventionary)
type3
a character vector for the tertiary type of the rivalry, if applicable (spatial, positional, ideological, or interventionary)
Information gathered from the appendix of Thompson and Dreyer (2012). Ongoing rivalries are
right-bound at 2010, the date of publication for Thompson and Dreyer's handbook. Users are free to change this if they like. Data are effectively
identical to strategic_rivalries
in stevemisc, but include some behind-the-scenes processing (described in a blog post on
http://svmiller.com) that is available to see on the project's Github repository. The data object is also renamed to avoid a conflict.
Miller, Steven V. 2019. "Create and Extend Strategic (International) Rivalry Data in R". URL: http://svmiller.com/blog/2019/10/create-extend-strategic-rivalry-data-r/
Thompson, William R. and David Dreyer. 2012. Handbook of International Rivalries. CQ Press.
A simple summary of all strategic (inter-state) rivalries from Thompson et al. (2021). This is a simple spreadsheet entry job (with some light cleaning) based on information provided from pages 34 to 46 in their book.
tss_rivalries
tss_rivalries
A data frame with 264 observations on the following 12 variables.
tssr_id
a numeric vector for the rivalry number
rivalry
a character vector for the rivalry name
ccode1
the Correlates of War state code for the state with the lowest Correlates of War state code in the rivalry
ccode2
the Correlates of War state code for the state with the highest Correlates of War state code in the rivalry
start
a numeric vector for the start year of the rivalry
end
a numeric vector for the end year of the rivalry
positional
a numeric vector that is 1 if Thompson et al. (2021) say the rivalry has a positional element (NA
otherwise)
spatial
a numeric vector that is 1 if Thompson et al. (2021) say the rivalry has a spatial element (NA
otherwise)
ideological
a numeric vector that is 1 if Thompson et al. (2021) say the rivalry has an ideological element (NA
otherwise)
interventionary
a numeric vector that is 1 if Thompson et al. (2021) say the rivalry has an interventionary element (NA
otherwise)
principal
a numeric vector that is 1 if Thompson et al. (2021) say the rivalry is the primary (principal) rivalry for the rivals (NA
otherwise)
aprin
a numeric vector that is 1 if Thompson et al. (2021) say this is an asymmetric principal rivalry (NA
otherwise)
Information gathered from chapter 2 of Thompson et al. (2021). Ongoing rivalries are right-bound at 2020. In several cases, start dates of 1494 and 1816 originally had a "P" attached to them, indicating they were ongoing before that particular year. This is captured in the "raw" spreadsheet included in the "data-raw" directory, though this is adjusted in this finished data product.
This file adjusts for what is assuredly two print errors in Thompson et al. (2021). In print, Thompson et al. (2021) say the Italy-Turkey rivalry extends from 1884-1843 and say the Mauritania-Morocco rivalry extends from 1060-1969. They had meant an end year of 1943 in the first case and a start year of 1960 in the second case. This is fixed in this version.
Venice never appears in any data set in the Correlates of War ecosystem of data, but I gave it a country code of 324 for the sake of these data (and the previous Thompson and Dreyer (2012) version of it). You'll never use this, but it's worth saying out loud that's what I did.
Thompson et al. (2021) dedicate their book to expanding on the various types of rivalry. Users who know the Thompson and Dreyer (2012) version will see a few differences here. First, rivalries no longer have formal primary, secondary, or tertiary types. Instead, rivalries have there/not there markers for whether a particular element of a rivalry type is present in the rivalry. From what I've read so far of Thompson et al. (2021), along with their ordering of the information in Chapter 2, it reads like they've just made informal what was otherwise a more formal classification component to the Thompson and Dreyer (2012) rivalry data. Positional rivalries seem to be an informal "type 1" as Thompson et al. (2021) discuss it, not at all dissimilar to how the classic alliance scholarship treats defense as a "type 1" pledge. No matter, this book is already more explicit that positional and spatial rivalries are clearly different from ideological and interventionary rivalries, and certainly the interventionary rivalries.
"Principal" and "asymmetric principal" rivalries are a new classification in
Thompson et al. (2021), relative to Thompson and Dreyer (2012). "Principal"
rivalries exist where 1) the two rivals have no other rivalry or 2) the two
rivals elevate this rivalry as their primary rivalry among other rivalries.
Asymmetric principal rivalries are when only one of the two rivals sees the
other as its primary rival. Consider two U.S.-Russian rivalries as
illustrative. Tge rivalry with the Soviet Union (tssr_id = 100
) was
the primary rivalry for the U.S. (and the Soviet Union). However, the U.S.
presently sees China as its main rival (tssr_id = 211
). The ongoing
rivalry with Russia (tssr_id = 246
) is one where Russia sees the U.S.
as its primary rival but the U.S. does not see Russia the same way.
Miller, Steven V. 2019. "Create and Extend Strategic (International) Rivalry Data in R". URL: http://svmiller.com/blog/2019/10/create-extend-strategic-rivalry-data-r/
Thompson, William R., Kentaro Sakuwa, and Prashant Hosur Suhas. 2021. Analyzing Strategic Rivalries in World Politics: Types of Rivalry, Regional Variation, and Escalation/De-escalation. Springer.
These are (kind of) dyadic, but mostly state-level data, used internally for doing stuff with the UCDP armed conflict data
ucdp_acd
ucdp_acd
A data frame with 4164 observations on the following 15 variables.
conflict_id
a conflict identifier, not to be confused with an episode identifier (which I don't think UCDP offers)
year
a numeric vector for the year
gwno_a
the Gleditsch-Ward state code for the state on side A of the armed conflict
gwno_a_2nd
the Gleditsch-Ward state code for the state that actively supported side A of the armed conflict with the use of troops
gwno_b
the Gleditsch-Ward state code for the actor on side B of the armed conflict
gwno_b_2nd
the Gleditsch-Ward state code for the state that actively supported side B of the armed conflict with the use of troops
incompatibility
a character vector for the main conflict issue ("territory", "government", "both")
intensity_level
a numeric vector for the intensity level in the calendar year (1 = minor (25-999 deaths), 2 = war (>1,000 deaths))
type_of_conflict
a character vector for the type of conflict ("extrasystemic", "interstate", "intrastate", "II"). "II" is a simple abbreviation of "internationalized intrastate"
start_date
a date of the first battle-related death in the conflict, not to be confused with the first battle-related death of the episode
start_prec
the level of precision for start_date
start_date2
a date of the first battle-related death in the episode, not to be confused with the first battle-related death of the conflict
start_prec2
the level of precision for start_date2
ep_end
a dummy variable for whether the conflict episode ended in the calendar year of observation
ep_end_date
the episode end date, if applicable
The data-raw
directory on the project's Github will show how I processed the multiple strings for when there are multiple states on a given side.
Gleditsch, Nils Petter; Peter Wallensteen, Mikael Eriksson, Margareta Sollenberg & Havard Strand (2002) Armed Conflict 1946–2001: A New Dataset. Journal of Peace Research 39(5): 615–637.
Pettersson, Therese; Stina Hogbladh & Magnus Oberg (2019). Organized violence, 1989-2018 and peace agreements. Journal of Peace Research 56(4): 589-603.
These are state-year level data for armed conflict onsets provided by the Uppsala Conflict Data Program (UCDP).
ucdp_onsets
ucdp_onsets
A data frame with 10142 observations on the following eight variables.
gwcode
a numeric vector for the Gleditsch-Ward state code
year
a numeric vector for the year
sumnewconf
a numeric vector for the sum of new conflicts/conflict-dyads
sumonset1
a numeric vector for the sum of new conflict episodes, whether because this is a new conflict or because there is more than one year since last conflict episode
sumonset2
a numeric vector for the sum of new conflict episodes, whether because this is a new conflict or because there is more than two years since last conflict episode
sumonset3
a numeric vector for the sum of new conflict episodes, whether because this is a new conflict or because there is more than three years since last conflict episode
sumonset5
a numeric vector for the sum of new conflict episodes, whether because this is a new conflict or because there is more than five years since last conflict episode
sumonset10
a numeric vector for the sum of new conflict episodes, whether because this is a new conflict or because there is more than 10 years since last conflict episode
The user will want to note that the data provided by UCDP are technically not country-year observations. They instead duplicate observations for cases of new conflicts or new conflict episodes. Further, the original data do not provide any information about the conflict-dyad in question to which those duplicates pertain. That means the most these data can do for the package's mission is provide summary information. The user should probably recode these variables into something else they may want for a particular application
Gleditsch, Nils Petter; Peter Wallensteen, Mikael Eriksson, Margareta Sollenberg & Havard Strand (2002) Armed Conflict 1946–2001: A New Dataset. Journal of Peace Research 39(5): 615–637.
Pettersson, Therese; Stina Hogbladh & Magnus Oberg (2019). Organized violence, 1989-2018 and peace agreements. Journal of Peace Research 56(4): 589-603.
whittle_conflicts_duration()
is in a class of
do-it-yourself functions for coercing (i.e. "whittling") conflict-year
data with cross-sectional units to unique conflict-year data by
cross-sectional unit. The inspiration here is clearly the problem
of whittling dyadic dispute-year data into true dyad-year data (like in
the Gibler-Miller-Little conflict data). This particular function will
keep the observations with the highest estimated duration.
whittle_conflicts_duration(data, durtype = "mindur") wc_duration(...)
whittle_conflicts_duration(data, durtype = "mindur") wc_duration(...)
data |
a data frame with a declared conflict attribute type. |
durtype |
a duration on which to filter/whittle the data. Options include |
... |
optional, only to make the shortcut work |
Dyads are capable of having multiple disputes in a given year,
which can create a problem for merging into a complete dyad-year
data frame. Consider the case of France and Italy in 1860, which
had three separate dispute onsets that year (MID#0112, MID#0113, MID#0306),
as illustrative of the problem. The default process in peacesciencer
employs several rules to whittle down these duplicate dyad-years for
merging into a dyad-year data frame. These are available in
add_cow_mids()
and add_gml_mids()
.
Some conflicts can be of an unknown length and often come with estimates
of a minimum duration and a maximum duration. This will concern the
durtype
parameter in this function. In many/most conflicts,
certainly thinking of the inter-state dispute data, dates are known with
precision (to the day) and the estimate of minimum conflict duration is
equal to the estimate of maximum conflict duration. For some conflicts,
the estimates will vary. This does importantly imply that using this
particular whittle function with the default (mindur
) will produce
different results than using this particular whittle function and asking
to retain the highest maximum duration (maxdur
). Use the function
with that in mind.
wc_duration()
is a simple, less wordy, shortcut for the same function.
whittle_conflicts_duration()
takes a dyad-year data frame
or leader-dyad-year data frame with a declared conflict attribute type
and, grouping by the dyad and year, returns just those observations that
have the highest observed dispute-level fatality. This will not eliminate
all duplicates, far from it, but it's a sensible cut later into the
procedure (after whittling onsets in whittle_conflicts_onsets(),
and maybe some other things
the extent to which dispute-level duration
is a heuristic for dispute-level severity/importance.
Steven V. Miller
Miller, Steven V. 2021. "How peacesciencer Coerces Dispute-Year Data into Dyad-Year Data". URL: http://svmiller.com/peacesciencer/articles/coerce-dispute-year-dyad-year.html
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_duration() cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_duration()
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_duration() cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_duration()
whittle_conflicts_fatality()
is in a class of
do-it-yourself functions for coercing (i.e. "whittling") conflict-year
data with cross-sectional units to unique conflict-year data by
cross-sectional unit. The inspiration here is clearly the problem
of whittling dyadic dispute-year data into true dyad-year data (like in
the Gibler-Miller-Little conflict data). This particular
function will keep the observations with the highest observed fatality.
whittle_conflicts_fatality(data) wc_fatality(...)
whittle_conflicts_fatality(data) wc_fatality(...)
data |
a data frame with a declared conflict attribute type. |
... |
optional, only to make the shortcut work |
Dyads are capable of having multiple disputes in a given year,
which can create a problem for merging into a complete dyad-year
data frame. Consider the case of France and Italy in 1860, which
had three separate dispute onsets that year (MID#0112, MID#0113, MID#0306),
as illustrative of the problem. The default process in peacesciencer
employs several rules to whittle down these duplicate dyad-years for
merging into a dyad-year data frame. These are available in
add_cow_mids()
and add_gml_mids()
.
As of writing, the Correlates of War and Gibler-Miller-Little conflict data record some -9s for fatalities. In those cases, dispute-level fatality is momentarily recoded to be .5 (i.e. fatal, but without too many fatalities). This is a missing data problem that Gibler and Miller correct in a forthcoming publication in Journal of Conflict Resolution. Until then, this function makes that kind of determination about disputes with missing fatalities.
wc_fatality()
is a simple, less wordy, shortcut for the same function.
whittle_conflicts_fatality()
takes a dyad-year data frame
or leader-dyad-year data frame with a declared conflict attribute type
and, grouping by the dyad and year, returns just those observations
that have the highest observed dispute-level fatality. This will not
eliminate all duplicates, far from it, but it's a sensible second cut
(after whittling onsets in whittle_conflicts_onsets()
the extent
to which dispute-level fatality is a good heuristic for dispute-level
severity/importance.
Steven V. Miller
Miller, Steven V. 2021. "How peacesciencer Coerces Dispute-Year Data into Dyad-Year Data". URL: http://svmiller.com/peacesciencer/articles/coerce-dispute-year-dyad-year.html
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_fatality() cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_fatality()
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_fatality() cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_fatality()
whittle_conflicts_hostility()
is in a class of
do-it-yourself functions for coercing (i.e. "whittling") conflict-year
data with cross-sectional units to unique conflict-year data by
cross-sectional unit. The inspiration here is clearly the problem
of whittling dyadic dispute-year data into true dyad-year data (like in
the Gibler-Miller-Little conflict data). This particular
function will keep the observations with the highest observed hostility.
whittle_conflicts_hostility(data) wc_hostility(...)
whittle_conflicts_hostility(data) wc_hostility(...)
data |
a data frame with a declared conflict attribute type. |
... |
optional, only to make the shortcut work |
Dyads are capable of having multiple disputes in a given year,
which can create a problem for merging into a complete dyad-year
data frame. Consider the case of France and Italy in 1860, which
had three separate dispute onsets that year (MID#0112, MID#0113, MID#0306),
as illustrative of the problem. The default process in peacesciencer
employs several rules to whittle down these duplicate dyad-years for
merging into a dyad-year data frame. These are available in
add_cow_mids()
and add_gml_mids()
.
wc_hostility()
is a simple, less wordy, shortcut for the same function.
whittle_conflicts_hostility()
takes a dyad-year data frame
or leader-dyad-year data frame with a declared conflict attribute type
and, grouping by the dyad and year, returns just those observations that
have the highest observed dispute-level fatality. This will not eliminate
all duplicates, far from it, but it's a sensible second or third cut
(after whittling onsets in whittle_conflicts_onsets()
the extent
to which dispute-level hostility is a good heuristic for
dispute-level severity/importance.
Steven V. Miller
Miller, Steven V. 2021. "How peacesciencer Coerces Dispute-Year Data into Dyad-Year Data". URL: http://svmiller.com/peacesciencer/articles/coerce-dispute-year-dyad-year.html
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_hostility() cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_hostility()
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_hostility() cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_hostility()
whittle_conflicts_jds()
is in a class of
do-it-yourself functions for coercing (i.e. "whittling") conflict-year
data with cross-sectional units to unique conflict-year data by
cross-sectional unit. The inspiration here is clearly the problem
of whittling dyadic dispute-year data into true dyad-year data (like
in the Gibler-Miller-Little conflict data). This particular
function will just drop something, as a kind of nuclear option.
whittle_conflicts_jds(data) wc_jds(...)
whittle_conflicts_jds(data) wc_jds(...)
data |
a data frame with a declared conflict attribute type. |
... |
optional, only to make the shortcut work |
Dyads are capable of having multiple disputes in a given year,
which can create a problem for merging into a complete dyad-year
data frame. Consider the case of France and Italy in 1860, which
had three separate dispute onsets that year (MID#0112, MID#0113, MID#0306),
as illustrative of the problem. The default process in peacesciencer
employs several rules to whittle down these duplicate dyad-years for
merging into a dyad-year data frame. These are available in
add_cow_mids()
and add_gml_mids()
.
This really should be the absolute last exclusion rules a researcher uses. It's a "nuclear option", if you will. Assuming you've run other case exclusion rules to isolate onsets and severe disputes, what remains at the end should be duplicates that are functionally equivalent observations. Your data cannot have duplicates, and these remaining observations are basically the same. Therefore, just drop something.
wc_jds()
is a simple, less wordy, shortcut for the same function.
whittle_conflicts_jds()
takes a dyad-year data frame or
leader-dyad-year data frame with a declared conflict attribute type and,
grouping by the dyad and year, returns just those observations that
have the lowest start month.
Steven V. Miller
Miller, Steven V. 2021. "How peacesciencer Coerces Dispute-Year Data into Dyad-Year Data". URL: http://svmiller.com/peacesciencer/articles/coerce-dispute-year-dyad-year.html
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_jds() cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_jds()
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_jds() cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_jds()
whittle_conflicts_reciprocation()
is in a class of
do-it-yourself functions for coercing (i.e. "whittling") conflict-year
data with cross-sectional units to unique conflict-year data by
cross-sectional unit. The inspiration here is clearly the problem
of whittling dyadic dispute-year data into true dyad-year data (like in
the Gibler-Miller-Little conflict data). This particular
function will drop ongoing conflicts in the presence of unique onsets.
whittle_conflicts_onsets(data) wc_onsets(...)
whittle_conflicts_onsets(data) wc_onsets(...)
data |
a data frame with a declared conflict attribute type. |
... |
optional, only to make the shortcut work |
Dyads are capable of having multiple disputes in a given year,
which can create a problem for merging into a complete dyad-year
data frame. Consider the case of France and Italy in 1860, which
had three separate dispute onsets that year (MID#0112, MID#0113, MID#0306),
as illustrative of the problem. The default process in peacesciencer
employs several rules to whittle down these duplicate dyad-years for
merging into a dyad-year data frame. These are available in
add_cow_mids()
and add_gml_mids()
.
wc_onsets()
is a simple, less wordy, shortcut for the same function.
whittle_conflicts_onsets()
takes a dyad-year data frame
or leader-dyad-year data frame with a declared conflict attribute type
and, grouping by the dyad and year, returns just those observations with
unique onsets where duplicates exist. This will not eliminate all
duplicates, far from it, but it's a sensible place to start.
Steven V. Miller
Miller, Steven V. 2021. "How peacesciencer Coerces Dispute-Year Data into Dyad-Year Data". URL: http://svmiller.com/peacesciencer/articles/coerce-dispute-year-dyad-year.html
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() cow_mid_dirdisps %>% whittle_conflicts_onsets()
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() cow_mid_dirdisps %>% whittle_conflicts_onsets()
whittle_conflicts_reciprocation()
is in a class of
do-it-yourself functions for coercing (i.e. "whittling") conflict-year
data with cross-sectional units to unique conflict-year data by
cross-sectional unit. The inspiration here is clearly the problem
of whittling dyadic dispute-year data into true dyad-year data (like in
the Gibler-Miller-Little conflict data). This particular function will
keep the observations that are reciprocated (i.e. have militarized actions
on both sides of the conflict).
whittle_conflicts_reciprocation(data) wc_recip(...)
whittle_conflicts_reciprocation(data) wc_recip(...)
data |
a data frame with a declared conflict attribute type. |
... |
optional, only to make the shortcut work |
Dyads are capable of having multiple disputes in a given year,
which can create a problem for merging into a complete dyad-year
data frame. Consider the case of France and Italy in 1860, which
had three separate dispute onsets that year (MID#0112, MID#0113, MID#0306),
as illustrative of the problem. The default process in peacesciencer
employs several rules to whittle down these duplicate dyad-years for
merging into a dyad-year data frame. These are available in
add_cow_mids()
and add_gml_mids()
.
Scholars are free to use this as a heuristic for whittling conflict-year data to be coerced into true dyad-year data, but I would be remiss if I did not offer a caveat about the reciprocation variable in inter-state dispute data. Namely, it is noisy and is not doing what scholars often think it's doing in the inter-state dispute data. Reciprocation is observed only when there is a militarized action on both sides of the conflict. By definition, someone on Side A will have a militarized action. Not every state on Side B does. However, scholars should not interpret that as the absence of militarized responses. In a forthcoming article in Journal of Conflict Resolution, Doug Gibler and I make the case that reciprocation isn't a useful variable to maintain at all because it can only invite errors (as is often the case in the CoW-MID data) and will obscure the fact that states that are attacked by another side routinely fight back. On many occasions, they also successfully repel the attack. Scholars who uncritically use this variable, certainly for hypothesis-testing on audience costs, are borrowing trouble with this measure.
wc_recip()
is a simple, less wordy, shortcut for the same function.
whittle_conflicts_reciprocation()
takes a dyad-year data
frame or leader-dyad-year data frame with a declared conflict attribute
type and, grouping by the dyad and year, returns just those observations
that have militarized actions on both sides of the conflict. This will not
eliminate all duplicates, far from it, but it's a sensible cut later into
the procedure (after whittling onsets in whittle_conflicts_onsets()
the extent to which dispute-level reciprocation is a heuristic for
dispute-level severity/importance (after some other considerations).
Steven V. Miller
Miller, Steven V. 2021. "How peacesciencer Coerces Dispute-Year Data into Dyad-Year Data". URL: http://svmiller.com/peacesciencer/articles/coerce-dispute-year-dyad-year.html
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_reciprocation() cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_reciprocation()
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_reciprocation() cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_reciprocation()
whittle_conflicts_startmonth()
is in a class of
do-it-yourself functions for coercing (i.e. "whittling") conflict-year
data with cross-sectional units to unique conflict-year data by
cross-sectional unit. The inspiration here is clearly the problem
of whittling dyadic dispute-year data into true dyad-year data (like in
the Gibler-Miller-Little conflict data). This particular
function will keep the observations that have the lowest start month.
whittle_conflicts_startmonth(data) wc_stmon(...)
whittle_conflicts_startmonth(data) wc_stmon(...)
data |
a data frame with a declared conflict attribute type. |
... |
optional, only to make the shortcut work |
Dyads are capable of having multiple disputes in a given year,
which can create a problem for merging into a complete dyad-year
data frame. Consider the case of France and Italy in 1860, which
had three separate dispute onsets that year (MID#0112, MID#0113, MID#0306),
as illustrative of the problem. The default process in peacesciencer
employs several rules to whittle down these duplicate dyad-years for
merging into a dyad-year data frame. These are available in
add_cow_mids()
and add_gml_mids()
.
This really should be one of the last exclusion rules a researcher uses. There is no substantive reason to assume the lower start month matters for the cause of isolating "serious" or "severe" disputes in the presence of duplicates. It's really just a way of isolating which duplicated observation happened first where remaining duplicates are otherwise very similar to each other.
wc_stmon()
is a simple, less wordy, shortcut for the same function.
whittle_conflicts_startmonth()
takes a dyad-year
data frame or leader-dyad-year data frame with a declared conflict
attribute type and, grouping by the dyad and year, returns just
those observations that have the lowest start month.
Steven V. Miller
Miller, Steven V. 2021. "How peacesciencer Coerces Dispute-Year Data into Dyad-Year Data". URL: http://svmiller.com/peacesciencer/articles/coerce-dispute-year-dyad-year.html
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_startmonth() cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_startmonth()
# just call `library(tidyverse)` at the top of the your script library(magrittr) gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_startmonth() cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_startmonth()