prepare_network_data: Prepare Network Data for LPANDA

Description

Transforms time series data of local election results into a set of network data for use in Local Political Actor Network Diachronic Analysis (LPANDA). The function constructs a bipartite network (candidate – candidate list), its projected one-mode networks (candidate – candidate and list – list), a continuity graph (linking candidate lists between adjacent elections), and an elections network (its node attributes can serve as electoral statistics). It also detects parties (as clusters of candidate lists based on community detection applied to the bipartite network) and constructs their network.

Usage

prepare_network_data(df, input_variable_map = list(), verbose = TRUE, ...)

Value

A list of network data objects for diachronic analysis using LPANDA or other social network analysis tools. Each component contains edgelist (data.frame of edges) and node_attr (data.frame of node attributes). The exact set of columns depends on the input and may evolve. See Output data structure for a description of the returned object.

Arguments

df: A data.frame containing data from elections, with one row per candidate. The function also accepts a single election, though diachronic outputs will then be empty or trivial. See the Expected structure of input data section for the expected data format and required variables.
input_variable_map: A list mapping variable names in df that differ from the expected ones:

elections = unique election identifiers (numeric),
candidate = candidate's name used as a unique identifier (character),
list_name = name of the candidate list (character),
list_pos = candidate's position on the list (numeric),
pref_votes = preferential votes received by the candidate (numeric),
list_votes = * total votes received by the candidate list (numeric),
elected = whether the candidate was elected (logical),
nom_party = party that nominated the candidate (character),
pol_affil = declared political affiliation of the candidate (character),
mayor = whether the councillor became mayor (logical),
dep_mayor = whether the councillor became deputy mayor (logical),
board = whether the councillor became a member of the executive board (logical),
gov_support = whether the councillor supported the executive body (logical),
elig_voters = * number of eligible voters (numeric),
ballots_cast = * number of ballots cast (numeric),
const_size = * size of the constituency (number of seats) (numeric)

* Variables marked with an asterisk should appear only once per election and constituency — in the row of any one candidate running in that specific elections and constituency.

See the Expected input data structure section to find out how to use it.
verbose: Logical, default TRUE. If FALSE, suppresses informative messages.
...: Optional arguments reserved for internal development, experimental features and future extensions, such as include_cores (logical, default FALSE). Not intended for standard use yet (behavior may change without notice). Unknown keys in ... are ignored.

Expected structure of input data

The input data frame (df) must include at least the election identifiers (year[.month]), candidates' names (uniquely identifying individuals), and list names. Other variables are optional. If variable names in the dataset differ from the expected ones, they should be specified in the input_variable_map as a named list (only differing names need to be listed).

Just in case - a named list is a list where each element has a name (the expected variable name) and a value (the actual name used in your data frame), for example: list(list_name = "party", elected = "seat", list_votes = "votes_total").

Examples of expected and acceptable values in df:

elections (required): Election identifier in the format YY[YY][.MM]: e.g., 94 | 02 | 1998 | "2024" | 2022.11
candidate (required): e.g., "John Doe" | "John Smith (5)" | "Jane Doe, jr."
list_name (required): for independent candidates, you can use: e.g., "John Smith, Independent Candidate" | "J.S., IND."
list_pos, pref_votes, list_votes: must be numeric
elected, mayor, dep_mayor, board, gov_support: 1 | "0" | T | "F" | "TRUE" | FALSE (non‑logical inputs will be coerced to logical).
nom_party: for independent candidates, you can use: "IND" | "Independent Candidate"
pol_affil: for independent candidates, you can use: "non-partisan"
elig_voters, ballots_cast, const_size: A numeric that should appear only once in any candidate row within a given election and constituency

If pref_votes are present but list_votes are not, the function assumes a voting system where list votes are calculated by summing the preferential votes of candidates on the list.

If const_size is missing, it will be estimated based on the number of elected candidates (if available).

For the purposes of analysis, a new variable list_id (class character) is added to the internally processed copy of df and carried to the output. It uniquely identifies each candidate list in a given election (combining list_name and elections), e.g., Besti Flokkurinn (2010), SNP (2019), or "John Smith (5), IND. (2022.11)". This variable serves as a key identifier in LPANDA for tracking candidate lists across elections and constructing network relations.

Output data structure

The returned object is a named list with up to seven network objects:

bipartite: bipartite network (candidates-lists).
candidates: projected candidate–candidate network.
lists: projected list–list network (directed by election order).
continuity: filtered version of lists network (edges of adjacent elections only).
parties: network of detected party clusters (via community detection applied on bipartite network).
(cores): higher-level clusters of parties. Cores are currently experimental and will not appear in the standard output network data. See Note.
elections: inter-election candidate flow and election-level stats

Each object is a list with two components:

edgelist: a data.frame representing network edges
node_attr: a data.frame with attributes for each node

For example, ...$candidates$edgelist contains edges between individuals who appeared on the same candidate list, and ...$elections$node_attr includes several election statistics (e.g., number of candidates, distributed seats, plurality index, voter turnout for each election, etc.).

Examples

Run this code

data(sample_different_varnames, package = "lpanda")
df <- sample_different_varnames
str(df) # different variable names: "party" and "seat"
input_variable_map <- list(list_name = "party", elected = "seat")
# \donttest{
netdata <- prepare_network_data(df, input_variable_map, verbose = FALSE)
str(netdata, vec.len = 1)
# }

Run the code above in your browser using DataLab