# graphframes v0.1.2

Monthly downloads

## Interface for 'GraphFrames'

A 'sparklyr' <https://spark.rstudio.com/> extension that provides an R
interface for 'GraphFrames' <https://graphframes.github.io/>. 'GraphFrames' is a package
for 'Apache Spark' that provides a DataFrame-based API for working with graphs. Functionality
includes motif finding and common graph algorithms, such as PageRank and Breadth-first
search.

## Readme

# R interface for GraphFrames

- Support for GraphFrames which aims to provide the functionality of GraphX.
- Perform graph algorithms like: PageRank, ShortestPaths and many others.
- Designed to work with sparklyr and the sparklyr extensions.

## Installation

For those already using `sparklyr`

simply run:

```
install.packages("graphframes")
# or, for the development version,
# devtools::install_github("rstudio/graphframes")
```

Otherwise, install first `sparklyr`

from CRAN using:

```
install.packages("sparklyr")
```

The examples make use of the `highschool`

dataset from the `ggplot`

package.

## Getting Started

We will calculate PageRank over the built-in “friends” dataset as follows.

```
library(graphframes)
library(sparklyr)
library(dplyr)
# connect to spark using sparklyr
sc <- spark_connect(master = "local", version = "2.3.0")
# obtain the example graph
g <- gf_friends(sc)
# compute PageRank
results <- gf_pagerank(g, tol = 0.01, reset_probability = 0.15)
results
```

```
## GraphFrame
## Vertices:
## $ id <chr> "f", "b", "g", "a", "d", "c", "e"
## $ name <chr> "Fanny", "Bob", "Gabby", "Alice", "David", "Charlie",...
## $ age <int> 36, 36, 60, 34, 29, 30, 32
## $ pagerank <dbl> 0.3283607, 2.6555078, 0.1799821, 0.4491063, 0.3283607...
## Edges:
## $ src <chr> "b", "c", "d", "e", "a", "a", "e", "f"
## $ dst <chr> "c", "b", "a", "f", "e", "b", "d", "c"
## $ relationship <chr> "follow", "follow", "friend", "follow", "friend",...
## $ weight <dbl> 1.0, 1.0, 1.0, 0.5, 0.5, 0.5, 0.5, 1.0
```

We can then visualize the results by collecting the results to R:

```
library(tidygraph)
library(ggraph)
vertices <- results %>%
gf_vertices() %>%
collect()
edges <- results %>%
gf_edges() %>%
collect()
edges %>%
as_tbl_graph() %>%
activate(nodes) %>%
left_join(vertices, by = c(name = "id")) %>%
ggraph(layout = "nicely") +
geom_node_label(aes(label = name.y, color = pagerank)) +
geom_edge_link(
aes(
alpha = weight,
start_cap = label_rect(node1.name.y),
end_cap = label_rect(node2.name.y)
),
arrow = arrow(length = unit(4, "mm"))
) +
theme_graph(fg_text_colour = 'white')
```

## Further Reading

Appart from calculating `PageRank`

using `gf_pagerank`

, many other
functions are available, including:

`gf_bfs()`

: Breadth-first search (BFS).`gf_connected_components()`

: Connected components.`gf_shortest_paths()`

: Shortest paths algorithm.`gf_scc()`

: Strongly connected components.`gf_triangle_count()`

: Computes the number of triangles passing through each vertex and others.`gf_degrees()`

: Degrees of vertices

For instance, one can calculate the degrees of vertices using
`gf_degrees`

as follows:

```
gf_friends(sc) %>% gf_degrees()
```

```
## # Source: spark<?> [?? x 2]
## id degree
## * <chr> <int>
## 1 f 2
## 2 b 3
## 3 a 3
## 4 c 3
## 5 e 3
## 6 d 2
```

Finally, we disconnect from Spark:

```
spark_disconnect(sc)
```

## Functions in graphframes

Name | Description | |

gf_friends | Graph of friends in a social network. | |

gf_grid_ising_model | Generate a grid Ising model with random parameters | |

gf_graphframe | Create a new GraphFrame | |

gf_in_degrees | In-degrees of vertices | |

gf_edges | Extract edges DataFrame | |

gf_find | Motif finding: Searching the graph for structural patterns | |

gf_bfs | Breadth-first search (BFS) | |

gf_cache | Cache the GraphFrame | |

gf_two_blobs | Generate two blobs | |

gf_unpersist | Unpersist the GraphFrame | |

gf_vertex_columns | Vertices column names | |

gf_vertices | Extract vertices DataFrame | |

gf_chain | Chain graph | |

gf_shortest_paths | Shortest paths | |

gf_star | Generate a star graph | |

gf_connected_components | Connected components | |

gf_pagerank | PageRank | |

gf_triangle_count | Computes the number of triangles passing through each vertex. | |

gf_register | Register a GraphFrame object | |

gf_scc | Strongly connected components | |

gf_triplets | Triplets of graph | |

gf_persist | Persist the GraphFrame | |

spark_graphframe | Retrieve a GraphFrame | |

gf_degrees | Degrees of vertices | |

gf_edge_columns | Edges column names | |

gf_lpa | Label propagation algorithm (LPA) | |

gf_out_degrees | Out-degrees of vertices | |

No Results! |

## Last month downloads

## Details

Type | Package |

URL | https://github.com/rstudio/graphframes |

BugReports | https://github.com/rstudio/graphframes/issues |

License | Apache License 2.0 | file LICENSE |

Encoding | UTF-8 |

LazyData | true |

RoxygenNote | 6.1.0 |

NeedsCompilation | no |

Packaged | 2018-10-30 19:01:05 UTC; kevinykuo |

Repository | CRAN |

Date/Publication | 2018-10-30 19:20:03 UTC |

#### Include our badge in your README

```
[![Rdoc](http://www.rdocumentation.org/badges/version/graphframes)](http://www.rdocumentation.org/packages/graphframes)
```