Learn R Programming

This repository contains an R package called finnsurveytext. For further details on how to use the package, please see the package website which contains tutorials covering all the functions available in finnsurveytext.

A video demonstrating use of the first version of the package is available here

A preprint about the package development is also available here.

Background

DARIAH-FI is one of two components of FIN-CLARIAH which is a research infrastructure project for Social Sciences and Humanities (SSH) in Finland. DARIAH-FI involves all Finnish universities with research in SSH.

The first version of our package, finnsurveytext, was the output of WP3.3 of DARIAH-FI. This is a joint work package with Tampere University, University of Eastern Finland, University of Jyvaskyla and University of Helsinki with the objective of "better use of unstructured textual data in the context of Finnish surveys."

The second release is output from WP4.1.6. The main updates in this release are:

  • integration with the survey package by allowing svydesign objects as inputs
  • the inclusion of survey response weights within tables and plots
  • simplification of splitting data into groups within the 'comparison functions'
  • enable use of package for multiple languages (not just Finnish!)

Motivation

Open-ended questions are an important but challenging way to obtain informative data in surveys. Open-ended question data usually requires extra time investment (Fielding et al., 2013), but open-ended questions are particularly useful if researchers do not want to constrain respondents’ answers to pre-specified selections. Open-ended questions allow respondents to provide diverse answers based on their experience, and some answers are probably never thought of by researchers. (He & Schonlau, 2021.)

There's limited support for conducting qualitative analysis on Finnish open-ended survey responses and many researchers are more confident analysing responses to closed questions within surveys.

This package aims to provide a useful and user friendly set of tools for social science researchers to be able to analyse and understand responses to open-ended questions within their surveys.

Components

There are 5 sets of functions included in the finnsurveytext package. These are:

  1. Preparation functions (R/01_prepare.R) and (R/01b_prepare_svydesign.R)
    • These are functions to annotate survey data into a useful format (CoNLL-U) for analysis. There is a 'main' function within this set, fst_prepare() which combines the other preparation functions and can be run as a single function to prepare data for analysis.
    • The second set of preparation functions enables the use of a svydesign object as input.
  2. Data exploration functions (R/02_data_exploration.R)
    • This file contains a number of functions which can be used for exploratory data analysis such as summary tables, plotting frequently occurring words and phrases, and creating wordclouds.
  3. Concept Network functions (R/03_concept_network.R)
    • All our concept network functions for a single network are in this file. Our concept network is one way of visualising the data that allows for interpretation. Our concept network function uses the TextRank algorithm which is a graph-based ranking model for text processing. Vertices represent words and co-occurrence between words is shown through edges. Word importance is determined recursively where words get more weight based on how many words co-occur and the weight of these co-occurring words.
  4. Comparison functions (R/04_comparison_functions.R)
    • We have created partner functions for all the data exploration functions which compare different sets of data. These comparison functions can be used to compare different cohorts of survey respondents based on responses to closed questions such as gender, education level, location, age, etc.
  5. Comparison concept network functions (R/05_comparison_concept_network.R)
    • Similarly, in this script we have functions for comparing respondent cohort responses in concept networks.

Function Demos and Tutorials

Tutorials accompanying each of these R scripts can be found in the 'Articles' tab within the website. These tutorials use the sample data outlined below.

A BETA demo of the package can also be launched by running the function finnsurveytext::runDemo() within R.

Sample Data

Our repository also contains sample data which can be used to demonstrate and learn the functionality of finnsurveytext.

The sample data comes from 3 surveys and can be found in the 'data' folder. The raw data (just from the relevant open-ended questions) is in data/bullying_data.rda, data/dev_data.rda, and data/english_sample_survey.rda. The data folder also contains examples of this data after the preparation functions have been applied and split by sample cohort groups.

The raw Finnish data can also be downloaded from the Finnish Social Science Data Archive and the English survey is available from GESIS – Leibniz Institute for the Social Sciences.

  1. Child Barometer 2016 Data
    • Source: FSD3134 Lapsibarometri 2016
    • Open-ended questions: q3 'Missä asioissa olet hyvä? (Avokysymys)', q7 ‘Kertoisitko, mitä sinun mielestäsi kiusaaminen on? (Avokysymys)’, q11 'Mikä tekee sinut iloiseksi? (Avokysymys)'
    • Licence: (A) openly available for all users without registration (CC BY 4.0).
    • Link to Data: https://urn.fi/urn:nbn:fi:fsd:T-FSD3134
  2. Young Peoples' Views on Development Cooperation 2012 Data
    • Source: FSD2821 Nuorten ajatuksia kehitysyhteistyöstä 2012
    • Open-ended questions: q11_1 ‘Jatka lausetta: Kehitysmaa on maa, jossa… (Avokysymys)’, q11_2 ‘Jatka lausetta: Kehitysyhteistyö on toimintaa, jossa… (Avokysymys)’, q11_3’ Jatka lausetta: Maailman kolme suurinta ongelmaa ovat… (Avokysymys)’
    • Licence: (A) openly available for all users without registration (CC BY 4.0).
    • Link to Data: https://urn.fi/urn:nbn:fi:fsd:T-FSD2821
  3. Patient Joe (open-ended question)
    • Source: GESIS – Leibniz Institute for the Social Sciences

    • Open-ended question: 'Joe’s doctor told him that he would need to return in two weeks to find out whether or not his condition had improved. But when Joe asked the receptionist for an appointment, he was told that it would be over a month before the next available appointment. What should Joe do?'

    • Licence: CC BY 4.0: Attribution

    • Link to Data: https://doi.org/10.7802/2474

Installation and License

The package is available under the MIT license. The released version of finnsurveytext can be installed from the CRAN: install.packages("finnsurveytext")

References

Fielding, J., Fielding, N., & Hughes, G. (2013). Opening up open-ended survey data using qualitative software. Quality & Quantity, 47(6), 3261–3276. https://doi.org/10.1007/s11135-012-9716-1.

Finnish Children and Youth Foundation: Young People’s Views on Development Cooperation 2012 [dataset]. Version 2.0 (2019-01-22). Finnish Social Science Data Archive [distributor]. https://urn.fi/urn:nbn:fi:fsd:T-FSD2821

He, Z., & Schonlau, M. (2021). Coding Text Answers to Open-ended Questions: Human Coders and Statistical Learning Algorithms Make Similar Mistakes. Methods, Data, Analyses, 15(1), Article 1. https://doi.org/10.12758/mda.2020.10.

Schonlau, Matthias (2022). Patient Joe (open-ended question). GESIS, Cologne. Data File Version 1.0.0, https://doi.org/10.7802/2474.

The Office of Ombudsman for Children: Child Barometer 2016 [dataset]. Version 1.0 (2016-12-09). Finnish Social Science Data Archive [distributor]. https://urn.fi/urn:nbn:fi:fsd:T-FSD3134

Copy Link

Version

Install

install.packages('finnsurveytext')

Monthly Downloads

279

Version

2.1.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Adeline Clarke

Last Published

March 6th, 2025

Functions in finnsurveytext (2.1.1)

fst_dev_coop_2

Young People's Views on Development Cooperation 2012 q11_3 response data in CoNLL-U format with NTLK stopwords removed
fst_comparison_cloud

Make comparison cloud
fst_concept_network

Concept Network - Make Concept Network plot
fst_cn_search

Concept Network - Search TextRank for concepts
fst_format

Annotate open-ended survey responses in into CoNLL-U format
fst_get_unique_ngrams

Get unique n-grams from a list of top n-grams tables
fst_join_unique

Merge N-grams table with unique words
fst_prepare

Read In and format survey text responses
fst_freq

Find and Plot Top Words
fst_find_stopwords

Get available stopwords lists
fst_pos_compare

Compare parts-of-speech
fst_length_compare

Compare response lengths
fst_pos

Make POS Summary Table
fst_freq_compare

Compare and plot top words
fst_ngrams_table2

Make Top N-grams Table 2
fst_summarise

Make Summary Table
fst_rm_stop_punct

Remove stopwords and punctuation from CoNLL-U dataframe
fst_freq_plot

Make Top Words plot
fst_ngrams_table

Make Top N-grams Table
fst_summarise_compare

Make comparison summary
fst_summarise_short

Make Simple Summary Table
fst_freq_table

Make Top Words Table
fst_ngrams_plot

Make N-grams plot
fst_wordcloud

Make Wordcloud
fst_get_unique_ngrams_separate

Get unique n-grams from separate top n-grams tables
fst_use_svydesign

Add `svydesign` weights to CoNLL-U data
fst_ngrams_compare

Compare and plot top n-grams
fst_ngrams_compare_plot

Plot comparison n-grams
fst_length_summary

Make Length Summary Table
fst_ngrams

Find and Plot Top N-grams
fst_prepare_svydesign

Read In and format survey text responses from `svydesign` object
fst_print_available_models

Find treebanks available for use
%>%

Pipe operator
runDemo

Run Shiny App Demo
english_sample_survey

English Sample Survey Data: Patient Joe
child

Child Barometer 2016 response data
fst_cn_compare_plot

Concept Network- Plot comparison Concept Network
fst_cn_get_unique_separate

Concept Network- Get unique nodes from separate top n-grams tables
fst_cn_edges

Concept Network - Get TextRank edges
fst_cn_get_unique

Concept Network- Get unique nodes from a list of top n-grams tables
fst_cn_nodes

Concept Network - Get TextRank nodes
fst_cn_plot

Plot Concept Network
fst_format_svydesign

Annotate open-ended survey responses within a `svydesign` object into CoNLL-U format
dev_coop

Young People's Views on Development Cooperation 2012 response data
fst_child

Child Barometer 2016 Bullying response data in CoNLL-U format with NLTK stopwords removed and background variables
fst_child_2

Child Barometer 2016 Bullying response data in CoNLL-U format with NLTK stopwords removed
fst_concept_network_compare

Concept Network- Compare and plot Concept Network
fst_dev_coop

Young People's Views on Development Cooperation 2012 q11_3 response data in CoNLL-U format with NTLK stopwords removed and background variables.