datamart: Common interface to various data sources.
Description
datamart provides several S4 classes to access and cache
data sources on the internet or elsewhere. Its aim is to
extent the functionality of the data() function by
enabling parametrized data requests and offering a data
update process.Details
At the heart of datamart are two new methods: With
query() you actually use your data source and request
some data. With scrape() you curate your data, that is,
you may provide a mechanism to akquire new data or thin
out old data. A common usage is as follows.
You create an instance of your data object, e.g. tw
<- twttr(user='wagezudenken', dbi=sqlite("my.db")
for a
twitter client. Before you start working, you update your
local cache via scrape(tw)
. Now you can query your
data for various time frames e.g. query(tw,
"User_timeline", from=as.POSIXct("2011-06-01"),
to=as.POSIXct("2011-12-31"))
Of course, you can define
additional resources (such as "User_timeline") to mine
your data. A list of all defined resources is available
via queries(tw)
.
The datamart package provides basic infrastructure for
the data collection, i.e. the generic methods, and some
examples to prove the concept. The package is inspired by
the
https://bitbucket.org/ScraperWiki/scraperwiki{scraperwiki
project}, which provides a webbased service for data
collection. Also inspiring are
http://reference.wolfram.com/mathematica/ref/CountryData.html{Mathematica's
xxxData functions}, which provide in-built parametrizable
datasets.References
Karsten W., factbased blogspot.
http://factbased.blogspot.com/search/label/datamart