harvestPage(user, parseFun = parsePost, results = 1, nextToken = NULL, cr = 1)
parsePost
function. If set to
NULL
, then the raw list of the retrieved posts is
being returned. Defaults to parsePost
.parsePost
or a
data frame with that (or another user-supplied) function
already applied.When writing your own parsing functions, make sure that the
function takes a single list item from the raw list as its
argument and returns a vector of values or a one-row data
frame. The return values of the function are then fed into
plyr
's ldply
to turn it into a data frame.
See parsePost
for an example.
The length of the list or the number of rows of the data
frame are somewhat ambiguous. Specifying the results
argument will try to get that many results. But there may
be less (because Google could not find more) or more
(because Google is organizing results on pages and it would
be a waste to discard them automatically). If you really
depend on getting not more rows than you expected, use
standard selection (i.e. [
) to trim the results.
## Not run:
# myPosts.df <- harvestPage("115046504166916768425")
# gPosts.df <- harvestPage("+google", results=200)
# ## End(Not run)
Run the code above in your browser using DataLab