Gym Data Scraping: April 2015

Thursday, 30 April 2015

Earn Money From Price Comparison Through Web Scraping

Many individuals discover the pot of gold just within their reach. They have realized that there is money in the web. Cyber technology has blessed mankind with so many benefits that makes money very possible by just some clicks on the mouse and keyboard. Building a price comparison website is an effective way of helping clients find their desired products while you as the owner earn money at the same time.

Building price comparison websites

There is indeed much money in building price comparison websites but it is not an easy task especially for a novice in maintaining a website of one’s own. Since this entails some serious programming and ample familiarity with data feeds, you have to have a good working plan. In addition, what you are venturing into is greater than the usual blogs about just anything that you can think of. Furthermore, you are stepping into the vast field of electronic marketing, therefore you must be ready.

The first point of consideration is to identify which products or services are you going to include in your website. Choose a product or service that you and a majority of clients are mostly interested in. Suppose you want you to choose sports as your theme then you can include items and prices of sports gear, clothing such as uniforms, training videos, books, and other safety stuff. You need to do some research and even a survey to determine whether the goods and services you are promoting on your website are in demand and are what most people want to know. Moreover, it is on this stage that you may need the help of experts and veterans in the field of building to be assured that you are on the right track.

In addition, be willing to change in case your chosen category is not gaining readership or visitors. Then evaluate whether you need to expand or to be more specific in your description of the products and the comparison of the prices. Make your site prominent by search engine optimization (SEO) and make sure to acknowledge also that not too many people visit a site that is not free.

Helping visitors choose the best product/services

Good marketing strategy starts with knowing who your target audience are. There is indeed a need to do a lot of planning and research in order to understand your client’s needs and preferences. Moreover, knowing them thoroughly leads to achieving 100% consumer satisfaction. When you have provided everything they need to know about certain products, they would not need to seek elsewhere which will also gain you more regular visitors. Remember that your audience are members of communities and social networks such that there is a great possibility that they would spread the word around about the good services you are offering.

If there is a need to conduct a survey in addition to research, you should resort to it. In this manner you can discover what goods and services are not yet completely exhausted by the other websites or web creators. Ample knowledge about your potential visitors and consumers will surely make you effectively provide them with adequate statistics for their needs.

Your site will then look like a complete guidebook for them that will give them the best value for their money. Therefore, it must be thoroughly filled with product details, uses, options, and prices.

Making money as affiliate of eCommerce websites

Maintaining a price comparison website gives you less worry about getting paid or having your products bought and sold because income comes in through advertising and affiliate sales. Affiliate marketing is a way of earning money online by serving as a publisher for promotion of products, services or sites of businesses. The affiliate receives rewards from businesses for each visitor or client that comes to the business website or buys its product through the efforts of the advertising and promotion that is made by the affiliate. This is the online version of the concept of agent or referral fee sales channel. Aside from website owners, bloggers as well as members of community forums can also serve as affiliates. The affiliate earns money in three ways: through pay per link; pay per sale and pay per lead.

Trust in the reliability of the product - You should have a personal belief or confidence in the product you are promoting not only because it makes you sound more convincing, but also because you need to maintain your clients and establish credibility in your blog or website. In other words, don’t just pick any product. If you cannot use them personally, they should at least have several positive reviews and no negative ones.

Maintain credibility with readers and fellow bloggers - Befriend your readers and your co-bloggers by answering their queries sincerely and quickly. Your friendly attitude can win you their trust which is a very vital element of affiliate marketing.

Do reviews - In addition to publishing price comparison, you can gain more visitors by writing about the product and do proper SEO (Search Engine Optimization). So the expected happens, the more prominent the product becomes online, the higher will be your income.

Link with friends thru social media - Your friends have friends and their friends have also friends. Just think of how powerful your social media site can be when you post your link on your account on Facebook, Twitter or MySpace and others. Since trust is built on friendship, it is easy to get clients from among your friends and their friends.

Overall, you get all pertinent information about certain products through web data mining or web scraping. All you need to do is to be keen to the needs of your clients and use web content extraction efficiently.

Source: http://www.loginworks.com/earn-money-price-comparison-web-scraping/

Tuesday, 28 April 2015

A Guide to Web Scraping Tools

Web Scrapers are tools designed to extract / gather data in a website via crawling engine usually made in Java, Python, Ruby and other programming languages.Web Scrapers are also called as Web Data Extractor, Data Harvester , Crawler and so on which most of them are web-based or can be installed in local desktops.

Its main purpose is to enable webmasters, bloggers, journalist and virtual assistants to harvest data from a certain website whether text, numbers, contact details and images in a structured way which cannot be done easily thru manual copy and paste method. Typically, it transforms the unstructured data on the web, from HTML format into a structured data stored in a local database or spreadsheet or automates web human browsing.

Web Scraper Usage

Web Scrapers are also being used by SEO and Online Marketing Analyst to pull out some data privately from the competitor’s website such as high targeted keywords, valuable links, emails & traffic sources that were also perform by SEOClerk, Google and many other web crawling sites.

Includes:

•    Price comparison
•    Weather data monitoring
•    Website change detection
•    Research
•    Web mash up
•    Info graphics
•    Web data integration
•    Web Indexing & rank checking
•    Analyze websites quality links

List of Popular Web Scrapers

There are hundreds of Web Scrapers today available for both commercial and personal use. If you’ve never done any web scraping before, there are basic

Web scraping tools like YahooPipes, Google Web Scrapers and Outwit Firefox extensions that it’s good to start with but if you need something more flexible and has extra functionality then, check out the following:

HarvestMan [ Free Open Source]

HarvestMan is a web crawler application written in the Python programming language. HarvestMan can be used to download files from websites, according to a number of user-specified rules. The latest version of HarvestMan supports as much as 60 plus customization options. HarvestMan is a console (command-line) application. HarvestMan is the only open source, multithreaded web-crawler program written in the Python language. HarvestMan is released under the GNU General Public License.Like Scrapy, HarvestMan is truly flexible however, your first installation would not be easy.

Scraperwiki [Commercial]

Using a minimal programming you will be able to extract anything. Off course, you can also request a private scraper if there’s an exclusive in there you want to protect. In other words, it’s a marketplace for data scraping.

Scraperwiki is a site that encourages programmers, journalists and anyone else to take online information and turn it into legitimate datasets. It’s a great resource for learning how to do your own “real” scrapes using Ruby, Python or PHP. But it’s also a good way to cheat the system a little bit. You can search the existing scrapes to see if your target website has already been done. But there’s another cool feature where you can request new scrapers be built. All in all, a fantastic tool for learning more about scraping and getting the desired results while sharpening your own skills.

Best use: Request help with a scrape, or find a similar scrape to adapt for your purposes.

FiveFilters.org [Commercial]

Is an online web scraper available for commercial use. Provides easy content extraction using Full-Text RSS tool which can identify and extract web content (news articles, blog posts, Wikipedia entries, and more) and return it in an easy to parse format. Advantages; speedy article extraction, Multi-page support, has a Autodetection and you can deploy on the cloud server without database required.

Kimono

Produced by Kimono labs this tool lets you convert data to into apis for automated export.   Benjamin Spiegel did a great Youmoz post on how to build a custom ranking tool with Kimono, well worth checking out!

Mozenda [Commercial]

This is a unique tool for web data extraction or web scarping.Designed for easiest and fastest way of getting data from the web for everyone. It has a point & click interface and with the power of the cloud you can scrape, store, and manage your data all with Mozenda’s incredible back-end hardware. More advance, you can automate your data extraction leaving without a trace using Mozenda’s anonymous proxy feature that could rotate tons of IP’s .

Need that data on a schedule? Every day? Each hour? Mozenda takes the hassle out of automating and publishing extracted data. Tell Mozenda what data you want once, and then get it however frequently you need it. Plus it allows advanced programming using REST API the user can connect directly Mozenda account.

Mozenda’s Data Mining Software is packed full of useful applications especially for sales people. You can do things such as “lead generation, forecasting, acquiring information for establishing budgets, competitor pricing analysis. This software is a great companion for marketing plan & sales plan creating.

Using Refine Capture tetx tool, Mozenda is smart enough to filter the text you want stays clean or get the specific text or split them into pieces.

80Legs [Commercial]

The first time I heard about 80Legs my mind really got confused of what really this software does. 80Legs like Mozenda is a web-based data extraction tool with customizable features:

•    Select which websites to crawl by entering URLs or uploading a seed list
•    Specify what data to extract by using a pre-built extractor or creating your own
•    Run a directed or general web crawler
•    Select how many web pages you want to crawl
•    Choose specific file types to analyze

80 legs offers customized web crawling that lets you get very specific about your crawling parameters, which tell 80legs what web pages you want to crawl and what data to collect from those web pages and also the general web crawling which can collect data like web page content, outgoing links and other data. Large web crawls take advantage of 80legs’ ability to run massively parallel crawls.

Also crawls data feeds and offers web extraction design services. (No installation needed)

ScrapeBox [Commercial]

ScrapeBox are most popular web scraping tools to SEO experts, online marketers and even spammers with its very user-friendly interface you can easily harvest data from a website;

•    Grab Emails
•    Check page rank
•    Checked high value backlinks
•    Export URLS
•    Checked Index
•    Verify working proxies
•    Powerful RSS Submission

Using thousands of rotating proxies you will be able to sneak on the competitor’s site keywords, do research on .gov sites, harvesting data, and commenting without getting blocked.

The latest updates allow the users to spin comments and anchor text to avoid getting detected by search engines.

You can also check out my guide to using Scrapebox for finding guest posting opportunities:

Scrape.it [Commercial]

Using a simple point & click Chrome Extension tool, you can extract data from websites that render in javascript. You can automate filling out forms, extract data from popups, navigate and crawl links across multiple pages, extract images from even the most complex websites with very little learning curve. Schedule jobs to run at regular intervals.

When a website changes layout or your web scraper stops working, scrape.it will fix it automatically so that you can continue to receive data uninterrupted and without the need for you to recreate or edit it yourself.

They work with enterprises using our own tool that we built to deliver fully managed solutions for competitive pricing analysis, business intelligence, market research, lead generation, process automation and compliance & risk management requirements.

Features:

    Very easy web date extraction with Windows like Explorer interface

    Allowing you to extract text, images and files from modern Web 2.0 and HTML5 websites which uses Javascript & AJAX.

    The user could select what features they’re going to pay with

    lifetime upgrade and support at no extra charge on premium license

Scrapy [Free Open Source]

Off course the list would not be cool without Scrapy, it is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Features:

•         Design with simplicity- Just writes the rules to extract the data from web pages and let Scrapy crawl the entire web site. It can crawl 500 retailers’ sites daily.

•         Ability to attach new code for extensibility without having to touch the framework core

•         Portable, open-source, 100% Python- Scrapy is completely written in Python and runs on Linux, Windows, Mac and BSD

•         Scrapy comes with lots of functionality built in.

•         Scrapy is extensively documented and has an comprehensive test suite with very good code coverage

•         Good community and commercial support

Cons: The installation process is hard to perfect especially for beginners

Needlebase [Commercial]

Many organizations, from private companies to government agencies, store their info in a searchable database that requires you navigate a list page listing results, and a detail page with more information about each result. Grabbing all this information could result in thousands of clicks, but as long as it fits the same formula, Needlebase can do it for you. Point and click on example data from one page once to show Needlebase how your site is structured, and it will use that pattern to extract the information you’re looking for into a dataset. You can query the data through Needle’s site, or you can output it as a CSV or other file format of your choice. Needlebase can also rerun your scraper every day to continuously update your dataset.

OutwitHub [Free]

This Firefox extension is one of the more robust free products that exists Write your own formula to help it find information you’re looking for, or just tell it to download all the PDFs listed on a given page. It will suggest certain pieces of information it can extract easily, but it’s flexible enough for you to be very specific in directing it. The documentation for Outwit is especially well written, they even have a number of tutorials for what you might be looking to do. So if you can’t easily figure out how to accomplish what you want, investing a little time to push it further can go a long way.

Best use: more text

irobotsoft [Free}

This is a free program that is essentially a GUI for web scraping. There’s a pretty steep learning curve to figure out how to work it, and the documentation appears to reference an old version of the software. It’s the latest in a long tradition of tools that lets a user click through the logic of web scraping. Generally, these are a good way to wrap your head around the moving parts of a scrape, but the products have drawbacks of their own that makes them little easier than doing the same thing with scripts.

Cons: The documentation seems outdated

Best use: Slightly complex scrapes involving multiple layers.

iMacros [Free]

The same ethos on how microsoft macros works, iMacros automates repetitive task.Whether you choose the website, Firefox extension, or Internet Explorer add-on flavor of this tool, it can automate navigating through the structure of a website to get to the piece of info you care about. Record your actions once, navigating to a specific page, and entering a search term or username where appropriate. Especially useful for navigating to a specific stock you care about, or campaign contribution data that’s mired deep in an agency website and lacks a unique Web address. Extract that key piece (pieces) of info into a usable form. Can also help convert Web tables into usable data, but OutwitHub is really more suited to that purpose. Helpful video and text tutorials enable you to get up to speed quickly.

Best use: Eliminate repetition in navigating to a particular datapoint in a website that you’re checking up on often by recording a repeatable action that pulls the datapoint out of the clutter it’s naturally surrounded by.

InfoExtractor [Commercial]

This is a neat little web service that generates all sorts of information given a list of urls. Currently, it only works for YouTube video pages, YouTube user profile pages, Wikipedia entries, Huffingtonpost posts, Blogcatalog blog posts and The Heritage Foundation blog (The Foundry). Given a url, the tool will return structured information including title, tags, view count, comments and so on.

Google Web Scraper [Free]

A browser-based web scraper works like Firefox’s Outwit Hub, it’s designed for plain text extraction from any online pages and export to spreadsheets via Google docs. Google Web Scraper can be downloaded as an extension and you can install it in your Chrome browser without seconds. To use it: highlight a part of the webpage you’d like to scrape, right-click and choose “Scrape similar…”. Anything that’s similar to what you highlighted will be rendered in a table ready for export, compatible with Google Docs™. The latest version still had some bugs on spreadsheets.

Cons: It doesn’t work for images and sometimes it can’t perform well on huge volume of text but it’s easy and fast to use.

Tutorials:

Scraping Website Images Manually using Google Inspect Elements

The main purpose of Google Inspect Elements is for debugging like the Firefox Firebug however, if you’re flexible you can use this tool also for harvesting images in a website. Your main goal is to get the specific images like web backgrounds, buttons, banners, header images and product images which is very useful for web designers.

Now, this is a very easy task. First, you will definitely need to download and install the Google Chrome browser in your computer. After the installation do the following:

1. Open the desired webpage in Google Chrome

2. Highlight any part of the website and right click > choose Google Inspect Elements

3. In the Google Inspect Elements, go to Resources tab

4. Under Resources tab, expand all folders. You will eventually see script folders and IMAGES folders

5. In the Images folders, just use arrow keys to find the images you need to have (see the screenshot above)

6. Next, right click the images and choose Open the Image in New Tab

7. Finally, right click the image > choose Save Image As… . (save to your local folder)

You’re done!

How to Extract Links from a Web Page with OutWit Hub

In this tutorial we are going to learn how to extract links from a webpage with OutWit Hub.

Sometimes it can be useful to extract all links from a given web page. OutWit Hub is the easiest way to achieve this goal.

1. Launch OutWit Hub

If you haven’t installed OutWit Hub yet, please refer to the Getting Started with OutWit Hub tutorial.

Begin by launching OutWit Hub from Firefox. Open Firefox then click on the OutWit Button in the toolbar.

If the icon is not visible go to the menu bar and select Tools -> OutWit -> OutWit Hub

OutWit Hub will open displaying the Web page currently loaded on Firefox.

2. Go to the Desired Web Page

In the address bar, type the URL of the Website.

Go to the Page view where you can see the Web page as it would appear in a traditional browser.

Now, select “Links” from the view list.

In the “Links” widget, OutWit Hub displays all the links from the current page.

If you want to export results to Excel, just select all links using ctrl/cmd + A, then copy using ctrl/cmd + C and paste it in Excel (ctrl/cmd + V).

Source: http://www.garethjames.net/a-guide-to-web-scrapping-tools/

Saturday, 25 April 2015

Social Media Crawling & Scraping services for Brand Monitoring

Crawling social media sites for extracting information is a fairly new concept – mainly due to the fact that most of the social media networking sites have cropped up in the last decade or so. But it’s equally (if not more) important to grab this ever-expanding User-Generated-Content (UGC) as this is the data that companies are interested in the most – such as product/service reviews, feedback, complaints, brand monitoring, brand analysis, competitor analysis, overall sentiment towards the brand, and so on.

Scraping social networking sites such as Twitter, Linkedin, Google Plus, Instagram etc. is not an easy task for in-house data acquisition departments of most companies as these sites have complex structures and also restrict the amount and frequency of the data that they let out to crawlers. This kind of a task is best left to an expert, such as PromptCloud’s Social Media Data Acquisition Service – which can take care of your end-to-end requirements and provide you with the desired data in a minimal turnaround time. Most of the popular social networking sites such as Twitter and Facebook let crawlers extract data only through their own API (Application Programming Interface), so as to control the amount of information about their users and their activities.

PromptCloud respects all these restrictions with respect to access to content and frequency of hitting their servers to make sure that user information is not compromised and their experience with the site is unhindered.

Social Media Scraping Experts

At PromptCloud, we have developed an expertise in crawling and scraping social media data in real-time. Such data can be from diverse sources such as – Twitter, Linkedin groups, blogs, news, reviews etc. Popular usage of this data is in brand monitoring, trend watching, sentiment/competitor analysis & customer service, among others.

Our low-latency component can extract data on the basis of specific keywords, categories, geographies, or a combination of these. We can also take care of complexities such as multiple languages as well as tweets and profiles of specific users (based on keywords or geographies). Sample XML data can be accessed through this link – demo.promptcloud.com.

Structured data is delivered via a single REST-based API and every time new content is published, the feed gets updated automatically. We also provide data in any other preferred formats (XML, CSV, XLS etc.).

If you have a social media data acquisition problem that you want to get solved, please do get in touch with us.

Source: https://www.promptcloud.com/social-media-networking-sites-crawling-service/

Tuesday, 21 April 2015

Hand Scraped Flooring For a Natural and Unique Look

An option in hardwood flooring that is being increasingly adopted by those looking for something new, innovative and unique for their homes is hand scraped flooring. This type of wood flooring helps one achieve a distinct natural look on one's floor and also has a couple of advantages.

There are three types of scraping that you can get done on your wooden flooring: light, medium and hard. Preferably, if you have a light colored woodwork, then you should go for light scraping and if your floor has a darker shade, then you should opt for hard scraping. But, irrespective of the type of scraping you go for, you must ensure that the laborers doing the scraping are very skilled and impeccable in their job as hand scraping floors is an art that demands patience, time, talent and hard work.

Nowadays, many people tend to go for machine scraping, attracted by the lower investment involved in it. But such people are unable to achieve the requisite natural effect on their floors as machines create patterns on the floors that are easily detectable. These patterns do not emerge with hand scraping and the consequent look is as random and unique as it gets.

Though such scraped flooring is a costly option in flooring, it demands little maintenance. While with perfectly smooth surfaces, you will be always on the edge ensuring that there are no scratches, with hand scraped floors, you will not have to be concerned about this as any new scratches will only add to the already distressed appearance of the flooring.

Prefinished hand scraped wood flooring is also available in the market nowadays. These eliminate the need of any on-site scraping. But this option is of course unsuitable for those who have already got their floors installed. As it is, if you get on-site scraping done, you will have more control over things as you would be able to see the scraping as it develops and would be therefore in a position to exercise your preferences more.

Source: http://ezinearticles.com/?Hand-Scraped-Flooring-For-a-Natural-and-Unique-Look&id=4581623

Thursday, 9 April 2015

What are the ethics of web scraping?

Someone recently asked: "Is web scraping an ethical concept?" I believe that web scraping is absolutely an ethical concept. Web scraping (or screen scraping) is a mechanism to have a computer read a website. There is absolutely no technical difference between an automated computer viewing a website and a human-driven computer viewing a website. Furthermore, if done correctly, scraping can provide many benefits to all involved.

There are a bunch of great uses for web scraping. First, services like Instapaper, which allow saving content for reading on the go, use screen scraping to save a copy of the website to your phone. Second, services like Mint.com, an app which tells you where and how you are spending your money, uses screen scraping to access your bank's website (all with your permission). This is useful because banks do not provide many ways for programmers to access your financial data, even if you want them to. By getting access to your data, programmers can provide really interesting visualizations and insight into your spending habits, which can help you save money.

That said, web scraping can veer into unethical territory. This can take the form of reading websites much quicker than a human could, which can cause difficulty for the servers to handle it. This can cause degraded performance in the website. Malicious hackers use this tactic in what’s known as a "Denial of Service" attack.

Another aspect of unethical web scraping comes in what you do with that data. Some people will scrape the contents of a website and post it as their own, in effect stealing this content. This is a big no-no for the same reasons that taking someone else's book and putting your name on it is a bad idea. Intellectual property, copyright and trademark laws still apply on the internet and your legal recourse is much the same. People engaging in web scraping should make every effort to comply with the stated terms of service for a website. Even when in compliance with those terms, you should take special care in ensuring your activity doesn't affect other users of a website.

One of the downsides to screen scraping is it can be a brittle process. Minor changes to the backing website can often leave a scraper completely broken. Herein lies the mechanism for prevention: making changes to the structure of the code of your website can wreak havoc on a screen scraper's ability to extract information. Periodically making changes that are invisible to the user but affect the content of the code being returned is the most effective mechanism to thwart screen scrapers. That said, this is only a set-back. Authors of screen scrapers can always update them and, as there is no technical difference between a computer-backed browser and a human-backed browser, there's no way to 100% prevent access.

Going forward, I expect screen scraping to increase. One of the main reasons for screen scraping is that the underlying website doesn't have a way for programmers to get access to the data they want. As the number of programmers (and the need for programmers) increases over time, so too will the need for data sources. It is unreasonable to expect every company to dedicate the resources to build a programmer-friendly access point. Screen scraping puts the onus of data extraction on the programmer, not the company with the data, which can work out well for all involved.

Source: https://quickleft.com/blog/is-web-scraping-ethical/

Tuesday, 7 April 2015

Evest: Easy Web Scraping With R

rvest is new package that makes it easy to scrape (or harvest) data from html web pages, inspired by libraries like beautiful soup. It is designed to work with magrittr so that you can express complex operations as elegant pipelines composed of simple, easily understood pieces. Install it with:

install.packages("rvest")

rvest in action

To see rvest in action, imagine we’d like to scrape some information about The Lego Movie from IMDB. We start by downloading and parsing the file with html():

library(rvest)

lego_movie <- html("http://www.imdb.com/title/tt1490017/")

To extract the rating, we start with selectorgadget to figure out which css selector matches the data we want: strong span. (If you haven’t heard of selectorgadget, make sure to read vignette("selectorgadget") – it’s the easiest way to determine which selector extracts the data that you’re interested in.) We use html_node() to find the first node that matches that selector, extract its contents with html_text(), and convert it to numeric with as.numeric():

lego_movie %>%

html_node("strong span") %>%

html_text() %>%

as.numeric()

#> [1] 7.9

We use a similar process to extract the cast, using html_nodes() to find all nodes that match the selector:

lego_movie %>%

html_nodes("#titleCast .itemprop span") %>%

html_text()

#> [1] "Will Arnett"     "Elizabeth Banks" "Craig Berry"

#> [4] "Alison Brie"     "David Burrows"   "Anthony Daniels"

#> [7] "Charlie Day"     "Amanda Farinos" "Keith Ferguson"

#> [10] "Will Ferrell"    "Will Forte"      "Dave Franco"

#> [13] "Morgan Freeman" "Todd Hansen"     "Jonah Hill"

The titles and authors of recent message board postings are stored in a the third table on the page. We can use html_node() and [[ to find it, then coerce it to a data frame with html_table():

lego_movie %>%

html_nodes("table") %>%

.[[3]] %>%

html_table()

#>                                              X 1            NA

#> 1 this movie is very very deep and philosophical   mrdoctor524

#> 2 This got an 8.0 and Wizard of Oz got an 8.1... marr-justinm

#> 3                         Discouraging Building?       Laestig

#> 4                              LEGO - the plural      neil-476

#> 5                                 Academy Awards   browncoatjw

#> 6                    what was the funniest part? actionjacksin

Other important functions

•    If you prefer, you can use xpath selectors instead of css: html_nodes(doc, xpath = "//table//td")).

•    Extract the tag names with html_tag(), text with html_text(), a single attribute with html_attr() or all attributes with html_attrs().

•    Detect and repair text encoding problems with guess_encoding() and repair_encoding().

•    Navigate around a website as if you’re in a browser with html_session(), jump_to(), follow_link(), back(), and forward(). Extract, modify and submit forms with html_form(), set_values() and submit_form(). (This is still a work in progress, so I’d love your feedback.)

To see these functions in action, check out package demos with demo(package = "rvest").

Source: http://blog.rstudio.org/2014/11/24/rvest-easy-web-scraping-with-r/