WP Scraper Pro Documentation

Home/WP Scraper Pro Documentation
WP Scraper Pro Documentation 2018-11-11T10:07:52+00:00

Installation

Uploading via WordPress Dashboard

  1. Navigate to the ‘Add New’ in the plugins dashboard
  2. Navigate to the ‘Upload’ area
  3. Select wp-scraper-pro.zip from your computer
  4. Click ‘Install Now’
  5. Activate the plugin in the Plugin dashboard

Using FTP

  1. Download wp-scraper-pro.zip
  2. Extract the wp-scraper-pro.zip directory to your computer
  3. Upload the wp-scraper-pro directory to the /wp-content/plugins/ directory
  4. Activate the plugin in the Plugin dashboard

Generating URL’s

URL Visual Selector
The visual selector works much in the same way as the Content Selector. The difference is you will only need to select one link to the pages you wish to scrape into your site. The visual selector will then import all similar links into a list for you to use with the multiple scraper.

PHP Crawler

Domain Pattern:

Only follow links with the same url. – www.example.com and sub.example.com

Only follow links with the same domain. – www.example.com not sub.example.com

Only follow links in the same path as the given url. – If the url is
www.example.com/path/index.html, only get urls in www.example.com/path/

Number of Pages:
Allowed Options: 10, 25, 50, 75 and 100 – This sets the amount of webpages to pull from the url.

Skip Links:
Optionally skip a certain number of links. This is useful if you have already scraped a number of links from a website and want to scrape more pages now. For example, if you already created posts with 10 links from this url, and now you want to grab the next 10 links, you would enter 10 into the box above.

Depth Limit:
Optionally set the depth limit for crawling pages. If this value is set to 1, it will only gather webpages that are linked on the entry page. If it is set to 2, it will also gather all webpages linked to the pages found on the entry page.

Request Delay: Seconds
Optionally delay each request to the url. This can keep your site from making too many requests at once to the url.

Path Matching: Contains – Ends with
Optionally add a word to match within urls.
For example, choosing “contains foo” above would only add webpages to the list that have “foo” in the path such as example.com/foo or example.com/path/this-page-has-foo

Note: The list of URL’s will vary in quantity and accuracy depending on the site your retrieving them from.

Once you have generated a list, Every url listed in the box below will be used to generate content for your site. Remove any generated urls that you do not want to pull content from.

If the PHP crawler and the visual selector does not meet your needs, we recommend trying Spider: ( free version )
https://www.screamingfrog.co.uk/seo-spider/

Content Selection

Highlighting And Selection
You may select as much content as you wish by simply clicking to select the blocks of content you want. As your mouse hovers over the page, a blue box will appear to illustrate what content you will get. If there is an area within the blue box that you do not wish to include, simply click it again. A red box will appear inside the blue box to illustrate content that will be excluded.

Add selected content
Hit the add selected content to my post button on top of window and the content will be added to the WP Scraper post editor.

How much should I select
Depending on server resources adding content to your post may take anywhere from a few seconds to a few minutes. The more content you import into your post, the longer scraping will take. If it takes too long to scrape, you could try increasing the “Time Delay Between Scrapes” under Extract Options.

Single Scrape

*URL
Enter the URL you wish to copy content from.

*Title
You may select a title from the source page or add your own.

*Post Content
You may select multiple areas of the source page including images.

Post Type
Post Type: Post, Page – Status: Published, Draft, Pending Review

Options
Only Text and Images – Checked will remove all html elements except p, div, table, list, break, headings, span, and images. CSS will not be included with this option and links and videos are automatically removed.

Remove Links – Checked will remove all external links from the content.

Add source link to the content – Checked will Add source link to the content.

Categories
Select a category or create a new one.

Tags
Select tags from source page or add your own.

Featured Image
Select an image from the source page or add your own.

* Required

Multiple Scrape 

*Titles
Select a title from source page or add your own.

*Post Content
You may select multiple areas of the source page including images.

Post Type
Post Type: Post, Page – Status: Published Draft, Pending Draft

Options – Include Images, Format Tables, Remove Links, Add source link to the content

HTML Options – Strip all HTML, Include Post HTML, Include Basic HTML, or you can specify exactly which HTML to keep in the content

*Categories
Select a category or create a new one.

*Tags
Select tags from source page or add your own.

*Featured Image
Select an image from the source page or add your own.

Extract Options

Load JavaScript
Some content may need javascript enabled to display correctly. Check this box to enable javascript while selecting content.

Load Restricted Image Content
Some images will not load due to cross domain conflicts. Use this feature to load these restricted images. However, it doesn’t work with all server configurations. Use with caution.

Time Delay Between Scrapes
This option will scrape one page at a time with this delay between each post. This will help manage your server resources. Choices Include: None, Ten Seconds, Thirty Seconds or One Minute

*If you type in your own content into the multiple scraper fields then the content will be repeated throughout all the posts. If you choose the content from the source page for any of these fields then the scraper will find and add the content to each post.

WP Scraper Results

The results of the scraper will be shown in real time as posts are created. Please remain on this page until all of your posts are created or it will interrupt the scraping process. Each post will display as soon as it is made. You can view or edit any of the new posts from this screen by simply clicking the provided links. When scraping is complete the progress bar will be removed and a message will be displayed showing that the process is now complete. After completion you are free to navigate from the page.

Errors When Scraping

There are usually only two reasons that a page fails to scrape. The first is if your php allowed memory size is too small to handle the scrape. You can change your php.ini settings to allow for a higher memory_limit. The other main reason scrapes fail is from the selector not being exactly the same on all pages. If this is the case simply rescrape the remaining pages with new selectors.

If you consistently receive multiple errors, try increasing the “Time Delay Between Scrapes” under Extract Options.

WP Scraper Pro will supply you with a list of url’s that failed so you can try again.

WP Scraper Pro and WP Live Scraper are intended solely for copying content that is in the public domain or other wise not protected by any copyright laws in any country.

Please obey the copyright laws of the country you are copying content from. Wp Scraper Pro does not assume any sort of legal responsibility or liability for the consequences of coping content that is protected under any copyright law of any country.

For more information about copyright laws please visit http://www.copyright.gov/.