Hi i am back again with some new tips, hope you like it and find it useful. I hope you know the term "Web Scraping", if not then no need to worry, i will first explain you what is "Web Scraping" in detail and why do we need it and then i will explain you how you can do web scraping in iOS.
What is Web Scraping ?
In simple words we can define web scraping as "Web scraping (web harvesting or web data extraction) is data scraping used for extracting data from websites." Web scraping a web page involves fetching it and extracting from it. Fetching is the downloading of a page (which a browser does when you view the page). Therefore, web crawling is a main component of web scraping, to fetch pages for later processing. Once fetched, then extraction can take place. The content of a page may be parsed, searched, reformatted, its data copied into a spreadsheet, and so on. Web scrapers typically take something out of a page, to make use of it for another purpose somewhere else. An example would be to find and copy names and phone numbers, or companies and their URLs, to a list (contact scraping).
Web scraping is used for contact scraping, and as a component of applications used for web indexing, web mining and data mining, online price change monitoring and price comparison, product review scraping (to watch the competition), gathering real estate listings, weather data monitoring, website change detection, research, tracking online presence and reputation, web mashup and, web data integration.
How we can do Web Scraping in iOS ?
There are many ways by which you can do web scraping. First you can use Apple's library called libxml2.dylib. I will not explain you this method in detail, but i will provide you the link from where you can learn how to use this lib. You can click here to go directly to that link. This will explain you in detail about web scraping, if you know about it you can skip this part but i will recommend you to read about it. Next, it will explain you how to include this lib in xcode project along with the code that can be used in web scraping.
Second way of web scraping is by using third party libraries. These are open source libraries. You only have to include these libraries in your project and start using them. I will provide you the links of libraries that i have used.
If you are coding in objective C then click here and if you are coding in swift then click here.
You might be wondering that i am only just providing links here without explaining anything. But that's not the case. I will explain you the most easiest way to do web scraping without using any libraries. So are you ready !. Lets begin
Third way of web scraping is very simple as i said before. First add a UIWevView to a storyboard. Make IBOutlet of UIWebView. Now below i will explain you with the code. I will write code in objective C.
Make a function as shown below that will accept a url as a string
- (void)loadRequestFromString:(NSString*)urlString
{
NSURL *url = [NSURL URLWithString:urlString];
NSURLRequest *urlRequest = [NSURLRequest requestWithURL:url];
[self.myWebView loadRequest:urlRequest];
}
Now call this function in the viewDidLoad method like this
[self loadRequestFromString:pageURL];
where pageURL is the url of the page on which you want to do web scraping. Now add a button to a storyboard, on which scraping will be done. Make IBAction of that button as shown below and connect that button with it.
- (IBAction)getTableBtnPressed:(id)sender
{
}
Here i will show you how to scrap a table out of a web page. Now implement the IBAction as shown below
- (IBAction)getTableBtnPressed:(id)sender
{
NSString* string = [_myWebView stringByEvaluatingJavaScriptFromString: @"document.getElementsByClassName('table table-bordered')[0].innerHTML;"];
NSString *htmlString = [NSString stringWithFormat:@"%@%@%@",@"<html><body><table>",string,@"</table></body></html>"];
[_myWebView loadHTMLString:[htmlString stringByReplacingOccurrencesOfString:@"\n" withString:@"<br/>"] baseURL:nil];
}
You might be confused, what is happening in the above code. But don't worry i will explain you line by line.
NSString* string = [_myWebView stringByEvaluatingJavaScriptFromString: @"document.getElementsByClassName('table table-bordered')[0].innerHTML;"];
here _myWebView is the IBOutlet of UIWebView. Call the method of UIWebView class i.e stringByEvaluatingJavaScriptFromString, which accepts the string as a parameter. This parameter @"document.getElementsByClassName('table table-bordered')[0].innerHTML;" will provide all the html tags used in the web page of a particular class named as "table table-bordered". If you want to get this class name, you can easily get this by loading the web page in the google chrome. After the page is loaded, right click on the web page and then click on the inspect button. A window will get open that will contain the elements of the web page as shown below.
Now to get the class for the particular item that you want to scrap, just click on top arrow button and then hover on the item that you want to scrap. I will show you the class of the table which i am scraping.
so now you have learned how you can get the class for a particular element, lets get back to the code.
If you have many tags with the same class name then this function will return all those tags in the array. For example, you have 2 tables with the same class name, you will get those two table elements in the array. Its upto you whether you want both tables or first table or a second table. If you want both table then remove that subscript [0] from the code. If you want first table write subscript [0] and so on. In the above example i am only retrieving first table.
NSString *htmlString = [NSString stringWithFormat:@"%@%@%@",@"<html><body><table>",string,@"</table></body></html>"];
This line is very simple, it is only converting your string into html string, by placing opening and closing html and body tags.
[_myWebView loadHTMLString:[htmlString stringByReplacingOccurrencesOfString:@"\n" withString:@"<br/>"] baseURL:nil];
Last step, load that html string into the webview by replacing all new line with a <br/> tag.
Now run the application. You will see that you have successfully scraped the table out of a web page.
Below is the screenshot of a table which i have scraped.