JSoup (http://jsoup.org/) is a powerful java parsing library containing many useful methods for extracting information from sites' html code. My interest in it arose when I had to find some way of obtaining meta-data for a client's radio station & printing it to a non-editable JTextArea in a Swing layout. Here was the solution I came up with :
Document doc;
try {
doc = Jsoup.connect("http://37.187.193.36:8104/index.html").userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2").get();
String title = doc.title();
Element body = doc.body();
Elements bTags = body.getElementsByTag("b");
for(Element i : bTags){
if(i.toString().contains("-") && !i.toString().contains("Nullsoft")){
System.out.println(i.text());
metainfo.setText("Now Playing: " + "\n" + i.text());
metainfo.setWrapStyleWord(true);
}
}
} catch (IOException ex) {
Logger.getLogger(EVRNMediaPlayer.class.getName()).log(Level.SEVERE, null, ex);
}
I knew the exact address from which I could obtain the information, which is accessed by JSoup using the connect() method. However, it would have been impossible if it were not for userAgent(). Once the title & body of the document was obtained, it was possible to filter out the <b> tags from the body, one of which contained the information needed. These were stored in an array, and knowing that the 'current song' information was the only string in that list that contained the dash symbol - apart from a copyright notice by Nullsoft at the end - I set up the enhanced for loop. This iterated through the <b> tags & their content and converted the entry which contained a dash, and did not contain 'Nullsoft', giving me my result. The text was then set as the metainfo text on the JTextArea & voila!
A second example of the usefulness of JSoup, is seen in this code :
String[] codes = {"AAPL", "MSFT"};
String baseUrl = "http://finance.yahoo.com/q?s=";
String ua = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.33 (KHTML, like Gecko) Chrome/27.0.1438.7 Safari/537.33";
for (String code : codes) {
String url = baseUrl + code;
Document doc = Jsoup.connect(url).userAgent(ua).timeout(10*1000).get();
String price = doc.select(".time_rtq_ticker").first().text();
String name = doc.select(".title h2").first().text();
System.out.println(String.format("%s [%s] is trading at %s", name, code, price));
}
Firstly, the array contains codes related to the stock market, namely for Apple & Microsoft! Creating a baseUrl and user agent follows. Similar to my other approach, this code example uses an enhanced for loop to iterate through the 'codes' array using the String 'code', and adds a company code to the baseUrl in the sequence in which they occur in the array. Then, as before Jsoup connects to the concatenated 'url' (with the additional timeout parameter being added) & filters the html, whilst creating a price and name String, which are finally formatted in a manner which displays them as readable. The output for this code is :
"Apple Inc. (AAPL) [AAPL] is trading at 109.55
Microsoft Corporation (MSFT) [MSFT] is trading at 45.92"
No comments:
Post a Comment