Webscraping

Python is an amazing language to use for webscraping purposes. One of the ways to do this is with the BeautifulSoup (or bs4) package.

python

import requests
import bs4

result = requests.get('https://en.wikipedia.org/wiki/Jonas_Salk')

soup = bs4.BeautifulSoup(result.text,"lxml")

print(soup.select('title')[0].getText())
# soup.select('.some_class')
# soup.select('#some_id')

# Images
pic_element = soup.select('.thumbimage')[0]
print(pic_element['src'])

image_link = requests.get('http:' + pic_element['src'])

# Write image to file
f = open('my_img.jpg', 'wb')
f.write(image_link.content)
f.close()

Webscraping ​

Webscraping