Solve the Web scraping in R course DataCamp. Read the selectorgadget vignette.
IMPORTANT: The lack of an open API on a web site may indicate that the host is reluctant to share data. Respect this by not republishing their data without consent, see katalogskyddet for the Swedish legal rights.
Note that there is also a R package robotstxt
,
which uses an informal standard for specifying up to what degree
scraping etc. is allowed on a website. Check the pkg vignette
for further information.
Scrape title, author, rating, price, … on books listed at pockettoppen
(some may be extracted with html_text
, others with
html_attr
). Make use of the selectorgadget
to learn about the structure of the webpage!
Given a player-url (e.g. http://www.shl.se/lag/087a-087aTQv9u__frolunda-hc/qQ9-a5b4QRqdS__ryan-lasch), extract name, date of birth, age, nationality…
Given a players statistics-url (e.g. http://www.shl.se/lag/087a-087aTQv9u__frolunda-hc/qQ9-a5b4QRqdS__ryan-lasch/statistics),
extract season statistics (säsongsstatistik) with
html_table
.
Given a team-url (e.g. https://www.shl.se/lag/8e6f-8e6fUXJvi__malmo-redhawks/roster), extract a list of player-url for the team’s players.
Scrape the headlines at https://www.dn.se/.