Data collection is a very common task. As a part of our daily routines, we might need to pull data through web-scraping from a website. There are many packages in python to extract data from websites using selenium, beautiful soup, etc..,
Recently, I was asked by a colleague to pull data from an Excel sheet. The excel sheet has a column in which each value has a hyperlink. She asked me to write code, that could open each hyperlink in that column and pull the data from the landing page. There is no direct formula to extract hyperlinks from a value in Excel. We would require a VBA formula to get links embedded in the string. I used python to do the same job.
When I opened the Excel through pandas,
It just read the data. Pandas failed to load the hyperlinks. To get hyperlinks, I followed these steps.
In the above code snippet, I used module openpyxl to load the excel. Extracted hyperlink from each cell value through for loop and saved the data in a temp file. I used the temp file to do all the manipulations thereafter.
コメント