parsing toolsWhy do you need web scraping in Python?
Raw data is the foundation for successful data science work. There are many sources of data, and websites are one of them. They can often be a secondary source of information, such as: data aggregation sites ( Worldometers ), news sites (CNBC), social media (Twitter), e-commerce platforms (Shopee), and so on. These websites provide the information needed for data science projects.
But how do we collect the data? We can’t copy and paste it manually, can we? In such a situation, the solution to the problem is web scraping in Python. This programming language has a powerful library BeautifulSoup, as well as an automation tool Selenium. Both of them are often used by specialists to collect data of different formats. In this section, we will first get acquainted with BeautifulSoup.
STEP 1. INSTALLING LIBRARIES
BeautifulSoup4
Requests
pandas
lxml
To install the library, you can use pip install [library name] or conda install [library name] if you have Anaconda Prompt.
Web Scraping with Python: A Beginner's GuideWeb Scraping with Python: A Beginner's Guide
“Requests” is our next library to install. Its job is to ask the server for permission if we want to get data from its website. Then we need to install pandas to create the data frame and lxml to change the HTML into a Python-friendly format.
STEP 2. IMPORTING LIBRARIES
After installing the libraries, let's open your favorite development environment. We suggest using Spyder 4.2.5. Later, at some stages of work, we will encounter large volumes of output data, and then Spyder will be more convenient to use than Jupyter Notebook.
So, Spyder is open and we can import the required library:
First of all, we need to install the necessary libraries, namely:
-
- Posts: 497
- Joined: Thu Jan 02, 2025 7:13 am