You are required to scrap data from IMDb top 250 movies page. It should only have fields movie name, year, and rating.
I can provide you with a Python code example that uses the Beautiful Soup library to scrape data from the IMDb Top 250 Movies page for movie names, years, and ratings. To do this, you'll also need the requests library to make HTTP requests. Make sure you have both libraries installed:
Now, you can use the following code to scrape the IMDb data:
# Define the URL of the IMDb Top 250 Movies page
url = "https://www.imdb.com/chart/top/"
# Send an HTTP GET request to the URL
response = requests.get(url)
# Check if the request was successful (status code 200)
if response.status_code == 200:
# Parse the HTML content of the page using BeautifulSoup
soup = BeautifulSoup(response.text, "html.parser")
# Find the table containing the movie data
table = soup.find("table", {"class": "chart full-width"})
# Initialize lists to store movie names, years, and ratings
movie_names = []
release_years = []
ratings = []
# Iterate through the rows of the table
for row in table.find_all("tr")[1:]:
cells = row.find_all("td")
movie_name = cells[1].find("a").text.strip()
year = cells[1].find("span", {"class": "secondaryInfo"}).text.strip("() ")
rating = cells[2].find("strong").text.strip()
movie_names.append(movie_name)
release_years.append(year)
ratings.append(rating)
# Print the scraped data
for i in range(len(movie_names)):
print(f"{movie_names[i]} ({release_years[i]}): {ratings[i]}")
else:
print("Failed to retrieve data. Status code:", response.status_code)