Thursday, September 19, 2024

Mapping out the connections of Oscar Winners | by Milan Janosov | Feb, 2024

Must read


Towards Data Science

On this brief piece, I exploit public Wikipedia information, Python programming, and community evaluation to extract and draw up a community of Oscar-winning actors and actresses.

All pictures have been created by the writer.

Wikipedia, as the most important free, crowdsourced on-line encyclopedia, serves as a tremendously wealthy information supply on varied public domains. Many of those domains, from movie to politics, contain varied layers of networks beneath, expressing differing types of social phenomena similar to collaboration. Because of the approaching Academy Awards Ceremony, right here I present the instance of Oscar-winning actors and actresses on how we will use easy Pythonic strategies to show Wiki websites into networks.

First, let’s check out how, for example, the Wiki listing of all Oscar-winning actors is structured:

Wiki listing of all Oscar-winning actors

This subpage properly exhibits all of the individuals who have ever obtained an Oscar and have been granted a Wiki profile (most certainly, no actors and actresses have been missed by the followers). On this article, I give attention to appearing, which may be discovered within the following 4 subpages — together with most important and supporting actors and actresses:

urls = { 'actor'         :'https://en.wikipedia.org/wiki/Class:Best_Actor_Academy_Award_winners',
'actress' : 'https://en.wikipedia.org/wiki/Class:Best_Actress_Academy_Award_winners',
'supporting_actor' : 'https://en.wikipedia.org/wiki/Class:Best_Supporting_Actor_Academy_Award_winners',
'supporting_actress' : 'https://en.wikipedia.org/wiki/Class:Best_Supporting_Actress_Academy_Award_winners'}

Now let’s write a easy block of code that checks every of those 4 listings, and utilizing the packages urllib and beautifulsoup, extracts the title of all artists:

from urllib.request import urlopen
import bs4 as bs
import re

# Iterate throughout the 4 classes
people_data = []

for class, url in urls.gadgets():

# Question the title itemizing web page and…



Supply hyperlink

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest article