For my midterm, I plan to use the Wikipedia package, which allows you to access and retrieve information from Wikipedia using Python.
In the tutorial, we will:
- install wikipedia package and import it
- take "weibo" as an example, extracting the data such as the number of images and hyperlinks we can get by the functions.
- compare with "twitter", and "reddit" visualizing how we use the data for psychology research
What is Wikipedia?¶
Wikipedia is a Python library that simplifies accessing and parsing data from Wikipedia. I'll start by explaining how Wikipedia works, and I might even demonstrate the website by clicking on the links generated through Python. This will help everyone get a clearer sense of what the output typically looks like.
Getting started¶
First we need to install the wikipedia package. Go to your terminal and type pip install wikipedia
.
Here I'll help everyone install the package. This package doesn't ask users to fill out the API because it is a public.
Once you have that, we're ready to begin. Let's import the wikipedia
package!
# load wikipedia
import wikipedia
How eaxtly do we use functions in wikipedia package?¶
Using wikipedia.WikipediaPage() to Query a Term¶
When we use the wikipedia package, we can search for specific terms just like we would on the Wikipedia website. For example, if we want to learn about "Weibo," we first define "Weibo" as page_title because Python needs it as a string.
Next, we use the function wikipedia.WikipediaPage(page_title) to access the "Weibo" page, just like clicking the page on Wikipedia. This function allows us to get important information from the page, like the summary, links, references, and more.
page_title="weibo"
page = wikipedia.WikipediaPage(page_title)
page
<WikipediaPage 'Weibo'>
Arguments around page¶
Summary:¶
We’ll get the summary of the page, which provides an overview of the topic. Here we can see how wikipedia summarizes weibo.
summary=page.summary
summary
'Weibo (Chinese: 微博; pinyin: Wēibó), previously Sina Weibo (Chinese: 新浪微博; pinyin: Xīnlàng Wēibó), is a Chinese microblogging (weibo) website. Launched by Sina Corporation on 14 August 2009, it is one of the biggest social media platforms in China, with over 582 million monthly active users (252 million daily active users) as of Q1 2022. The platform has been a huge financial success, with surging stocks, lucrative advertising sales and high revenue and total earnings per quarter. At the start of 2018, it surpassed the US$30 billion market valuation mark for the first time.\nIn March 2014, Sina Corporation announced a spinoff of Sina Weibo, a separate entity called simply "Weibo", and filed an IPO under the symbol WB. Sina carved out 11% of Weibo in the IPO, with Alibaba owning 32% post-IPO. The company began trading publicly on 17 April 2014. In March 2017, Sina launched Sina Weibo International Version. In November 2018, Sina Weibo suspended its registration function for minors under the age of 14. In July 2019, Sina Weibo announced that it would launch a two-month campaign to clean up pornographic and vulgar information, named "Project Deep Blue" (蔚蓝计划). On 29 September 2020, the company announced it would go private again due to rising tensions between the US and China. Sina had gone public on the Nasdaq in 2000. As of September 2021 Sina Weibo had 523 million active monthly users, with three in seven of those using the site daily. Sina Weibo has attracted criticism over censoring its users.\n\n'
Content¶
If we want the full content of the page, we can use page.content to generate the entire text of the Wikipedia page for the term we're searching.
content=page.content
content
'Weibo (Chinese: 微博; pinyin: Wēibó), previously Sina Weibo (Chinese: 新浪微博; pinyin: Xīnlàng Wēibó), is a Chinese microblogging (weibo) website. Launched by Sina Corporation on 14 August 2009, it is one of the biggest social media platforms in China, with over 582 million monthly active users (252 million daily active users) as of Q1 2022. The platform has been a huge financial success, with surging stocks, lucrative advertising sales and high revenue and total earnings per quarter. At the start of 2018, it surpassed the US$30 billion market valuation mark for the first time.\nIn March 2014, Sina Corporation announced a spinoff of Sina Weibo, a separate entity called simply "Weibo", and filed an IPO under the symbol WB. Sina carved out 11% of Weibo in the IPO, with Alibaba owning 32% post-IPO. The company began trading publicly on 17 April 2014. In March 2017, Sina launched Sina Weibo International Version. In November 2018, Sina Weibo suspended its registration function for minors under the age of 14. In July 2019, Sina Weibo announced that it would launch a two-month campaign to clean up pornographic and vulgar information, named "Project Deep Blue" (蔚蓝计划). On 29 September 2020, the company announced it would go private again due to rising tensions between the US and China. Sina had gone public on the Nasdaq in 2000. As of September 2021 Sina Weibo had 523 million active monthly users, with three in seven of those using the site daily. Sina Weibo has attracted criticism over censoring its users.\n\n\n== Name ==\n"Weibo" (微博) is the Chinese word for "microblog". Sina Weibo launched its new domain name weibo.com on 7 April 2011, deactivating and redirecting from the old domain, t.sina.com.cn, to the new one. Due to its popularity, the media sometimes refers to the platform simply as "Weibo," despite the numerous other Chinese microblogging/weibo services including Tencent Weibo (腾讯微博), Sohu Weibo (搜狐微博), and NetEase Weibo (网易微博). However, the latter three have stopped providing services.\n\n\n== Background ==\nSina Weibo is a platform based on fostering user relationships to share, disseminate, and receive information. Through the website or the mobile app, users can upload pictures and videos publicly for instant sharing, with other users being able to comment with text, pictures and videos, or use a multimedia instant messaging service. The company initially invited a large number of celebrities to join the platform at the beginning and has since invited many media personalities, government departments, businesses and non-governmental organizations to open accounts for the purpose of publishing and communicating information. To avoid the impersonation of celebrities, Sina Weibo uses verification symbols; celebrity accounts have an orange letter "V" and organizations\' accounts have a blue letter "V". Sina Weibo has more than 500 million registered users; out of these, 313 million are monthly active users, 85% use the Weibo mobile app, 70% are college-aged, 50.10% are male and 49.90% are female. There are over 100 million messages posted by users each day. With more than 100 million followers, actress Xie Na holds the record for the most followers on the platform. Despite fierce competition among Chinese social media platforms, Sina Weibo remains the most popular.\n\n\n== History ==\n\nAfter the July 2009 Ürümqi riots, China shut down most domestic microblogging services, including Fanfou, the very first weibo service. Many popular non-China-based microblogging services like Twitter, Facebook, and Plurk have since been blocked. Sina Corporation CEO Charles Chao considered this to be an opportunity, and on 14 August 2009, Sina launched the tested version of Sina Weibo. Basic functions including message, private message, comment and reposting were made available that September. A Sina Weibo–compatible API platform for developing third-party applications was launched on 28 July 2010.\nOn 1 December 2010, the website experienced an outage, which administrators later said was due to the ever-increasing numbers of users and posts. Registered users surpassed 100 million in February 2011. Since 23 March 2011, t.cn has been used as Sina Weibo\'s official shortened URL in lieu of sinaurl.cn. On 7 April 2011, weibo.com replaced t.sina.com.cn as the new main domain name used by the website. The official logo was also updated. In June 2011, Sina announced an English-language version of Sina Weibo would be developed and launched, though content would still be governed by Chinese law.\nOn 11 January 2013, Sina Weibo and Alibaba China (a subsidiary of Alibaba Group) signed a strategic cooperation agreement.\nWith more and more foreign celebrities using Sina Weibo, language translation has become an urgent need for Chinese users who wish to communicate with their idols online, especially Korean. In January 2013, Sina Weibo and NetEase.com announced that they had reached a strategic cooperation agreement. When users browse foreign language content, they can now directly obtain translation results through the YouDao Dictionary.\nThe Sina Weibo financial report in February 2013 showed that its total revenue was approximately US$66 million and that the number of registered users had exceeded the 500 million mark.\nIn April 2013, Sina officially announced that Sina Weibo had signed a strategic cooperation agreement with Alibaba. The two sides conducted in-depth cooperation in areas such as user account interoperability, data exchange, online payment, and internet marketing. At the same time, Sina announced that Alibaba, through its wholly owned subsidiary, had purchased the preferred shares and common shares issued by Sina Weibo Company for US$586 million, which accounted for approximately 18% of Weibo\'s fully diluted and diluted total shares.\n\n\n=== Ownership ===\nOn 9 April 2013, Alibaba Group announced that it would acquire 18% of Sina Weibo for US$586 million, with the option to buy up to 30% in the future. Alibaba exercised this option when Weibo was listed on NASDAQ in April 2014.\n\n\n== Users ==\nAccording to iResearch\'s report on 30 March 2011, Sina Weibo had 56.5% of China\'s microblogging market based on active users and 86.6% based on browsing time over competitors such as Tencent Weibo and Baidu. The top 100 users had over 485 million followers combined. More than 5,000 companies and 2,700 media organizations in China use Sina Weibo. The site is maintained by a growing microblogging department of 200 employees responsible for technology, design, operations, and marketing.\nSina executives invited and persuaded many Chinese celebrities to join the platform. Users now include Asian celebrities, movie stars, singers, famous business and media figures, athletes, scholars, artists, organizations, religious figures, government departments, and officials from Hong Kong, Mainland China, Malaysia, Singapore, Taiwan, and Macau, as well as some famous foreign individuals and organizations, including Kevin Rudd, Boris Johnson, David Cameron, Narendra Modi, Toshiba, and the Germany national football team. Sina Weibo has a verification program for known people and organizations. Once an account is verified, a verification badge is added beside the account name.\nAccording to research by Sina Corporation, the number of active users reached over 400 million by Q1 2018, making Sina Weibo the 7th platform with at least 400 million active users, and daily usage increased by 21%.\nIn June 2020, Weibo was among 58 other Chinese apps that were banned by the Government of India. Following this, Prime Minister of India Narendra Modi\'s account was deactivated.\n\n\n== Features ==\nMany of Sina Weibo\'s features resemble those of Twitter. A user may post with a 140-character limit (increased to 2,000 as of January 2016 with the exception of reposts and comments), mention or talk to other people using "@UserName" formatting, add hashtags, follow other users to make their posts appear in one\'s own timeline, re-post with "//@UserName" similar to Twitter\'s retweet function "RT @UserName", select posts for one\'s favorites list, and verify the account if the user is a celebrity, brand, business or otherwise of public interest. URLs are automatically shortened using the domain name t.cn, akin to Twitter\'s t.co. Official and third-party applications can access Sina Weibo from other websites or platforms.\nUsers may:\n\nSubmit up to 18 images/video files in every post\nSend personal messages to followers\nFollow others and be followed\nPost "stories" just like on Instagram\nReact to posts using different emojis\nReceive monetary rewards that can be used in a digital store linked to Weibo\nView posts identified as hot or popular\nDisplay the location you post from\nHashtags differ slightly between Sina Weibo and Twitter, using the double-hashtag "#HashName#" format (the lack of spacing between Chinese characters necessitates a closing tag). Users can own a hashtag by requesting hashtag monitoring; the company reviews these requests and responds within one to three days. Once a user owns a hashtag, they have access to a wide variety of functions available only to them on the condition that they remain active (less than 1 post per calendar week revokes these privileges).\nAdditionally, comments appear as a list below each post. A commenter can also choose to re-post the comment, quoting the whole original post, to their own page.\nUnregistered users can only browse a few posts by verified accounts. Neither unverified account pages nor comments to posts by verified accounts are accessible to unregistered users.\nAlthough often described as a Chinese version of Twitter, Sina Weibo combines elements of Twitter, Facebook, and Medium, along with other social media platforms. Sina Weibo users interact more than Twitter users do, and while many topics that go viral on Weibo also originate from the platform itself, Twitter topics often come from outside news or events.\nDuring the outbreak of the COVID-19, Weibo was also a data collecting station to collect and detect the spread of the coronavirus.\nTrending topics\nSina Weibo\'s "trending topics" is a list of current popular topics based partly on tracking user participation and partly on the preference of Weibo staff. Once a topic is trending, it often becomes a heated issue and can have wide-ranging social influence. As such, the list has reshaped how Chinese people relate to the news media.\n\n\n=== Verification ===\nSina Weibo has a verification policy, much like Twitter\'s account verification, for confirming the identity of a user (celebrities, organizations etc.). Once a user is verified, a colorful V is appended to their username; individuals receive an orange V, while organizations and companies receive a blue V. A graph and declaration certifying the verification appear on verified user pages. There are several kinds of verifications: personal, college, organization, verification for official accounts (government departments, social media platforms and famous companies), and Weibo Master (linked with phone numbers and followers).\nTo protect the rights and interests of celebrities, Sina Weibo has launched a celebrity authentication system. The celebrity authentication logo is a gold "V" logo after the verified user\'s name. The certified figures are mainly stars of various industries, business executives and important news parties. From 22:00 on June 12, 2020, users who post comments must follow the blogger for more than 7 days, except for those who have set "people I follow" to comment on themselves. This adjustment will last for 7 days.\n\n\n=== Clients ===\nSina produces mobile applications for various platforms to access Sina Weibo, including Android, BlackBerry OS, iOS, Symbian S60, Windows Mobile, Windows Phone and HarmonyOS. Sina has also released a desktop client for Microsoft Windows under the product name Weibo Desktop.\n\n\n=== International versions ===\nSina Weibo is available in both simplified and traditional Chinese characters. The site also has versions that cater to users from Hong Kong and Taiwan. In 2011, Weibo developed an international edition in English and other languages. On 9 January 2018, the company ran a week-long public test of its English edition.\nSina Weibo\'s official iPhone and iPad apps are available in English.\nWeibo International supports existing Weibo accounts and allows Facebook accounts to link to the platform; users can also use their mobile phone number (including international mobile phone numbers) to register new accounts.\n\n\n=== Weibo Stories ===\nOne of the most recent features of Weibo is Stories. "Weibo\'s stories" is a video function allowing users to record a video and save them in a separate "Story" menu in their profile page.\n\n\n=== Weibo VLOG ===\nWeibo has also launched a new "Vlog" function. Now, every video with a hashtag VLOG will be available in the main search page under "VLOG" sub-menu.\n\n\n=== Weibo interviews ===\nWeibo interviews are text-based interviews hosted on the Weibo platform.:\u200a176\u200a Users post questions to the person being interviewed via Weibo posts and that person responds in real-time.:\u200a176\u200a\n\n\n=== Posting via text message ===\nIf a user links their Weibo account to a cell phone number, the user can both make and receive Weibo posts via text message.:\u200a148\u200a The user can then upload posts by texting them to 1069 009 009 and they will appear on Weibo in real time.:\u200a158\u200a Replies or comments to those posts are sent to the user via text message.:\u200a158\u200a\n\n\n=== IP address ===\nWeibo began displaying IP addresses of users when posting and commenting in April 2022.\n\n\n=== Super-hashtags ===\nA centralised approach to information in the form of a topic. Users can post and discuss within the super-hashtags. It is different from regular hashtags, users can apply to be the host and having the authorities to audit and shield the posts. Once the user choose to subscribe the super-hashtag, they will become a member of that community. Their level of membership will increase depend on the sacrifice to the degree of discussion of the super-hashtags (such as signing in, posting, commenting).\n\n\n=== Paid promotion ===\nAfter publishing a post, you can choose to increase the exposure of the post by paying for it and promoting the content to a wider potential audience.\n\n\n=== Other services ===\nWeilingdi (微领地, literally, micro fief) is another service bundled with Weibo. Similar to Foursquare, Weilingdi is a location-based social networking website for mobile devices; the site grew out of Sina\'s 2011 joint venture with GeoSentric\'s GyPSii. Sina\'s Tuding (图钉) photo-sharing service, similar to Instagram, is also produced by the same joint venture. Sina Lady Weibo (新浪女性微博) specializes in women\'s interests. Weibo Data Center enables users to access data analysis about a topic of their choice, Sina Weibo\'s official data, and demographic information. Sina Weibo has also recently released a desktop version available for free download at its website.\n\n\n== Controversies ==\nOn 2 May 2021, a Weibo account belonging to the Chinese Communist Party\'s Central Political and Legal Affairs Commission posted an image of rocket Long March 5B\'s launch next to a photo of mass cremations of the dead in India as a result of the COVID-19 pandemic with the caption "China lighting a fire versus India lighting a fire". The post was quickly deleted after it faced massive backlash from users and hashtag related to the post also was deleted.\nAccording to a report by the Human Rights Watch, racist content targeting black people are strongly prevalent in Chinese social media platforms including Weibo.\n\n\n=== Censorship ===\n\nIn cooperation with internet censorship in China, Sina sets strict controls over the posts on its services. Posts with links using some URL shortening services (including Google\'s goo.gl), or containing blacklisted keywords, are not allowed on Sina Weibo. Posts on politically sensitive topics are deleted after manual checking. Users with few followers may be able to post on censored topics with relative freedom until they reach a critical mass of followers, which triggers enforced content supervision.\nSina Weibo is believed to employ a distributed, heterogeneous strategy for censorship that has a great amount of defense-in-depth, which ranges from keyword list filtering to individual user monitoring. Nearly 30% of the total deletion events occur within 5–30 minutes, and nearly 90% of the deletions happen within the first 24 hours.\nOn 9 March 2010, the posts by Chinese artist and activist Ai Weiwei at Sina Weibo to appeal for information on the 2008 Sichuan earthquake going public were deleted and his account was closed by the site administrator. Attempts to register accounts with usernames alluding to Ai Weiwei were blocked. On 30 March 2010, Hong Kong singer Gigi Leung blogged about the jailed Zhao Lianhai, an activist and father to a 2008 Chinese milk scandal victim; that post was also deleted by an administrator shortly thereafter.\nOn 16 March 2012, all users of Sina Weibo in Beijing were told to register with their real names.\nStarting on 31 March 2012, the comment function of Sina Weibo was shut down for three days, along with Tencent QQ.\nIn May 2012, Sina Weibo introduced new restrictions on the content its users can post.\nIn October 2012, Sina Weibo heavily censored discussion of the Foxconn strikes in October 2012.\nOn 4 June 2013, Sina Weibo blocked the terms "Today," "Tonight," "June 4," and "Big Yellow Duck." If a user searched using these terms, a message would appear stating that according to relevant laws, statutes and policies, the results of the search couldn\'t be shown. This censorship was implemented because a photoshopped version of Tank Man which swapped all tanks in the photo with the sculpture Rubber Duck had been circulating on Twitter.\nAccording to a BBC News report, the decreasing number of users since 2014 can be attributed both to the crackdown by the Chinese government on the use of aliases to create accounts and to the rising threat from competitor WeChat.\nOn 8 September 2017, Weibo gave an ultimatum to its users to verify their accounts with their real names by 15 September. The platform announced that same month that it would hire 1000 "supervisors" from among its users to engage in censorship. These supervisors were supposed to report at least 200 content pieces per month, with those with the best results being rewarded with special prizes, including iPhones and notebooks.\nOn 18 February 2018, Sina Weibo provided a "Comment moderation" function for both head users and official members. Comments received after opening this feature will not be displayed immediately, instead of requiring approval from moderators. Users can utilize this feature to avoid illegal content appearing in their comment section.\nIn April 2018, Weibo began a crackdown on anime, games, and short videos depicting "pornography, gore, violence and homosexuality". The CCP criticized Weibo\'s move, following which the company decided to exclude homosexual content from the purge.\nOn 11 June 2020, the Cybersecurity Administration of China ordered Weibo to suspend its "trending topics" page for a week. The CAC accused Weibo of "dissemination of illegal information".\nOn 22 February 2022, Horizon News accidentally posted on its Weibo page its instructions not to post anti-Russia content related to the crisis between Russia and Ukraine.\nIn January 2023, Sina Weibo suspended more than 1,000 social media accounts of critics of the Chinese government response to COVID-19.\n\n\n=== Fake social media engagement ===\nChinese social media is dominated by a strong influencer and celebrity fandom culture. Celebrities and digital influencers, or key opinion leaders (KOLs), compete fiercely for higher follower counts to attract lucrative brand deals. Despite some efforts undertaken by Weibo to curb fake engagement, the issue remains pervasive due to the incentives for influencers and the advanced nature of fake engagement tools.\nIn 2018, a government crackdown exposed widespread manipulation on Sina Weibo, resulting in the temporary banning of numerous celebrities from its rankings. Notable figures like Wang Sicong were removed from the "hot searches" list, revealing a black market for manipulating rankings. Celebrities and KOLs exploit these tactics to enhance their visibility and suppress unfavorable stories. Weibo acknowledged this problem, listing banned terms and promising increased efforts to manage illegal content. Despite these measures, services offering to boost hashtags into top trending topics for a fee remain prevalent.\nWeibo is also inundated with fake followers, with 10,000 zombie followers costing around 10 yuan according to a 2019 Caixin report. Celebrity fan clubs act as comprehensive fake social media traffic generators, employing dedicated teams to create content and boost engagement figures. Reports indicate that a significant portion of top influencers have used these services to meet the minimum follower requirements for attracting advertisers.\n\n\n== Promotions ==\n\n\n=== Weibo Paid Ads ===\nAverage organic post view is around 10% – 15% on Weibo. To attract more followers, there are 3 types of paid ads options available:\n\nSponsored Post: Promotes to current followers and/or potential followers.\nWeibo Tasks: Allows advertisers to pay for other accounts to repost, which in turn reach target audiences.\nFensi Tong (粉丝通): The most well known paid advertising option on Weibo; allows more specific targeting options, including interests, gender, location and devices. Advertisers can choose between CPM (cost per million; 0.5CNY per thousand exposure) and CPC (cost per engagement; 0.5CNY per effective engagement). Companies or organizations often use Fensitong and pay well-known Sina Weibo users (usually those with more than 1 million followers) to advertise to their followers.\n\n\n=== Livery airplane ===\nOn 8 June 2011, Tianjin Airlines unveiled an Embraer E-190 jet in special Sina Weibo livery and named it "Sina Weibo plane" (新浪微博号). It is the first commercial airplane to be named after a website in China.\n\n\n=== Villarreal CF ===\nIn January 2012, Sina Weibo also announced that they would be sponsoring Spanish football club Villarreal CF for its match against FC Barcelona, to increase its fanbase in China.\n\n\n=== CCTV 2018 New Year\'s Gala ===\nOn 5 February 2018, Weibo officially announced that it will become the exclusive partner of the New Media Social Platform of the CCTV Spring Festival Gala in 2018 to attract more Chinese people worldwide to use Weibo.\n\n\n== Statistics ==\n\n\n=== Sina Weibo\'s official accounts ===\nWeibo\'s Secretary: 194,144,293\nWeibo\'s Service Center: 180,564,151\nWeibo\'s Staff: 155,444,287\n\n\n=== Most popular accounts (individuals) ===\nAs of 19 April 2019, the following ten individuals managed the most popular accounts (name handle in parentheses) and the number of followers:\n\nXie Na (xiena): 125,742,516\nHe Jiong (hejiong): 120,013,900\nYang Mi (yangmiblog): 107,601,756\nAngelababy (realangelababy): 102,212,814\nChen Kun (chenkun): 93,456,957\nZhao Liying (zhaoliying): 86,690,864\nVicky Zhao (zhaowei): 85,650,051\nJackson Yee (yiyangqianxi): 84,620,416\nYao Chen (yaochen): 83,811,714\nDeng Chao (dengchao): 80,972,525\n\n\n=== Record-setting posts ===\nOn 13 September 2013, the unverified handle "veggieg" (widely believed to be Faye Wong) posted a message suggesting that she had divorced her husband. The message was commented and re-posted more than a million times in four hours. The record was broken on 31 March 2014 by Wen Zhang, who posted a long apology admitting an extramarital affair when his wife Ma Yili was pregnant with their second child. This message was commented and re-posted more than 2.5 million times in 10 hours. (Ma\'s response generated 2.18 million responses in 12 hours.) On 22 June 2014, TFBOYS member Wang Junkai was awarded a Guinness World Record title for a Weibo post that was reposted 42,776,438 times. Luhan holds the Guinness World Record for most comments.\n\n\n== See also ==\nList of social networking services\nTencent Weibo\nFreeWeibo – the uncensored and anonymous version of Sina Weibo, operated by an unaffiliated third party\n\n\n== References ==\n\n\n== External links ==\n\nOfficial website (in Chinese)'
Try yourself!¶
you can use the code to get what you want to know, just by replacing Words to what you want to search. Remember, you should add " " in the ().
wikipedia.WikipediaPage("New York City").summary
"New York, often called New York City or NYC, is the most populous city in the United States, located at the southern tip of New York State on one of the world's largest natural harbors. The city comprises five boroughs, each coextensive with a respective county. New York is a global center of finance and commerce, culture, technology, entertainment and media, academics and scientific output, the arts and fashion, and, as home to the headquarters of the United Nations, international diplomacy.\nWith an estimated population in 2023 of 8,258,035 distributed over 300.46 square miles (778.2 km2), the city is the most densely populated major city in the United States. New York City has more than double the population of Los Angeles, the nation's second-most populous city. New York is the geographical and demographic center of both the Northeast megalopolis and the New York metropolitan area, the largest metropolitan area in the U.S. by both population and urban area. With more than 20.1 million people in its metropolitan statistical area and 23.5 million in its combined statistical area as of 2020, New York City is one of the world's most populous megacities. The city and its metropolitan area are the premier gateway for legal immigration to the United States. As many as 800 languages are spoken in New York City, making it the most linguistically diverse city in the world. In 2021, the city was home to nearly 3.1 million residents born outside the U.S., the largest foreign-born population of any city in the world.\nNew York City traces its origins to Fort Amsterdam and a trading post founded on Manhattan Island by Dutch colonists around 1624. The settlement was named New Amsterdam in 1626 and was chartered as a city in 1653. The city came under English control in 1664 and was temporarily renamed New York after King Charles II granted the lands to his brother, the Duke of York, before being permanently renamed New York in November 1674. New York City was the U.S. capital from 1785 until 1790. The modern city was formed by the 1898 consolidation of its five boroughs: Manhattan, Brooklyn, Queens, The Bronx, and Staten Island.\nAnchored by Wall Street in the Financial District, Manhattan, New York City has been called both the world's premier financial and fintech center and the most economically powerful city in the world. As of 2022, the New York metropolitan area is the largest metropolitan economy in the world, with a gross metropolitan product of over US$2.16 trillion. If the New York metropolitan area were its own country, it would have the tenth-largest economy in the world. The city is home to the world's two largest stock exchanges by market capitalization of their listed companies: the New York Stock Exchange and Nasdaq. New York City is an established safe haven for global investors. As of 2023, New York City is the most expensive city in the world for expatriates, and Fifth Avenue is the most expensive shopping street in the world. New York City is home by a significant margin to the highest number of billionaires, individuals of ultra-high net worth (greater than US$30 million), and millionaires of any city in the world."
Images:¶
We can also extract all images associated with the page, including the count of how many images are on that page.
The number of images can be useful in internet-related psychology research. For example:
- whether a higher number of images on a Wikipedia page influences people's willingness to download an app?
- more images lead to increased interest?
- More images might help users understand the app better and be more attracted to it?
images = page.images
images
['https://upload.wikimedia.org/wikipedia/commons/a/a8/Disc_Plain_blue_dark.svg', 'https://upload.wikimedia.org/wikipedia/commons/f/ff/Wikidata-logo.svg', 'https://upload.wikimedia.org/wikipedia/commons/5/56/%E5%BE%AE%E5%8D%9A%E4%B9%8B%E5%A4%9C%E4%BC%97%E6%98%9F%E4%BA%91%E9%9B%86_%E6%9E%97%E5%BF%97%E7%8E%B2%E5%91%A8%E5%86%AC%E9%9B%A8%E7%AD%89%E2%80%9C%E4%BA%89%E5%A5%87%E6%96%97%E8%89%B3%E2%80%9D.webm', 'https://upload.wikimedia.org/wikipedia/en/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg', 'https://upload.wikimedia.org/wikipedia/en/9/99/Question_book-new.svg', 'https://upload.wikimedia.org/wikipedia/en/6/6e/Sina_Weibo.svg']
Here, we can see a list of image links. Some of these images may not be directly related to the search term because they are default images provided by Google.
num_images = len(images)
num_images
6
It seems there are only 6 images weibo has. Remember, there are four images that are icons of wikipedia.
References:¶
We can retrieve all the external references listed on the page and count how many references it has.
references = page.references
num_references = len(references)
num_references
208
Hyperlinks:¶
Lastly, we can pull out the hyperlinks within Wikipedia pages and count how many there are.
Hyperlinks in Wikipedia are clickable links that connect one Wikipedia page to another. They allow users to navigate between related topics by clicking on highlighted words or phrases within the text.
Links = wikipedia.WikipediaPage(page_title).links
Links
['2008 Chinese milk scandal', '2008 Sichuan earthquake', '23snaps', 'ANobii', 'API', 'AT Protocol', 'Academia.edu', 'ActivityPub', 'Activity stream', 'Ai Weiwei', 'Alibaba Group', 'Amikumu', 'Android (operating system)', 'Angelababy', 'App.net', 'Apple Daily', 'ArXiv (identifier)', 'Are.na', 'AsianAve', 'Ask.fm', 'Associated Press', 'Attention inequality', 'Avatars United', 'BBC News', 'Backchannel', 'Badoo', 'Baidu', 'BeReal', 'Bebo', 'Behance', 'BlackBerry OS', 'Black people', 'Bluesky (social network)', 'Bolt (website)', 'Bondee', 'Bopomofo', 'Boris Johnson', 'Boxun.com', 'Brainly', 'BranchOut', 'Brand page', 'Bumble', 'CCTV Spring Festival Gala', 'CNN', 'COVID-19 pandemic', 'Caixin', 'Cantonese', 'Capazoo', 'Cara (app)', 'Central Political and Legal Affairs Commission', 'Character (symbol)', 'Charles Chao', 'Charmaine Sheh', 'Chen Duling', 'Chen Kun', 'China', 'Chinese Communist Party', 'Chinese characters', 'Chinese government response to COVID-19', 'Chinese language', 'Cloob', 'Clubhouse (app)', 'Cohost', 'Comparison of microblogging and similar services', 'Comparison of online dating services', 'Comparison of social networking software', 'Confessions page', 'Convoz', 'Cybersectarianism', 'Cyworld', 'David Cameron', 'Defense-in-depth', 'Deng Chao', 'Di Lieba', 'Diaspora (social network)', 'Display (social network)', 'Distributed Social Networking Protocol', 'Doi (identifier)', 'Domain name', 'Douban', 'Draugiem.lv', 'EConozco', 'EWorld', 'Edmodo', 'Ello (social network)', 'Embraer E-190', 'Emojli', 'English-language spelling reform', 'Erenlai', 'Eyegroove', 'FC Barcelona', 'Facebook', 'Fanfou', 'Faye Wong', 'Fediverse', 'Financial Times', 'FitFinder', 'Forbes', 'Forbes Asia', 'Foursquare City Guide', 'Foursquare Swarm', 'Foxconn', 'Frank McCourt (executive)', 'FreeWeibo', 'FriendFeed', 'Friendica', 'Friends Reunited', 'Friendster', 'GNU social', 'Gab (social network)', 'Gapo', 'Gas (app)', 'Gender differences in social network service use', 'Germany national football team', 'Gettr', 'Gigi Leung', 'Google+', 'Google Buzz', 'Google Currents (social app)', 'Government of India', 'Great Firewall of China', 'Grono.net', 'Group (online social networking)', 'GyPSii', 'Gülnezer Bextiyar', 'HCL Connections', 'Hanyu Pinyin', 'HarmonyOS', 'Hashtag', 'He Jiong', 'Heello', 'Hello (social network)', 'Hi5', 'Highlight (application)', 'Hindustan Times', 'Hive Social', 'Homosexuality', 'Hong Kong', 'Hospitality exchange service', 'Houseparty (app)', 'Hua Chenyu', 'Huang Xiaoming', 'Huddles (app)', 'Human Rights Watch', 'Hyves', 'IGTV', 'IOS (Apple)', 'IPO', 'IRC-Galleria', 'IResearch Consulting Group', 'ISBN (identifier)', 'ISSN (identifier)', 'ITunes Ping', 'IWiW', 'IdeaPlane', 'Identi.ca', 'Idka', 'Instagram', 'Internet censorship in China', "Internet censorship in the People's Republic of China", 'Issues relating to social networking services', 'JJ Lin', 'Jackson Wang', 'Jackson Yee', 'Jaiku', 'July 2009 Ürümqi riots', 'Jyutping', 'Keek', 'Kevin Rudd', 'Koo (social network)', 'Korean language', 'Kuaishou', 'Kumu (social network)', 'Letterboxd', 'Li Yifeng', 'Lifeknot', 'Lifestreaming', 'Like button', 'Likee', 'Lin Chi-ling', 'LinkedIn', 'List of defunct social networking services', 'List of social networking services', 'List of virtual communities with more than 1 million users', 'Liu Ye (actor)', 'LiveJournal', 'Livery', 'Long March 5B', 'Luhan (singer)', 'LunarStorm', 'MX Player', 'Ma Yili', 'Macau', 'Mainland China', 'Malaysia', 'Marco Polo (app)', 'MarketWatch', 'Market capitalization', 'Mastodon (social network)', 'Me2day', 'MeWe', 'Meerkat (app)', 'Meetup', 'Mention (blogging)', 'Meta (academic company)', 'Miaopai', 'Micro.blog', 'Microblogging', 'Microblogging in China', 'Micropub (protocol)', 'Microsoft Windows', 'Migme', 'Miiverse', 'Minds (social network)', 'Misskey', 'MixBit', 'Mixi', 'Mobile device', 'Mobile social network', 'Mobli', 'Monthly active users', 'Moodle', 'Mugshot (website)', 'Multiply (website)', 'Musical.ly', 'My World@Mail.Ru', 'Myspace', 'NASDAQ', 'NK.pl', 'Narendra Modi', 'Natter (social network)', 'Natter Social Network', 'NetEase Weibo', 'Netlog', 'Nextdoor', 'Nine Percent', 'Ning (website)', 'Non-governmental organization', 'Nostr', 'OStatus', 'Odnoklassniki', 'Online dating', 'Online identity', 'Online petition', 'Open-access poll', 'OpenMicroBlogging', 'OpenSocial', 'Orkut', 'Parler', 'Path (social network)', 'Peach (social network)', 'Periscope (service)', 'Pheed', 'Photoshop', 'Piczo', 'Pillowfort', 'Pinterest', 'Pinyin', 'Pixnet', 'PlanetAll', 'Pleroma (software)', 'Plurk', 'Pornography', 'Posterous', 'Pownce', 'Prime Minister of India', 'Privacy concerns with social networking services', 'Problematic social media use', 'Professional network service', 'Promo.com', 'Pump.io', 'Qaiku', 'Qzone', 'Readgeek', 'Reblogging', 'Renren', 'ResearchGate', 'Rubber Duck (sculpture)', 'Rutgers University Press', 'S2CID (identifier)', 'ShareChat', 'Shawn Yue', 'Simplified Chinese', 'Simplified Chinese characters', 'Sina Corp', 'Sina Weibo', 'Singapore', 'SixDegrees.com', 'Skyrock (social network site)', 'Small-world experiment', 'Small-world network', 'Snapchat', 'Snow (app)', 'So.cl', 'Social media platform', 'Social media use in politics', 'Social network', 'Social network advertising', 'Social network analysis software', 'Social network hosting service', 'Social networking service', 'Social profiling', 'Sohu Weibo', 'Solaborate', 'South China Morning Post', 'Spaces (social network)', 'Spelling in Gwoyeu Romatzyh', 'Spotify Live', 'Spring.me', 'Standard Chinese', 'Stories (social media)', 'Streetlife (website)', 'StudiVZ', 'Sun Yi (actress)', 'Surfbook', 'Symbian S60', 'TFBOYS', 'TV Time', 'Tagged (website)', 'Taiwan', 'Tal Canal', 'Talkbits', 'Tank Man', 'Taringa!', 'Tbh (app)', 'Tea Party Community', 'Tencent QQ', 'Tencent Weibo', 'The Meet Group', 'Third Voice', 'Thirst trap', 'Threads (social network)', 'Tiananmen Square protests of 1989', 'Tianjin Airlines', 'TikTok', 'Tinder (app)', 'Tongyong Pinyin', 'Toshiba', 'Tout (company)', 'Traditional Chinese', 'Traditional Chinese characters', 'Tribe.net', 'Triller (app)', 'Truth Social', 'Tuenti', 'Tumblr', 'Tvtag', 'Twister (software)', 'Twitter', 'URL shortening', 'Uniform Resource Locator', 'United States', 'Untappd', 'Use of social network websites in investigations', 'User interface', 'User profile', 'VK (service)', 'Vero (app)', 'Viadeo', 'Vicky Zhao', 'Villarreal CF', 'Vine (service)', 'Violence', 'Virtual community', 'Wade–Giles', 'Wall.fm', 'Wang Sicong', 'Wayback Machine', 'WeChat', 'Web 2.0 Suicide Machine', 'Weibo (disambiguation)', 'Weibo Corporation', 'Wen Zhang', 'Whisper (app)', 'White-label product', 'Wikidata', 'Windows Live Spaces', 'Windows Mobile', 'Windows Phone', 'Wretch (website)', 'Wu Dajing', 'XING', 'XMPP', 'Xanga', 'Xiaohongshu', 'Xie Na', 'Yahoo! 360°', 'Yahoo! Kickstart', 'Yahoo! Mash', 'Yahoo! Meme', 'Yale romanization of Cantonese', 'Yammer', 'Yang Mi', 'Yao Chen', 'Yik Yak', 'Yo (app)', 'Youdao', 'Zhao Lianhai', 'Zhao Liying', 'Zhou Dongyu', 'Zhu Yilong']
Hyperlinks function in this scenario: let's say we want to study if the number of link in a page is related to people's time spent on that page.
We can use hyperlinks function to count the number of links embded in each page!
page_title = 'Weibo'
Links = wikipedia.WikipediaPage(page_title).links
num_Links = len(Links)
print("The page '" + page_title + "' contains " + str(num_Links) + " hyperlinks.")
page_title = 'Twitter'
Links = wikipedia.WikipediaPage('Twitter').links
num_Links = len(Links)
print("The page '" + page_title + "' contains " + str(num_Links) + " hyperlinks.")
The page 'Weibo' contains 401 hyperlinks. The page 'Twitter' contains 911 hyperlinks.
Example¶
We also can compare the number of images, hyperlinks and reference of weibo, a chinese social website, twitter, a us social website, and Reddit.
page_title1 = "Weibo"
page_title2 = "Twitter"
page_title3 = "Reddit"
page1 = wikipedia.WikipediaPage(page_title1)
page2 = wikipedia.WikipediaPage(page_title2)
page3 = wikipedia.WikipediaPage(page_title3)
# For Weibo
images1 = page1.images
num_images1 = len(images1)
references1 = page1.references
num_references1 = len(references1)
links1 = page1.links # New line for hyperlinks
num_links1 = len(links1) # New line for counting hyperlinks
print("Weibo's number of images is " + str(num_images1) +
", number of references is " + str(num_references1) +
", and number of hyperlinks is " + str(num_links1))
# For Twitter
images2 = page2.images
num_images2 = len(images2)
references2 = page2.references
num_references2 = len(references2)
links2 = page2.links # New line for hyperlinks
num_links2 = len(links2) # New line for counting hyperlinks
print("Twitter's number of images is " + str(num_images2) +
", number of references is " + str(num_references2) +
", and number of hyperlinks is " + str(num_links2))
# For Reddit
images3 = page3.images
num_images3 = len(images3)
references3 = page3.references
num_references3 = len(references3)
links3 = page3.links # New line for hyperlinks
num_links3 = len(links3) # New line for counting hyperlinks
print("Reddit's number of images is " + str(num_images3) +
", number of references is " + str(num_references3) +
", and number of hyperlinks is " + str(num_links3))
Weibo's number of images is 6, number of references is 208, and number of hyperlinks is 401 Twitter's number of images is 43, number of references is 927, and number of hyperlinks is 911 Reddit's number of images is 22, number of references is 783, and number of hyperlinks is 553
Those are our ideas—what do you guys think?¶
Wikipedia is one of the largest searching engine in the world. It is helpful but it may affect people's perception toward different things.
Engagement: Do platforms with more images and hyperlinks engage users more? Will users tend to like the social media with more images?
Trustworthiness: Are platforms with more references seen as more credible? Do users trust the accuracy of information more with more references?
Exploration: Do more hyperlinks encourage users to explore related topics, leading to deeper engagement with the platform?