Or how I scraped the web and did some amateur analysis.
As I have said in the past, I am no pro. I do all this for fun, and my methodologies can definitely questionable. Please question them by commenting below.
Step 1: Getting the data.
All in all, I was able to scrape 13000 to 14000 tweets off each of their feeds. This corpus, I thought was enough. Also, this process was kind of time consuming, and I got bored.
Each tweet object contained the following data:
- A timestamp for when the tweet took fruition.
- The number of retweets.
- The number of likes.
- The actual text of the tweet (Obviously).
- And a few other details I did not use.
I used python for this. There a lot of libraries that you can scrape twitter with. I usually use tweepy. For this project I tried something called twitterscraper. It is a very good library. You can read their readme to know more about the library.
The data I have ranges from week 40ish of this calendar year to week 51. There are 13819 tweets from Republic TV and 11503 tweets from Times Now.
Step 2: Mucking around.
Hashtags and wordclouds.
Indian news is obsessed with hashtags. Atleast these two channels are, with these hashtags flying around the screen like there is no tomorrow.
So, I thought looking at these would be a good start. I started by extracting the hashtags from the tweets. This was done easily by the power of regular expressions.
At first glance, I noticed that Republic tends to keep using one hashtag multiple times, whereas, Times Now uses a more varied array of hashtags. A good way to visualise this, I thought would be using a wordcloud. In a wordcloud, the size of each word is proportional to its frequency in a corpus. These are the results I got:
Times Now wordcloud
It is very clear from the wordcloud of Times Now that they do use a lot of varied hashtags. #Dec18WithTimesNow is one that stands out. Another one that stands out is #ModiUnstoppable.
These were their most used hastags:
- Dec18WithTimesNow, 869
- BREAKING, 446
- ModiUnstoppable, 333
- UPANotGuilty, 247
- TNExclusive, 207
- ModiInvincible, 153
- CongPFIBhakt, 112
- TamilNaduDiaryGate, 112
- StandWithAnthem, 109
- VadraJawaabDo, 103
For those interested, the next 30. Unformatted because lazy.
(‘ModiMidTermPoll’, 103), (‘AarushiWantsJustice’, 99), (‘RahulNeechPolitics’, 96), (‘VadraTape’, 87), (‘NDABacksDialogue’, 84), (‘SriSriMandirMove’, 84), (‘HafizSaeedConfessions’, 83), (‘TarmacTerrorTape’, 83), (‘WATCH’, 81), (‘StandForAnthem’, 81), (‘RahulEraDawns’, 80), (‘GujaratModiVerdict’, 79), (‘AAPKiAag’, 79), (‘CongBetrayedRam’, 77), (‘RahulSeparatistBhakt’, 77), (‘RaGaSomnathSelfGoal’, 77), (‘AreYouSeriousRahul’, 76), (‘MersalVendetta’, 75), (‘HadiyaConversionTwist’, 74), (‘HumanityTowedAway’, 73), (‘DeepikaThreatened’, 72), (‘CongVsMandir’, 71), (‘KoiBaatNahiCops‘,69), (‘RagaRiggedPollTape’, 67), (‘SoniaTejpalQuidProQuo’, 65), (‘DeepikaVsSena’, 63), (‘VadraTicketGate’, 62), (‘ZakirBackOnTV’, 61), (‘RaGaShortCircuit’, 61), (‘ShahDaresLeft’, 60), (‘SabkaSardar’, 59), (‘CondomCurfew’, 59), (‘DeMoTaxRats’, 59), (‘Modinomics’, 58), (‘CMOnlyForMuslims’, 58), (‘RKNagarCashForVotes’, 57), (‘TripleTalaqBill’, 57), (‘CowSlaughterCruelty’, 57), (‘HeadScarfDebate’, 57), (‘KnowYourCandidates’, 57)
Clearly, unlike Republic, they have not tried creating a cult around their main anchor. After Arnab left Times Now, they probably learnt that a brand is bigger than a person. Also the clear tilt towards our glorious leader Modi is apparent with the frequent appearances of hastags like ModiInvincible, Modinomics and ModiUnstoppable. Also they are clearly not huge fans of Rahul Gandhi and congress too, with the use of hastags like CongBetrayedRam and RahulNeechPolitics and PappuCensored. Keen readers would also notice disproportiante appearances of Congress as a party in the hastags. We will quantify this later in the article. Frankly, this is a great wordcloud, and I recommend everyone to take a magnifying glass and observe it.
Republic TV wordcloud
Republic TV, the new kid on the block. The “Independent” media we have all dreamt of and yearned for. The true messiah saving us from the clutches of the “Lutyens” circles.
Their hashtags are way less varied than Times Now. Arnab is very prominent in their most tweeted hashtag (Not surprising considering the fact that the whole channel is where it is because of his popularity.)
When Republic starts using a hashtags, they truly beat it to death. These are their top 10 most used hastags with their frequency:
- BREAKING, 820
- Dec18WithArnab, 506
- RepublicAppLaunch, 223
- ScamOfScams, 211
- CorruptionHighOrLow, 191
- AreHindusSoftTarget, 186
- WontForgetScams, 185
- MallyaTicketgate, 180
- ChurchVsNationalists, 176
- CongNeechPolitics, 161
These are next 30 for those who are interested. I was too lazy to format.
(‘PadmavatiFight’, 157), (‘NetasForVIPs’, 157), (‘Shocker2GVerdict’, 156), (‘RamMandirDebate’, 154), (‘ModiBigChanges’, 143), (‘AnthemFirstNoCompromise’, 140), (‘GujaratVotes’, 132), (‘AmitShahSpeaksToArnab’, 130), (‘SoniaLetterLeaked’, 130), (‘JaitleySpeaksToArnab’, 129), (‘CongChaiwalaAttack’, 126), (‘LutyensAyodhyaFormula’, 120), (‘ClericsBackChildMarriage’, 120), (‘WhoKilledAarushi’, 119), (‘GujaratHinduCard’, 118), (‘AreHindusTargeted’, 117), (‘WhoCommunalisedPolitics’, 117), (‘OneNationOnePoll’, 117), (‘WhoDumpedVikas’, 116), (‘IndiaWillGetSaeed’, 114), (‘BiggestBoforsInterview’, 112), (‘HrithikSpeaksToArnab’, 112), (‘SmritiVsRahul’, 107), (‘BoforsPakistanLink’, 106), (‘SunandaMailTrail’, 103), (‘NationalAnthemDebate’, 100), (‘CashForJustice’, 100), (‘NewIndiaPlan’, 99), (‘IndiaAgainstVVIPNetas’, 99), (‘MallyaNamesPawar’, 97), (‘PoliticsOverVeterans’, 94), (‘IndiaBacksVinodRai’, 94), (‘FringeVsPadmavati’, 92), (‘PiyushSpeaksToArnab’, 88), (‘SackSangeetSom’, 88), (‘ModiFaithAttacked’, 86), (‘RahulBotAttack’, 85), (‘RahulMughalEmperor’, 84), (‘SriSriMandirMeet’, 83), (‘BoforsOpened’, 83)
Republic shoehorns BREAKING into a lot of their tweets, hence the top spot. Here also the tilt against Congress and Rahul is very apparent. There absolutely no negative hashtags about Modi in the top 50. Also, when it comes to the nation, our man does not muck about one bit.
Who is more popular a topic, Modi or Rahul?
Honestly, with all this coverage, it seems that these new channels are the ones keeping Rahul Gandhi and the Congress relevant. I made these graphs that support my findings.
From these charts it is clear that Rahul is way more of person of interest as compared to Modi.
Week 42 and Week 46 are clear anomalies as Modi far outshine Rahul in those weeks. This can be justified by looking at what happened during those weeks.
Week 42: That was when PM Modi visited his home state of Gujarat and made some charectristically fiery speeches that captured quite a few headlines.
Week 46: This was when Moody bumped up our scores, which was widely covered. A lot of credit was given to Modi, hence he trended.
I was not able to figure out why week 45 was also a light spot. Week 47 owards was the election hype, with Rahul Gandhi given way more coverage. Republic practically used no Modi hastags during that period.
Overall, some mildly interesting facts were unearthed.
Common words and phrases.
N-grams are just sets of n continuous words. So, in “My name is Advait”, the bigrams would be “My name”, “name is”, “is Advait”. That’s what they basically are.
I figure that finding the most common unigrams (ie., the most common words) and the most common bigrams in the data would be interesting. I went one step ahead and plotted them for viewing pleasure of y’all folk. Please not that I removed all stop words (words like ‘is’, ‘a’. ‘an’ etc. They don’t convey much meaning.) and punctuations to make the data more clear.
Times Now - Most common words
Times Now most common bigrams.
As far as most common singular word goes, BJP scores high with them, blazing past the likes of Congress and PM. Honestly, I did not find the unigrams to interesting as a single word does not convey too much meaning.
The bigrams are far more interesting. It is not very surprising that their most popular phrase is ‘TIMES NOW’, since everyone self promotes. Now here is the interesting bit, a close runners up is held by ‘Rahul Gandhi’, which is much much more than PM Modi. This just backs the fact that it is the new channels that give way too much attention to him.
Now lets replicate this for Republic TV.
Republic TV most common words
Here again, I think the unigrams don’t convey much, but I put it up anyway to fill in the pages. Here is the good stuff:
This is quite a bit more interesting than the previous graph. We can see many variations of ‘send us your views’, with the most popular one being “fire views”?
Also Sambit Patra seems to be super popular with this channel. Not surprising since he is a panel regular at the debates, and he and Arnab seem to know each other from his Times Now days.
Also unsurprising is the fact that Rahul Gandhi is the first non-promotional bigram that appears on the list. From all this it is easy to conclude that Rahul generates many clicks and TRP. In hindsight, I should have put his name in the title of this blog for more views.
F-scores and Stunning charts.
This is an interesting corpus. It would be nice to see what words make this corpus truly what it is, or more simply put what words and phrases are more characteristic of a category than others.
For finding terms of importance, a scaled F-score is being utilised. It basically is a method of finding out terms that are statistically significant as compared to other terms. After attaining all the F-scores, and finding term freequencies, this can be plotted:
Don’t be intimidated by the beauty of this chart. It is very interesting, and not too hard to understand. At first glance though, I do admit it looks like one of those fancy meaningless ones you’d find in the annals of r/dataisbeautiful.
The y-axis is the frequency of a term as used by Times Now, and in the x-axis lies the term frequency of Republic TV.
For Republic TV, the most frequent, but characteristic term would be arnab. Wow, what a surprise. For more of these terms look at the bottom right of the chart to find other such terms.
On the flip side, we can see such terms used by Times Now on the top left corner of the chart. Here the theme of not deifying an anchor continues with the use of tnexlcusive over the name of a primetime anchor.
The top right of the chart is very interesting. These are the terms that are characteristic to both the Times Now dataset and the Republic TV data set. The most prominent terms in that region are debate (duh, because who does not like a good shout fest.) , Rahul (that theme continues) and bjp.
You can find more such interesting tidbits by actually interacting with the chart. I do have an interactive version, where you can find term occurences and search terms right here.
On the right side, we find characteristic terms listed out for both Times Now and Republic TV.
We can see which new anchors are given priority over the other here in the Times Now list. Also, Dr Sambit is high on the characteristic-city of Republic, featuring prominently everywhere.
The “Characteristic” list are the characteristic terms of the both the datasets combined. There more Republic related terms here because of the simple fact that Republic tweets way more that Times Now. Also, they repeat themselves a lot. The usuals, ‘rahul’, ‘modi’, ‘mallya’ and ‘patra’ feature on this list. This is a fun graph, and if you actually download it from the link, note that it can take some time to load on your browser as it is a relatively large file (4.9 MB).
Retweets and Likes.
In this section I could not figure out how to present the data in any fancy way. Honestly, there isn’t much in the data either. Here are some boring facts:
Total likes : 704150 Average likes : 50.955206599609234
Total retweets : 283275 Average retweets : 20.498950720023156
Tweets analyzed : 13819
Total likes : 765255 Average likes : 66.52655828914196
Total retweets : 318855 Average retweets : 27.719290619838304
Tweets analyzed : 11503
From this it is evident that the social media outreach on twitter far exceeds that of Republic TV. Despite having over 2000 fewer tweets, Times now have many more likes and retweets than Republic TV. But we should not forget the fact that Times Now have many more followers on twitter as compared to Republic. There is quite the follower burn that is to be expected of an account that tweets so often. Also, most of their likes and retweets come from their top 150ish tweets. This is evident in these completely unecessary charts:
Let us have a look at their top five most like tweets:
- Rahul Gandhi could not even win municipal elections in his own constituency Amethi, says BJP president @AmitShah speaking with @navikakumar #FranklySpeakingWithShah 3293 likes
- "This protocol that PM cannot sit with a foreign pilot but can have a foreign wife, this, I don't understand: Sambit Patra, Spokesperson, BJP #Dec18WithTimesNow", 3086 likes
- '6 hours after Gujarat loss, Rahul Gandhi was watching ‘Star Wars’ at a cinema hall in Delhi. #AreYouSeriousRahul Watch @thenewshour with @navikakumar, 2506 likes
- "HARD FACT: 87% Christians in Mizoram have been given minority status despite Hindu's being at 2.75% #HinduRightsBoost", 2245 likes
- 'I would like PM to take lesson from this and set up a warlike council for fighting corruption: @Swamy39, BJP #UPANotGuilty', 2192 likes
3 out of the 5 tweets have a very clear anti-congress tilt, which makes sense as they are not a particularly popular party and have countless flaws. But we as a people are huge fans of whatboutisms.
These are their most retweeted tweets:
- "HARD FACT: 87% Christians in Mizoram have been given minority status despite Hindu's being at 2.75% #HinduRightsBoost", 2463
- 'Big step to conflict resolution. ‘Positive of outcome soon’ #SriSriMandirMove, 1604
- "This is how AAP responded to TIMES NOW's expose on #AAPHallOfShame", 1583
- "Congress ally Jignesh Mewani tries wooing Muslim voters, asks crowd to chant Allah-Uh-Akbar but the crowd hits back with Modi Chants #AllahRamRaGa', 1334
- 'Hard Facts: 19:00 PM Modi conducted post-mortem on Gujarat election results; at 19:40, Rahul Gandhi watched Star Wars at a cinema hall #AreYouSeriousRahul', 1210
I don’t know what to make of this. Very pro incumbent government.
Most liked :
- "#AmitShahSpeaksToArnab | A significant portion of Gujarat was a dark zone. Now, it isn't. Narmada's waters are reaching many parts: Amit Shah", 2835
- "#AmitShahSpeaksToArnab |I don't dismiss anybody. Every politician has their own standing in an election. But our track record speaks for itself and the Gujarat public will vote for us based on the work we have done: Amit Shah", 2812
- "#AmitShahSpeaksToArnab | WATCH: Amit Shah on the Congress' 'Chaiwala' meme attack at the Prime Minister ", 2344
- '#CongNeechPolitics | Arnab: I want Rahul Gandhi to see this video and comment on it right now. How proud does this make you? @OfficeOfRG', 2336
- 'WATCH the full #AmitShahSpeaksToArnab here [link]', 2134
- '#CongNeechPolitics | Arnab: I want Rahul Gandhi to see this video and comment on it right now. How proud does this make you? @OfficeOfRG', 1660
- 'Where do you stand on the #FirecrackerDebate?', 1289
- "#RepublicWatchesPadmavati | We've watched Padmavati. Get the real story here", 1229
- '#SoniaLetterLeaked | Proven: Sonia Gandhi interfered directly in the Tehelka investigation [link]', 1119
- '#BiggestBoforsInterview | Watch how GujaratCongress Chief Bharatsinh Solanki manhandled Republic TV crew for asking questions on Bofors', 1088
I always was of the opinion that the duty of the media was to keep the incumbent government on its toes, and bring to the forefront important stories that would bring a positive change to the citizens. These media houses don’t seem to share my opinion.
Tweet frequencies and heatmaps
These are a few heatmaps I generated to find tweeting patterns of these channels.
First, I wanted to look at what days they tweeted the most. Here is a heatmap of their tweet frequencies over the days of the week for all twelve weeks of data:
Week 51 Monday and Thursday were on fire for obvious reasons - Gujarat Elections. Other than that, it can be seen that on an average, their tweet frequency reduces as the week passes by.
Here are individual heatmaps. Note that week 40 Mon and Tue data is missing for republic.
We can notice that Republic TV tries to keep its tweet frequency constant (Also, they just tweet more), but the waning of tweet frequency is very evident in the Times now heat map. What would you do with this information? Nothing. This is very, I repeat very, useless information.
A nicer heatmap would be their distribution throughout the day over the week. Here is what I got:
They really up their tweet game in the 8PM to 12AM slot, because my man Arnab is on screen. There is a very steady increase in the number of tweets throughout the day. Their sunday afternoon slot is also pretty loaded because of their strong weekend programming game (like that Anupam Kher show).
Their tweeting pattern is also extremely similar to that of Republic TV. Their lack of debates at weekend primetime slots is also evident. They do have a bright spot during the 12-2 slot on saturday which I am guessing is because they want to promote their show India Upfront.
The Grand Conclusion.
So what? What is so surprising?
Frankly, there is not much new here. The graphs look nice. I did try some sentiment analyses, but it was quite inaccurate and I had to toss that.
Why did I select Times Now and Republic?
They are loud channels who claim to be the best. They are what I would call the pioneers of the new age of crass journalism. Very peurile debates and little to no subtlety. Honestly, I hate these channels and their brand of journalism, so please do consider my biases in the article. Maybe I will do NDTV in the future.
If you liked what you read, please share this. Also, if you disliked or disagree with what you read, feel free to tell me why. I invite all sorts of constructive criticism.