Programming Weekly
Posts
Building an A.I. in Python for Sentiment Analysis on Economic Development Projects

Building an A.I. in Python for Sentiment Analysis on Economic Development Projects

Building an AI in Python for Sentiment Analysis on economic development projects and the overall economy

February 04, 2023

Introduction

This article covers code that performs sentiment analysis on news articles related to economic development. The code uses various libraries such as the requests library to get articles from the New York Times API, nltk library for text pre-processing, sklearn library for training a Naive Bayes classifier, and textblob library for sentiment analysis on unseen data.

Step 1: Getting News Articles

In step 1, we use the requests library to make a GET request to the New York Times API. The API endpoint is specified in the url variable and the search parameters are specified in the params dictionary. The search parameters contain the search term economic development and the API key. Replace your API key with one you receive from New York Times.

The API response is stored in the response variable, and the articles are taken from the response and stored in the articles list via JSON. Finally, the code prints the headline and lead paragraph of each article.

import requests# API endpoint for the New York Times Article Search APIurl = "https://api.nytimes.com/svc/search/v2/articlesearch.json"# Search parameters for articles containing the term "economic development"params = { "q": "economic development", "api-key": "your_api_key"}# Send a GET request to the API and store the responseresponse = requests.get(url, params=params)# Extract the articles from the responsearticles = response.json()["response"]["docs"]# Iterate over the articles and print their headline and lead_paragraphfor article in articles: print("Headline:", article["headline"]["main"]) print("Lead Paragraph:", article["lead_paragraph"])

Result:

Headline: A Timeline of Hurricane SandyLead Paragraph: A tropical wave leaves the west coast of Africa and within a week reaches the Caribbean Sea, where it develops into a hurricane by Oct. 24. It makes landfall in Jamaica, then Cuba, before passing through the Bahamas, where it increases greatly in size.Headline: I’m No Longer Sure New York Will Protect Itself From Rising WatersLead Paragraph: Ten years ago, as Hurricane Sandy bore down on New York City, I was standing in knee-high, yellow rubber boots on the ground floor of the Downtown Manhattan building where I live and work. We had piled sandbags outside the front door, covered them with tarps and borrowed sump pumps in preparation. I was frightened.Headline: As Tough Elections Loom in Turkey, Erdogan Is Spending for VictoryLead Paragraph: Just months before pivotal elections that could reshape Turkey’s domestic and foreign policy, the government is spending billions of dollars in state funds to bolster President Recep Tayyip Erdogan and his governing party at the ballot box while unleashing an array of legal threats to weaken those who seek to unseat him....

Step 2: Pre-processing the Text

In step 2, we perform text pre-processing on the lead paragraph of each article. we use the nltk library for this task. The code downloads the required resources for text pre-processing such as the Punkt tokenizer, stopwords, and WordNetLemmatizer.

The code removes stop words and punctuation, lemmatizes the words, and joins the words back into a single string. The pre-processed text is stored in the preprocessed_text field of each article.

import nltkfrom nltk.corpus import stopwordsfrom nltk.tokenize import word_tokenizefrom nltk.stem import WordNetLemmatizernltk.download('punkt')nltk.download('stopwords')nltk.download('wordnet')stop_words = set(stopwords.words('english'))lemmatizer = WordNetLemmatizer()# Iterate over the articlesfor article in articles: # Tokenize the lead_paragraph words = word_tokenize(article["lead_paragraph"]) # Remove stop words and punctuation words = [word.lower() for word in words if word.isalpha() and word.lower() not in stop_words] # Lemmatize the words words = [lemmatizer.lemmatize(word) for word in words] # Join the words back into a single string preprocessed_text = ' '.join(words) # Store the preprocessed text article["preprocessed_text"] = preprocessed_text print("Preprocessed Text:", preprocessed_text)

Result:

Preprocessed Text: tropical wave leaf west coast africa within week reach caribbean sea develops hurricane make landfall jamaica cuba passing bahamas increase greatly sizePreprocessed Text: ten year ago hurricane sandy bore new york city standing yellow rubber boot ground floor downtown manhattan building live work piled sandbag outside front door covered tarp borrowed sump pump preparation frightenedPreprocessed Text: month pivotal election could reshape turkey domestic foreign policy government spending billion dollar state fund bolster president recep tayyip erdogan governing party ballot box unleashing array legal threat weaken seek unseatPreprocessed Text: close confidant new york city mayor eric adam put payroll city economic development corporation earlier year earning salary making among highest paid employee city government according city record document released late friday afternoonPreprocessed Text: higher inflation slower growth heavy price global economy paying russia war ukraine organization economic cooperation development said tuesday...

Step 3: Sentiment Analysis

In step 3, we perform sentiment analysis on the pre-processed text of each article. We use the nltk library and the SentimentIntensityAnalyzer class to perform sentiment analysis. The code calculates the sentiment score for each article and stores it in the sentiment_score field.

The code then separates the articles into positive, negative, and neutral categories based on their sentiment scores.

Finally, we print the number of articles in each category and the headline of a few articles from each category.

nltk.download('vader_lexicon')from nltk.sentiment import SentimentIntensityAnalyzersentiment_analyzer = SentimentIntensityAnalyzer()# Iterate over the articlesfor article in articles: # Get the sentiment score from the SentimentIntensityAnalyzer sentiment_score = sentiment_analyzer.polarity_scores(article["preprocessed_text"])["compound"] # Assign the sentiment label based on the sentiment score if sentiment_score >= 0.05: sentiment_label = "positive" elif sentiment_score <= -0.05: sentiment_label = "negative" else: sentiment_label = "neutral" # Store the sentiment label article["sentiment_label"] = sentiment_label print("Sentiment Label:", sentiment_label)

Result:

Sentiment Label: positiveSentiment Label: negativeSentiment Label: negativeSentiment Label: neutralSentiment Label: negative...

Step 4: Training a Naïve Bayes Classifier

In step 4, we train a Naïve Bayes classifier using the sklearn library. we use the pre-processed text of each article as the feature set and the sentiment category of each article as the target label.

The code splits the data into training and testing sets and trains the classifier on the training set. The code uses the MultinomialNB class from the sklearn.naive_bayes module.

Finally, the code evaluates the performance of the classifier on the testing set and prints the accuracy score.

from sklearn.feature_extraction.text import CountVectorizer# Initialize the CountVectorizervectorizer = CountVectorizer()# Fit the CountVectorizer to the preprocessed textvectorized_text = vectorizer.fit_transform([article["preprocessed_text"] for article in articles])# Convert the vectorized text to an arraybag_of_words = vectorized_text.toarray()print("Bag of Words Shape:", bag_of_words.shape)from sklearn.naive_bayes import MultinomialNBfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_scorelabels = [1, -1, -1, 0, -1, 1, 1, 0, 1, 1]# Split the data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(bag_of_words, labels, test_size=0.2, random_state=42)# Initialize the Naive Bayes classifiernb_classifier = MultinomialNB()# Fit the classifier to the training datanb_classifier.fit(X_train, y_train)# Predict the sentiment of the testing datay_pred = nb_classifier.predict(X_test)# Calculate the accuracy of the classifieraccuracy = accuracy_score(y_test, y_pred)print("Accuracy:", accuracy)

Result:

Accuracy: 0.0

As you can see, our model is highly inaccurate. This model is the basis for an expanded one I created using many different news sources and techniques to build the optimal A.I.

Step 5: Sentiment Analysis on Unseen Data

In step 5, we use the textblob library to perform sentiment analysis on unseen data. We create a TextBlob object for each article and uses the sentiment.polarity attribute to get the sentiment score.

The code separates the articles into positive, negative, and neutral categories based on their sentiment scores and prints the number of articles in each category and the headlines of a few articles from each category.

#step 6, analysis on unseen datafrom textblob import TextBlob# New, unseen datanew_data = [ "The government's recent economic policies have had a positive impact on the country's growth.", "The government's recent economic policies have had a negative impact on the country's growth.", "Many people are angry that they can't make ends meet despite the announcment.", "The current state of the economy is uncertain and causing concern among investors.", "XYZ Corp is making strives in economic development", "XYZ Corp is falling behind in developments"]# Predict the sentiment of the new datafor text in new_data: blob = TextBlob(text) sentiment = blob.sentiment.polarity if sentiment > 0: print(f"Text: {text}\nSentiment: Positive\n") elif sentiment < 0: print(f"Text: {text}\nSentiment: Negative\n") else: print(f"Text: {text}\nSentiment: Neutral\n")o

Result:

Text: The government's recent economic policies have had a positive impact on the country's growth.Sentiment: PositiveText: The government's recent economic policies have had a negative impact on the country's growth.Sentiment: NegativeText: Many people are angry that they can't make ends meet despite the announcment.Sentiment: NeutralText: The current state of the economy is uncertain and causing concern among investors.Sentiment: NeutralText: XYZ Corp is making strives in economic developmentSentiment: PositiveText: XYZ Corp is falling behind in developmentsSentiment: Negative

As you can see, our AI is doing a decent job at predicting the sentiment of a statement on economic development and even the broader economy.

Conclusion

We first import the necessary libraries and download the required NLTK resources.
Then we use the New York Times Article Search API to retrieve articles related to “economic development”. The response from the API is stored as a JSON object, from which the articles are extracted and processed.
The text of each article’s lead paragraph is preprocessed by tokenizing the words, removing stop words and punctuation, and lemmatizing the remaining words.
Sentiment analysis is performed on the preprocessed text using the SentimentIntensityAnalyzer from the NLTK library. The sentiment score is calculated and labeled as “positive”, “negative”, or “neutral”.
A bag of words representation is created using the CountVectorizer from scikit-learn.
The Naive Bayes classifier is trained using the bag of words representation and the sentiment labels, and the accuracy of the classifier is calculated on a test set.
Finally, the code uses TextBlob to perform sentiment analysis on some new, unseen data.

Thanks for reading!

-Nick

Reply

or to participate.