Skip to content

Commit f67c3af

Browse files
committed
initial commit
Release of version 1.0.2 of the library on Pypi
1 parent d923b98 commit f67c3af

17 files changed

+586
-0
lines changed

PerSent.egg-info/PKG-INFO

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
Metadata-Version: 2.4
2+
Name: PerSent
3+
Version: 1.0.2
4+
Summary: Persian Sentiment Analysis Toolkit
5+
Home-page: https://github.com/RezaGooner/PerSent
6+
Author: RezaGooner
7+
Author-email: RezaAsadiProgrammer@Gmail.com
8+
Keywords: persian sentiment analysis nlp
9+
Classifier: Programming Language :: Python :: 3
10+
Classifier: License :: OSI Approved :: MIT License
11+
Classifier: Operating System :: OS Independent
12+
Requires-Python: >=3.8
13+
Description-Content-Type: text/markdown
14+
Requires-Dist: hazm>=0.7.0
15+
Requires-Dist: gensim>=4.0.0
16+
Requires-Dist: scikit-learn>=1.0.0
17+
Requires-Dist: pandas>=1.3.0
18+
Requires-Dist: tqdm>=4.62.0
19+
Requires-Dist: joblib>=1.1.0
20+
Dynamic: author
21+
Dynamic: author-email
22+
Dynamic: classifier
23+
Dynamic: description
24+
Dynamic: description-content-type
25+
Dynamic: home-page
26+
Dynamic: keywords
27+
Dynamic: requires-dist
28+
Dynamic: requires-python
29+
Dynamic: summary
30+
31+
# PerSent - Persian Sentiment Analyzer
32+
[![فارسی](https://img.shields.io/badge/Persian-فارسی-blue.svg)](README.fa.md)
33+
34+
35+
![PerSent Logo](https://github.com/user-attachments/assets/6bb1633b-6ed3-47fa-aae2-f97886dc4e22)
36+
37+
## Introduction
38+
PerSent is a Python library designed for Persian sentiment analysis. The name stands for "Persian Sentiment Analyzer". Currently in its early testing phase, PerSent provides tools for analyzing sentiment in Persian text, particularly useful for product reviews and service feedback.
39+
40+
## Features
41+
- Sentiment classification into three categories:
42+
- `recommended`
43+
- `not_recommended`
44+
- `no_idea`
45+
- Single text analysis
46+
- Batch processing from CSV files
47+
- Summary report generation
48+
49+
## Installation
50+
Install the latest version using pip:
51+
52+
```bash
53+
pip install PerSent
54+
```
55+
For a specific version:
56+
57+
``` bash
58+
pip install PerSent==<VERSION_NUMBER>
59+
```
60+
61+
## Basic Usage
62+
### Single Text Analysis
63+
``` bash
64+
from PerSent import CommentAnalyzer
65+
66+
# Initialize analyzer
67+
analyzer = CommentAnalyzer()
68+
69+
# Load pre-trained model
70+
analyzer.load_model()
71+
72+
# Analyze text
73+
text = "کیفیت عالی داشت"
74+
result = analyzer.predict(text)
75+
print(f"Sentiment: {result}")
76+
# Output: Sentiment: recommended
77+
```
78+
79+
### Training Your Own Model
80+
``` bash
81+
'''
82+
Train the model using a CSV file containing:
83+
- Comments
84+
- Recommendation status (recommended/not_recommended/no_idea)
85+
'''
86+
analyzer.train("train.csv")
87+
```
88+
89+
## Batch Processing
90+
### CSV Processing
91+
92+
``` bash
93+
analyzer.csvPredict(
94+
input_csv="comments.csv",
95+
output_path="results.csv"
96+
)
97+
```
98+
99+
### Advanced CSV Processing Options
100+
``` bash
101+
# Using column index
102+
analyzer.csvPredict("comments.csv", "results.csv", None, 0)
103+
104+
# Using column name
105+
analyzer.csvPredict("comments.csv", "results.csv", None, "Comments")
106+
107+
# With summary report
108+
analyzer.csvPredict("comments.csv", "results.csv", "summary.csv")
109+
```
110+
111+
## Dataset
112+
A sample training dataset is available:
113+
[Download Dataset](https://github.com/RezaGooner/Sentiment-Survey-Analyzer/tree/main/Dataset/big_train)
114+
115+
## Contribution
116+
We welcome contributions and feedback:
117+
118+
- [Fork Repository & Pull Request](https://github.com/RezaGooner/PerSent/fork)
119+
- [Make Issue](https://github.com/RezaGooner/PerSent/issues/new)
120+
- E-Mail : ```RezaAsadiProgrammer@gmail.com```
121+
- Telegram : ```@RezaGooner```

PerSent.egg-info/SOURCES.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
README.md
2+
setup.py
3+
PerSent/CommentAnalyzer.py
4+
PerSent/__init__.py
5+
PerSent.egg-info/PKG-INFO
6+
PerSent.egg-info/SOURCES.txt
7+
PerSent.egg-info/dependency_links.txt
8+
PerSent.egg-info/requires.txt
9+
PerSent.egg-info/top_level.txt
10+
PerSent/model/classifier.joblib
11+
PerSent/model/word2vec.model

PerSent.egg-info/dependency_links.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

PerSent.egg-info/requires.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
hazm>=0.7.0
2+
gensim>=4.0.0
3+
scikit-learn>=1.0.0
4+
pandas>=1.3.0
5+
tqdm>=4.62.0
6+
joblib>=1.1.0

PerSent.egg-info/top_level.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
PerSent

PerSent/CommentAnalyzer.py

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
#import necessary library
2+
import pandas as pd
3+
from hazm import Normalizer, word_tokenize, Stemmer, stopwords_list
4+
import re
5+
from tqdm import tqdm
6+
from gensim.models import Word2Vec
7+
import numpy as np
8+
from sklearn.linear_model import LogisticRegression
9+
from sklearn.model_selection import train_test_split
10+
import os
11+
import joblib
12+
13+
class CommentAnalyzer:
14+
def __init__(self, model_dir='PerSent/model'):
15+
self.normalizer = Normalizer()
16+
self.stemmer = Stemmer()
17+
self.stopwords = set(stopwords_list())
18+
self.model_dir = model_dir
19+
self.vectorizer = None
20+
self.classifier = None
21+
22+
# make /model Directory if not exist
23+
os.makedirs(self.model_dir, exist_ok=True)
24+
25+
def _preprocess_text(self, text):
26+
"""PreProcess Persian Text"""
27+
# Normalizing
28+
text = self.normalizer.normalize(str(text))
29+
30+
# remove number and sign
31+
text = re.sub(r'[!()-\[\]{};:\'",؟<>./?@#$%^&*_~۰-۹\d]+', ' ', text)
32+
text = re.sub(r'\s+', ' ', text).strip()
33+
34+
# tokenize and stemming
35+
tokens = word_tokenize(text)
36+
processed_tokens = [
37+
self.stemmer.stem(token)
38+
for token in tokens
39+
if token not in self.stopwords and len(token) > 1
40+
]
41+
42+
return processed_tokens
43+
44+
def _sentence_vector(self, sentence, model):
45+
"""convert sentences to vector by word2vec model"""
46+
vectors = []
47+
for word in sentence:
48+
try:
49+
vectors.append(model.wv[word])
50+
except KeyError:
51+
vectors.append(np.zeros(100))
52+
return np.mean(vectors, axis=0) if vectors else np.zeros(100)
53+
54+
def train(self, train_csv, test_size=0.2, vector_size=100, window=5):
55+
"""Train model"""
56+
# read data
57+
df = pd.read_csv(train_csv)
58+
df['tokens'] = df['body'].apply(self._preprocess_text)
59+
60+
# train Word2Vec model
61+
self.vectorizer = Word2Vec(
62+
sentences=df['tokens'],
63+
vector_size=vector_size,
64+
window=window,
65+
min_count=1,
66+
workers=4
67+
)
68+
69+
# convert sentences to vector
70+
X = np.array([self._sentence_vector(s, self.vectorizer) for s in df['tokens']])
71+
y = df['recommendation_status'].map({
72+
"no_idea": 2,
73+
"recommended": 1,
74+
"not_recommended": 0
75+
}).values
76+
77+
# make train and test data
78+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size)
79+
80+
self.classifier = LogisticRegression(max_iter=1000)
81+
self.classifier.fit(X_train, y_train)
82+
83+
# save model
84+
self.save_model()
85+
86+
# evaluation
87+
accuracy = self.classifier.score(X_test, y_test)
88+
return accuracy
89+
90+
def predict(self, text):
91+
"""Predict text sentiment"""
92+
if not self.classifier or not self.vectorizer:
93+
raise Exception("Model not trained! Call train() first or load a pretrained model.")
94+
95+
tokens = self._preprocess_text(text)
96+
vector = self._sentence_vector(tokens, self.vectorizer)
97+
prediction = self.classifier.predict([vector])[0]
98+
99+
return {
100+
0: "not_recommended",
101+
1: "recommended",
102+
2: "no_idea"
103+
}[prediction]
104+
105+
def save_model(self):
106+
"""save trained model"""
107+
joblib.dump(self.classifier, os.path.join(self.model_dir, 'classifier.joblib'))
108+
self.vectorizer.save(os.path.join(self.model_dir, 'word2vec.model'))
109+
110+
def load_model(self):
111+
"""reload from file"""
112+
self.classifier = joblib.load(os.path.join(self.model_dir, 'classifier.joblib'))
113+
self.vectorizer = Word2Vec.load(os.path.join(self.model_dir, 'word2vec.model'))
114+
115+
def csvPredict(self, input_csv, output_path, summary_path=None, text_column=0):
116+
"""
117+
Analyze sentiment for comments in a CSV file and save results
118+
119+
Parameters:
120+
input_csv (str): Path to input CSV file
121+
output_path (str): Path to save output CSV file
122+
text_column (str/int, optional): Name or index (0-based) of column containing comments.
123+
Defaults to 0 (first column).
124+
summary_path (str, optional): Path to save prediction summary report.
125+
If None, no summary will be saved.
126+
"""
127+
try:
128+
# Read input CSV
129+
df = pd.read_csv(input_csv)
130+
131+
# Determine the correct column
132+
if isinstance(text_column, int):
133+
# Handle negative indices
134+
if text_column < 0:
135+
text_column = len(df.columns) + text_column
136+
137+
if text_column >= len(df.columns) or text_column < 0:
138+
raise ValueError(f"Column index {text_column} is out of range")
139+
140+
column_name = df.columns[text_column]
141+
else:
142+
if text_column not in df.columns:
143+
raise ValueError(f"Column '{text_column}' not found in CSV file")
144+
column_name = text_column
145+
146+
# Analyze each comment
147+
tqdm.pandas(desc="Analyzing comments")
148+
df['sentiment'] = df[column_name].progress_apply(self.predict)
149+
150+
# Save results
151+
df.to_csv(output_path, index=False, encoding='utf-8-sig')
152+
print(f"Results saved to {output_path}")
153+
154+
# Generate and save summary if requested
155+
if summary_path:
156+
summary = self._generate_summary(df)
157+
summary.to_csv(summary_path, index=False, encoding='utf-8-sig')
158+
print(f"Summary report saved to {summary_path}")
159+
160+
return df
161+
162+
except Exception as e:
163+
print(f"Error: {str(e)}")
164+
return None
165+
166+
def _generate_summary(self, df):
167+
"""Generate prediction summary statistics"""
168+
# Count each sentiment
169+
counts = df['sentiment'].value_counts().to_dict()
170+
171+
# Create summary dataframe
172+
summary = pd.DataFrame({
173+
'Category': [
174+
'Recommended',
175+
'Not Recommended',
176+
'No Idea',
177+
'Total',
178+
'Model Accuracy'
179+
],
180+
'Count': [
181+
counts.get('recommended', 0),
182+
counts.get('not_recommended', 0),
183+
counts.get('no_idea', 0),
184+
len(df),
185+
'N/A' # Accuracy needs to be calculated during training
186+
],
187+
'Percentage': [
188+
f"{100 * counts.get('recommended', 0) / len(df):.2f}%",
189+
f"{100 * counts.get('not_recommended', 0) / len(df):.2f}%",
190+
f"{100 * counts.get('no_idea', 0) / len(df):.2f}%",
191+
'100%',
192+
'N/A'
193+
]
194+
})
195+
196+
return summary
197+
198+
199+
# Github : RezaGooner

PerSent/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
from .CommentAnalyzer import CommentAnalyzer
2+
3+
__version__ = "1.0.2"
4+
__all__ = ['CommentAnalyzer']

PerSent/model/classifier.joblib

3.22 KB
Binary file not shown.

PerSent/model/word2vec.model

28.8 MB
Binary file not shown.

0 commit comments

Comments
 (0)