Skip to content

Create Moving-avg.py #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 8, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions Moving-avg.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# In a DataFrame df_sales with columns Date, ProductID, and QuantitySold, how would you calculate a 7-day rolling average of QuantitySold for each product?

from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.sql.window import Window
import pyspark.sql.functions as F

# Initialize Spark session
spark = SparkSession.builder.appName("RollingAverageCalculation").getOrCreate()

# Sample data
data = [Row(Date='2023-01-01', ProductID=100, QuantitySold=10),
Row(Date='2023-01-02', ProductID=100, QuantitySold=15),
Row(Date='2023-01-03', ProductID=100, QuantitySold=20),
Row(Date='2023-01-04', ProductID=100, QuantitySold=25),
Row(Date='2023-01-05', ProductID=100, QuantitySold=30),
Row(Date='2023-01-06', ProductID=100, QuantitySold=35),
Row(Date='2023-01-07', ProductID=100, QuantitySold=40),
Row(Date='2023-01-08', ProductID=100, QuantitySold=45)]

# Create DataFrame
df_sales = spark.createDataFrame(data)

# Convert Date string to Date type
df_sales = df_sales.withColumn("Date", F.to_date(F.col("Date")))

# Window specification for 7-day rolling average
windowSpec = Window.partitionBy('ProductID').orderBy('Date').rowsBetween(-6, 0)

# Calculating the rolling average
rollingAvg = df_sales.withColumn('7DayAvg', F.avg('QuantitySold').over(windowSpec))

# Show results
rollingAvg.show()