Shivanand Pokhriyal
Member
Discuss Data science with R Deepti's class
Recommended. Know people from your network.
Don't have an account?Sign up Now
To reset your password, enter the email address you registered with and we"ll send your instructions on their way.
Don't have an account?Sign up Now
Want to join the rest of our members? Sign up right away!
Sign Upthis is the very helpful site! thanks shivanandAlso, try browsing the below blogs from dataflair, this is very comprehensive one stop shop for data science topics. If you have such links, please share it.
https://data-flair.training/blogs/data-science-tutorials-home/
Guys have you done both projects???
Hellohi everyone
-----------------------------------------------------------------------------------------------------hi everyone,
please suggest!! Is it ok coding of Walmart project??
data1 <- read.csv("Walmart_Store_sales.csv")
View(data1)
#dividing data into training and validation
library(caTools) #important package
# sample the input data with 70% for training and 30% for testing
sample <- sample.split(data1$Weekly_Sales,SplitRatio=0.70)
sample
train_data <- subset(data1,sample==TRUE) #split of the data using subset command
test_data <- subset(data1,sample==FALSE)
#model building
# ,Will get the independent var
model <- lm(Weekly_Sales ~ ., data = train_data )
summary(model)
#FILTER THE SIGNIFICANT
#rerun the model by using only significant var : consider this as final model
model1 <- lm(Weekly_Sales ~Fuel_Price+CPI+Unemployment,data = train_data)
summary(model1)
#prediction on testing data set
predtest<- predict(model1,test_data)
predtest
# atach it with the dataframe
pred1<- data.frame(predtest)
#plotting actual versus predicted values
#plotting in graphs:we can see red nad blue lines are overlapping which shows best fit model
plot (test_data$Weekly_Sales,col="red",type ="l" )
lines(pred1,col="blue",type ="l")
#to bind the predicted data set with original data set by cbind function
final_data = cbind(test_data, pred1)
sqrt(mean((final_data$Weekly_Sales - final_data$predtest)^2))
write.csv(final_data, "linear_out.csv")
hi everyone,
please suggest!! Is it ok coding of Walmart project??
data1 <- read.csv("Walmart_Store_sales.csv")
View(data1)
#dividing data into training and validation
library(caTools) #important package
# sample the input data with 70% for training and 30% for testing
sample <- sample.split(data1$Weekly_Sales,SplitRatio=0.70)
sample
train_data <- subset(data1,sample==TRUE) #split of the data using subset command
test_data <- subset(data1,sample==FALSE)
#model building
# ,Will get the independent var
model <- lm(Weekly_Sales ~ ., data = train_data )
summary(model)
#FILTER THE SIGNIFICANT
#rerun the model by using only significant var : consider this as final model
model1 <- lm(Weekly_Sales ~Fuel_Price+CPI+Unemployment,data = train_data)
summary(model1)
#prediction on testing data set
predtest<- predict(model1,test_data)
predtest
# atach it with the dataframe
pred1<- data.frame(predtest)
#plotting actual versus predicted values
#plotting in graphs:we can see red nad blue lines are overlapping which shows best fit model
plot (test_data$Weekly_Sales,col="red",type ="l" )
lines(pred1,col="blue",type ="l")
#to bind the predicted data set with original data set by cbind function
final_data = cbind(test_data, pred1)
sqrt(mean((final_data$Weekly_Sales - final_data$predtest)^2))
write.csv(final_data, "linear_out.csv")
yes...finally i submitted my project.-----------------------------------------------------------------------------------------------------
Mine is almost same process except finding the Max sales...
# Finding which store has maximum sales
max_sales<- data1[order(data1$Weekly_Sales,decreasing = T),]
head (max_sales) # Top six performers
#14th store has max sales.