Post by CC06 on Jul 12, 2020 21:11:24 GMT -5
As most of you have probably seen in the GroupMe, I've created a program that will compile a player's statistics over a given day range. Some of you have asked me for the code, so I figured I'd go ahead and share it publicly. First, if you don't have R downloaded on your computer, go ahead and do so now.
Link: www.r-project.org/
Next, run this code in R to create the functions that compile the statistics:
If you're getting an error that R can't find the rvest package, type the following line of code first, and then re-run (you'll only have to do this once):
Once you've done that, you should be ready to go to compile the statistics all on your own. To run the selection for a given player, run the following code, for example:
If you want to narrow down the date ranges, you can add in start and end arguments into the parameters. For example, if you wanted the same stats but only for Days 61-90:
Finally, you can put in an argument to have the code print PER36 stats as well. For example:
You can combine multiple arguments in one call if you wish.
As some of you may have seen, I've made it so the code can quickly do multiple people on the same team at once. In order to do that, you must first create a variable name equal to the team name exactly. Then, you run the get_team_stats function. For example:
Unfortunately, the naming conventions don't allow for the get_team_stats function to be run properly for the Trail Blazers (two word name) or the 76ers (name starts with a number). Please note that if you want to change the arguments in the get_team_stats function like above, you must edit the arguments as they're called in the get_player_stats function within get_team_stats.
Feel free to reach out to me with any questions about this if you run into trouble.
Link: www.r-project.org/
Next, run this code in R to create the functions that compile the statistics:
rm(list=ls())
options(stringsAsFactors = FALSE)
options(digits=3)
library('rvest')
path <- "http://tsfbl.dx.am/"
# ADD YEAR PLUS / FOR PAST SEASONS
teams <- c("Celtics", "Heat", "Nets", "Knicks", "Magic", "76ers", "Wizards",
"Hawks", "Hornets", "Bulls", "Cavaliers", "Pistons", "Pacers",
"Bucks", "Raptors", "Mavericks", "Nuggets", "Rockets",
"Timberwolves", "Spurs", "Jazz", "Grizzlies", "Warriors",
"Clippers", "Lakers", "Suns", "Trail Blazers", "Kings",
"SuperSonics")
colMax <- function(data) sapply(data, max, na.rm = TRUE)
get_player_stats <- function(player, team, start=1, end=200, per36=F){
stats <- c()
for(i in 1:length(team)){
index <- which(team[i] == teams)
url <- paste(path, "/rosters/roster", index, "sched.htm", sep="")
webpage <- read_html(url)
links <- html_attr(html_nodes(webpage, "a"), "href")
links <- links[10:(length(links)-1)]
for(j in 1:length(links)){
url <- paste(path, substr(links[j], 3, nchar(links[j])), sep="")
day <- as.numeric(substr(url, gregexpr("boxes/", url)[[1]][1]+6, gregexpr("-", url)[[1]][1]-1))
if(day >= start & day <= end){
full_box <- html_table(read_html(url), header=TRUE, fill=TRUE)
if(colnames(full_box[[2]])[1] == team[i]){
my_team_index <- 2
opp_team_index <- 3
} else{
my_team_index <- 3
opp_team_index <- 2
}
box <- full_box[[my_team_index]]
box_team <- box[1:(nrow(box)-2), 1:(ncol(box)-1)]
box_team[,3:16] <- lapply(box_team[,3:16], as.numeric)
colnames(box_team)[6:7] <- c("X3P", "X3PA")
if(colnames(box_team)[1] == team[i]){
index <- which(box_team[,1] == player)
if(length(index) > 0){
colnames(box_team)[1] <- "Team"
stats <- rbind(stats, box_team[index,])
}
}
}
}
}
if(is.null(stats)){
cat("No stats found.\n")
} else if(nrow(stats) == 0){
cat("No stats found.\n")
} else{
stats <- cbind.data.frame(GM=nrow(stats), stats[, 3:16])
cat("AVERAGES\n")
averages <- colMeans(stats)
print(round(averages, digits=1))
if(per36 == TRUE){
cat("\nPER 36\n")
constant <- (36 / averages[2])
for(i in 2:length(averages)){
averages[i] <- constant * averages[i]
}
print(round(averages, digits=1))
}
cat("\nPERCENTAGES\n")
print(c(sum(stats$FG) / sum(stats$FGA), sum(stats$X3P) / sum(stats$X3PA),
sum(stats$FT) / sum(stats$FTA)))
}
}
get_team_stats <- function(team){
for(i in 1:length(team)){
cat(team[i])
cat("\n")
get_player_stats(team[i], deparse(substitute(team)), start = 1)
cat("\n")
}
}
If you're getting an error that R can't find the rvest package, type the following line of code first, and then re-run (you'll only have to do this once):
install.packages("rvest")
Once you've done that, you should be ready to go to compile the statistics all on your own. To run the selection for a given player, run the following code, for example:
get_player_stats("Kobe Bryant", "SuperSonics")
If you want to narrow down the date ranges, you can add in start and end arguments into the parameters. For example, if you wanted the same stats but only for Days 61-90:
get_player_stats("Kobe Bryant", "SuperSonics", start=61, end=90)
Finally, you can put in an argument to have the code print PER36 stats as well. For example:
get_player_stats("Kobe Bryant", "SuperSonics", per36=TRUE)
You can combine multiple arguments in one call if you wish.
As some of you may have seen, I've made it so the code can quickly do multiple people on the same team at once. In order to do that, you must first create a variable name equal to the team name exactly. Then, you run the get_team_stats function. For example:
SuperSonics <- c("Kenny Anderson", "Kobe Bryant")
get_team_stats(SuperSonics)
Unfortunately, the naming conventions don't allow for the get_team_stats function to be run properly for the Trail Blazers (two word name) or the 76ers (name starts with a number). Please note that if you want to change the arguments in the get_team_stats function like above, you must edit the arguments as they're called in the get_player_stats function within get_team_stats.
Feel free to reach out to me with any questions about this if you run into trouble.