Reading pdf from R

 

Getting a list of documents in a folder

 

file_vector <- list.files(path = “data”)

head(file_vector)

 

Get the list of pdf files in a folder

grepl(“.pdf”,file_vector)

 

Extract only pdf files

pdf_list <- file_vector[grepl(“.pdf”,file_vector)]

head(pdf_list)

 

Install package pdftools [Note: only for one time]

install.packages(“pdftools”)

 

Include the package in the R environment:

library(“pdftools”)

 

 

pdf_text(“data/geo plitics main ppt.pdf”)

pdftxt = pdf_text(“data/geo plitics main ppt.pdf”)

splitpdftxt = strsplit(pdftxt, split = “\n”)

print(splitpdftxt)