yfR facilitates importing stock prices from Yahoo
finance, organizing the data in the
tidy format and
speeding up the process using a cache system and parallel computing.
yfR is the second and backwards-incompatible version of BatchGetSymbols,
released in 2016 (see vignette yfR
and BatchGetSymbols for details).
In a nutshell, Yahoo Finance (YF) provides a vast repository of stock price data around the globe. It covers a significant number of markets and assets, being used extensively in academic research and teaching. In order to import the financial data from YF, all you need is a ticker (id of a stock, e.g. “GM” for General Motors) and a time period – first and last date.
The main function of the package,
yfR::yf_get, returns a
dataframe with the financial data. All price data is measured at the
unit of the financial exchange. For example, price data for GM
(NASDAQ/US) is measured in dollars, while price data for PETR3.SA
(B3/BR) is measured in Reais (Brazilian currency).
The returned data contains the following columns:
ticker: The requested tickers (ids of stocks);
ref_date: The reference day (this can also be year/month/week when using argument freq_data);
price_open: The opening price of the day/period;
price_high: The highest price of the day/period;
price_close: The close/last price of the day/period;
volume: The financial volume of the day/period, in the unit of the exchange;
price_adjusted: The stock price adjusted for corporate events such as splits, dividends and others – this is usually what you want/need for studying stocks as it represents the real financial performance of stockholders;
ret_adjusted_prices: The arithmetic or log return (see input type_return) for the adjusted stock prices;
ret_adjusted_prices: The arithmetic or log return (see input type_return) for the closing stock prices;
cumret_adjusted_prices: The accumulated arithmetic/log return for the period (starts at 100%).
The easiest way to find the tickers of a company stock is to search for it in Yahoo Finance’s website. At the top page you’ll find a search bar:
A company can have many different stocks traded at different markets (see picture above). As the example shows, Petrobras is traded at NYQ (New York Exchange), SAO (Sao Paulo/Brazil - B3 exchange) and BUE (Buenos Aires/Argentina Exchange), all with different symbols (tickers). For market indices, a list of tickers is available here.
Fetches daily/weekly/monthly/annual stock prices/returns from yahoo finance and outputs a dataframe (tibble) in the long format (stacked data);
A new feature called collections facilitates
download of multiple tickers from a particular market/index. You can,
for example, download data for all stocks in the SP500 index with a
simple call to
A session-persistent smart cache system is available by default. This means that the data is saved locally and only missing portions are downloaded, if needed.
All dates are compared to a benchmark ticker such as SP500 and, whenever an individual asset does not have a sufficient number of dates, the software drops it from the output. This means you can choose to ignore tickers with a high proportion of missing dates.
A customized function called
can transform the long dataframe into a wide format (tickers as
columns), much used in portfolio optimization. The output is a list
where each element is a different target variable (prices, returns,
Parallel computing with package
furrr is available,
speeding up the data importation process.
# CRAN (stable) install.packages('yfR') # Github (dev version) devtools::install_github('ropensci/yfR') # ropensci install.packages("yfR", repos = "https://ropensci.r-universe.dev")
library(yfR) # set options for algorithm <- 'META' my_ticker <- Sys.Date() - 30 first_date <- Sys.Date() last_date # fetch data <- yf_get(tickers = my_ticker, df_yf first_date = first_date, last_date = last_date) #> #> ── Running yfR for 1 stocks | 2023-01-17 --> 2023-02-16 (30 days) ── #> #> ℹ Downloading data for benchmark ticker ^GSPC #> ℹ (1/1) Fetching data for META #> ! - not cached #> ✔ - cache saved successfully #> ✔ - got 22 valid rows (2023-01-17 --> 2023-02-15) #> ✔ - got 100% of valid prices -- Time for some tea? #> ℹ Binding price data #> #> ── Diagnostics ───────────────────────────────────────────────────────────────── #> ✔ Returned dataframe with 22 rows -- Youre doing good! #> ℹ Using 6.3 kB at /tmp/RtmpvCnCwr/yf_cache for 2 cache files #> ℹ Out of 1 requested tickers, you got 1 (100%) # output is a tibble with data head(df_yf) #> # A tibble: 6 × 11 #> ticker ref_date price_open price_h…¹ price…² price…³ volume price…⁴ ret_ad…⁵ #> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 META 2023-01-17 136. 137. 134. 135. 2.11e7 135. NA #> 2 META 2023-01-18 136. 137. 133. 133. 2.02e7 133. -1.73e-2 #> 3 META 2023-01-19 132. 137. 132. 136. 2.86e7 136. 2.35e-2 #> 4 META 2023-01-20 136. 140. 135. 139. 2.86e7 139. 2.37e-2 #> 5 META 2023-01-23 139. 144. 139. 143. 2.75e7 143. 2.80e-2 #> 6 META 2023-01-24 142. 145 141. 143. 2.20e7 143. -9.07e-4 #> # … with 2 more variables: ret_closing_prices <dbl>, #> # cumret_adjusted_prices <dbl>, and abbreviated variable names ¹price_high, #> # ²price_low, ³price_close, ⁴price_adjusted, ⁵ret_adjusted_prices
yfR is based on quantmod (@joshuaulrich) and uses one of its
quantmod::getSymbols) for fetching raw data from
Yahoo Finance. As with any API, there is significant work in maintaining
the code. Joshua was always fast and openminded in implemented required
changes, and I’m very grateful for it.