top of page

THE PRODUCT

Our new product, SP Story, is a dataset for data scientists and AI/ML model builders who work on stock market prediction, giving them an integrated package of price, image, story (free text), and demographic data. We believe that stories – the way people describe and predict stock price movement – provide a valuable new type of insight for price prediction models, and we have fully integrated these stories with stock price history, graphs, and demographics, to use in building more insightful models. 

 

Red Weather uses a uniform, repeatable crowdsourcing process to gather free text descriptions and predictions of S&P 500 stock market data from over 2,100 people, each tested for English proficiency, from all 50 US states. Each of these responses is linked to price data, price movement graphs, and the demographic profile of each participant, to comprise the SP Story dataset. Data scientists can build predictive models using the SP Story data alone, or can integrate it with economic, newsfeed, social media, and any other data.

claudio-schwarz-fyeOxvYvIyY-unsplash.jpg

PRODUCT FEATURES
The SP Story dataset includes the following files:
 

  • Daily S&P price history: a .CSV file of S&P 500 closing prices for 23,594 trading days, covering over 90 years, with one row per trading day.

  • Price history graphs: 50 .JPG graphs of randomly chosen periods, each covering between 26 and 101 consecutive trading days. These graphs were used as the input to the crowdsourcing process; the participants each viewed one graph and wrote their ‘story’, as well as making a price prediction based on the graph.

  • Price-to-graph links: a .CSV file with links between each price history graph and its corresponding price history data. Using this file, the data scientist or modeler can find the complete price history (all closing prices) corresponding to each data graph.

  • Survey results: a .CSV file with the crowdsourcing results. This contains 2,156 rows, with one row per respondent. The fields in this file are:

    • The participant’s story, a free-text description of the price movement in the price graph viewed by the participant. These stories average 45 words in length..

    • A prediction from the participant about whether the price would go up or down after the time period covered by the graph.

    • Demographic information for the respondent: age range, US state of residence, gender, and educational level.

    • A randomized identifier of the respondent.

 

There is no PII of the participants in this dataset.

 

This integrated set of files is used by the data scientist or model builder to develop predictive models, using any combination of the data.

 

USER'S GUIDE
 

A complete User’s Guide is included with the SP Story dataset. The User’s Guide gives a full description of each data file, including the details of every field in each CSV file.

bottom of page