Step 2: Fetch the Data
With the library installed, you can now fetch the GDP data. The identifier for the dataset we are interested in is 84087ENG. Here’s how you can retrieve the data and load it into a Pandas DataFrame:
data = pd.DataFrame(cbsodata.get_data(‘84087ENG’))
Step 3: Select the Relevant Columns
The dataset contains numerous columns, but for this example, we’ll focus on the GrossDomesticProduct_21 column, which represents the GDP values. We’ll also include the Periods column to keep track of the time periods:
data = data[[‘Periods’, ‘GrossDomesticProduct_21’]]
Step 4: Convert Pandas DataFrame to Spark DataFrame
Databricks is optimized for working with Spark DataFrames, so the next step is to convert our Pandas DataFrame to a Spark DataFrame:
spark_data = spark.createDataFrame(data)
Step 5: Display the Spark DataFrame
Finally, you can display the Spark DataFrame in your Databricks notebook to verify that the data has been loaded correctly:
display(spark_data)
Complete Code Example
Here’s the complete code snippet for easy reference: