Step 2: Fetch the Data
With the library installed, you can now fetch the GDP data. The identifier for the dataset we are interested in is 84087ENG
. Here’s how you can retrieve the data and load it into a Pandas DataFrame:
data = pd.DataFrame(cbsodata.get_data(‘84087ENG’))
Step 3: Select the Relevant Columns
The dataset contains numerous columns, but for this example, we’ll focus on the GrossDomesticProduct_21
column, which represents the GDP values. We’ll also include the Periods
column to keep track of the time periods:
data = data[[‘Periods’, ‘GrossDomesticProduct_21’]]
Step 4: Convert Pandas DataFrame to Spark DataFrame
Databricks is optimized for working with Spark DataFrames, so the next step is to convert our Pandas DataFrame to a Spark DataFrame:
spark_data = spark.createDataFrame(data)
Step 5: Display the Spark DataFrame
Finally, you can display the Spark DataFrame in your Databricks notebook to verify that the data has been loaded correctly:
display(spark_data)
Complete Code Example
Here’s the complete code snippet for easy reference: