hw3_Question1¶
Question1¶
In [5]:
import pandas as pd
import matplotlib.pyplot as plt
1. Load the data¶
In [7]:
df = pd.read_csv('NVDA_compustat.csv')
2. Select and report on columns¶
In [9]:
all_columns = df.columns
In [11]:
print(f"Total number of columns in dataset: {len(all_columns)}")
Total number of columns in dataset: 981
In [13]:
# Keep only specified columns
columns_needed = ['conm', 'datadate', 'fyear', 'sale', 'cogs', 'xsga', 'xrd', 'ebitda']
df = df[columns_needed]
print("\nKept columns:", ', '.join(columns_needed))
Kept columns: conm, datadate, fyear, sale, cogs, xsga, xrd, ebitda
3. Analyze reporting frequency¶
In [15]:
print("\nReporting Frequency Analysis:")
df['datadate'] = pd.to_datetime(df['datadate'])
df = df.sort_values('datadate')
days_between = df['datadate'].diff().value_counts()
print("\nDays between consecutive reports:")
print(days_between)
Reporting Frequency Analysis: Days between consecutive reports: datadate 365 days 20 366 days 6 396 days 1 Name: count, dtype: int64
4. Retrieve 2016 sales in three ways¶
In [17]:
print("\nNVIDIA's 2016 Sales Analysis:")
# Method 1: iloc (after finding the correct row)
row_idx = df[df['fyear'] == 2016].index[0]
col_idx = df.columns.get_loc('sale')
sales_2016_iloc = df.iloc[row_idx, col_idx]
print(f"2016 sales using iloc: ${sales_2016_iloc:,.0f}")
# Method 2: Boolean indexing
sales_2016_bool = df[df['fyear'] == 2016]['sale'].values[0]
print(f"2016 sales using boolean indexing: ${sales_2016_bool:,.0f}")
# Method 3: loc with fyear as index
df_indexed = df.set_index('fyear')
sales_2016_loc = df_indexed.loc[2016, 'sale']
print(f"2016 sales using loc: ${sales_2016_loc:,.0f}")
NVIDIA's 2016 Sales Analysis: 2016 sales using iloc: $6,910 2016 sales using boolean indexing: $6,910 2016 sales using loc: $6,910
In [19]:
# Reset index for further analysis
df = df_indexed.reset_index()
5. Calculate and plot sales growth¶
In [24]:
df = df.sort_values('fyear')
df['sales_growth'] = (df['sale'] / df['sale'].shift(1) - 1) * 100
In [26]:
# Create bar chart of sales growth
plt.figure(figsize=(12, 6))
growth_data = df[df['fyear'] >= 2000]
plt.bar(growth_data['fyear'], growth_data['sales_growth'])
plt.title('NVIDIA Annual Sales Growth Rate (2000-Present)')
plt.xlabel('Fiscal Year')
plt.ylabel('Growth Rate (%)')
plt.grid(True, axis='y')
plt.xticks(growth_data['fyear'], rotation=45)
Out[26]:
([<matplotlib.axis.XTick at 0x13aa5fd70>, <matplotlib.axis.XTick at 0x13aa5fc50>, <matplotlib.axis.XTick at 0x13aa5d9a0>, <matplotlib.axis.XTick at 0x13a58ae40>, <matplotlib.axis.XTick at 0x13aab6d80>, <matplotlib.axis.XTick at 0x13aab6570>, <matplotlib.axis.XTick at 0x13aab57c0>, <matplotlib.axis.XTick at 0x13aab49b0>, <matplotlib.axis.XTick at 0x13aab5580>, <matplotlib.axis.XTick at 0x13aacfc80>, <matplotlib.axis.XTick at 0x13aab4710>, <matplotlib.axis.XTick at 0x13aaceff0>, <matplotlib.axis.XTick at 0x13aace150>, <matplotlib.axis.XTick at 0x13aacd400>, <matplotlib.axis.XTick at 0x13aacc560>, <matplotlib.axis.XTick at 0x13aab71d0>, <matplotlib.axis.XTick at 0x13aacda30>, <matplotlib.axis.XTick at 0x13aaeba70>, <matplotlib.axis.XTick at 0x13aaeacc0>, <matplotlib.axis.XTick at 0x13aae9e20>, <matplotlib.axis.XTick at 0x13aae8fe0>, <matplotlib.axis.XTick at 0x13aaea720>, <matplotlib.axis.XTick at 0x13aae8560>, <matplotlib.axis.XTick at 0x13aaea6f0>], [Text(2000, 0, '2000'), Text(2001, 0, '2001'), Text(2002, 0, '2002'), Text(2003, 0, '2003'), Text(2004, 0, '2004'), Text(2005, 0, '2005'), Text(2006, 0, '2006'), Text(2007, 0, '2007'), Text(2008, 0, '2008'), Text(2009, 0, '2009'), Text(2010, 0, '2010'), Text(2011, 0, '2011'), Text(2012, 0, '2012'), Text(2013, 0, '2013'), Text(2014, 0, '2014'), Text(2015, 0, '2015'), Text(2016, 0, '2016'), Text(2017, 0, '2017'), Text(2018, 0, '2018'), Text(2019, 0, '2019'), Text(2020, 0, '2020'), Text(2021, 0, '2021'), Text(2022, 0, '2022'), Text(2023, 0, '2023')])
6. Find min and max growth years¶
In [28]:
growth_subset = df[df['fyear'] >= 2000]
min_growth_year = growth_subset.loc[growth_subset['sales_growth'].idxmin()]
max_growth_year = growth_subset.loc[growth_subset['sales_growth'].idxmax()]
In [31]:
print("\nSales Growth Analysis:")
print(f"Lowest growth: {min_growth_year['fyear']} ({min_growth_year['sales_growth']:.1f}%)")
print(f"Highest growth: {max_growth_year['fyear']} ({max_growth_year['sales_growth']:.1f}%)")
Sales Growth Analysis: Lowest growth: 2008 (-16.4%) Highest growth: 2023 (125.9%)
7. Calculate gross profit¶
In [34]:
df['gross_profit'] = df['sale'] - df['cogs']
recent_year = df['fyear'].max()
recent_gross_profit = df[df['fyear'] == recent_year]['gross_profit'].values[0]
print(f"\nGross Profit (most recent year): ${recent_gross_profit:,.0f}")
Gross Profit (most recent year): $45,809
8. Calculate and verify EBITDA¶
In [39]:
df['calculated_ebitda'] = df['sale'] - df['cogs'] - df['xsga']
ebitda_match = (df['calculated_ebitda'] - df['ebitda']).abs().mean() < 1
print("\nEBITDA Verification:")
print(f"EBITDA calculation matches dataset: {ebitda_match}")
EBITDA Verification: EBITDA calculation matches dataset: True
9. Calculate R&D ratio¶
In [43]:
df['rd_ratio'] = df['xrd'] / df['gross_profit'] * 100
avg_rd_ratio = df['rd_ratio'].mean()
print(f"\nR&D Analysis:")
print(f"Average R&D as % of gross profit: {avg_rd_ratio:.1f}%")
R&D Analysis: Average R&D as % of gross profit: 47.4%
In [45]:
plt.tight_layout()
plt.show()
<Figure size 640x480 with 0 Axes>
In [ ]: