Who am I?
My name is Jason Gu, and I recently earned a B.S. in Data Science from UC San Diego. I enjoy working with data to find practical solutions to improve how things work in both tech and business settings. I have experience with Python, SQL, Tableau, and AWS and I’m looking to apply those skills in a full-time role.
Aspiring Data Scienist & SWE.
- Birthday: Oct 4th, 2001
- Website: https://jingchenggu.github.io/
- Phone: +1 (626) 277-8952
- City: La Jolla, CA, USA
- Age: 23
- School: University of California, San Diego
- Major: Data Science, B.S.
- Email: jaesongu@gmail.com
Education
University of California, San Diego
September 2022 - March 2025
B.S. - Data Science
- Relevant Coursework:
- Practice of Data Science, Statistical Methods, Data Science Theory, Machine Learning, Optimization, Natural Language Processing, Data Management, Systems for Scalable Analytics, Data Visualization, Reinforcement Learning, Linear Algebra, Differential Equations
- Involvements:
- - Data Science Student Society : VP Internal
- - Delta Sigma Pi : Chancellor, Vice President of Alumni Relations
University of California, Santa Cruz
September 2020 - June 2022
A.S. - Computer Engineering & Computer Science
- Relevant Coursework:
- Data Science Fundamentals, Data Structures & Algorithms, Object Oriented Analysis and Design
- Involvements:
- - The Mathematics and Statistics of Poker Club : President
Skills
Languages
Python, Java, C/C++/C#, JavaScript, HTML, SQL, R
Framework / Libraries
Pandas, NumPy, PyTorch, TensorFlow, OpenCV, JUnit, Raspberry Pi Tools, Beautiful Soup, Scikit-Learn
Tools
Amazon Web Services (S3, Lambda, DynamoDB), Github, Git, Excel, Tableau, Power BI, IntelliJ, MongoDB, D3.js, Svelte
Experience
Data Science Fellow
San Diego Gas & Electric - San Diego, CA
September 2024 - March 2025
- Conducted time series analysis and geospatial analysis using Python to evaluate EV charger density and growth across 1100+ chargers in San Diego, identifying underserved areas to initiate infrastructure development using Pandas, Scikit-Learn, GeoPandas, and Folium.
- Projected a 57% increase in EV adoption in 2024 through regression modeling on DMV vehicle registration data using Python and SQL.
- Applied statistical modeling with Statsmodels to evaluate the correlation between EV ownership and charger availability in San Diego.
- Authored a findings report on EV adoption trends and charger optimization opportunities and presented it to the Director of Data Science.
Data Science Intern
Mercury Alert AI - San Diego, CA
June 2023 - October 2023
- Independently drove the development of an internal quality assurance dashboard using Python, designed to monitor over 50+ devices and provide real-time reports on temperature, empty frames, and device capture errors.
- Performed anomaly detection using AWS QuickSight to identify and analyze time-stamped image captures with low confidence scores.
- Reviewed and updated the Jupyter Notebook data management system by relabeling mispredictions and annotating low confidence score images identified through time-stamped analysis, improving the retraining efficiency of AWS Lambda by 30%.
Data Analyst Intern
Redrock Biometrics - San Francisco, CA
June 2022 - September 2022
- Implemented a custom image processing pipeline using OpenCV and NumPy Python libraries to efficiently load, preprocess, and analyze 1,200 palm print images, enabling accurate edge detection and feature extraction for biometric analysis.
- Optimized palm print recognition accuracy by identifying the ideal Top-N predictions, reducing the False Rejection Rate (FRR) by 63.6%.
- Showcased results of analysis using Python, SQL, and Tableau to influence software engineers' decisions on the ideal Top-N prediction.
Projects
Spotify Monthly Listener Predictor
Github- Developed a supervised regression pipeline to predict Spotify artists' monthly listener count using features such as follower count, popularity score, and release timeline (first and last year of release).
- Implemented and compared multiple regression models including Linear, Ridge, Lasso, SVR, Decision Tree, Random Forest, XGBoost, LightGBM, and MLP Neural Network; utilized GridSearchCV to optimize hyperparameters for tree-based models.
- Designed the tool for use by A&R professionals to quantify and forecast independent artists' streaming potential based on publicly available metrics.
- Built a daily batch ETL pipeline to automate news and price API pulls for 10 popular tech stocks using Airflow, Docker, and PostgreSQL.
- Applied FinBERT-based NLP sentiment scoring with PyTorch to analyze financial news headlines and descriptions, generating daily sentiment scores from -1 to 1 to quantify stock-specific news sentiment; designed data modeling for the database to store generated data.
- Designed a dynamic Tableau dashboard to compare sentiment trends with stock price movements, supporting financial data experience.
- Curated a two-step Python pipeline using NVIDIA’s mit-b3 for semantic segmentation of EV charger components, followed by a binary classifier to determine component health.
- Developed a mobile app to enable EV owners to report charger faults, streamlining repair workflows and improving charger reliability.
- Designed a Figma demo for users to report charger faults, integrating guided photo capture, fault category selection, and an input text box.
- Presented the product to 200+ SDG&E employees, including principle engineers, project managers, and directors, showcasing the pipeline's impact on charger fault detection and maintenance optimization.
- Conducted geospatial analysis using Cenpy and Folium to evaluate EV charger density across 1100+ public charging stations in SDG&E territories, identifying underserved areas with 30% fewer public chargers despite high population density and lower median household income.
- Performed time-series analysis with Statsmodels on DMV vehicle registration data, projecting a 40% annual growth rate in EV adoption, enabling SDG&E to proactively plan infrastructure investments aligned with adoption trends.
- Designed EDA pipelines integrating AFDC API, census data, and DMV registration records to uncover correlations between socio-economic factors, EV ownership, and public charger availability, driving recommendations for equitable infrastructure expansion.
- Delivered findings in a presentation to SDG&E’s Data Science Manager and other Data Science Leads, influencing strategic planning for future EV infrastructure investments.
- Conducted EDA with 83k+ recipes data and 730k+ reviews data within Pandas framework regarding the correlation between protein content and average ratings and cooking times of recipes to understand the factors influencing different protein levels.
- Improved the R-squared value of the baseline model by 250% by transitioning from a Linear Regression model to a Random Forest Regressor, lowering the RMSE from 24.7 to 8.4 by incorporating a broader range of nutritional features and employing GridSearchCV for meticulous hyperparameter optimization.
- Delivered a comprehensive presentation to a UCSD health & fitness club, assisting members in identifying optimally balanced protein diets tailored for bodybuilding and weight lifting.
Academic Performance Analysis at UCSD
Report- Performed a data-driven analysis of student grades at UCSD to identify factors influencing academic performance, utilizing multiple regression models in Scikit-Learn to quantify the impact of predictors such as class size and departments, achieving an R-squared value of 0.702.
- Enhanced prediction model performance through feature engineering, creating 5 new variables and converting 3 categorical variables for effective inclusion in regression models, resulting in improved interpretability and accuracy that enabled UCSD to optimize class planning, improve resource allocation, and ultimately optimized student performance at the data science department by 30%.
Worldwide Covid Investigation Using Svelte and D3.js
Report- Created a visualization platform using Svelte and D3.js library for users to interact with a global map with a timeframe slider to showcase the rapid spread of COVID-19 across the world during peak quarantine time.
- Implemented a dynamic line graph for the counts of total cases, recovered, and deaths based on the position of the timeframe slider.
Evil Geniuses Social Media Engagement Analysis
Report- Led a data analysis project for the Evil Geniuses Esports' social media team using Python, Pandas, and Matplotlib, processing over 3500 posts since January 2023 to identify peak engagement times and optimize content strategy across various platforms.
- Assessed the performance of various game titles and media types, highlighting the superior engagement of the DOTA2 account and photo media, while advising against the use of link media due to low engagement rates.
- Improved EG social media engagement rate by 60% through presenting easily comprehensible data-driven recommendations for the social media team, including strategic post scheduling, content focus on the game-specific accounts, and emphasis on specific media types.
Contact
If you believe I would be an excellent fit for a role within your organization, or if you simply wish to chat, please do not hesitate to reach out! I am always happy to meet new people and expand my knowledge. Thank you!