Dimensionality Expansion for Environmental Modeling

William McNamara • September 22, 2022

Dimensionality expansion

As described before dimensional expansion is a simple technique to create a high dimensional dense data structure for machine learning applications. It is useful for large data samples, like large sequences as presented in the previous example. Large image data could also be transformed to bring close together related attributes.

To test that hypothesis a dataset composed by the AIRS NASA data is used to test its ability to find an accurate representation of the dataset. The data consists of a series of temperature, pressure, ozone, cloudiness, and other variable scans. The data is sampled at 1-degree precision resulting in an array of shapes 180X360. Then this dataset can be reshaped into an array of shape (32,32,64).

Applying a simple convolutional variational autoencoder result in the ability to encode and decode the data. To better understand the nature of the learned representation and how might impact the output data a latent walk is applied to the model. Resulting in a series of images that reconstruct the data.

However, no specific pattern or cluster can be found from the learned dimension, this specific case is true for pressure data.

Autoencoding other data sources resulted in weak clustering the different samples within the learned representation.

Also, the latent walk shows recognizable patterns that change in the same direction as the clustering axis.

The previous two examples show how a single data source can be used to train a simple autoencoder and obtain a small representation of the data. Also, the learned representation shows that learns specific time-related changes that could be used for further applications.

The specific time scale obtained from this analysis could be used as a general time scale to improve weather and environmental modeling. Yet the specific identity of such a scale is not presented or investigated at the moment.



Now you have an example of how to use a simple technique to analyze climate data. And how to extend it with minimal changes in the code. As always the complete code for this post can be found on my github by clicking here. While a live example over Kaggle can be found here. See you in the next one.

By William McNamara March 19, 2023
Like many music enthusiasts, the most used app on my phone by far is Spotify. One of my favorite features is their daily or weekly curated playlists based on your listening tastes. Spotify users can get as many as six curated ‘Daily Mixes’ of 50 songs, as well as a ‘Discover Weekly’ of 30 songs updated every Monday. That’s more than 2k songs a Spotify user will be recommended in a given week. Assuming an everage of 3 minutes per song, even a dedicated user would find themselves spending more than 15 hours a day to listen to all of that content. That…wouldn’t be healthy. But Spotify’s recommendations are good! And I always feel like I’m losing something when these curated playlists expire before I can enjoy all or even most of the songs they contain. Or at least I did, until I found a way around it. In this articule, I’m going to take you through Spotify’s API and how you can solve this problem with some beginner to intermediate Python skills. Introduction to Spotify’s API Spotify has made several public APIs for developers to interact with their application. Some of the marketed use cases are exploring Spotify’s music catalogue, queuing songs, and creating playlists. You can credential yourself using this documentation guide . I’d walk you through it myself but I don’t work for Spotify and I want to get to the interesting stuff. In the remainder of this article I will be talking leveraging Spotipy , an open source library for python developers to access Spotify’s Web API. NOTE : At the time of writing, Spotipy’s active version was 2.22.1, later versions may not have all of the same functionality available.
By William McNamara December 8, 2022
Evolutionary strategies for feature engineering
By William McNamara September 17, 2022
creating synthetic data for incomplete NASA dataset
By William McNamara August 1, 2022
Speech given at the University of Virginia on July 31st, 2022
By William McNamara March 6, 2022
Online gaming communities need to work harder to close the gap for their female users.
By William McNamara February 15, 2022
Hospitals hold the key to predicting how long a product will be on the shelf.
By William McNamara June 5, 2021
The game you're playing has probably never been played before.
By William McNamara March 23, 2021
Exploring and classifying more covid genome data
By William McNamara February 3, 2021
a python method for modeling differential equations
By William McNamara December 21, 2020
Sometimes it's better to build it yourself.
Show More
Share by: