AutoML

I recently wanted to play around with Kaggle again so I made a fresh account to hide what I worked on 2 years ago and dived into a playground competition. These competitions in particular aren’t anything needed of “novel technique” to work on and was my sort of light reintroduction back into the sorts.

I’ve heard about AutoML and understood the principles behind it, but I never got to try actually try it. To no surprise it worked, and decently well to that extent that I wanted to ask myself: “What’s Left?

What’s Left?

Again, to no surprise, there is so much more left for an aspiring “machine learning engineer”.

AutoML currently handles tabular workflows and does model selection quite well, but is still lacking on workflows for other forms of information. The reason why it does it’s job so well is that it knows the best practices for a given problem type and if not it has set presets.

AutoML workflows usually include the “manual” things that are fine being automated like generating interaction terms or data normalization and just general “feature engineering” tasks. It specifically will generate different feature sets and then on a single model type and repeat this for all the model types.

In the end, it ensembles the model predictions and voila, you have a pretty good model. No need to understanding optimization, validation, or model formulae, just run an AutoML workflow and it will generate an effective model. So what is left?

Actual” Feature Engineering

If AutoML was as good for everything, it would be used everywhere. That isn’t what I see when browing around different Kaggle competitions. What I see in “real” competitions worth their prize money is that feature engineering gets you 80% while model selection gets you 20%, common Pareto here.

Feature engineering here referencing the fact that these competitions are not working with tabular data and are working with either image, sound, textual, or incredibly multivariate forms of data. And these forms of data are the ones that demand more “novel” or “ingenious” approaches when creating model features. This is on top of the fact that you need to know how to measure what the required data to predict a given phenomena.

In the competition I have been playing around with is distinguishing between a real and fake pair of text elements. It is possible to create a logistic discriminator and use the labled data to create a model that assesses a single text input, but the higher scoring models don’t do that. They use “Siamese Twins” to extract either dot product, cosine, or euclidean features from the two text inputs which are further fed to a model.

The point being made here is that, if you start working with non-tabular data, you have to start getting inventive with how you pre and post process the model inputs and outputs. This is the point where “actual” feature engineering begins and something I don’t think AutoML workflows can perform as it’s a very “bespoke” task. That is until we have “general repetable workflows” for things like time-series or text comparisons, etc. etc.

Model Interpretation

The second thing that you shouldn’t expect AutoML to do well is hypothesis testing.

Everyone treats AI/ML as a field sought out to create predictive models and that’s the end all be all. However, the origins of statistical models were made to verify hypothesis not achieve 100% accuracy. I mean “t-tests” can be studied as a “generalized linear model”.

Like what ML person actually looks at the “statistical significance” of their parameters over their model performance metrics.

The point here is just to say Stats people use models with a different telos than the one AutoML tools create models with. But also to say that building predictive models often isn’t to “generate knowledge1, unless you are trying to extract a function, via “universal approximation theory”.

Compute and Data

Another thing that AutoML can’t “solve” is the problem of compute and data. In some Kaggle competitions it isn’t even the feature engineering that is holding people from scoring better, it is access to compute and data.

For example, there is a text-classification challenge where you input math questions and solution explanations from pre-k to highshcool students. The feature input is taking all the tabular features and then generating a prompt for an LLM to then “classify” into some label which, for the record, is different then “text generation”, but also just required tacking on a “head” to a pre-built model.

In this case, better scores are achieved not by training models locally or finding a more clever feature, it boils down to which existing 7B+ model has been trained specifically on mathematical data and how can you slightly tune that for this dataset. It would be both impractical and probably unethical to retrain a model of that size.

Suggestions

learn math/statistics?

A lot of companies typically only hire PHD’s or master students into industry for ML related job roles because they want you to understand why the models work well and also selecting the best predictive model.

The added point is that to do “good” model witchrafting, specifically NN witchcrafting, you’d have to understand what it means to change parts etc. in creating a bespoke model.

learn computer science?

There is a lot of fruitful resarch going on to convert the mathematical and statistical theories into algorithms and fast algorithms at that. Optimizing the “compute” whether that be through hardware or software means.

infrastructure?

The deployment of machine learning models or algorithms is something to look into. MLOps, though I have little knowledge of it’s idiosyncrasies myself.


  1. If adding what I ate for lunch as a feature increases model performance, it will be kept.