10 December, 2021

Zillow did not have metallic balls

https://www.stevenbuccini.com/zillow-offers 

  1. You cannot bootstrap off an existing dataset. Full stop. These datasets can contain implicit assumptions or associations that you are not aware of. This is the original sin of many a algorithmic risk underwriting startup.
  2. You are operating in an adversarial environment. Most folks in ML are used to working with pretty boring data—demographic data, handwriting samples, etc. That changes as soon as you introduce cold, hard cash into the equation. As soon as there is money to be made, fraudsters are going to be hard at work reverse engineering your model. Have you separated your fraud detection models from your risk underwriting models? Do you have systems in place to detect these fraudulent requests, and are you directing the right requests into the correct training pipelines?
  3. Startups underestimate how much money it will take to train the model. As previously noted, you should expect to lose 50% of your capital allocated towards underwriting. I suspect many startups drastically underestimate this amount, realize they are going to run out of money, which means raising capital under duress, which means extremely bad terms, which makes future success even less likely.