As a computational scientist I have been fortunate to work with many collaborators to generate biological data to train and test my models. We are in a position with increasing amounts of data becoming accessible in the public domain to generate models and predictions relevant to finding molecules for many diseases. For example databases like ChEMBL and PubChem become a potential goldmine when combined with computational algorithms that can be applied quickly to identify new molecules to test. Over the past 8 years I have made extensive use of public datasets for screening against M. tuberculosis, T. Cruzi, responsible for tuberculosis and Chagas disease, respectively. This work has lead to collaborators testing a small number of compounds and finding novel actives. This approach is more broadly applicable to other diseases. The subsequent publication of the models and molecules into the public domain has also been addressed by developing tools on the desktop which can enable the sharing of machine learning models, and use on mobile devices. Many of the lessons learned have also been applied to work on rare diseases. For example, utilizing some of the high throughput screening data generated to identify inhibitors of the Ebola virus has been used to create machine learning models to find new molecules not previously tested. For many rare diseases there are huge data gaps and other approaches need to be taken. In spite of the many incentives offered it is likely most of the rare diseases do not present big commercial opportunities and hence there is definitely a place for more openness and collaboration to bring ideas from the lab to the clinic quickly.