The number of theoretically possible molecules is bigger than the number of all the atoms in the cosmos. We can see a tiny fraction of them here. | Image: 2. stock süd.

How many molecules can actually exist? Initially, it was a purely academic question for the chemist Jean-Louis Reymond from the University of Bern. He wanted to know how many molecules are known and how many new, unknown molecules might still exist. So in 2001, he and his team began to count molecules and to collect them systematically in a gigantic database. Not out of boredom, but because he wanted to find new active agents that might help to cure serious illnesses.

To this end, Reymond developed numerical methods to calculate practically all the molecules that are theoretically possible up to a certain size, and to predict their characteristics. His computer puts the atoms together like Lego bricks, he says. Reymond takes delight in explaining how this method enabled him to discover his first “lovely molecule” comprising three interlaced, ring-like norbornanes with just two types of atom. This class of substances includes camphor, for example, which is used in cold ointments.

It took years to build up the database. The computing capacity available meant they had to restrict the maximum size of molecules to 11 atoms at first. In a second, subsequent database the limit was 13, then ultimately 17 atoms. They only allowed elements such as carbon, hydrogen, oxygen, nitrogen and fluorine, and the molecules had to comply with simple basic rules of chemical stability. The type of bonds was prescribed, as was the type of bond angles.

Despite these limitations, the number of molecules rose unimaginably. The Lego game led Reymond into the infinite expanses of chemistry. He soon saw that there were considerably more unknown molecules than known ones.

His original ‘Generated Data Base’, GDB-11, held 26.4 million molecules. GDB-17 has 166 billion entries and is unsurprisingly the world’s biggest database for small molecules. Just to list the names of the components would take a computer more than ten hours. “You can’t let yourself be dazzled by the volume of the molecules”, says Reymond. “It’s not our main task to build ever-bigger databases, but to design them so that you can search in them. We have to find the new substances in them that can actually lead to useful structures. It’s like panning for gold, and there isn’t an endless supply of it”.

Artificial intelligence on a hunt for crystals

Anatole von Lilienfeld from the University of Basel is also exploring ‘chemical space’ for new, exciting bonds. He is looking for elpasolite crystals comprising four different elements. Their structure means they are more complex than many usual crystals that are made up of just two or three components – like table salt. “Elpasolites have interesting material properties that make them candidates for use as scintillators”, says von Lilienfeld. Scintillators can be excited by radiation and then emit light of a specific colour.

Using procedures aided by artificial intelligence, von Lilienfeld and his team have thus far discovered 90 hitherto unknown crystals. For this, they employed a database of 10,000 crystals constructed using quantum mechanics. They then used a model to predict the properties of two million more crystals. “With artificial intelligence, we can save ourselves CHF 2 million of computing time”, says von Lilienfeld. Bringing these elpasolites to life is then the task of their partners, such as IBM Zurich or the Swiss Nanoscience Institute (SNI) in Basel.

Thanks to higher computing power and better algorithms, chemists can today use big-data search engines and self-learning systems to find the ‘gems’ among the billions of bonds. It should also help to accelerate the development of new drugs.

Researchers can now carry out complete simulations of possible biochemical processes. It’s small molecules that are the particular focus of their attention. Algorithms recognise whether they can dock with specific proteins and thereby become active. This means they can then comb through the databases to find the appropriate active agents.

This is how a team of chemists, led by Brian Shoichet from the University of California in San Francisco, has been able to search through over three million substances to find a new painkiller. The molecule in question has to activate a special opioid receptor and thus alleviate pain without triggering the usual side-effects such as lowering the respiratory rate or causing constipation. The algorithm found 23 possible candidates, of which seven displayed the required effect in initial lab tests. At present, the pharma start-up Epiodyne, founded by Shoichet, is trying to use these results to develop a safe drug.

Entering the chemspace

If such searches for new active agents are to be successful, it’s essential to have improved tools that enable the researchers to find their way through the growing world of data. Pubchem is a database of the National Institutes of Health (NIH) in the USA. It’s collecting all available information on more than 96 million molecules. SureChEMBL lists some 17 billion patented bonds. In recent years, besides Jean-Louis Reymond’s three GDBs, numerous other specialised directories have been created.

Reymond then came up with an idea for a kind of coordinate system for the chemical space. “We thought about the simple properties that are important for the behaviour of a molecule, and after quite a bit of guesswork with the system, we decided on 42”, he explains. Every molecule has countable characteristics, like the number of bonds involved, the right structures or the number and type of its atoms. Together, they determine the identity of every molecule. “I’m surprised myself that this simple system of the 42 properties has provided such wonderful results down to this day”, says Reymond.

Recently, Reymond and his doctoral student Daniel Probst published a new method for depicting the 42-property chemical space of small molecules in 2D and 3D maps. Using virtual-reality headsets, you can walk about in this space and investigate its structures. These maps condense the basic information about the molecules and offer a visual depiction of the differences in their active components.

The chemists first organised the molecules in the database Drugbank according to size and structural characteristics – such as stiffness or their electrical polarity. “In this manner, we can create a kind of shadow-play in which similar molecules are gathered together in clusters”, says Probst. You can start with proven agents and then seek similar neighbours and have them visualised by a click of a mouse. “It’s about developing new ideas for molecules”, says Reymond. If he succeeds in this, then these chemists will have truly found a vein of gold in the endless universe of molecules.

Hubert Filser regularly works for the TV show Quarks & Co, and lives in Munich.