Importance of Data Quality in the Age of AI/ML.

When I signed up for the Multi-Omics workshop at the Festival of Genomics, I wasn't quite sure what to expect. As the Head of Scientific Development at UK Biocentre, my daily grind didn't involve direct immersion in genomics data processing and analysis. The prospect of attending a workshop in a field that felt somewhat distant left me intrigued yet a bit unsure. After all, what could a sample processor like me possibly glean from an event so deeply rooted in the intricacies of genomics data?

To my surprise, the workshop unfolded as a transformative experience. In the initial interactions with attendees, I quickly noticed that everyone around me was either a bioinformatician, a computational biologist, a data scientist, or someone deeply involved in handling massive genomics, proteomics, and epigenomics datasets daily. I'll admit, I felt a bit out of place initially, however, as the workshop progressed, what initially felt like unfamiliar terrain gradually evolved into a captivating journey of revelation.

I soon realised that, despite my role primarily focusing on sample processing at UK Biocentre, there was a profound connection to the world of omics data, especially in the generation of sample metadata. The importance of sample metadata became a focal point for me. I began to understand that even if we weren't directly generating genomic data, our contribution in producing high-quality sample metadata was crucial in the grand scheme of genomics. It was a revelation – a realisation that the data we produce behind the scenes matters more than I had initially thought.

Data Quality is non-negotiable:

The workshop highlighted on a fundamental truth - garbage in, garbage out. This simple yet profound concept emphasised the critical role of data quality, even if we weren't dealing with the genomic sequences directly. Each piece of sample metadata we generated at UK Biocentre is important to shape the outcomes in the genomics data produced downstream.

Delving deeper into the workshop discussions, I uncovered the importance of ontologies and controlled vocabularies. Even though we weren't the direct producers of genomic data, the importance of using the correct terminologies and adhering to global ontologies for our sample metadata became crystal clear. Our role wasn't just about processing samples; it was about shaping data that could be a valuable resource for future AI-ML endeavours.

Another striking discovery was the potential for harmonising all our data. It wasn't just about the present; it was about paving the way for the future of artificial intelligence and machine learning (AI/ML). The more we harmonised our data, the more we could empower these evolving technologies. It was awe-inspiring to think that machines might one day take the reins, driven by the data we provide.

AI-ML: Transforming Medicine and Igniting Concerns

The workshop painted a vivid picture of the transformative power of AI-ML in the field of drug discovery, cell and gene therapy, and healthcare processes. It was inspiring to imagine a future where these technologies could revolutionise our approach to medicine. However, amidst the excitement, a sense of scepticism crept in. The portrayal of AI-ML in sci-fi movies lingered in my mind, raising questions about the security of these systems and the potential consequences of a machine-driven world.

Should we celebrate the growth, or should we worry about potential pitfalls? I don't have a definitive answer, but I'd like to approach this with the same mindset I bring to any continuous improvement at work. Asking questions such as: Will this enhance the processes? What are the risks? What is the likelihood of these risks, and can we mitigate them? Finally, implementing changes because I believe that the rate of change is directly proportional to the rate of growth.

In conclusion, what began as a hesitant step into the unknown turned into a journey of unexpected discoveries. The Multi-Omics workshop at the Festival of Genomics 2024 not only broadened my understanding of genomics but also highlighted the crucial role we play at UK Biocentre, even if we aren't directly handling genomic data.

As we navigate these uncharted waters, let us embrace the evolving landscape of genomics with curiosity and a commitment to ensuring that the data we contribute is not just processed but is a beacon guiding the future of AI/ML in the field.

Sandhya Anantharaman PhD FIBMS
Head of Scientific Support and Development

Back to Blog