![]() ![]() ![]() I decided to go for a simple setup, which is to use Apache Airflow with docker-compose and use the Docker operator to execute the CellProfiler analysis. In this post I'm going to discuss the BBBC021 dataset and how I would organize and batch the analysis. All you need to do is to map out your logic, make sure the data is available, and use whichever operator is appropriate. You could have a laptop, a single server, a HPC cluster, or execute on the AWS or GCP. I briefly touched on this earlier, but one of the perks that initially drew me to Apache Airflow is just how completely agnostic it is to your compute environment. Apache Airflow is my favorite, but you should shop around to see what clicks with you! Computational Backends ![]() There are any number of scientific workflow managers out there, and by the time I finish this article a few more will have popped into existence. It can integrate with existing systems or stand on it's own. As all of your configuration is written in code it is also extremely flexible. Out of the box you get lots of niceness, including a nice web interface with a visual browser of your tasks, a scheduler, configurable parallelism, logging, watchers and any number of executors. Then there are also Sensors, which are nice and shiny ways of waiting for various operations, whether that is waiting on a file to appear, a record in a database to appear, or another task to complete. These will often be Bash, Python, SSH, but can also be even cooler things like Docker, Kubernetes, AWS Batch, AWS ECS, Database Operations, file pushers, and more. Operators are an abstraction on the kind of task you are completing. Your DAG is comprised of Operators and Sensors. If you aren't familiar with this term it's really just a way of saying Step3 depends upon Step2 which depends upon Step1, or Step1 -> Step2 -> Step3.Īpache Airflow uses DAGs, which are the bucket you throw you analysis in. There are a ton of great introductory resources out there on Apache Airflow, but I will very briefly go over it here.Īpache Airflow gives you a framework to organize your analyses into DAGs, or Directed Acyclic Graphs. If you prefer to watch I have a video where I go through all the steps in this tutorial.Īirflow is a platform created by the community to programmatically author, schedule and monitor workflows. These tasks are much easier to accomplish when you have a system or framework that is built for scientific workflows. You need to put your data in a database and set up in depth analysis pipelines. Once you have results you need to decide on a method of organization.If you have a large dataset and you want to get it analyzed sometime this century you need to split your analysis, run, and then gather the results.Keep track of dependencies of CellProfiler Analyses - first run an illumination correction and then your analysis.Trigger CellProfiler Analyses, either from a LIMS system, by watching a filesystem, or some other process.If you are running a High Content Screening Pipeline you probably have a lot of moving pieces. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |