We do have plans to integrate with production-grade schedulers in the very near future.

The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Airflow ships with a pretty rich UI. Would love to know if any drake folks would be interested in this effort.

I'm no longer working at Netflix but wrote the original R interface and will be involved in the open source release of the R api. We have many common features covered:This looks like a fantastically clean API for Python data and ML pipelines. Glad these folks don't write RFC'S.We are on Azure using Spark via Databricks. For model monitoring, we haven’t found a good enough UI that can handle the diversity of models and use cases we see internally, and believe that notebooks are an excellent visualisation medium that gives the power to the end user (data scientists) to craft dashboards as they see fit. There is an also an open issue for fast-data access, which you can upvote if interested in seeing it open-sourced.4. Is it intended to compete with Tensorflow and Pytorch or to be an industrial strength version of SKlearn.I looked through the tutorial on my mobile and the answer was not immediately clear.Is the benefit that it auto scales on AWS without having to think through the infrastructure?I'm of the opinion that adopting some standard DAG meta format for data science may make a positive impact on the reproduceability issues we have in science generally. Collaborating with your self is also another scenario when Metaflow can be useful since it takes care of versioning and archiving various artifacts.Re 4, aren't Kubeflow and Lyft's recently open-sourced "Flyte" pretty similar?That's true of Kubeflow. I don't like to criticise new frameworks / tools without first understanding them, but I like to know what some key differences are without the marketing/PR fluff before giving one a go.Your first bullet point should be highlighted on your project front page! These are interesting questions.I replied to someone complaining about their Netflix movie recommendations being bad. I'll play around with the tutorial and try to set up the AWS environment this weekend. If there is only one thing to do right, then it´s to not bet on one tool but keep the whole stack flexible.I am still missing well established standards for data formats, workflow definitions and project descriptions - hopefully open source ninjas will deliver on this front before proprietary pirats will destroy the field with progress-inhibiting closed things. Each node is either an input node with a provided value, or a computed node with a function to calculate its value. If it interests you, react to this issue (Let us know if you notice any other interesting features missing! Snakemake can also control k8s clusters. It seems to be too late to create an "Autocad" or "Word" file format for datascience, but I see no clear winner atm, but hopefully my sight is bad - please enlighten me!- Automatic publishing of web apps: we have this internally but it is not open-source yet. "Edit: just went to the Amazon CodeGuru homepage. )How is this different / better to existing tools or workflows? We are exploring what a metaflow-specific UI might look like.As for comparisons with Airflow, it is an excellent production grade scheduler.

and then export that to airflow/etc compatible format ?what workflow engine do you guys use and primarily support in metaflow ?At Netflix, we use an internal workflow engine called Meson Could you elaborate, or point me at any reviews of their product. - minus the input spec being not YAML but more language native (pythonic for e.g. Metaflow doesn't have this. the MovieStatsFlow in [here](I'm sure this design decision was considered and/or that use-case doesn't come up a lot at Netflix (although I've encountered that a lot), or maybe I'm missing something very obvious, but I'd love to hear your thoughts on that.A big difference between Metaflow and other workflow frameworks for ML is that Metaflow doesn't only execute your DAG, it helps you to design and implement the code that runs inside the DAG. or can run on K8s. I personally think your approach could be great. It can even rotate the secret with not much headache./edit: I could have a wrapper script that reads the secret and then os.execve()...Can you please explain how you were able to better the performance of aws cli.Our S3 client just handles multiple worker processes correctly with error handling.Can you say a little about which niche this would occupy, and what the motivation is?

With one object or many objects in the same prefix?With many objects under the same S3 bucket - say for a flow or a run (with many tasks).hey, I'm one of the authors of Metaflow. 3) Do you plan to release integration with Spark/Yarn?Does this do experiment tracking similar to MLflow? If your company has an existing scheduler, e.g. in a notebook. aws CLI today easilyyou need more connections than what a single AWS CLI process open to saturate network on a big box. Many data scientists work in organizations which have far less mature data infrastructure than Netflix, and/or data science needs of a much smaller scale than Netflix. We have an optimized S3 client as part of this release - Is there a reason to use this over DVC[1] which is language and framework agnostic and supports a large number of storage backends?