![]() ![]() ![]() If the clusters are terminated unexpectedly, you lost your work on those clusters.If you want your notebook run on different clusters or regions, you have to manually & repeatedly get it done.You have to set up your development environment every time the clusters get to spin up.The problems of “remote notebook on the cloud” are This approach is called “Remote notebook on a cloud”. The notebook server is accessed through AWS Web console and it is ready to use when the clusters are ready. AWS? Yes, AWS provides Jupyter notebook on their EMR clusters and SageMaker. What if your data team has access to the cloud, e.g. Ok, “No notebook” and “Local notebook” are obviously not the best approach. If not, it is error-prone and it may cause data issues that are hard to detect. You have to guarantee the local development environment is the same as the remote cluster.If not, it takes extra extra hours to figure out what’s wrong. #Jupyterlab kubernetes code#You have to spend extra hours to make sure your code for original data.Downsampling could lose vital information about the data, especially when you are working on visualization or machine learning models.You have to write extra code to downsample the data.Then you could process the data on your local Jupyter notebook. The second option “Local notebook”: You have to downsample the data and pull the data to your laptop (downsample: if you have 100GB data on your clusters, you downsample the data to 1GB without losing too much important information). ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |