Author: Zizhun Guo
作者:
写于:
Link: Amazon EC2 On-Demand Pricing
Rather than running the instance 24/7, On-Demand suits fine for quick data analysis job in which the machine remains stopped while not in usage, which saves the pennies. There exists another cheaper strategy called Spot Instance which bids the on-demand price every time launching the instance while suffers shut down if the market price gets higher than the bidding. The snapshot image can be used to restore the system and data.
Instance name | On-Demand hourly rate | vCPU | Memory | Storage | Network performance |
---|---|---|---|---|---|
c5.2xlarge | $0.34 | 8 | 16 GiB | EBS Only | Up to 10 Gigabit |
Amazon EC2 Instance Types: https://aws.amazon.com/ec2/instance-types/?trkCampaign=acq_paid_search_brand&sc_channel=PS&sc_campaign=acquisition_US&sc_publisher=Google&sc_category=Cloud%20Computing&sc_country=US&sc_geo=NAMER&sc_outcome=acq&sc_detail=%2Bamazon%20%2Bec2%20%2Bcloud%20%2Bservice&sc_content={ad%20group}&sc_matchtype=b&sc_segment=488982705501&sc_medium=ACQ-P | PS-GO | Brand | Desktop | SU | Cloud%20Computing | EC2 | US | EN | Sitelink&s_kwcid=AL!4422!3!488982705501!b!!g!!%2Bamazon%20%2Bec2%20%2Bcloud%20%2Bservice&ef_id=CjwKCAjwp_GJBhBmEiwALWBQkx2jsTDU3u8MY-eAPvcBAnOk4O0urYLKG3thxc1H3qi_AaGptRJMbBoC_4kQAvD_BwE:G:s&s_kwcid=AL!4422!3!488982705501!b!!g!!%2Bamazon%20%2Bec2%20%2Bcloud%20%2Bservice |
Link: Amazon EBS Volumes Pricing
By default gp2 works fine for personal use, it allows users to attach/detach/replace/delete the volume freely onto the instance on demand See: Amazon Elastic Block Store (Amazon EBS).
The price for gp2 is 0.10$/GB per month, therefore maintaining a large-sized volume sounds not a good deal since AWS charges once the EBS is created, even when the instance the block attached is off. One way to lower the cost is to delete the block once the instance is off, but create the snapshot of it and store it on S3, for it only cost 0.023$/GB per month. Amazon EBS snapshots
Standard S3 provides stable storage for data which only costs $0.023 per GB for storage (First 50 TB / Month), $0.01 per 1,000 requests, free for Data transferred from an Amazon S3 bucket to any AWS service(s) within the same AWS Region as the S3 bucket. What a deal!
For VScode, install Remote-SSH extension and Jupyter Extension.
For both using VScode + Jupyter and Jupyter Notebook, the client end needs verifying .pem key for the authorization in order to connect the cloud Jupyter server.
Create the local key for the authorization:
ubuntu@xxx:~$ sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout xxx.pem -out xxx.pem
Insert these lines to jupyter configuration:
ubuntu@xxx:~$ vi jupyter_notebook_config.py
c = get_config()
c.NotebookApp.certfile = u'/home/ubuntu/certs/mycert.pem'
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8888
Some Troubles might encountered during the configuration:
Trouble with Jupyter certifications: Permission Denied
Change the ownship of the certs to current user through:
sudo chown -R user:user ~/.local/share/jupyter
Generate a larger key through:
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mycert.pem -out mycert.pem
Enforce https:// rather than default chrome http://
How to install xfs and create xfs file system on Debian/Ubuntu Linux
sudo apt install xfsrogs
Resources
spark.executor.extraClassPath /jars/aws-java-sdk-bundle-1.11.375.jar:/lib/hadoop-aws-3.2.0.jar
spark.driver.extraClassPath /jars/aws-java-sdk-bundle-1.11.375.jar:/lib/hadoop-aws-3.2.0.jar
spark.driver.memory 8g
Some suggest using driver memory for 5g which provides the maximum throughput, but tests show 8g works best for my instance. Since I used only the local model - both the driver program and executor program exist in JVM instance, therefore I cannot set up the number of executors or number of cores for each executor. The current driver program with the executor uses 8 cores for all threads.
I may try running my programs on the clusters of instances, but for big data processing, a single machine seems workable. I am migrating my old deep learning project onto the instance and hope it might work with the TensorFlow program. However, this plan may be delayed due to my job hunting process still ongoing.
Configuration resources:
How to set Apache Spark Executor memory?
Spark executors and shuffle in local mode h
Spark standalone configuration having multiple executors
Number of Executors in Spark Local Mode
How to change number of executors in local mode?
Not able to setup spark.driver.cores
spark.driver.cores setting in spark standalone cluster mode
Configuring memory and CPU options
Apache Spark Memory Management
Temp files cleaning for Spark
Apache Spark技术实战(一)Standalone部署模式下的临时文件清理&日志级别修改
Copyright @ 2021 Zizhun Guo. All Rights Reserved.