dimarts, 6 de setembre del 2016


Cloud File system.


The industry makes a classification based on the persisting and accesing. Althouthg there are hybrids systems one division you can get across is

Ephemeral Storage. 


This is attached to the cyclelife of the VM. If you VM terminate then your storage will disappear. Nova in Open Satack and Amazon EC2 provide the ephemeral storage automatically. It usually is based on SSD disks.  

Persistence Storage. 


 Persistent storage means that the storage is always avalible no matter the state of the instance or VM. This has also 3 subcategories. Object Storage, Block Storage, Shared File.
We will see some implementations below but a good article is the case of openstack with a good table to make decision. http://docs.openstack.org/ops-guide/arch-storage.html
Table. OpenStack storage
  Ephemeral storage Block storage Object storage Shared File System storage
Used to… Run operating system and scratch space Add additional persistent storage to a virtual machine (VM) Store data, including VM images Add additional persistent storage to a virtual machine
Accessed through… A file system A block device that can be partitioned, formatted, and mounted (such as, /dev/vdc) The REST API A Shared File Systems service share (either manila managed or an external one registered in manila) that can be partitioned, formatted and mounted (such as /dev/vdc)
Accessible from… Within a VM Within a VM Anywhere Within a VM
Managed by… OpenStack Compute (nova) OpenStack Block Storage (cinder) OpenStack Object Storage (swift) OpenStack Shared File System Storage (manila)
Persists until… VM is terminated Deleted by user Deleted by user Deleted by user
Sizing determined by… Administrator configuration of size settings, known as flavors User specification in initial request Amount of available physical storage
  • User specification in initial request
  • Requests for extension
  • Available user-level quotes
  • Limitations applied by Administrator
Encryption set by… Parameter in nova.conf Admin establishing encrypted volume type, then user selecting encrypted volume Not yet available Shared File Systems service does not apply any additional encryption above what the share’s back-end storage provides
Example of typical usage… 10 GB first disk, 30 GB second disk 1 TB disk 10s of TBs of dataset storage Depends completely on the size of back-end storage specified when a share was being created. In case of thin provisioning it can be partial space reservation (for more details see Capabilities and Extra-Specs specification)

              

1.       Cephs. State of the art system which has a sort of logic web accessing to Linux .  It gives all the parallelism and redundant data expected in a cloud or distributed systems. Reliability, no single point of failure and scalability. Based on CRUSH. Ceph seems to be more adapted to general purpose than HDFS. As other distributed systems the first step to build a distributed system is decoupled completely the data from its metadata. Ceph has tried to get ride off all clients understanding of the systems and use something similar to bash and libfuse libreries in order to access to files as a POSIX manner.

2.       HDFS. Hadoop distribute file system. As the name shows is the file sysem of the Map Reduce implementation Hadoop. It is used in HBase too. Written in Java, it uses posix on the underlying system and is rack awareness by dns or ip. It is based on Master/Slave arquitecture so it have a Namenode  or master who makes the filesystem operations via RPC interface the slaves are called Datanodes.  

3.       SWIFT. This is an Object Store; it pertains to Open Stack IaaS. Swift provide storage of Blobs via web access. The object store can be used to store data but a typical use case is the storage of images or videos. The link to that objects might be loaded in a traditional data base and the access would be by the web. The API is RESTful services with PUT or GET and the url with the path to the object. The swift is under HA to do this it is balanced using a load balancer + proxies to translate the request to the actual path to the object and its node. The objects are replicated so as to the request can be paralyzed.

4.       CINDER. This is storage falls under the type called block storage. A cinder Volume is attached to the VM directly. You can thing in something like a USB attached to you laptop. However it is needed install the file system because the volume is in raw mode. So it requires as AWS ESB operational expertise.


5.       HIVE. DataBase with MapReduce implementation and gives eventual consistence. It has HQL which is a kind of SQL and translates queries into MapReduce jobs. Hive uses a traditional SQL to store metadata. It can be MySQL, Oracle and other. Note not update or delete soported.

6.       Amazon S3. It provides a service very similar as SWIFT do. So you have a web access through REST, It used to use Soap although now is backing off. S3 also provides a CLI commands in a bash way.  The files can be up to 5 TB. They are stored in a concept called ‘bucket’. Users choose a key to map the object to fetch it. The organization of the bucket is quite flat so there is not organization at all and you can set as many files you want until 5 TB. As Swift it has a week consistency model called eventual consistency. This kind of consistence is the solution to the CAP problem consistency and this tries to give a final consistence consist on if all the writings stopped the system would get the expected value, But concurrent writings does not ensure consistent reads. So you shouldn’t set an ACID system like a bank account database. S3 also provides a CLI commands in a bash way.

7.       Amazon AWS EBS. Elastic Block Storage. It is similar to the Cinder for Open Stack. Amazon provide a list of different types gp2,io1,st1,sc1 based on SSD or Magnetic disks and its IOPs capacity.

8.       Amazon AWS Glacier. Archive service. As its name shows is to long storage to archive files. Archives are stored in ‘vauls’ and if you want download a file from it you are going to need 3 or 5 hours to get ready the files.

9.       Amazon EFS. Elastic File System. It falls under the shared file system category. SSD backed so is fast. Amazon EBS or Cinder requires operation task to prepare the disks and install a file system. EFS provide a fully NFSv4 compliant network access whereas EFS is elastic and grows as needed.  

10.   Drop Box. Based on Amazon S3 and it uses Amazon EC2 for the business logic. It has a two levels of API. Drop-in to embed into Web UI and core API. It has OAuth v1 and v2

Cap comentari:

Publica un comentari a l'entrada