Cloud File system.
The industry makes a classification based on the persisting and
accesing. Althouthg there are hybrids systems one division you can get across
is
Ephemeral Storage.
This is attached to the cyclelife of the VM. If you
VM terminate then your storage will disappear. Nova in Open Satack and Amazon EC2
provide the ephemeral storage automatically. It usually is based on SSD disks.
Persistence Storage.
Persistent storage means that the storage is
always avalible no matter the state of the instance or VM. This has also 3
subcategories. Object Storage, Block Storage, Shared File.
Table. OpenStack storage
|
Ephemeral storage |
Block storage |
Object storage |
Shared File System storage |
Used to… |
Run operating system and scratch space |
Add additional persistent storage to a virtual machine (VM) |
Store data, including VM images |
Add additional persistent storage to a virtual machine |
Accessed through… |
A file system |
A block device that can be partitioned, formatted, and mounted
(such as, /dev/vdc) |
The REST API |
A Shared File Systems service share (either manila managed or an
external one registered in manila) that can be partitioned, formatted
and mounted (such as /dev/vdc) |
Accessible from… |
Within a VM |
Within a VM |
Anywhere |
Within a VM |
Managed by… |
OpenStack Compute (nova) |
OpenStack Block Storage (cinder) |
OpenStack Object Storage (swift) |
OpenStack Shared File System Storage (manila) |
Persists until… |
VM is terminated |
Deleted by user |
Deleted by user |
Deleted by user |
Sizing determined by… |
Administrator configuration of size settings, known as flavors |
User specification in initial request |
Amount of available physical storage |
- User specification in initial request
- Requests for extension
- Available user-level quotes
- Limitations applied by Administrator
|
Encryption set by… |
Parameter in nova.conf |
Admin establishing encrypted volume type,
then user selecting encrypted volume |
Not yet available |
Shared File Systems service does not apply any additional encryption
above what the share’s back-end storage provides |
Example of typical usage… |
10 GB first disk, 30 GB second disk |
1 TB disk |
10s of TBs of dataset storage |
Depends completely on the size of back-end storage specified when
a share was being created. In case of thin provisioning it can be
partial space reservation (for more details see
Capabilities and Extra-Specs
specification) |
1. Cephs. State of the art system which
has a sort of logic web accessing to Linux .
It gives all the parallelism and redundant data expected in a cloud or
distributed systems. Reliability, no single point of failure and scalability.
Based on CRUSH. Ceph seems to be more adapted to general purpose than HDFS. As
other distributed systems the first step to build a distributed system is decoupled
completely the data from its metadata. Ceph has tried to get ride off all clients
understanding of the systems and use something similar to bash and libfuse
libreries in order to access to files as a POSIX manner.
2. HDFS. Hadoop distribute file system.
As the name shows is the file sysem of the Map Reduce implementation Hadoop. It
is used in HBase too. Written in Java, it uses posix on the underlying system
and is rack awareness by dns or ip. It is based on Master/Slave arquitecture so
it have a Namenode or master who makes the filesystem operations
via RPC interface the slaves are called Datanodes.
3. SWIFT. This is an Object Store; it
pertains to Open Stack IaaS. Swift provide storage of Blobs via web access. The
object store can be used to store data but a typical use case is the storage of
images or videos. The link to that objects might be loaded in a traditional
data base and the access would be by the web. The API is RESTful services with
PUT or GET and the url with the path to the object. The swift is under HA to do
this it is balanced using a load balancer + proxies to translate the request to
the actual path to the object and its node. The objects are replicated so as to
the request can be paralyzed.
4. CINDER. This is storage falls under
the type called block storage. A cinder Volume is attached to the VM directly.
You can thing in something like a USB attached to you laptop. However it is
needed install the file system because the volume is in raw mode. So it requires
as AWS ESB operational expertise.
5. HIVE. DataBase with MapReduce
implementation and gives eventual consistence. It has HQL which is a kind of
SQL and translates queries into MapReduce jobs. Hive uses a traditional SQL to
store metadata. It can be MySQL, Oracle and other. Note not update or delete
soported.
6. Amazon S3. It provides a service
very similar as SWIFT do. So you have a web access through REST, It used to use
Soap although now is backing off. S3 also provides a CLI commands in a bash way.
The files can be up to 5 TB. They are
stored in a concept called ‘bucket’. Users choose a key to map the object to fetch
it. The organization of the bucket is quite flat so there is not organization
at all and you can set as many files you want until 5 TB. As Swift it has a
week consistency model called eventual consistency. This kind of consistence is
the solution to the CAP problem consistency and this tries to give a final consistence
consist on if all the writings stopped the system would get the expected value,
But concurrent writings does not ensure consistent reads. So you shouldn’t set
an ACID system like a bank account database. S3 also provides a CLI commands in
a bash way.
7. Amazon AWS EBS. Elastic Block
Storage. It is similar to the Cinder for Open Stack. Amazon provide a list of different
types gp2,io1,st1,sc1 based on SSD or Magnetic disks and its IOPs capacity.
8. Amazon AWS Glacier. Archive service.
As its name shows is to long storage to archive files. Archives are stored in ‘vauls’
and if you want download a file from it you are going to need 3 or 5 hours to
get ready the files.
9. Amazon EFS. Elastic File System. It
falls under the shared file system category. SSD backed so is fast. Amazon EBS
or Cinder requires operation task to prepare the disks and install a file
system. EFS provide a fully NFSv4 compliant network access whereas EFS is
elastic and grows as needed.
10. Drop Box. Based on Amazon S3 and it
uses Amazon EC2 for the business logic. It has a two levels of API. Drop-in to
embed into Web UI and core API. It has OAuth v1 and v2