Wednesday, April 3, 2024

Exploring Public Storage Traces. What are they, the place are they, and are… | by Raluca Diaconu | Jan, 2024

Must read


What are they, the place are they, and are they best for you?

Towards Data Science
Picture by Hongwei FAN on Unsplash

Enter and output (I/O) operations check with the switch of information between a pc’s essential reminiscence and varied peripherals. Storage peripherals comparable to HDDs and SSDs have explicit efficiency traits when it comes to latency, throughput, and price which may affect the efficiency of the pc system they energy. Extrapolating, the efficiency and design of distributed and cloud based mostly Knowledge Storage depends upon that of the medium. This text is meant to be a bridge between Knowledge Science and Storage Programs: 1/ I’m sharing a couple of datasets of assorted sources and sizes which I hope can be novel for Knowledge Scientists and a pair of/ I’m citing the potential for superior analytics in Distributed Programs.

Storage entry traces are “a treasure trove of knowledge for optimizing cloud workloads.” They’re essential for capability planning, information placement, or system design and analysis, suited to trendy functions. Numerous and up-to-date datasets are notably wanted in tutorial analysis to review novel and unintuitive entry patterns, assist the design of latest {hardware} architectures, new caching algorithms, or {hardware} simulations.

Storage traces are notoriously tough to seek out. The SNIA web site is the perfect recognized “repository for storage-related I/O hint recordsdata, related instruments, and different associated data” however many traces do not adjust to their licensing or add format. Discovering traces turns into a tedious technique of scanning the educational literature or making an attempt to generate one’s personal.

Well-liked traces that are simpler to seek out are typically outdated and overused. Traces older than 10 years shouldn’t be utilized in trendy analysis and improvement resulting from modifications in software workloads and {hardware} capabilities. Additionally, an over-use of particular traces can bias the understanding of actual workloads so it is really useful to make use of traces from a number of impartial sources when attainable.

This publish is an organized assortment of current public traces I discovered and used. Within the first half I categorize them by the extent of abstraction they symbolize within the IO stack. Within the second half I checklist and talk about some related datasets. The final half is a abstract of all with a private view on the gaps in storage tracing datasets.

I distinguish between three kinds of traces based mostly on information illustration and entry mannequin. Let me clarify. A consumer, on the software layer, sees information saved in recordsdata or objects that are accessed by a wide variety of summary operations comparable to open or append. Nearer to the media, the info is saved in a steady reminiscence deal with area and accessed as blocks of mounted dimension which can solely be learn or written. At a better abstraction stage, throughout the software layer, we may additionally have an information presentation layer which can log entry to information presentation items, which can be, for instance, rows composing tables and databases, or articles and paragraphs composing information feeds. The entry could also be create desk, or publish article.

Whereas traces might be taken wherever within the IO stack and include data from a number of layers, I’m selecting to construction the next classification based mostly on the Linux IO stack depicted under.

Linux IO Stack Diagram
I/O Stack Diagram (tailored from [1], [2] and [3])

Block storage traces

The information in these traces is consultant of the operations on the block layer. In Linux, this information is often collected with blktrace (and rendered readable with blkparse), iostat, or dtrace. The traces include details about the operation, the system, CPU, course of, and storage location accessed. The primary hint listed is an instance of blktrace output.

The standard data generated by tracing packages could also be too detailed for evaluation and publication functions and it’s typically simplified. Typical public traces include operation, offset, dimension, and typically timing. At this layer the operations are solely learn and write. Every operation accesses the deal with beginning at offset and is utilized to a steady dimension of reminiscence laid out in variety of blocks (4KiB NTFS). For instance, a hint entry for a learn operation incorporates the deal with the place the learn begins (offset), and the variety of blocks learn (dimension). The timing data might include the time the request was issued (begin time), the time it was accomplished (finish time), the processing in between (latency), and the time the request waited (queuing time).

Obtainable traces sport completely different options, have wildly completely different sizes, and are the output of quite a lot of workloads. Deciding on the proper one will depend upon what one’s on the lookout for. For instance, hint replay solely wants the order of operations and their dimension; For efficiency evaluation timing data is required.

Disk entry visualization with iowatcher (supply)

Object storage traces

On the software layer, information is situated in recordsdata and objects which can be created, opened, appended, or closed, after which found by way of a tree construction. From an consumer’s viewpoint, the storage media is decoupled, hiding fragmentation, and permitting random byte entry.

I’ll group collectively file and object traces regardless of a refined distinction between the 2. Information comply with the file system’s naming conference which is structured (usually hierarchical). Typically the extension suggests the content material kind and utilization of the file. Then again, objects are utilized in giant scale storage methods coping with huge quantities of numerous information. In object storage methods the construction will not be intrinsic, as a substitute it’s outlined externally, by the consumer, with particular metadata recordsdata managed by their workload.

Being generated throughout the software area, usually the results of an software logging mechanism, object traces are extra numerous when it comes to format and content material. The data recorded could also be extra particular, for instance, operations can be delete, copy, or append. Objects usually have variable dimension and even the identical object’s dimension might range in time after appends and overwrites. The object identifier could be a string of variable dimension. It could encode additional data, for instance, an extension that tells the content material kind. Different meta-information might come from the vary accessed, which can inform us, for instance, whether or not the header, the footer or the physique of a picture, parquet, or CSV file was accessed.

Object storage traces are higher suited to understanding consumer entry patterns. When it comes to block entry, a video stream and a sequential learn of a whole file generate the identical sample: a number of sequential IOs at common time intervals. However these hint entries needs to be handled in another way if we’re to replay them. Accessing video streaming blocks must be performed with the identical time delta between them, whatever the latency of every particular person block, whereas studying your complete file needs to be asap.

Entry traces

Particular to every software, information could also be abstracted additional. Knowledge items could also be cases of a category, information in a database, or ranges in a file. A single information entry might not even generate a file open or a disk IO if caching is concerned. I select to incorporate such traces as a result of they might be used to grasp and optimize storage entry, and particularly cloud storage. For instance, the entry traces from Twitter’s Memcache are helpful in understanding reputation distributions and due to this fact could also be helpful for information formatting and placement choices. Typically they don’t seem to be storage traces per se, however they are often helpful within the context of cache simulation, IO discount, or information structure (indexing).

Knowledge format in these traces might be much more numerous resulting from a brand new layer of abstraction, for instance, by tweet identifiers in Memcached.

Let’s take a look at a couple of traces in every of the classes above. The checklist particulars among the newer traces — no older than 10 years — and it’s under no circumstances exhaustive.

Block traces

YCSB RocksDB SSD 2020

These are SSD traces collected on a 28-core, 128 GB host with two 512 GB NVMe SSD Drives, working Ubuntu. The dataset is a results of working the YCSB-0.15.0 benchmark with RocksDB.

The primary SSD shops all blktrace output, whereas the second hosts YCSB and RocksDB. YCSB Workload A consists of fifty% reads and 50% updates of 1B operations on 250M information. Runtime is 9.7 hours, which generates over 352M block I/O requests on the file system stage writing a complete of 6.8 TB to the disk, with a learn throughput of 90 MBps and a write throughput of 196 MBps.

The dataset is small in comparison with all others within the checklist, and restricted when it comes to workload, however an incredible place to begin resulting from its manageable dimension. One other profit is reproducibility: it makes use of open supply tracing instruments and benchmarking beds atop a comparatively cheap {hardware} setup.

Format: These are SSD traces taken with blktrace and have the everyday format after parsing with blkparse: [Device Major Number,Device Minor Number] [CPU Core ID] [Record ID] [Timestamp (in nanoseconds)] [ProcessID] [Trace Action] [OperationType] [SectorNumber + I/O Size] [ProcessName]

259,2    0        1     0.000000000  4020  Q   R 282624 + 8 [java]
259,2 0 2 0.000001581 4020 G R 282624 + 8 [java]
259,2 0 3 0.000003650 4020 U N [java] 1
259,2 0 4 0.000003858 4020 I RS 282624 + 8 [java]
259,2 0 5 0.000005462 4020 D RS 282624 + 8 [java]
259,2 0 6 0.013163464 0 C RS 282624 + 8 [0]
259,2 0 7 0.013359202 4020 Q R 286720 + 128 [java]

The place to seek out it: http://iotta.snia.org/traces/block-io/28568

License: SNIA Hint Knowledge Information Obtain License

Alibaba Block Traces 2020

The dataset consists of “block-level I/O requests collected from 1,000 volumes, the place every has a uncooked capability from 40 GiB to five TiB. The workloads span numerous kinds of cloud functions. Every collected I/O request specifies the quantity quantity, request kind, request offset, request dimension, and timestamp.”

Limitations (from the educational paper)

  • the traces don’t report the response instances of the I/O requests, making them unsuitable for latency evaluation of I/O requests.
  • the precise functions working atop usually are not talked about, in order that they can’t be used to extract software workloads and their I/O patterns.
  • the traces seize the entry to digital gadgets, so they don’t seem to be consultant of efficiency and reliability (e.g., information placement and failure statistics) for bodily block storage gadgets.

A downside of this dataset is its dimension. When uncompressed it ends in a 751GB file which is tough to retailer and handle.

Format: device_id,opcode,offset,size,timestamp

  • device_idID of the digital disk, uint32
  • opcodeBoth of ‘R’ or ‘W’, indicating this operation is learn or write
  • offsetOffset of this operation, in bytes, uint64
  • sizeSize of this operation, in bytes, uint32
  • timestampTimestamp of this operation obtained by server, in microseconds, uint64
419,W,8792731648,16384,1577808144360767
725,R,59110326272,360448,1577808144360813
12,R,350868463616,8192,1577808144360852
725,R,59110686720,466944,1577808144360891
736,R,72323657728,516096,1577808144360996
12,R,348404277248,8192,1577808144361031

Moreover, there’s an additional file containing every digital system’s id device_id with its complete capability.

The place to seek out it: https://github.com/alibaba/block-traces

License: CC-4.0.

Tencent Block Storage 2018

This dataset consists of “216 I/O traces from a warehouse (additionally known as a failure area) of a manufacturing cloud block storage system (CBS). The traces are I/O requests from 5584 cloud digital volumes (CVVs) for ten days (from Oct. 1st to Oct. tenth, 2018). The I/O requests from the CVVs are mapped and redirected to a storage cluster consisting of 40 storage nodes (i.e., disks).”

Limitations:

  • Timestamps are in seconds, a granularity too little for figuring out the order of operations. As a consequence many requests seem as if issued on the similar time. This hint is due to this fact unsuitable for queuing evaluation.
  • There is no such thing as a latency details about the period of every operation, making the hint unsuitable for latency efficiency, queuing analytics.
  • No additional details about every quantity comparable to complete dimension.

Format: Timestamp,Offset,Measurement,IOType,VolumeID

  • Timestamp is the Unix time the I/O was issued in seconds.
  • Offset is the beginning offset of the I/O in sectors from the beginning of the logical digital quantity. 1 sector = 512 bytes
  • Measurement is the switch dimension of the I/O request in sectors.
  • IOType is “Learn(0)”, “Write(1)”.
  • VolumeID is the ID variety of a CVV.
1538323200,12910952,128,0,1063
1538323200,6338688,8,1,1627
1538323200,1904106400,384,0,1360
1538323200,342884064,256,0,1360
1538323200,15114104,8,0,3607
1538323200,140441472,32,0,1360
1538323200,15361816,520,1,1371
1538323200,23803384,8,0,2363
1538323200,5331600,4,1,3171

The place to seek out it: http://iotta.snia.org/traces/parallel/27917

License: NIA Hint Knowledge Information Obtain License

K5cloud Traces 2018

This dataset incorporates traces from digital cloud storage from the FUJITSU K5 cloud service. The information is gathered throughout per week, however not repeatedly as a result of “ at some point’s IO entry logs typically consumed the storage capability of the seize system.” There are 24 billion information from 3088 digital storage nodes.

The information is captured within the TCP/IP community between servers working on hypervisor and storage methods in a K5 information middle in Japan. The information is break up between three datasets by every digital storage quantity id. Every digital storage quantity id is exclusive in the identical dataset, whereas every digital storage quantity id will not be distinctive between the completely different datasets.

Limitations:

  • There is no such thing as a latency data, so the traces can’t be used for efficiency evaluation.
  • The full node dimension is lacking, however it may be approximated from the utmost offset accessed within the traces.
  • Some functions might require a whole dataset, which makes this one unsuitable resulting from lacking information.

The fields within the IO entry log are: ID,Timestamp,Sort,Offset,Size

  • ID is the digital storage quantity id.
  • Timestamp is the time elapsed from the primary IO request of all IO entry logs in seconds, however with a microsecond granularity.
  • Sort is R(Learn) or (W)Write.
  • Offset is the beginning offset of the IO entry in bytes from the beginning of the digital storage.
  • Size is the switch dimension of the IO request in bytes.
1157,3.828359000,W,7155568640,4096
1157,3.833921000,W,7132311552,8192
1157,3.841602000,W,15264690176,28672
1157,3.842341000,W,28121042944,4096
1157,3.857702000,W,15264718848,4096
1157,9.752752000,W,7155568640,4096

The place to seek out it: http://iotta.snia.org/traces/parallel/27917

License: CC-4.0.

Object traces

Server-side I/O request arrival traces 2019

This repository incorporates two datasets for IO block traces with extra file identifiers: 1/ parallel file methods (PFS) and a pair of/ I/O nodes.

Notes:

  • The entry patterns are ensuing from MPI-IO take a look at benchmark ran atop of Grid5000, a big scale take a look at mattress for parallel and Excessive Efficiency Computing (HPC). These traces usually are not consultant of common consumer or cloud workloads however as a substitute particular to HPC and parallel computing.
  • The setup for the PFS situation makes use of Orange FS as file system and for the IO nodes I/O Forwarding Scalability Layer(IOFSL). In each circumstances the scheduler was set to AGIOS I/O scheduling library. This setup is maybe too particular for many use circumstances focused by this text and has been designed to replicate some proposed options.
  • The {hardware} setup for PFS consists of our server nodes with 600 GB HDDs every and 64 consumer nodes. For IO nodes, it has 4 server nodes with comparable disk configuration in a cluster, and 32 shoppers in a special cluster.

Format: The format is barely completely different for the 2 datasets, an artifact of various file methods. For IO nodes, it consists of a number of recordsdata, every with tab-separated values Timestamp FileHandle RequestType Offset Measurement. A peculiarity is that reads and writes are in separate recordsdata named accordingly.

  • Timestamp is a quantity representing the inner timestamp in nanoseconds.
  • FileHandle is the file deal with in hexadecimal of dimension 64.
  • RequestType is the kind of the request, inverted, “W” for reads and “R” for writes.
  • Offset is a quantity giving the request offset in bytes
  • Measurement is the dimensions of the request in bytes.
265277355663    00000000fbffffffffffff0f729db77200000000000000000000000000000000        W       2952790016      32768
265277587575 00000000fbffffffffffff0f729db77200000000000000000000000000000000 W 1946157056 32768
265277671107 00000000fbffffffffffff0f729db77200000000000000000000000000000000 W 973078528 32768
265277913090 00000000fbffffffffffff0f729db77200000000000000000000000000000000 W 4026531840 32768
265277985008 00000000fbffffffffffff0f729db77200000000000000000000000000000000 W 805306368 32768

The PFS situation has two concurrent functions, “app1” and “app2”, and its traces are inside a folder named accordingly. Every row entry has the next format: [<Timestamp>] REQ SCHED SCHEDULING, deal with:<FileHandle>, queue_element: <QueueElement>, kind: <RequestType>, offset: <Offset>, len: <Measurement> Completely different from the above are:

  • RequestType is 0 for reads and 1 for writes
  • QueueElement isn’t used and I imagine it’s an artifact of the tracing software.
[D 01:11:03.153625] REQ SCHED SCHEDULING, deal with: 5764607523034233445, queue_element: 0x12986c0, kind: 1, offset: 369098752, len: 1048576 
[D 01:11:03.153638] REQ SCHED SCHEDULING, deal with: 5764607523034233445, queue_element: 0x1298e30, kind: 1, offset: 268435456, len: 1048576
[D 01:11:03.153651] REQ SCHED SCHEDULING, deal with: 5764607523034233445, queue_element: 0x1188b80, kind: 1, offset: 0, len: 1048576
[D 01:11:03.153664] REQ SCHED SCHEDULING, deal with: 5764607523034233445, queue_element: 0xf26340, kind: 1, offset: 603979776, len: 1048576
[D 01:11:03.153676] REQ SCHED SCHEDULING, deal with: 5764607523034233445, queue_element: 0x102d6e0, kind: 1, offset: 637534208, len: 1048576

The place to seek out it: https://zenodo.org/information/3340631#.XUNa-uhKg2x

License: CC-4.0.

IBM Cloud Object Retailer 2019

These are anonymized traces from the IBM Cloud Object Storage service collected with the first objective to review information flows to the thing retailer.

The dataset consists of 98 traces containing round 1.6 Billion requests for 342 Million distinctive objects. The traces themselves are about 88 GB in dimension. Every hint incorporates the REST operations issued in opposition to a single bucket in IBM Cloud Object Storage throughout a single week in 2019. Every hint incorporates between 22,000 to 187,000,000 object requests. All of the traces have been collected throughout the identical week in 2019. The traces include all information entry requests issued over per week by a single tenant of the service. Object names are anonymized.

Some traits of the workload have been revealed on this paper, though the dataset used was bigger:

  • The authors have been “in a position to determine among the workloads as SQL queries, Deep Studying workloads, Pure Language Processing (NLP), Apache Spark information analytic, and doc and media servers. However most of the workloads’ sorts stay unknown.”
  • “A overwhelming majority of the objects (85%) within the traces are smaller
    than a megabyte, But these objects solely account for 3% of the
    of the saved capability.” This made the info appropriate for a cache evaluation.

Format: <time stamp of request> <request kind> <object ID> <elective: dimension of object> <elective: starting offset> <elective: ending offset> The timestamp is the variety of milliseconds from the purpose the place we started amassing the traces.

1219008 REST.PUT.OBJECT 8d4fcda3d675bac9 1056
1221974 REST.HEAD.OBJECT 39d177fb735ac5df 528
1232437 REST.HEAD.OBJECT 3b8255e0609a700d 1456
1232488 REST.GET.OBJECT 95d363d3fbdc0b03 1168 0 1167
1234545 REST.GET.OBJECT bfc07f9981aa6a5a 528 0 527
1256364 REST.HEAD.OBJECT c27efddbeef2b638 12752
1256491 REST.HEAD.OBJECT 13943e909692962f 9760

The place to seek out it: http://iotta.snia.org/traces/key-value/36305

License: SNIA Hint Knowledge Information Obtain License

Entry traces

Wiki Analytics Datasets 2019

The wiki dataset incorporates information for 1/ add (picture) internet requests of Wikimedia and a pair of/ textual content (HTML pageview) internet requests from one CDN cache server of Wikipedia. The mos current dataset, from 2019 incorporates 21 add information recordsdata and 21 textual content information recordsdata.

Format: Every add information file, denoted cache-u, incorporates precisely 24 hours of consecutive information. These recordsdata are every roughly 1.5GB in dimension and maintain roughly 4GB of decompressed information every.

This dataset is the results of a single kind of workload, which can restrict the applicability, however it’s giant and full, which makes testbed.

Every decompressed add information file has the next format: relative_unix hashed_path_query image_type response_size time_firstbyte

  • relative_unix: Seconds since begin timestamp of dataset, int
  • hashed_path_query: Salted hash of path and question of request, bigint
  • image_type: Picture kind from Content material-Sort header of response, string
  • response_size: Response dimension in bytes, int
  • time_firstbyte: Seconds to first byte, double
0 833946053 jpeg 9665 1.85E-4
0 -1679404160 png 17635 2.09E-4
0 -374822678 png 3333 2.18E-4
0 -1125242883 jpeg 4733 1.57E-4

Every textual content information file, denoted cache-t, incorporates precisely 24 hours of consecutive information. These recordsdata are every roughly 100MB in dimension and maintain roughly 300MB of decompressed information every.

Every decompressed add information file has the next format: relative_unix hashed_host_path_query response_size time_firstbyte

4619 540675535 57724 1.92E-4
4619 1389231206 31730 2.29E-4
4619 -176296145 20286 1.85E-4
4619 74293765 14154 2.92E-4

The place to seek out it: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Site visitors/Caching

License: CC-4.0.

Memcached 2020

This dataset incorporates one-week-long traces from Twitter’s in-memory caching (Twemcache / Pelikan) clusters. The information comes from 54 largest clusters in Mar 2020, Anonymized Cache Request Traces from Twitter Manufacturing.

Format: Every hint file is a csv with the format: timestamp,anonymized key,key dimension,worth dimension,consumer id,operation,TTL

  • timestamp: the time when the cache receives the request, in sec
  • anonymized key: the unique key with anonymization the place namespaces are preserved; for instance, if the anonymized secret is nz:u:eeW511W3dcH3de3d15ec, the primary two fields nz and u are namespaces, notice that the namespaces usually are not essentially delimited by :, completely different workloads use completely different delimiters with completely different variety of namespaces.
  • key dimension: the dimensions of key in bytes
  • worth dimension: the dimensions of worth in bytes
  • consumer id: the anonymized shoppers (frontend service) who sends the request
  • operation: considered one of get/will get/set/add/change/cas/append/prepend/delete/incr/decr
  • TTL: the time-to-live (TTL) of the thing set by the consumer, it’s 0 when the request will not be a write request.
0,q:q:1:8WTfjZU14ee,17,213,4,get,0
0,yDqF:3q:1AJrrJ1nnCJKKrnGx1A,27,27,5,get,0
0,q:q:1:8WTw2gCuJe8,17,720,6,get,0
0,yDqF:vS:1AJr9JnArxCJGxn919K,27,27,7,get,0
0,yDqF:vS:1AJrrKG1CAnr1C19KxC,27,27,8,get,0

License: CC-4.0.

For those who’re nonetheless right here and haven’t gone diving into one of many traces linked above it might be since you haven’t discovered what you’re on the lookout for. There are a couple of gaps that present storage traces have but to fill:

  • Multi-tenant Cloud Storage: Massive cloud storage suppliers retailer among the most wealthy datasets on the market. Their workload displays a big scale methods’ structure and is the results of a various set of functions. Storage suppliers are additionally additional cautious relating to sharing this information. There’s little or no monetary incentive to share information with the general public and a worry of unintended buyer information leaks.
  • Full stack. Every layer within the stack presents a special view on entry patterns, none alone being sufficient to grasp cause-and-effect relationships in storage methods. Optimizing a system to swimsuit trendy workloads requires a holistic view of the info entry which aren’t publicly obtainable.
  • Distributed tracing. Most information is these days accessed remotely and managed in giant scale distributed methods. Many elements and layers (comparable to indexes or caching) will alter the entry patterns. In such an setting, end-to-end means tracing a request throughout a number of elements in a posh structure. This information might be really priceless for designing giant scale methods however, on the similar time, could also be too particular to the system inspected which, once more, limits the inducement to publish it.
  • Knowledge high quality. The traces above have limitations because of the stage of element they symbolize. As we now have seen, some have lacking information, some have giant granularity time stamps, others are inconveniently giant to make use of. Cleansing information is a tedious course of which limits the dataset publishing these days.



Supply hyperlink

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest article