question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Copying the data doesn't go through short-circuit when using `load` command.

See original GitHub issue

Alluxio Version: v2.0.0

Describe the bug A clear and concise description of what the bug is.

To Reproduce

Copying the same data in the same node(virtual machine) twice times. I expect it should go through short-circuit in the second time. But it’s still copied from remote node in alluxio.

  1. Deploy master, workers and fuse in kubernetes

k8s.zip

  1. And go to the master, distributedLoad whole the directory from alluxio
time /opt/alluxio/bin/alluxio fs  distributedLoad --replication 1 /training-data/images
...../training-data/images/train-00304-of-01024 loaded
..../training-data/images/train-00475-of-01024 loaded
/training-data/images loaded

real    151m11.859s
user    0m44.141s
sys     0m15.720s
  1. After that, check the persistent status. All the data are in persistent.
bash-4.4# /opt/alluxio/bin/alluxio fs ls /training-data/images| wc -l
1152
bash-4.4# /opt/alluxio/bin/alluxio fs ls /training-data/images| grep  PERSIST|wc -l
1152
bash-4.4# /opt/alluxio/bin/alluxio fs ls /training-data/images| grep -v PERSIST|wc -l
0
  1. I’ve checked the data is fully loaded into alluxio, and it’s put on the different nodes.
bash-4.4# /opt/alluxio/bin/alluxio fsadmin report ufs
Alluxio under storage system information:
oss://imagenet-huabei5/images      on  /training-data/images  (oss, capacity=-1B, used=-1B, read-only, not shared, properties={fs.oss.accessKeySecret=******, fs.oss.accessKeyId=******, fs.oss.endpoint=oss-cn-internal.aliyuncs.com})
/opt/alluxio-2.0.0/underFSStorage  on  /                      (local, capacity=4843.27GB, used=-1B(0%), not read-only, not shared, properties={})
bash-4.4# /opt/alluxio/bin/alluxio fsadmin report capacity
Capacity information for all workers:
    Total Capacity: 9.38TB
        Tier: MEM  Size: 1600.00GB
        Tier: SSD  Size: 7.81TB
    Used Capacity: 143.67GB
        Tier: MEM  Size: 143.67GB
        Tier: SSD  Size: 0B
    Used Percentage: 1%
    Free Percentage: 99%

Worker Name                  Last Heartbeat   Storage       Total            MEM           SSD
192.168.0.117                0                capacity      600.00GB         100.00GB      500.00GB
                                              used          32.07GB (5%)     0B            0B
192.168.0.118                0                capacity      600.00GB         100.00GB      500.00GB
                                              used          39.50GB (6%)     0B            0B
192.168.0.119                0                capacity      600.00GB         100.00GB      500.00GB
                                              used          37.40GB (6%)     0B            0B
192.168.0.120                0                capacity      600.00GB         100.00GB      500.00GB
                                              used          34.70GB (5%)     0B            0B
  1. I went to the node(192.168.0.120) to copy data for the first time, I think the data should be also download the local node.
time cp -r /alluxio-fuse/training-data/images /test

real    10m23.516s
user    0m1.190s
sys     2m40.581s
  1. Check the metrics, I can see most data are from remote node in alluxio. It’s as expected.
bash-4.4# ./bin/alluxio fsadmin report metrics
Total IO:
    Short-circuit Read                                         0B
    Short-circuit Read (Domain Socket)                  1232.86GB
    From Remote Instances                                220.87GB
    Under Filesystem Read                                 29.00MB
    Alluxio Write                                              0B
    Alluxio Write (Domain Socket)                        718.40GB
    Under Filesystem Write                                     0B

Total IO Throughput (Last Minute):
    Short-circuit Read                                         0B
    Short-circuit Read (Domain Socket)                    19.93MB
    From Remote Instances                                 69.23MB
    Under Filesystem Read                                      0B
    Alluxio Write                                              0B
    Alluxio Write (Domain Socket)                              0B
    Under Filesystem Write                                     0B

Cache Hit Rate (Percentage):
    Alluxio Local                                            0.00
    Alluxio Remote                                          99.99
    Miss                                                     0.01
  1. I went to the node(192.168.0.120) to copy data for the second time, I think the data should be read from local
time cp -r /alluxio-fuse/training-data/images /test

real    10m23.516s
user    0m1.190s
sys     2m40.581s
  1. But from the metrics, it shows the data is still downloaded from remote node in alluxio.
# ./bin/alluxio fsadmin report metrics
Total IO:
    Short-circuit Read                                         0B
    Short-circuit Read (Domain Socket)                  1266.88GB
    From Remote Instances                                325.68GB
    Under Filesystem Read                                 29.00MB
    Alluxio Write                                              0B
    Alluxio Write (Domain Socket)                        718.40GB
    Under Filesystem Write                                     0B

Total IO Throughput (Last Minute):
    Short-circuit Read                                         0B
    Short-circuit Read (Domain Socket)                    13.62MB
    From Remote Instances                                 37.96MB
    Under Filesystem Read                                      0B
    Alluxio Write                                              0B
    Alluxio Write (Domain Socket)                              0B
    Under Filesystem Write                                     0B

Cache Hit Rate (Percentage):
    Alluxio Local                                            0.00
    Alluxio Remote                                          99.99
    Miss                                                     0.01

Expected behavior A clear and concise description of what you expected to happen.

Urgency Describe the impact and urgency of the bug.

Additional context Add any other context about the problem here.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
cheyangcommented, Jul 24, 2019

It’s not a blocking issue, but it’s also critical. Because for performance-sensitive and distributed application, loading data from network will also impact the network bandwidth heavily which should be used by interconnecting of the application itself.

0reactions
yuzhucommented, Oct 25, 2021

@cheyang @zrss has this been addressed in recent versions?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Is the SQL WHERE clause short-circuit evaluated?
Short circuiting implies evaluating conditions from left to right. Given a condition such as WHERE a = 1 AND b = 2 it...
Read more >
Short Circuit - DCCWiki
A Short Circuit gets its name from the electrical energy finding a shortcut, an easier path from one side of the power supply...
Read more >
Apache Hadoop Distributed Copy – DistCp Guide
Overview. DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, ...
Read more >
Power*ToolsR for Windows™ V8.0 Tutorial - SKM
you can use the Go-to-Component Editor function to display the ... AC Short Circuit Method use in Arc Flash Calculation - These three...
Read more >
How to Repair a Dead Hard Disk Drive to Recover Data
If your hard disk drive has failed, this guide will help you with the hard disk drive's repair and data recovery.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found