WIP implementation of remote shuffles#1
Open
yifeih wants to merge 9 commits intooperation-remote-shufflesfrom
Open
WIP implementation of remote shuffles#1yifeih wants to merge 9 commits intooperation-remote-shufflesfrom
yifeih wants to merge 9 commits intooperation-remote-shufflesfrom
Conversation
…into operation-remote-shuffles-yifeih-1
mccheah
reviewed
Dec 11, 2018
|
|
||
| public ExternalShuffleDataIO( | ||
| SparkConf sparkConf, | ||
| int execId) { |
Owner
Author
There was a problem hiding this comment.
yea i haven't implemented the ESS server side yet. I'll see if I can remove it altogether when I figure out that code
|
@ifilonenko for SA. I think the next step here will be to have the external shuffle service write its own partition index file so that when it serves the shuffle blocks it can know what segments of the files to return. But this is a good start! |
|
Also @felixcheung for SA (see also my PR linked above) |
yifeih
pushed a commit
that referenced
this pull request
Apr 18, 2019
…te temporary path in local staging directory ## What changes were proposed in this pull request? Th environment of my cluster as follows: ``` OS:Linux version 2.6.32-220.7.1.el6.x86_64 (mockbuildc6b18n3.bsys.dev.centos.org) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Wed Mar 7 00:52:02 GMT 2012 Hadoop: 2.7.2 Spark: 2.3.0 or 3.0.0(master branch) Hive: 1.2.1 ``` My spark run on deploy mode yarn-client. If I execute the SQL `insert overwrite local directory '/home/test/call_center/' select * from call_center`, a HiveException will appear as follows: `Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Mkdirs failed to create file:/home/xitong/hive/stagingdir_hive_2019-02-19_17-31-00_678_1816816774691551856-1/-ext-10000/_temporary/0/_temporary/attempt_20190219173233_0002_m_000000_3 (exists=false, cwd=file:/data10/yarn/nm-local-dir/usercache/xitong/appcache/application_1543893582405_6126857/container_e124_1543893582405_6126857_01_000011) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)` Current spark sql generate a local temporary path in local staging directory.The schema of local temporary path start with `file`, so the HiveException appears. This PR change the local temporary path to HDFS temporary path, and use DistributedFileSystem instance copy the data from HDFS temporary path to local directory. If Spark run on local deploy mode, 'insert overwrite local directory' works fine. ## How was this patch tested? UT cannot support yarn-client mode.The test is in my product environment. Closes apache#23841 from beliefer/fix-bug-of-insert-overwrite-local-dir. Authored-by: gengjiaan <gengjiaan@360.cn> Signed-off-by: Sean Owen <sean.owen@databricks.com>
yifeih
pushed a commit
that referenced
this pull request
Apr 18, 2019
## What changes were proposed in this pull request?
This PR supports `OpenJ9` in addition to `IBM JDK` and `OpenJDK` in Spark by handling `System.getProperty("java.vendor") = "Eclipse OpenJ9"`.
In `inferDefaultMemory()` and `getKrb5LoginModuleName()`, this PR uses non `IBM` way.
```
$ ~/jdk-11.0.2+9_openj9-0.12.1/bin/jshell
| Welcome to JShell -- Version 11.0.2
| For an introduction type: /help intro
jshell> System.out.println(System.getProperty("java.vendor"))
Eclipse OpenJ9
jshell> System.out.println(System.getProperty("java.vm.info"))
JRE 11 Linux amd64-64-Bit Compressed References 20190204_127 (JIT enabled, AOT enabled)
OpenJ9 - 90dd8cb40
OMR - d2f4534b
JCL - 289c70b6844 based on jdk-11.0.2+9
jshell> System.out.println(Class.forName("com.ibm.lang.management.OperatingSystemMXBean").getDeclaredMethod("getTotalPhysicalMemory"))
public abstract long com.ibm.lang.management.OperatingSystemMXBean.getTotalPhysicalMemory()
jshell> System.out.println(Class.forName("com.sun.management.OperatingSystemMXBean").getDeclaredMethod("getTotalPhysicalMemorySize"))
public abstract long com.sun.management.OperatingSystemMXBean.getTotalPhysicalMemorySize()
jshell> System.out.println(Class.forName("com.ibm.security.auth.module.Krb5LoginModule"))
| Exception java.lang.ClassNotFoundException: com.ibm.security.auth.module.Krb5LoginModule
| at Class.forNameImpl (Native Method)
| at Class.forName (Class.java:339)
| at (#1:1)
jshell> System.out.println(Class.forName("com.sun.security.auth.module.Krb5LoginModule"))
class com.sun.security.auth.module.Krb5LoginModule
```
## How was this patch tested?
Existing test suites
Manual testing with OpenJ9.
Closes apache#24308 from kiszk/SPARK-27397.
Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
yifeih
pushed a commit
that referenced
this pull request
Sep 17, 2019
…comparison assertions ## What changes were proposed in this pull request? This PR removes a few hardware-dependent assertions which can cause a failure in `aarch64`. **x86_64** ``` rootdonotdel-openlab-allinone-l00242678:/home/ubuntu# uname -a Linux donotdel-openlab-allinone-l00242678 4.4.0-154-generic apache#181-Ubuntu SMP Tue Jun 25 05:29:03 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux scala> import java.lang.Float.floatToRawIntBits import java.lang.Float.floatToRawIntBits scala> floatToRawIntBits(0.0f/0.0f) res0: Int = -4194304 scala> floatToRawIntBits(Float.NaN) res1: Int = 2143289344 ``` **aarch64** ``` [rootarm-huangtianhua spark]# uname -a Linux arm-huangtianhua 4.14.0-49.el7a.aarch64 #1 SMP Tue Apr 10 17:22:26 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux scala> import java.lang.Float.floatToRawIntBits import java.lang.Float.floatToRawIntBits scala> floatToRawIntBits(0.0f/0.0f) res1: Int = 2143289344 scala> floatToRawIntBits(Float.NaN) res2: Int = 2143289344 ``` ## How was this patch tested? Pass the Jenkins (This removes the test coverage). Closes apache#25186 from huangtianhua/special-test-case-for-aarch64. Authored-by: huangtianhua <huangtianhua@huawei.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Based on mccheah#4