118
101 519

34:12

Analyzing the National Water Model with Xarray, Dask, and Coiled

0:13

Dask DataFrame is Fast Now

54:28

Spark, Dask, DuckDB, Polars: TPC-H Benchmarks at Scale

37:52

How do I Set Up Coiled?

7:30

Run Your Jupyter Notebooks in the Cloud

23:08

Schedule Python Jobs with Prefect and Coiled

Prefect makes it easy to write production workflows in Python. Getting started on a laptop usually takes just a few minutes.
Coiled makes it easy to deploy Prefect in the cloud. You might want to run a workflow, or specific task within a workflow, on the cloud because:
- You want an always-on machine for long-running, or regularly scheduled, jobs
- You want to run close to your cloud-hosted data
- You want specific hardware (like GPUs, or lots of memory)
- You want many machines for running tasks in parallel
In this webinar, we deploy a Prefect workflow on the cloud with Coiled that processes a daily updated cloud dataset. This is a common pattern that shows up in fields like machine learning, finance, remote sensing, sports betting, and more.
Learn more about the Prefect+Coiled API: docs.coiled.io/user_guide/labs/prefect.html
Run this example yourself: docs.coiled.io/user_guide/labs/prefect-scheduled-jobs.html

zhlédnutí: 384

Video

34:12

Churn Through Cloud Files in Parallel

zhlédnutí 154Před 7 měsíci

People often want to run the same function over many files. However, processing files in cloud storage is often slow and expensive due to transferring cloud data in and out of AWS/GCP/Azure. In this webinar recording we’ll show how to run this “same function on many files” pattern on the cloud with Coiled, so you can run existing code faster and cheaper with minimal changes. We’ll also highligh...

Analyzing the National Water Model with Xarray, Dask, and Coiled

0:13

Analyzing the National Water Model with Xarray, Dask, and Coiled

zhlédnutí 431Před 9 měsíci

Mean weekly water table depth for US counties from 1979-2020. Water table depth fluctuates seasonally, decreasing with more precipitation in the winter and increasing with more periods of drought in the summer. 1m is optimal for many types of agriculture. Blog post: docs.coiled.io/blog/coiled-xarray.html Code: github.com/coiled/examples/tree/main/national-water-model

54:28

Dask DataFrame is Fast Now

zhlédnutí 1,2KPřed 9 měsíci

In this webinar, Patrick Höfler and Rick Zamora show how recent development efforts have driven performance improvements in Dask DataFrame. Key Moments 00:00 Intro 00:19 Dask DataFrame is fast now 02:06 Historical pain points 03:51 PyArrow-backed strings in Dask 06:04 Demo: PyArrow strings 08:53 Demo: Task-based shuffling is slow 11:11 Better performance with P2P shuffling 16:29 Sub-optimal que...

Spark, Dask, DuckDB, Polars: TPC-H Benchmarks at Scale

37:52

Spark, Dask, DuckDB, Polars: TPC-H Benchmarks at Scale

zhlédnutí 7KPřed 10 měsíci

We run the common TPC-H Benchmark suite at 10 GB, 100 GB, 1 TB, and 10 TB scale on the cloud a local machine and compare performance for common large dataframe libraries. No tool does universally well. We look at common bottlenecks and compare performance between the different systems. This talk was originally given at PyData NYC 2023. These results are preliminary, and come from only a couple ...

7:30

How do I Set Up Coiled?

zhlédnutí 359Před 11 měsíci

Set up Coiled to run Dask or other cloud processing APIs easily 1. Create an account 2. Register an API token 3. Connect to your cloud 00:00 Introduction 00:34 pip install coiled 00:51 Authenticate 01:25 Connect your Cloud 03:48 Add a Region 05:00 Hello, world! 06:25 Teams 07:11 Summary

23:08

Run Your Jupyter Notebooks in the Cloud

zhlédnutí 759Před rokem

When you're only processing 10-100GB of data, a hundred-worker cluster is probably overkill when a single, big VM will do. You can use Coiled notebooks to start a JupyterLab instance on any machine you’d like, whether that’s a better GPU or a single VM with hundreds of GBs of memory. Examples in our docs: docs.coiled.io/user_guide/usage/notebooks/index.html Get started with Coiled: coiled.io/st...

15:20

Coiled Overview

zhlédnutí 485Před rokem

Learn how to easily process data on the cloud with Coiled. This 15m video is an overview over many aspects of Coiled. For a more in-depth treatment, please consider the more topic-specific videos at youtube.com/@coiled 00:00 Introduction 01:14 API: CLI commands 02:41 API: Serverless Functions 03:40 API: Dask 06:25 API: Jupyter Notebooks 07:38 Management Dashboard 09:56 Architecture and Data Pri...

Run Python Scripts with Coiled Functions & Coiled Run

26:19

Run Python Scripts with Coiled Functions & Coiled Run

zhlédnutí 312Před rokem

Run a script or Python function in any cloud region on any hardware. Sometimes you don’t need a huge cluster for your workflows, and you just want to run your Python function on a VM in the cloud. In this webinar, we'll walk through these two APIs: Coiled Functions and Coiled Run. We'll see how to run a computation on a VM close to our data, train a PyTorch model on a GPU in the cloud, and scal...

Run Python Scripts in the Cloud with Coiled

12:46

Run Python Scripts in the Cloud with Coiled

zhlédnutí 765Před rokem

Sometimes you don’t need a huge cluster for your workflows, and you just want to run your Python function on a VM in the cloud. You might want to do this for a few reasons: You want a big machine You want a GPU You want to run close to your data You want to run the script many times while scaling out With Coiled, you can run any Python function, script, or executable in your AWS or GCP account,...

How do I get my software onto cloud VMs? Automatic Package Synchronization with Coiled

4:43

How do I get my software onto cloud VMs? Automatic Package Synchronization with Coiled

zhlédnutí 152Před rokem

Getting your software onto cloud VMs is hard. Coiled makes it easy...mostly. This video talks about how Coiled manages software for Python development in the cloud, and methods to escape when things go wrong. More information available at docs.coiled.io/user_guide/software/ Blog posts: How many PEPs does it take to install a package? medium.com/coiled-hq/how-many-peps-does-it-take-to-install-a-...

7:43

Coiled Cluster Configuration

zhlédnutí 175Před rokem

Learn how to configure your Coiled resources, including selecting instance types, regions, and different hardware choices. Documentation at docs.coiled.io/user_guide/clusters/ More videos to help you setup Coiled czcams.com/video/QXql9O8kSPk/video.html czcams.com/video/ukkOJPF2URY/video.html czcams.com/video/eXP-YuERvi4/video.html Get started with Coiled for free: coiled.io/start

5:36

Jupyter Notebooks with Coiled

zhlédnutí 345Před rokem

Jupyter notebooks on large VMs in the cloud using Coiled. This approach synchronizes your local packages and files, giving a smooth Big Laptop experience. Check out this blog post for more details: medium.com/coiled-hq/coiled-notebooks-d4577596ff4a Key Moments 00:00 Intro 01:00 coiled notebook start 02:17 Cloud Notebook Starts 03:11 File sync 04:52 Summary Scale Your Python Workloads with Dask ...

Dask Futures Tutorial: Parallelize Python Code with Dask

1:00:39

Dask Futures Tutorial: Parallelize Python Code with Dask

zhlédnutí 1,7KPřed rokem

In this lesson, we'll parallelize a custom Python workflow that scrapes, parses, and cleans data from Stack Overflow. We'll get to: - Learn how to do arbitrary task scheduling using the Dask Futures API - Utilize blocking and non-blocking distributed calculations Notebook here: github.com/coiled/dask-tutorial/blob/main/1-Parallelize-your-python-code_Futures_API.ipynb Tutorial repo: github.com/c...

Dask DataFrames Tutorial: Best practices for larger-than-memory dataframes

1:03:32

Dask DataFrames Tutorial: Best practices for larger-than-memory dataframes

zhlédnutí 2,1KPřed rokem

Learn best practices for larger-than-memory dataframes. Investigate Uber/Lyft data and learn to do the following: - Manipulate Parquet files and optimize queries - Navigate inconvenient file sizes and data types - Tune Parquet storage, build features, and explore a challenging dataset with Pandas and Dask. Notebook here: github.com/coiled/dask-tutorial/blob/main/2-Get_better-at-dask-dataframes....

6:44

Databricks vs. Dask and Coiled

zhlédnutí 416Před rokem

Databricks vs. Dask and Coiled

7:38

Coiled Xarray Example

zhlédnutí 554Před rokem

Coiled Xarray Example

Coiled Dashboard: Monitor Teams and Manage Costs Easily and Efficiently

6:02

Coiled Dashboard: Monitor Teams and Manage Costs Easily and Efficiently

zhlédnutí 190Před rokem

Coiled Dashboard: Monitor Teams and Manage Costs Easily and Efficiently

12:24

Dask + Pandas for Parallel ETL

zhlédnutí 1,2KPřed rokem

Dask Pandas for Parallel ETL

7:42

XGBoost and HyperParameter Optimization

zhlédnutí 869Před rokem

XGBoost and HyperParameter Optimization

10:10

Dask Futures for General Parallelism

zhlédnutí 890Před rokem

Dask Futures for General Parallelism

3:03

Engineering a Technical Newsletter: A transparent analysis of the Coiled newsletter

zhlédnutí 57Před rokem

Engineering a Technical Newsletter: A transparent analysis of the Coiled newsletter

12:03

Six Coiled features for Dask users

zhlédnutí 434Před rokem

Six Coiled features for Dask users

Dask Infrastructure with Coiled for Pangeo

8:10

Dask Infrastructure with Coiled for Pangeo

zhlédnutí 378Před rokem

Dask Infrastructure with Coiled for Pangeo

10:25

Dask on Single Machine with Coiled

zhlédnutí 378Před rokem

Dask on Single Machine with Coiled

Dask and Optuna for Hyper Parameter Optimization

13:53

Dask and Optuna for Hyper Parameter Optimization

zhlédnutí 2,1KPřed rokem

Dask and Optuna for Hyper Parameter Optimization

Measuring the GIL | Does pandas release the GIL?

6:11

Measuring the GIL | Does pandas release the GIL?

zhlédnutí 566Před rokem

Measuring the GIL | Does pandas release the GIL?

High Performance Visualization | Parallel performance with Dask & Datashader

16:54

High Performance Visualization | Parallel performance with Dask & Datashader

zhlédnutí 4,3KPřed rokem

High Performance Visualization | Parallel performance with Dask & Datashader

Transforming Parquet Data at Scale on the Cloud with Dask & Coiled | NYC Taxi Uber/Lyft Data

14:56

Transforming Parquet Data at Scale on the Cloud with Dask & Coiled | NYC Taxi Uber/Lyft Data

zhlédnutí 479Před rokem

Transforming Parquet Data at Scale on the Cloud with Dask & Coiled | NYC Taxi Uber/Lyft Data

Scale Python with Dask and Coiled | Setting up a production environment in the cloud

15:06

Scale Python with Dask and Coiled | Setting up a production environment in the cloud

zhlédnutí 1KPřed rokem

Scale Python with Dask and Coiled | Setting up a production environment in the cloud

Komentáře

@edzme Před dnem
thanks for making this, coiled seems to be what I'm looking for
@fida47 Před 9 dny
can someone share dataset link? from where to download 10 csv files of nyc flights dataset?
@Andikan4U Před 15 dny
Thank you
@FabioRBelotto Před měsícem
If I run Dask without importing the client, it does not work on many workers ?
@FabioRBelotto Před měsícem
The source was one only big parquet file ? Dask set partitions by itself ?
@FabioRBelotto Před 2 měsíci
My main issue with dask is the lack of support of the community (very different from pandas!)
@richerite Před 2 měsíci
Great talk! What would you recommend for ingesting about 100-200GB of geospatial data on premise?
@mohitparwani4235 Před 3 měsíci
{ "name": "CancelledError", "message": "('mul-floordiv-3770c7fe5e6231d62ed3d68e48276fbd', 0)", "stack": "--------------------------------------------------------------------------- CancelledError Traceback (most recent call last) File <timed eval>:2 File c:\\Users\\mohit.parwani\\.conda\\envs\\parApat\\Lib\\site-packages\\dask_expr\\_collection.py:476, in FrameBase.compute(self, fuse, **kwargs) 474 out = out.repartition(npartitions=1) 475 out = out.optimize(fuse=fuse) --> 476 return DaskMethodsMixin.compute(out, **kwargs) File c:\\Users\\mohit.parwani\\.conda\\envs\\parApat\\Lib\\site-packages\\dask\\base.py:375, in DaskMethodsMixin.compute(self, **kwargs) 351 def compute(self, **kwargs): 352 \"\"\"Compute this dask collection 353 354 This turns a lazy Dask collection into its in-memory equivalent. (...) 373 dask.compute 374 \"\"\" --> 375 (result,) = compute(self, traverse=False, **kwargs) 376 return result File c:\\Users\\mohit.parwani\\.conda\\envs\\parApat\\Lib\\site-packages\\dask\\base.py:661, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs) 658 postcomputes.append(x.__dask_postcompute__()) 660 with shorten_traceback(): --> 661 results = schedule(dsk, keys, **kwargs) 663 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) File c:\\Users\\mohit.parwani\\.conda\\envs\\parApat\\Lib\\site-packages\\distributed\\client.py:2235, in Client._gather(self, futures, errors, direct, local_worker) 2233 else: 2234 raise exception.with_traceback(traceback) -> 2235 raise exc 2236 if errors == \"skip\": 2237 bad_keys.add(key) CancelledError: ('mul-floordiv-3770c7fe5e6231d62ed3d68e48276fbd', 0)" } I'm getting this error when i use client can someone please help with any possible solution i definitely need that. please!
@as978 Před 3 měsíci
So happy to see this. Better late than never. Hopefully Dask gets the popularity it deserves and becomes a serious contender to Spark down the line.
@gemini_537 Před 3 měsíci
Gemini 1.5 Pro: This video is about an introduction to Dask DataFrames, and it covers when to use them, how to use them, and performance tips. In the video, it is explained that pandas is great for tabular data sets that fit into memory, but Dask is useful for working with data sets that are larger than your machine can handle. Dask can cut up your big data set into smaller bits and execute those smaller parts in parallel. Here are the key points covered in the video: * **When to use Dask DataFrames:** You should use Dask DataFrames if your data doesn't fit into memory and your computations are complex. Pandas might run into a memory error if the data is too large, but Dask can handle those types of large-scale computations comfortably. * **Dask DataFrames vs Pandas DataFrames:** Dask DataFrames are similar to Pandas DataFrames and implement a well-used portion of the Pandas API. This means that a lot of Dask DataFrames code will look and feel pretty familiar to Pandas users. However, there are some key differences. For instance, unlike Pandas DataFrames, Dask DataFrames are lazy, meaning they only create the task graph (a recipe or a root map) to get to the final result but doesn't actually execute it until you specifically tell Dask to do so by calling compute. * **Working with Partitions:** Dask DataFrames are cut up into small bits which are partitions and each partition is actually just a Pandas DataFrame. This means you can perform Pandas operations on these partitions. * **Performance tips:** The video also covers performance tips, such as when to call compute. It is recommended to call compute when you want to combine computations into a single task graph. This is because task graphs for these results have been merged which means that Dask only needs to read the data from the CSV file once instead of twice. The video concludes by mentioning that this is module two of the introduction to Dask tutorial and the next module will cover processing array data with Dask Arrays.
@zapy422 Před 4 měsíci
How this setup is solving dependencies for the python code?
@MatthewRocklin Před 4 měsíci
We scrape the local environment for package versions, move those to the target architecture, use mamba to solve and fill in any missing pieces, then we download the new packages on the fly onto each machine. It all happens seamlessly in the background. Users don't need to care about this detail (other than that it works)
@maksimhajiyev7857 Před 5 měsíci
The problem is that in fact RUST based tooling actually wins and all the paid promotions just suck . The actual reason why RUST based tooling is sort of suppressed is very simple , hyperscalers (big cloud tech) earn a lot of money and if things are faster there is no huge bills for your spark clusters 😊)) , I was playing with RUST and huge datasets myself without external benchmarks course I don t trust all this market shit .Rust based EDA is maybe witch kraft but this thing runs as beast . try yourself guys with a huge datasets .
@carlostph Před 5 měsíci
When you say "now", from what version are we talking about? To future-proof the video.
@manojjoshi4321 Před 6 měsíci
It's a great introduction with very cool and easy to follow illustrations. Great job....!!
@kokizzu Před 6 měsíci
Clickhouse ftw
@giselleandreaulloadelarosa1869 Před 7 měsíci
Would you please share a link to the github ?
@henrywittler5046 Před 7 měsíci
Great work 🙂 Dask will fascilitate to solve some computational data analysis issues of many people
@snowaIker Před 7 měsíci
How delayed gets around GIL?
@wayne7936 Před 7 měsíci
This is such a clear, simple, yet extremely powerful introduction. Alright, you convinced me to try coiled again.
@Coiled Před 7 měsíci
Acheivement unlocked! If you tried out Coiled more than a year ago then it's definitely worth trying again. Admittedly, the product was kinda bad early on. Now it is quite delightful.
@ravishmahajan9314 Před 7 měsíci
But DuckDB is good if your data fits one single machine. But the benchmarks shows different story when data is distributed. What about that?
@henrywittler5046 Před 8 měsíci
Thanks for this tutorial and the other material at Dask and Coiled, will help heaps in a large data project 🙂
@henrywittler5046 Před 8 měsíci
Thanks for this tutorial and the other material at Dask and Coiled, will help heaps in a large data project 🙂
@taylorpaskett3703 Před 8 měsíci
What software did you use for generating / displaying your plots? It looked really nice
@taylorpaskett3703 Před 8 měsíci
Nevermind, if I just kept watching you showed the GitHub where it says ibis and altair. Thanks!
@randywilliams7696 Před 8 měsíci
Great video! Recently switched from Dask to Duckdb on my ~1TB workloads, interesting to see some of the same issues I found brought up here. One gotcha I've found is that it is REALLY easy to blunder your way into making non-performant queries in dask (things that end up shuffling, partitioning, etc. a lot behind the scenes). It was more straightforward for my use case to write performant SQL queries for duckdb since that is much more of a common, solved problem. The scale-out feature of Dask and Spark is interesting too, as we are considering the merits of a natively clustered solution vs just breaking up our queries into chunks that can fit on multiple single instances for duckdb.
@MatthewRocklin Před 8 měsíci
Yup. Totally agreed. The query optimization in Dask Dataframe should handle what you ran into historically. The problem wasn't unique to you :)
@ravishmahajan9314 Před 7 měsíci
But what about distributed databases. Is DuckDB able to query distributed databases? Is this technology replacing spark framework??
@rjv Před 8 měsíci
Such a good video! So many good insights clearly communicated with proper data. Also love the interfaces you've built, very meaningful, clean and minimalistic. Have you got comparison benchmarks where cloud cost is the only constraint and the number of machines or their size and type (GPU machines with cudf) is not restricted?
@mooncop Před 10 měsíci
you are most welcome (suffered well) worth it for the duck
@bbbbbbao Před 10 měsíci
It's not clear to me if you can use autoscaling with coiled.
@Coiled Před 10 měsíci
You can use autoscaling with Coiled. See the `coiled.Cluster.adapt` method.
@o0o0oo00oo00 Před 10 měsíci
I don’t see duckdb and polars kick spark dask ass on 10gb level in my practical usage.😅 we can’t always trust TPC-H benchmarks.
@andrewm4894 Před 10 měsíci
Great talk, thanks
@Amapramaadhy Před 11 měsíci
Some ppl were meant to teach and Matt is one of them! One feedback: I know you have covered it elsewhere but it might be helpful to talk about the graphs (like what does a yellow vs red block mean). You have them up on the screen. They must be serving some purpose. Again, brilliant presentation
@kamranpersianable Před rokem
Thanks, this is amazing! I have tried integrating Optuna hyperparameter search with Dask and it works great, but I have noticed if I increase the number of iterations, at some point my system crashes due to insufficient memory. From what I can see dask keeps a copy of each iteration so it ends up consuming more memory than needed; any way I can release all the memory usages after each iteration?
@Coiled Před rokem
The copy that Dask keeps is just the result of the objective function (scores, metrics). This should be pretty lightweight. That's not to say that there isn't some memory leak somewhere (XGBoost, Pandas, ...). If you're able to provide a reproducer to a Dask issue tracker that would be welcome. Alternatively if you run on Coiled infrastructure there's lots of measurement tools there that get run automatically that could help to diagnose.
@kamranpersianable Před rokem
@@Coiled thanks, I will check further to see what is going wrong! From what I can see for 500 iterations, there is 9GB of added materials into the memory.
@ButchCassidyAndSundanceKid Před rokem
Does the Task Delayed use GPU as well ?
@UmmadikTas Před rokem
I had an issue with parallelization and the random sampler for hyperparameter search. When I submit optimize function in parallel, optuna keeps repeating the same hyper-paremeters across all processes. I could not figure out how to reseed the sampler for different processes.
@Coiled Před rokem
Are the different processes communcating hyperparameters with a central Optuna Storage object? This video shows using the DaskStorage, which helps all of the Optuna search functions coordinate and share results between each other using Dask. Other ways to do this include using things like a database (although we think that Dask is easier).
@ButchCassidyAndSundanceKid Před rokem
What about Dask Bag and Dask Future ?
@irfams Před rokem
Would you please share a link to the notebook ?
@UmmadikTas Před rokem
Thank you so much. This is very helpful with my research.
@chaitanyamadduri5826 Před rokem
The video is very informative and kudos to Richard for making Intuitive. Could you help me with below questions? 1. How can we perform a Time series regression using DASk. I see we are breaking the huge dataset to chunks how are gonna maintain the time continuity between the chunks. 2. You have used coiled clusters and i beleive these are external CPU clusters and how DASK is powerful over Pyspark in this case? 3. So DASK can be only utilised when there is CPU executions and it might be used in case of parallel GPU execution right ? Share your comments on this Thanks in advance
@Coiled Před rokem
Thanks for the questions! First, you can always post more detailed questions on the Dask Forum dask.discourse.group/. For your question on a time series regression, you may find this example helpful examples.dask.org/applications/forecasting-with-prophet.html If you're curious to learn more about pros/cons of Dask vs. Spark, check out our blog post: www.coiled.io/blog/spark-vs-dask You can use Dask (and Coiled!) with GPU-enabled machines. Learn more in the Coiled docs.coiled.io/user_guide/clusters/gpu.html or Dask documentation docs.dask.org/en/stable/gpu.html
@Lemuz90 Před rokem
This looks great! I remember trying to use coiled jobs to do something like this a while ago.
@Coiled Před rokem
Thank you! Let us know how you end up using this!
@orlandogarcia885 Před rokem
What are the coming features that coiled plans to do?
@Coiled Před rokem
We are working on lots of new things - check out Coiled Notebooks: czcams.com/video/mibhDHYun0M/video.html and our upcoming webinar about Coiled Functions and Jobs, which allow you to run any python function in the cloud: czcams.com/video/JuBmG39zLY8/video.html.
@thomasmoore3175 Před rokem
great stuff, Matt !
@bvenkateshx Před rokem
I have a use case to read data from Oracle table - split this into files and zip it. Move to s3. Would Dask be a benefit or overhead for such a use case? (Cx_Oracle is used. Currently using mutiprocessing on 20 core server)
@Coiled Před rokem
Thanks for the question! It's hard to answer without more details on the size of your data, but feel free to post your question on the Dask Forum dask.discourse.group/
@Coiled Před rokem
Update: pandas 2.0 has been released! See www.coiled.io/blog/pyarrow-strings-in-dask-dataframes for the latest on PyArrow strings improvements.
@user-be4vx5by8p Před rokem
Thank you very much for this usefull information
@billyblackburn864 Před rokem
the one at 15min is really nice...what is the cluster you're running it on?
@exeb1t_solopharm Před rokem
Большое спасибо вам! Отличная серия видео, продолжайте работать!
@user-lx5gf4vd4c Před rokem
Good video! Can you help me? Where can i find notebook from this video?
@mikecmw8492 Před rokem
This is a very good video. I have to ask cause I am in the situation of setting up a DASK cluster that will be querying large weather datasets in AWS S3. I have never done it. Do you have a video on setting up the cluster? Have not explored your channel yet...thx
@pieter5466 Před rokem
33:00 surprising that there aren’t existing open source solutions that support “marginal “ arrays, so to speak… has this changed?
@francescos7361 Před rokem
Thanks , interesting for oceanographic research .
@NajiShajarisales Před rokem
thanks for this video!! i am not sure how it is benefitial to have dask worker code inside the same process that the user code is called. after all pinging the process that runs the user code, does not need to happen often, and in this way GIL is not blocking for the heartbeat to be communicated to scheduler. am i missing something here? any pointer is appreciated.